High-dimensional multi-trait GWAS by reverse prediction of genotypes

Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genoty...

Full description

Saved in:

Bibliographic Details
Main Authors	Malik, Muhammad Ammar, Ludl, Adriaan-Alexander, Michoel, Tom
Format	Journal Article
Language	English
Published	29.10.2021
Subjects	Computer Science - Learning Quantitative Biology - Genomics Quantitative Biology - Quantitative Methods Statistics - Methodology
Online Access	Get full text
DOI	10.48550/arxiv.2111.00108

Cover

More Information
Summary:	Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We analyzed different machine learning methods (ridge regression, naive Bayes/independent univariate, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes, with complementary findings across methods. Code to reproduce the analysis is available at https://github.com/michoel-lab/Reverse-Pred-GWAS
DOI:	10.48550/arxiv.2111.00108