High-dimensional multi-trait GWAS by reverse prediction of genotypes
Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genoty...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
29.10.2021
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2111.00108 |
Cover
Summary: | Multi-trait genome-wide association studies (GWAS) use multi-variate
statistical methods to identify associations between genetic variants and
multiple correlated traits simultaneously, and have higher statistical power
than independent univariate analyses of traits. Reverse regression, where
genotypes of genetic variants are regressed on multiple traits simultaneously,
has emerged as a promising approach to perform multi-trait GWAS in
high-dimensional settings where the number of traits exceeds the number of
samples. We analyzed different machine learning methods (ridge regression,
naive Bayes/independent univariate, random forests and support vector machines)
for reverse regression in multi-trait GWAS, using genotypes, gene expression
data and ground-truth transcriptional regulatory networks from the DREAM5
SysGen Challenge and from a cross between two yeast strains to evaluate
methods. We found that genotype prediction performance, in terms of root mean
squared error (RMSE), allowed to distinguish between genomic regions with high
and low transcriptional activity. Moreover, model feature coefficients
correlated with the strength of association between variants and individual
traits, and were predictive of true trans-eQTL target genes, with complementary
findings across methods. Code to reproduce the analysis is available at
https://github.com/michoel-lab/Reverse-Pred-GWAS |
---|---|
DOI: | 10.48550/arxiv.2111.00108 |