Regularized multi-trait multi-locus linear mixed models for genome-wide association studies and genomic selection in crops

Background We consider two key problems in genomics involving multiple traits: multi-trait genome wide association studies (GWAS), where the goal is to detect genetic variants associated with the traits; and multi-trait genomic selection (GS), where the emphasis is on accurately predicting trait val...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 24; no. 1; pp. 1 - 15
Main Authors	Lozano, Aurélie C., Ding, Hantian, Abe, Naoki, Lipka, Alexander E.
Format	Journal Article
Language	English
Published	London BioMed Central 26.10.2023 Springer Nature B.V BMC
Subjects	Agronomy Algorithms Bioinformatics Biomedical and Life Sciences Clustering Computational Biology/Bioinformatics Computer Appl. in Life Sciences Environmental effects Estimates Estimation Gene loci Genetic diversity Genetic variance Genome-wide association studies Genomes Genomics Genotypes GWAS and genomic selection in plants Life Sciences Microarrays Multi-trait multi-locus linear mixed model Normal distribution Plant breeding Population structure Prediction models Quantitative genetics Regularization Single-nucleotide polymorphism Sorghum Multi-trait multi-locus linear mixed model Regularization GWAS and genomic selection in plants
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-023-05519-2

Cover

More Information
Summary:	Background We consider two key problems in genomics involving multiple traits: multi-trait genome wide association studies (GWAS), where the goal is to detect genetic variants associated with the traits; and multi-trait genomic selection (GS), where the emphasis is on accurately predicting trait values. Multi-trait linear mixed models build on the linear mixed model to jointly model multiple traits. Existing estimation methods, however, are limited to the joint analysis of a small number of genotypes; in fact, most approaches consider one SNP at a time. Estimating multi-dimensional genetic and environment effects also results in considerable computational burden. Efficient approaches that incorporate regularization into multi-trait linear models (no random effects) have been recently proposed to identify genomic loci associated with multiple traits (Yu et al. in Multitask learning using task clustering with applications to predictive modeling and GWAS of plant varieties. arXiv:1710.01788 , 2017; Yu et al in Front Big Data 2:27, 2019), but these ignore population structure and familial relatedness (Yu et al in Nat Genet 38:203–208, 2006). Results This work addresses this gap by proposing a novel class of regularized multi-trait linear mixed models along with scalable approaches for estimation in the presence of high-dimensional genotypes and a large number of traits. We evaluate the effectiveness of the proposed methods using datasets in maize and sorghum diversity panels, and demonstrate benefits in both achieving high prediction accuracy in GS and in identifying relevant marker-trait associations. Conclusions The proposed regularized multivariate linear mixed models are relevant for both GWAS and GS. We hope that they will facilitate agronomy-related research in plant biology and crop breeding endeavors.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-023-05519-2