Correcting for volunteer bias in GWAS increases SNP effect sizes and heritability estimates

Selection bias in genome-wide association studies (GWASs) due to volunteer-based sampling (volunteer bias) is poorly understood. The UK Biobank (UKB), one of the largest and most widely used cohorts, is highly selected. Using inverse probability (IP) weights we estimate inverse probability weighted...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 16; no. 1; pp. 3578 - 11
Main Authors van Alten, Sjoerd, Domingue, Benjamin W., Faul, Jessica, Galama, Titus, Marees, Andries T.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 15.04.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2041-1723
2041-1723
DOI10.1038/s41467-025-58684-8

Cover

More Information
Summary:Selection bias in genome-wide association studies (GWASs) due to volunteer-based sampling (volunteer bias) is poorly understood. The UK Biobank (UKB), one of the largest and most widely used cohorts, is highly selected. Using inverse probability (IP) weights we estimate inverse probability weighted GWAS (WGWAS) to correct GWAS summary statistics in the UKB for volunteer bias. Our IP weights were estimated using UK Census data – the largest source of population-representative data – made representative of the UKB’s sampling population. These weights have a substantial SNP-based heritability of 4.8% (s.e. 0.8%), suggesting they capture volunteer bias in GWAS. Across ten phenotypes, WGWAS yields larger SNP effect sizes, larger heritability estimates, and altered gene-set tissue expression, despite decreasing the effective sample size by 62% on average, compared to GWAS. The impact of volunteer bias on GWAS results varies by phenotype. Traits related to disease, health behaviors, and socioeconomic status are most affected. We recommend that GWAS consortia provide population weights for their data sets, or use population-representative samples. Genetic studies may be biased due to volunteer-based biobanks. Using UK Biobank, the authors apply inverse probability weighting based on UK Census data, finding that genome-wide association studies showed bias in SNP effect sizes, heritability, and gene-set tissue expression.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-025-58684-8