PCA-based population structure inference with generic clustering algorithms
Background Handling genotype data typed at hundreds of thousands of loci is very time-consuming and it is no exception for population structure inference. Therefore, we propose to apply PCA to the genotype data of a population, select the significant principal components using the Tracy-Widom distri...
        Saved in:
      
    
          | Published in | BMC bioinformatics Vol. 10; no. Suppl 1; p. S73 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          BioMed Central
    
        30.01.2009
     BMC  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1471-2105 1471-2105  | 
| DOI | 10.1186/1471-2105-10-S1-S73 | 
Cover
| Summary: | Background
Handling genotype data typed at hundreds of thousands of loci is very time-consuming and it is no exception for population structure inference. Therefore, we propose to apply PCA to the genotype data of a population, select the significant principal components using the Tracy-Widom distribution, and assign the individuals to one or more subpopulations using generic clustering algorithms.
Results
We investigated K-means, soft K-means and spectral clustering and made comparison to STRUCTURE, a model-based algorithm specifically designed for population structure inference. Moreover, we investigated methods for predicting the number of subpopulations in a population. The results on four simulated datasets and two real datasets indicate that our approach performs comparably well to STRUCTURE. For the simulated datasets, STRUCTURE and soft K-means with BIC produced identical predictions on the number of subpopulations. We also showed that, for real dataset, BIC is a better index than likelihood in predicting the number of subpopulations.
Conclusion
Our approach has the advantage of being fast and scalable, while STRUCTURE is very time-consuming because of the nature of MCMC in parameter estimation. Therefore, we suggest choosing the proper algorithm based on the application of population structure inference. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 1471-2105 1471-2105  | 
| DOI: | 10.1186/1471-2105-10-S1-S73 |