A high‐performance SNP panel developed by machine‐learning approaches for characterizing genetic differences of Southern and Northern Han Chinese, Korean, and Japanese individuals

Population stratification analyses targeting genetically closely related East Asians have revealed that distinguishable differentiation exists between Han Chinese, Korean, and Japanese individuals, as well as between southern (S‐) and northern (N‐) Han Chinese. Previous studies offer a number of cho...

Full description

Saved in:
Bibliographic Details
Published inElectrophoresis Vol. 43; no. 11; pp. 1183 - 1192
Main Authors Gu, Jia‐Qi, Zhao, Hui, Guo, Xiao‐Yuan, Sun, Hao‐Yun, Xu, Jing‐Yi, Wei, Yi‐Liang
Format Journal Article
LanguageEnglish
Published Germany Wiley Subscription Services, Inc 01.06.2022
Subjects
Online AccessGet full text
ISSN0173-0835
1522-2683
1522-2683
DOI10.1002/elps.202100184

Cover

More Information
Summary:Population stratification analyses targeting genetically closely related East Asians have revealed that distinguishable differentiation exists between Han Chinese, Korean, and Japanese individuals, as well as between southern (S‐) and northern (N‐) Han Chinese. Previous studies offer a number of choices for ancestry informative single nucleotide polymorphisms (AISNPs) to discriminate East‐Asian populations. In this study, we collected and examined the efficiency of 1185 AISNPs using frequency and genotype data from various publicly available databases. With the aim to perform fine‐scale classification of S‐Han, N‐Han, Korean, and Japanese subjects, machine‐learning methods (Softmax and Random Forest) were used to screen a panel of highly informative AISNPs and to develop a superior classification model. Stepwise classification was implemented to increase and balance the discrimination in the process of AISNP selection, first discriminating Han, Korean, and Japanese individuals, and then characterizing stratification between S‐Han and N‐Han. The final 272‐AISNP panel is an alternative optimization of various previous works, which promises reliable and >90% accuracy in classification of the four East‐Asian groups. This AISNP panel and the machine‐learning model could be a useful and superior choice in medical genome‐wide association studies and in forensic investigations for unknown suspect identity.
Bibliography:See article online to view Figs. 1–8 in color.
Color online
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0173-0835
1522-2683
1522-2683
DOI:10.1002/elps.202100184