Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes
Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics...
Saved in:
| Published in | Forensic science international Vol. 340; p. 111466 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
Amsterdam
Elsevier B.V
01.11.2022
Elsevier Limited |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0379-0738 1872-6283 1872-6283 |
| DOI | 10.1016/j.forsciint.2022.111466 |
Cover
| Abstract | Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed.
•The factors influencing the performance of kNN algorithms for classifying haplogroups were assessed.•Combine all the levels of haplogroups into the observations is inappropriate.•Transracial prediction was proved to be impractical.•Classification accuracy under the SM group of Y-STR loci was higher than that of the RM group.•The kNN classifier can be effectively used for accurate haplogroup assignment. |
|---|---|
| AbstractList | Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed. Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed.Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed. Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion of the Y chromosome. Because of their different rates of mutation, Y-STRs and Y-SNPs play distinct roles in forensic and evolutionary genetics. Current approaches to infer haplogroup status rely on genotyping lots of Y-SNP loci. Given the relationship between haplotype and haplogroup of a Y chromosome, a cost-effective strategy of Y-STRs typing had an advantage in haplogroup prediction. Many machine learning algorithms have sprung up for assigning a Y-STR haplotype to a haplogroup. However, a series of issues must be solved before the using of machine learning method in practice. Thus, the k-nearest neighbor (kNN) classifier was built respectively based on different situations in this study. We assessed different factors which may influence the performance of the kNN prediction model for classifying haplogroups. The training set was based on a diverse ground-truth data set comprising Y-STR haplotypes and corresponding Y-SNP haplogroups. Our results showed that combining different levels of haplogroups into the observations or transracial prediction was impractical. Moreover, using more slow mutation Y-STR loci in the category is good for promoting classification accuracy. The preconditions for an effective and accurate haplogroup assignment by the kNN classifier were revealed. •The factors influencing the performance of kNN algorithms for classifying haplogroups were assessed.•Combine all the levels of haplogroups into the observations is inappropriate.•Transracial prediction was proved to be impractical.•Classification accuracy under the SM group of Y-STR loci was higher than that of the RM group.•The kNN classifier can be effectively used for accurate haplogroup assignment. |
| ArticleNumber | 111466 |
| Author | Fan, Guang-Yao |
| Author_xml | – sequence: 1 givenname: Guang-Yao surname: Fan fullname: Fan, Guang-Yao email: fanyoyo1983@163.com organization: Forensic Center, College of Medicine, Shaoxing University, Shaoxing 312000, China |
| BookMark | eNqNkUtv1DAUhS1UJKYtvwFLbNhksJ34kQVCVcVLqoRU2kVXluu56Xhw7GA7oPn3OAqw6IauLF9_51wdn1N0EmIAhF5RsqWEireH7RBTts6FsmWEsS2ltBPiGdpQJVkjmGpP0Ia0sm-IbNULdJrzgRDCORMb9OsiZ8jZhQdc9oAHY0t1wy4MfoZg_84nSHXLaIIFHAc8Grt3AbAHk8LC1EdsvalGw3G5783k40OK85TxkOKI75pvN9fruBwnyOfo-WB8hpd_zjN0-_HDzeXn5urrpy-XF1eN7YgsDRBGODXAQUhC5L3smdz1vLWcgOxabpRQCvg9ba3a9Z0CKW3Pa3DbC9jV6Gfozeo7pfhjhlz06LIF702AOGfNJGspVYw8BaVS9BXmFX39CD3EOYUaZDFkoudcdJWSK2VTzDnBoKfkRpOOmhK9dKcP-l93eulOr91V5btHSuuKKS6GkozzT9C_X_VQv_ang6Std8FZ47_DUe-i-6_Db4MlvPk |
| CitedBy_id | crossref_primary_10_1080_03014460_2023_2168057 |
| Cites_doi | 10.1101/gr.217602 10.1093/hmg/ddab215 10.1016/j.fsigen.2019.102204 10.1007/s00414-020-02326-9 10.1002/elps.202000145 10.1080/00031305.1992.10475879 10.1016/j.fsigen.2018.11.016 10.1111/j.1365-2753.2005.00598.x 10.1086/302905 10.1016/j.fsigen.2018.03.008 10.21037/atm.2016.03.37 10.1371/journal.pgen.1009758 10.1016/j.fsigen.2019.07.011 10.1109/TIT.1981.1056403 10.1371/journal.pcbi.1000093 10.1002/elps.202100142 10.1086/302676 10.1186/s40169-015-0060-7 10.1016/j.fsigss.2017.09.031 10.3390/genes11070743 10.1002/sim.603 10.1002/elps.202100003 10.1016/j.fsigen.2018.01.005 |
| ContentType | Journal Article |
| Copyright | 2022 Elsevier B.V. 2022. Elsevier B.V. Copyright © 2022 Elsevier B.V. All rights reserved. |
| Copyright_xml | – notice: 2022 Elsevier B.V. – notice: 2022. Elsevier B.V. – notice: Copyright © 2022 Elsevier B.V. All rights reserved. |
| DBID | AAYXX CITATION 3V. 7QP 7RV 7U7 7X7 7XB 88E 8FE 8FH 8FI 8FJ 8FK 8G5 ABUWG AFKRA AZQEC BBNVY BENPR BHPHI C1K CCPQU DWQXO FYUFA GHDGH GNUQQ GUQSH HCIFZ K9. KB0 LK8 M0S M1P M2O M7P MBDVC NAPCQ PHGZM PHGZT PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS Q9U 7X8 7S9 L.6 |
| DOI | 10.1016/j.forsciint.2022.111466 |
| DatabaseName | CrossRef ProQuest Central (Corporate) Calcium & Calcified Tissue Abstracts Nursing & Allied Health Database Toxicology Abstracts ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Journals Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Database ProQuest Central Natural Science Collection Environmental Sciences and Pollution Management ProQuest One Community College ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Database (Alumni Edition) ProQuest Biological Science Collection Health & Medical Collection (Alumni Edition) Medical Database Research Library Biological Science Database Research Library (Corporate) Nursing & Allied Health Premium ProQuest Central Premium ProQuest One Academic ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic AGRICOLA AGRICOLA - Academic |
| DatabaseTitle | CrossRef Research Library Prep ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest One Health & Nursing Research Library (Alumni Edition) ProQuest Natural Science Collection ProQuest Central China Environmental Sciences and Pollution Management ProQuest Central ProQuest One Applied & Life Sciences ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Health & Medical Research Collection Biological Science Collection ProQuest Research Library ProQuest Central (New) ProQuest Medical Library (Alumni) ProQuest Biological Science Collection ProQuest Central Basic Toxicology Abstracts ProQuest One Academic Eastern Edition ProQuest Nursing & Allied Health Source ProQuest Hospital Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition ProQuest Nursing & Allied Health Source (Alumni) ProQuest One Academic Calcium & Calcified Tissue Abstracts ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic AGRICOLA AGRICOLA - Academic |
| DatabaseTitleList | AGRICOLA Research Library Prep MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Public Health |
| EISSN | 1872-6283 |
| ExternalDocumentID | 10_1016_j_forsciint_2022_111466 S0379073822002961 |
| GroupedDBID | --- --K --M .1- .4L .FO .GJ .~1 04C 0R~ 186 1B1 1P~ 1RT 1~. 1~5 29H 3O- 4.4 457 4G. 53G 5GY 5RE 5VS 7-5 71M 7RV 7X7 88E 8FE 8FH 8FI 8FJ 8G5 8P~ 9JM 9JN 9JO AABNK AAEDT AAEDW AAFJI AAHBH AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARLI AATTM AAXKI AAXUO AAYWO ABBQC ABFNM ABFRF ABGSF ABJNI ABLJU ABMAC ABMMH ABMZM ABOCM ABUDA ABUWG ABWVN ABXDB ABZDS ACDAQ ACGFO ACGFS ACIEU ACIUM ACIWK ACLOT ACNNM ACPRK ACRLP ACRPL ACVFH ADBBV ADCNI ADECG ADEZE ADFRT ADMUD ADNMO ADUVX AEBSH AEFWE AEHWI AEIPS AEKER AENEX AEUPX AEVXI AFFNX AFJKZ AFKRA AFPUW AFRAH AFRHN AFTJW AFXIZ AFZHZ AGHFR AGQPQ AGRDE AGUBO AGYEJ AHHHB AHMBA AIEXJ AIGII AIIUN AIKHN AITUG AJRQY AJSZI AJUYK AKBMS AKRWK AKYEP ALCLG ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU ANZVX AOMHK APXCP ASPBG AVARZ AVWKF AXJTR AZFZN AZQEC BBNVY BENPR BHPHI BKEYQ BKOJK BLXMC BMSDO BNPGV BPHCQ BVXVI CCPQU CS3 DU5 DWQXO EBD EBS EFJIC EFKBS EFLBG EIHBH EJD EO8 EO9 EP2 EP3 EX3 F5P FDB FEDTE FGOYB FIRID FLBIZ FNPLU FYGXN FYUFA G-2 G-Q GBLVA GNUQQ GUQSH HCIFZ HDY HMCUK HMK HMO HVGLF HZ~ I-F IAO IEA IHE ILT IOF ITC J1W KOM LK8 M1P M29 M2O M41 M7P MO0 N9A NAPCQ O-L O9- OAUVE OG0 OGGZJ OS0 OZT P-8 P-9 P2P PC. PHGZM PHGZT PJZUB PPXIY PQGLB PQQKQ PRBVW PROAC PSQYO Q38 R2- RNS ROL RPZ SAE SCB SCC SDF SDG SDP SEL SES SEW SPC SPCBC SSB SSH SSK SSO SSP SSU SSZ T5K TAE TN5 UKHRP ULE WH7 WOW WUQ Z5R ZGI ~02 ~G- ~HD AAYXX CITATION 3V. 7QP 7U7 7XB 8FK C1K K9. MBDVC PKEHL PQEST PQUKI PRINS Q9U 7X8 PUEGO 7S9 L.6 |
| ID | FETCH-LOGICAL-c407t-e02051ae5e67007b7927d953c50e7435a8688e5b13c8d948e77c95187c96ed283 |
| IEDL.DBID | BENPR |
| ISSN | 0379-0738 1872-6283 |
| IngestDate | Sun Sep 28 09:29:10 EDT 2025 Sat Sep 27 20:38:04 EDT 2025 Tue Oct 07 06:32:18 EDT 2025 Thu Oct 16 04:33:58 EDT 2025 Thu Apr 24 23:13:10 EDT 2025 Tue Oct 14 19:27:49 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Y-STR haplotype KNN Prediction performance Y-SNP haplogroup Machine learning |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c407t-e02051ae5e67007b7927d953c50e7435a8688e5b13c8d948e77c95187c96ed283 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| PQID | 2722695564 |
| PQPubID | 1226354 |
| ParticipantIDs | proquest_miscellaneous_2723118208 proquest_miscellaneous_2717691185 proquest_journals_2722695564 crossref_primary_10_1016_j_forsciint_2022_111466 crossref_citationtrail_10_1016_j_forsciint_2022_111466 elsevier_clinicalkey_doi_10_1016_j_forsciint_2022_111466 |
| PublicationCentury | 2000 |
| PublicationDate | November 2022 2022-11-00 20221101 |
| PublicationDateYYYYMMDD | 2022-11-01 |
| PublicationDate_xml | – month: 11 year: 2022 text: November 2022 |
| PublicationDecade | 2020 |
| PublicationPlace | Amsterdam |
| PublicationPlace_xml | – name: Amsterdam |
| PublicationTitle | Forensic science international |
| PublicationYear | 2022 |
| Publisher | Elsevier B.V Elsevier Limited |
| Publisher_xml | – name: Elsevier B.V – name: Elsevier Limited |
| References | Baeta, Núñez, Villaescusa, Ortueta, Ibarbia, Herrera, Blazquez-Caeiro, Builes, Jiménez-Moreno, Martínez-Jarreta, de Pancorbo (bib6) 2018; 34 Song, Song, Zhao, Hou (bib16) 2021 Linden (bib26) 2006; 12 Kayser, Roewer, Hedman, Henke, Henke, Brauer, Krüger, Krawczak, Nagy, Dobosz, Szibor, de Knijff, Stoneking, Sajantila (bib8) 2000; 66 Lang, Liu, Song, Qiao, Ye, Ren, Li, Huang, Xie, Chen, Song, Zhang, Qian, Yuan, Wang, Liu, Wang, Liu, Liu, Hou (bib19) 2019; 42 Bosch, Calafell, Santos, Pérez-Lezaun, Comas, Benchemsi, Tyler-Smith, Bertranpetit (bib15) 1999; 65 Yin, He, Wang, He, Zhang, Xia, Zhai, Chang, Chen, Chen, Chen, Jin, Li (bib7) 2022; 57 Schlecht, Kaplan, Barnard, Karafet, Hammer, Merchant, Ouzounis (bib9) 2008; 4 Song, Wang, Zhang, Zhao, Lang, Xie, Qian, Wang, Hou (bib18) 2019; 39 Claerhout, Verstraete, Warnez, Vanpaemel, Larmuseau, Decorte, Gojobori (bib10) 2021; 17 Song, Song, Luo, Xie, Wang, Dai, Hou (bib22) 2021; 42 Claerhout, Vandenbosch, Nivelle, Gruyters, Peeters, Larmuseau, Decorte (bib3) 2018; 34 Short, Fukunaga (bib25) 1981; 27 Wang, He, Zou, Liu, Ye, Ming, Du, Wang, Hou (bib23) 2021; 54 Votrubova, Saskova, Frolik, Vanek (bib5) 2017; 6 Tiirikka, Moilanen (bib11) 2015; 4 Wilson (bib1) 2021; 30 Altman (bib17) 1992; 46 Zhang (bib24) 2016; 4 Yin, Su, He, Zhai, Guo, Chen, Jin, Li (bib20) 2020; 11 Consortium, Y (bib12) 2002; 12 Claerhout, Roelens, Van der Haegen, Verstraete, Larmuseau, Decorte, Ysurnames? (bib2) 2020; 44 Fan, Xie, Wang, Ru, Tan, Ding, Wang, Huang, Wang, Li, Wang, He, Gu, Liu, Ma, Wen, Qiu (bib13) 2022; 59 Wang, Song, Song, Li, Xie, Hou (bib21) 2021; 42 Li, Zhang, Luo, Bian, Li (bib14) 2020; 41 Fan, Pan, Tang, Zhou, Liu, Luo (bib4) 2020; 135 Thompson (bib27) 2001; 20 Consortium (10.1016/j.forsciint.2022.111466_bib12) 2002; 12 Schlecht (10.1016/j.forsciint.2022.111466_bib9) 2008; 4 Song (10.1016/j.forsciint.2022.111466_bib18) 2019; 39 Kayser (10.1016/j.forsciint.2022.111466_bib8) 2000; 66 Claerhout (10.1016/j.forsciint.2022.111466_bib3) 2018; 34 Li (10.1016/j.forsciint.2022.111466_bib14) 2020; 41 Yin (10.1016/j.forsciint.2022.111466_bib20) 2020; 11 Claerhout (10.1016/j.forsciint.2022.111466_bib10) 2021; 17 Altman (10.1016/j.forsciint.2022.111466_bib17) 1992; 46 Tiirikka (10.1016/j.forsciint.2022.111466_bib11) 2015; 4 Short (10.1016/j.forsciint.2022.111466_bib25) 1981; 27 Claerhout (10.1016/j.forsciint.2022.111466_bib2) 2020; 44 Baeta (10.1016/j.forsciint.2022.111466_bib6) 2018; 34 Votrubova (10.1016/j.forsciint.2022.111466_bib5) 2017; 6 Yin (10.1016/j.forsciint.2022.111466_bib7) 2022; 57 Fan (10.1016/j.forsciint.2022.111466_bib13) 2022; 59 Zhang (10.1016/j.forsciint.2022.111466_bib24) 2016; 4 Wilson (10.1016/j.forsciint.2022.111466_bib1) 2021; 30 Bosch (10.1016/j.forsciint.2022.111466_bib15) 1999; 65 Linden (10.1016/j.forsciint.2022.111466_bib26) 2006; 12 Song (10.1016/j.forsciint.2022.111466_bib16) 2021 Wang (10.1016/j.forsciint.2022.111466_bib23) 2021; 54 Song (10.1016/j.forsciint.2022.111466_bib22) 2021; 42 Lang (10.1016/j.forsciint.2022.111466_bib19) 2019; 42 Fan (10.1016/j.forsciint.2022.111466_bib4) 2020; 135 Thompson (10.1016/j.forsciint.2022.111466_bib27) 2001; 20 Wang (10.1016/j.forsciint.2022.111466_bib21) 2021; 42 |
| References_xml | – volume: 44 year: 2020 ident: bib2 article-title: The patrilineal Y-chromosome and surname correlation for DNA kinship research publication-title: Forensic Sci. Int. Genet. – volume: 17 year: 2021 ident: bib10 article-title: CSYseq: the first Y-chromosome sequencing tool typing a large number of Y-SNPs and Y-STRs to unravel worldwide human population genetics publication-title: Plos Genet – volume: 39 start-page: e14 year: 2019 end-page: e20 ident: bib18 article-title: Forensic characteristics and phylogenetic analysis of both Y-STR and Y-SNP in the Li and Han ethnic groups from Hainan Island of China publication-title: Forensic Sci. Int.: Genet. – volume: 27 start-page: 622 year: 1981 end-page: 627 ident: bib25 article-title: The optimal distance measure for nearest neighbor classification publication-title: IEEE T Inf. Theory – volume: 20 start-page: 2895 year: 2001 end-page: 2906 ident: bib27 article-title: Estimating equations for kappa statistics publication-title: Stat. Med. – volume: 42 start-page: 1892 year: 2021 end-page: 1899 ident: bib22 article-title: Paternal genetic structure of Kyrgyz ethnic group in China revealed by high‐resolution Y‐chromosome STRs and SNPs publication-title: Electrophoresis – volume: 4 year: 2016 ident: bib24 article-title: Introduction to machine learning: k-nearest neighbors publication-title: Ann. Transl. Med. – volume: 57 year: 2022 ident: bib7 article-title: Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: a pilot study on male Chinese Yunnan Zhaoyang Han population publication-title: Forensic Sci. Int.: Genet. – volume: 59 year: 2022 ident: bib13 article-title: Microhaplotype and Y-SNP/STR (MY): a novel MPS-based system for genotype pattern recognition in two-person DNA mixtures publication-title: Forensic Sci. Int.: Genet. – volume: 12 start-page: 132 year: 2006 end-page: 139 ident: bib26 article-title: Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis publication-title: J. Eval. Clin. Pr. – volume: 4 year: 2008 ident: bib9 article-title: Machine-learning approaches for classifying haplogroup from Y chromosome STR data publication-title: Plos Comput. Biol. – volume: 34 start-page: e7 year: 2018 end-page: e12 ident: bib6 article-title: Assessment of a subset of slowly mutating Y-STRs for forensic and evolutionary studies publication-title: Forensic Sci. Int.: Genet. – volume: 4 start-page: 1 year: 2015 end-page: 9 ident: bib11 article-title: Human chromosome Y and Haplogroups; introducing YDHS database publication-title: Clin. Transl. Med. – volume: 34 start-page: 1 year: 2018 end-page: 10 ident: bib3 article-title: Determining Y-STR mutation rates in deep-routing genealogies: identification of haplogroup differences publication-title: Forensic Sci. Int. Genet. – year: 2021 ident: bib16 article-title: YHP: Y-chromosome Haplogroup Predictor for Predicting Male Lineages Based on Y-STRs – volume: 65 start-page: 1623 year: 1999 end-page: 1638 ident: bib15 article-title: Variation in short tandem repeats is deeply structured by genetic background on the human Y chromosome publication-title: Am. J. Hum. Genet – volume: 54 year: 2021 ident: bib23 article-title: Genetic insights into the paternal admixture history of Chinese Mongolians via high-resolution customized Y-SNP SNaPshot panels publication-title: Forensic Sci. Int.: Genet. – volume: 66 start-page: 1580 year: 2000 end-page: 1588 ident: bib8 article-title: Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs publication-title: Am. J. Hum. Genet – volume: 12 start-page: 339 year: 2002 end-page: 348 ident: bib12 article-title: A nomenclature system for the tree of human Y-chromosomal binary haplogroups publication-title: Genome Res – volume: 6 start-page: e129 year: 2017 end-page: e131 ident: bib5 article-title: Linking the Y-chromosomal haplotype from a high medieval (1160–1421) skeleton from a Podlazice excavation site with living descendants publication-title: Forensic Sci. Int. Genet. Suppl. Ser. – volume: 46 start-page: 175 year: 1992 end-page: 185 ident: bib17 article-title: An introduction to Kernel and nearest-neighbor nonparametric regression publication-title: Am. Stat. – volume: 30 start-page: R296 year: 2021 end-page: R300 ident: bib1 article-title: The Y chromosome and its impact on health and disease publication-title: Hum. Mol. Genet. – volume: 42 start-page: 1480 year: 2021 end-page: 1487 ident: bib21 article-title: Genetic reconstruction and phylogenetic analysis by 193 Y‐SNPs and 27 Y‐STRs in a Chinese Yi ethnic group publication-title: Electrophoresis – volume: 11 start-page: 743 year: 2020 ident: bib20 article-title: Genetic reconstruction and forensic analysis of Chinese shandong and yunnan han populations by Co-analyzing Y chromosomal STRs and SNPs publication-title: Genes – volume: 42 start-page: e13 year: 2019 end-page: e20 ident: bib19 article-title: Forensic characteristics and genetic analysis of both 27 Y-STRs and 143 Y-SNPs in Eastern Han Chinese population publication-title: Forensic Sci. Int.: Genet. – volume: 135 start-page: 409 year: 2020 end-page: 419 ident: bib4 article-title: Technical note: developmental validation of a novel 41-plex Y-STR system for the direct amplification of reference samples publication-title: Int J. Leg. Med – volume: 41 start-page: 2047 year: 2020 end-page: 2054 ident: bib14 article-title: Development and validation of a custom panel including 183 Y‐SNPs for Chinese Y‐chromosomal haplogroups dissection using a MALDI‐TOF MS system publication-title: Electrophoresis – volume: 12 start-page: 339 issue: 2 year: 2002 ident: 10.1016/j.forsciint.2022.111466_bib12 article-title: A nomenclature system for the tree of human Y-chromosomal binary haplogroups publication-title: Genome Res doi: 10.1101/gr.217602 – volume: 30 start-page: R296 issue: R2 year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib1 article-title: The Y chromosome and its impact on health and disease publication-title: Hum. Mol. Genet. doi: 10.1093/hmg/ddab215 – volume: 44 year: 2020 ident: 10.1016/j.forsciint.2022.111466_bib2 article-title: The patrilineal Y-chromosome and surname correlation for DNA kinship research publication-title: Forensic Sci. Int. Genet. doi: 10.1016/j.fsigen.2019.102204 – volume: 135 start-page: 409 issue: 2 year: 2020 ident: 10.1016/j.forsciint.2022.111466_bib4 article-title: Technical note: developmental validation of a novel 41-plex Y-STR system for the direct amplification of reference samples publication-title: Int J. Leg. Med doi: 10.1007/s00414-020-02326-9 – volume: 41 start-page: 2047 issue: 23 year: 2020 ident: 10.1016/j.forsciint.2022.111466_bib14 article-title: Development and validation of a custom panel including 183 Y‐SNPs for Chinese Y‐chromosomal haplogroups dissection using a MALDI‐TOF MS system publication-title: Electrophoresis doi: 10.1002/elps.202000145 – volume: 46 start-page: 175 issue: 3 year: 1992 ident: 10.1016/j.forsciint.2022.111466_bib17 article-title: An introduction to Kernel and nearest-neighbor nonparametric regression publication-title: Am. Stat. doi: 10.1080/00031305.1992.10475879 – volume: 39 start-page: e14 year: 2019 ident: 10.1016/j.forsciint.2022.111466_bib18 article-title: Forensic characteristics and phylogenetic analysis of both Y-STR and Y-SNP in the Li and Han ethnic groups from Hainan Island of China publication-title: Forensic Sci. Int.: Genet. doi: 10.1016/j.fsigen.2018.11.016 – volume: 12 start-page: 132 issue: 2 year: 2006 ident: 10.1016/j.forsciint.2022.111466_bib26 article-title: Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis publication-title: J. Eval. Clin. Pr. doi: 10.1111/j.1365-2753.2005.00598.x – volume: 66 start-page: 1580 issue: 5 year: 2000 ident: 10.1016/j.forsciint.2022.111466_bib8 article-title: Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs publication-title: Am. J. Hum. Genet doi: 10.1086/302905 – volume: 54 year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib23 article-title: Genetic insights into the paternal admixture history of Chinese Mongolians via high-resolution customized Y-SNP SNaPshot panels publication-title: Forensic Sci. Int.: Genet. – volume: 34 start-page: e7 year: 2018 ident: 10.1016/j.forsciint.2022.111466_bib6 article-title: Assessment of a subset of slowly mutating Y-STRs for forensic and evolutionary studies publication-title: Forensic Sci. Int.: Genet. doi: 10.1016/j.fsigen.2018.03.008 – volume: 4 issue: 11 year: 2016 ident: 10.1016/j.forsciint.2022.111466_bib24 article-title: Introduction to machine learning: k-nearest neighbors publication-title: Ann. Transl. Med. doi: 10.21037/atm.2016.03.37 – volume: 17 issue: 9 year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib10 article-title: CSYseq: the first Y-chromosome sequencing tool typing a large number of Y-SNPs and Y-STRs to unravel worldwide human population genetics publication-title: Plos Genet doi: 10.1371/journal.pgen.1009758 – volume: 42 start-page: e13 year: 2019 ident: 10.1016/j.forsciint.2022.111466_bib19 article-title: Forensic characteristics and genetic analysis of both 27 Y-STRs and 143 Y-SNPs in Eastern Han Chinese population publication-title: Forensic Sci. Int.: Genet. doi: 10.1016/j.fsigen.2019.07.011 – volume: 27 start-page: 622 issue: 5 year: 1981 ident: 10.1016/j.forsciint.2022.111466_bib25 article-title: The optimal distance measure for nearest neighbor classification publication-title: IEEE T Inf. Theory doi: 10.1109/TIT.1981.1056403 – volume: 4 issue: 6 year: 2008 ident: 10.1016/j.forsciint.2022.111466_bib9 article-title: Machine-learning approaches for classifying haplogroup from Y chromosome STR data publication-title: Plos Comput. Biol. doi: 10.1371/journal.pcbi.1000093 – volume: 42 start-page: 1892 issue: 19 year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib22 article-title: Paternal genetic structure of Kyrgyz ethnic group in China revealed by high‐resolution Y‐chromosome STRs and SNPs publication-title: Electrophoresis doi: 10.1002/elps.202100142 – volume: 65 start-page: 1623 issue: 6 year: 1999 ident: 10.1016/j.forsciint.2022.111466_bib15 article-title: Variation in short tandem repeats is deeply structured by genetic background on the human Y chromosome publication-title: Am. J. Hum. Genet doi: 10.1086/302676 – volume: 4 start-page: 1 issue: 1 year: 2015 ident: 10.1016/j.forsciint.2022.111466_bib11 article-title: Human chromosome Y and Haplogroups; introducing YDHS database publication-title: Clin. Transl. Med. doi: 10.1186/s40169-015-0060-7 – year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib16 – volume: 6 start-page: e129 year: 2017 ident: 10.1016/j.forsciint.2022.111466_bib5 article-title: Linking the Y-chromosomal haplotype from a high medieval (1160–1421) skeleton from a Podlazice excavation site with living descendants publication-title: Forensic Sci. Int. Genet. Suppl. Ser. doi: 10.1016/j.fsigss.2017.09.031 – volume: 11 start-page: 743 issue: 7 year: 2020 ident: 10.1016/j.forsciint.2022.111466_bib20 article-title: Genetic reconstruction and forensic analysis of Chinese shandong and yunnan han populations by Co-analyzing Y chromosomal STRs and SNPs publication-title: Genes doi: 10.3390/genes11070743 – volume: 20 start-page: 2895 issue: 19 year: 2001 ident: 10.1016/j.forsciint.2022.111466_bib27 article-title: Estimating equations for kappa statistics publication-title: Stat. Med. doi: 10.1002/sim.603 – volume: 42 start-page: 1480 issue: 14–15 year: 2021 ident: 10.1016/j.forsciint.2022.111466_bib21 article-title: Genetic reconstruction and phylogenetic analysis by 193 Y‐SNPs and 27 Y‐STRs in a Chinese Yi ethnic group publication-title: Electrophoresis doi: 10.1002/elps.202100003 – volume: 59 year: 2022 ident: 10.1016/j.forsciint.2022.111466_bib13 article-title: Microhaplotype and Y-SNP/STR (MY): a novel MPS-based system for genotype pattern recognition in two-person DNA mixtures publication-title: Forensic Sci. Int.: Genet. – volume: 34 start-page: 1 year: 2018 ident: 10.1016/j.forsciint.2022.111466_bib3 article-title: Determining Y-STR mutation rates in deep-routing genealogies: identification of haplogroup differences publication-title: Forensic Sci. Int. Genet. doi: 10.1016/j.fsigen.2018.01.005 – volume: 57 year: 2022 ident: 10.1016/j.forsciint.2022.111466_bib7 article-title: Improving the regional Y-STR haplotype resolution utilizing haplogroup-determining Y-SNPs and the application of machine learning in Y-SNP haplogroup prediction in a forensic Y-STR database: a pilot study on male Chinese Yunnan Zhaoyang Han population publication-title: Forensic Sci. Int.: Genet. |
| SSID | ssj0005526 |
| Score | 2.3867054 |
| Snippet | Two distinct genetic markers, single nucleotide polymorphisms (Y-SNPs) and short tandem repeats (Y-STRs), exist simultaneously in the non-recombining portion... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 111466 |
| SubjectTerms | Accuracy Algorithms Chromosomes Classification Classifiers cost effectiveness data collection Datasets Decision trees Efficiency Evolutionary genetics Forensic science Forensic sciences Genetic markers Genetics Genotyping Haplotypes K-nearest neighbors algorithm KNN Learning algorithms Machine learning Mutation Nucleotides prediction Prediction models Prediction performance Short tandem repeats Single-nucleotide polymorphism Y chromosome Y chromosomes Y-SNP haplogroup Y-STR haplotype |
| Title | Assessing the factors influencing the performance of machine learning for classifying haplogroups from Y-STR haplotypes |
| URI | https://www.clinicalkey.com/#!/content/1-s2.0-S0379073822002961 https://www.proquest.com/docview/2722695564 https://www.proquest.com/docview/2717691185 https://www.proquest.com/docview/2723118208 |
| Volume | 340 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1872-6283 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection customDbUrl: eissn: 1872-6283 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: ACRLP dateStart: 19950105 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] customDbUrl: eissn: 1872-6283 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: AIKHN dateStart: 19950105 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Science Direct customDbUrl: eissn: 1872-6283 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1872-6283 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: AKRWK dateStart: 19780701 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1872-6283 dateEnd: 20250902 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: 7X7 dateStart: 19970207 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1872-6283 dateEnd: 20250902 omitProxy: true ssIdentifier: ssj0005526 issn: 0379-0738 databaseCode: BENPR dateStart: 19970207 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3fS-NAEB5q-yLI4Z0n9s6TPfA1mKTZH3kQUVHKwZWjKvSelmSz1Yom0bbcv-9Msttyx2Ef0ockS0InmfmyO9_3ARxHQqRWiRCzn8iDxMgoSK2Nggyre5gPwiKJiO_8cySGd8mPCZ90YOS5MNRW6XNik6iLytAc-UksESiknIvkrH4JyDWKVle9hUbmrBWK00ZibAt6MSljdaF3cTX6NV43ffBY_NXlhcgQS82spLbKOKYEkjSCif-tUf9k66YEXe_CB4cd2Xkb7I_QseUn2Gkn3ljLJ9qDP-0yLlYkhtiOOT8dNnNmJH5_vSYMsGrKnpueSsucicQ9w4PMELKeNUQo9pDVpG5dLes5I0oK-x3c3I7b3TSNO_8Md9dXt5fDwLkrBAY_4haBRaDIo8xyS0wdmcs0lkXKB4aHFmEFz5RQyvI8GhhVpImyUhqEYwp_hS0QlexDt6xKewAsI7vfwVTmCDeTJFckKJOSrj5uBr_W-yD8_6mNkx4nB4wn7XvMHvUqEJoCodtA9CFcDaxb9Y3NQ5QPmPbkUkyHGivE5qGHPsDavclzvX7u-vB9dRjfQVpYyUpbLemcSAqsGoq_dw4iaVLLV1_ev8xX2KZ7avmOh9BdvC7tNwQ-i_wItuREHrln-g16VgWy |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6V9gASQjzF0gJGgmNEHn7lUFU8Wm1pu0JlK5WTSRwXtipJyu6q6p_jtzGT2LsCofbUQ3JI4iTKOPN9tuebAXidSJk7LWP0frKMuFVJlDuXRAWie1xmccUT0jsfjOTwiH86Fscr8DtoYSisMvjEzlFXjaU58repQqKQCyH5VnseUdUoWl0NJTQKX1qh2uxSjHlhx567vMAh3HRz9yPa-02a7myPPwwjX2UgsjiYmUUOCZNICiccKVZUqfJUVbnIrIgdwqsotNTaiTLJrK5yrp1SFmmJxr10FaIz3vcWrPGM5zj4W3u_Pfp8uAwyEan8K6oMmShC26SmMM40JYfFuwSN_8XEf9Chg7yd-3DPc1X2ru9cD2DF1Q_hbj_Rx3r90iO46JeNEQEZcknm6_ewiS9-Eo63S4ECa07Yzy6G0zFftOI7w5PMEpOfdMIr9qNoKZt2M2-njCQw7Gv0ZXzYH6Zp4-ljOLqR7_wEVuumdk-BFVReODtRJdJbzktNCWxyyuOPm3VlNgAZvqexPtU5Vdw4MyGm7dQsDGHIEKY3xADiRcO2z_ZxfRMdDGaCmBXdr0FEur7pRjCw8Z5japb9fACvFqfxn6eFnKJ2zZyuSZRElNLiqmuQuVN2fv3s6se8hNvD8cG-2d8d7a3DHXq_Xmu5AauzX3P3HEnXrHzhezaDbzf9M_0BwPQ_lw |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9RAEB9qBRFE_KSnVVfQx9B87UceRMR6tFaLaAvn05ps5vSkTaJ3R_Ff869zJpu9Q5H2qQ_JQ5JNQmZ3fr_szm8G4FmiVIFGxeT9VBXlTidRgZhEJaF7XGVxnSesd35_qPaO87cTOdmA30ELw2GVwSf2jrpuHc-R76SaiEIhpcp3pkNYxIfd8cvuR8QVpHilNZTT8F3kAH-d0e_b_MX-Ltn6eZqO3xy93ouGCgORox-ZRYRElmRSokRWq-hKF6muC5k5GSNBqyyNMgZllWTO1EVuUGtHlMTQXmFNyEz3vQJXdZYVHE6oJ3odXiJT9Vc8GXFQArVZwwGcacquKu9TM_4XDf_BhR7sxrfg5sBSxSvfrW7DBjZ34Iaf4hNeuXQXzvyCMWGfIBYphso9YjaUPQnHu7U0QbRTcdpHb6IYylV8FXRSOObws15yJb6VHefRbpfdXLD4RXyOPh199Id5wnh-D44v5Svfh82mbXALRMmFhbOprojY5nllOHVNwRn8aXNYZSNQ4XtaNyQ551obJzZEs323K0NYNoT1hhhBvGrY-TwfFzcxwWA2yFjJ8VrCooubbgcD28FnzO26h4_g6eo0jXZewikbbJd8TaIV4ZOR511DnJ3z8psH5z_mCVyjIWTf7R8ePITr_HpeZLkNm4ufS3xEbGtRPe67tYAvlz2O_gAF7z0x |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Assessing+the+factors+influencing+the+performance+of+machine+learning+for+classifying+haplogroups+from+Y-STR+haplotypes&rft.jtitle=Forensic+science+international&rft.au=Fan%2C+Guang-Yao&rft.date=2022-11-01&rft.issn=1872-6283&rft.eissn=1872-6283&rft.volume=340&rft.spage=111466&rft_id=info:doi/10.1016%2Fj.forsciint.2022.111466&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0379-0738&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0379-0738&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0379-0738&client=summon |