Generative haplotype prediction outperforms statistical methods for small variant detection in next-generation sequencing data
Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and t...
Saved in:
| Published in | Bioinformatics (Oxford, England) Vol. 40; no. 11 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
England
Oxford University Press
01.11.2024
Oxford Publishing Limited (England) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1367-4811 1367-4803 1367-4811 |
| DOI | 10.1093/bioinformatics/btae565 |
Cover
| Abstract | Motivation
Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden.
Results
We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested.
Availability and implementation
Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/ |
|---|---|
| AbstractList | Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. Results We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Availability and implementation Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/ Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. Results We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Availability and implementation Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/ Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden.MOTIVATIONDetection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden.We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested.RESULTSWe introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested.Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.AVAILABILITY AND IMPLEMENTATIONJenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/. Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/. |
| Author | Fredrickson, Eric Durtschi, Jacob Bolia, Ashini Best, Hunter O’Fallon, Brendan Yang, Luobin |
| Author_xml | – sequence: 1 givenname: Brendan orcidid: 0000-0001-7185-7894 surname: O’Fallon fullname: O’Fallon, Brendan email: brendan.ofallon@aruplab.com – sequence: 2 givenname: Ashini surname: Bolia fullname: Bolia, Ashini – sequence: 3 givenname: Jacob surname: Durtschi fullname: Durtschi, Jacob – sequence: 4 givenname: Luobin surname: Yang fullname: Yang, Luobin – sequence: 5 givenname: Eric surname: Fredrickson fullname: Fredrickson, Eric – sequence: 6 givenname: Hunter surname: Best fullname: Best, Hunter |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39298478$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNUctu1DAUjVARfcAvVJbYsAm1YztOVghVUJAqsYG1de3czLhK7GA7A7Ph23E1A7SsurLle173-Lw68cFjVV0y-pbRnl8ZF5wfQ5whO5uuTAaUrXxWnTHeqlp0jJ08uJ9W5yndUUolle2L6pT3Td8J1Z1Vv27QYywiOyRbWKaQ9wuSJeLgbHbBk7DmBeO9UyIpF2AqhjCRGfM2DImUCUkzTBPZQXTgMxkw44HrPPH4M9ebo0d5Svh9RW-d35ABMrysno8wJXx1PC-qbx8_fL3-VN9-ufl8_f62tpI2uRbcjqAoM2rgY28MbUVnWMe5Un3DlBosG1uDvUCgtukaKwCkQQFtwzrDB35RqYPu6hfY_yhx9RLdDHGvGdX3jerHjepjo4X57sBcVjPjYNHnCP_YAZx-PPFuqzdhpxmToqdMFIU3R4UYyvIp69kli9MEHsOaNGdUMSkFawr09X_Qu7BGX5rRvGFcdiVQV1CXDyP9zfLnVwugPQBsDClFHJ--LDsQw7o8lfMb3NbU7w |
| Cites_doi | 10.1038/nbt.4235 10.1038/s41592-018-0051-x 10.1101/023754, 10.1038/nrg2986 10.1186/s13059-020-01993-6 10.1038/ng.806 10.1093/bioinformatics/btp352 10.1101/2022.09.12.506413, 10.1101/2024.01.02.573821 10.1088/2632-2153/ab7e19 10.1038/nmeth.2221 10.1038/s42256-020-0167-4 10.1093/bioinformatics/btu356 10.1038/s41587-021-00861-3 10.1101/2020.03.23.004473, 10.1186/s13073-016-0269-0 10.1038/s41587-019-0054-x 10.1371/journal.pcbi.1007556 10.1038/s41467-019-09025-z 10.1016/j.neucom.2023.127063 10.1162/tacl_a_00353 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2024. Published by Oxford University Press. 2024 The Author(s) 2024. Published by Oxford University Press. |
| Copyright_xml | – notice: The Author(s) 2024. Published by Oxford University Press. 2024 – notice: The Author(s) 2024. Published by Oxford University Press. |
| DBID | TOX AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM ADTOC UNPAY |
| DOI | 10.1093/bioinformatics/btae565 |
| DatabaseName | Oxford Journals Open Access Collection CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
| DatabaseTitleList | Materials Research Database MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: TOX name: Oxford - Revues - OpenAccess url: https://academic.oup.com/journals/ sourceTypes: Publisher – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1367-4811 |
| ExternalDocumentID | 10.1093/bioinformatics/btae565 PMC11549014 39298478 10_1093_bioinformatics_btae565 |
| Genre | Journal Article |
| GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABGNP ABIXL ABNGD ABNKS ABPQP ABPTD ABQLI ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUKT ACUXJ ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADMLS ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQPQ AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC AMNDL APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 5PM ADTOC UNPAY |
| ID | FETCH-LOGICAL-c502t-43cfa701b7d3f9bb0648b18337792177dc1f6be94ea0c282c4aa5be4a6218b3d3 |
| IEDL.DBID | UNPAY |
| ISSN | 1367-4811 1367-4803 |
| IngestDate | Sun Oct 26 02:44:49 EDT 2025 Tue Sep 30 17:07:02 EDT 2025 Thu Oct 02 10:27:49 EDT 2025 Mon Oct 06 17:29:49 EDT 2025 Mon Jul 21 06:17:51 EDT 2025 Tue Jul 01 02:34:05 EDT 2025 Mon Jun 30 08:34:49 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 The Author(s) 2024. Published by Oxford University Press. cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c502t-43cfa701b7d3f9bb0648b18337792177dc1f6be94ea0c282c4aa5be4a6218b3d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0001-7185-7894 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.1093/bioinformatics/btae565 |
| PMID | 39298478 |
| PQID | 3213586538 |
| PQPubID | 36124 |
| ParticipantIDs | unpaywall_primary_10_1093_bioinformatics_btae565 pubmedcentral_primary_oai_pubmedcentral_nih_gov_11549014 proquest_miscellaneous_3107155412 proquest_journals_3213586538 pubmed_primary_39298478 crossref_primary_10_1093_bioinformatics_btae565 oup_primary_10_1093_bioinformatics_btae565 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2024-11-01 |
| PublicationDateYYYYMMDD | 2024-11-01 |
| PublicationDate_xml | – month: 11 year: 2024 text: 2024-11-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England – name: Oxford |
| PublicationTitle | Bioinformatics (Oxford, England) |
| PublicationTitleAlternate | Bioinformatics |
| PublicationYear | 2024 |
| Publisher | Oxford University Press Oxford Publishing Limited (England) |
| Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
| References | Li (2024110904321050700_btae565-B16) 2014; 30 Kim (2024110904321050700_btae565-B13) 2018; 15 Shaw (2024110904321050700_btae565-B34) 2018 Garrison (2024110904321050700_btae565-B9) 2012 DePristo (2024110904321050700_btae565-B7) 2011; 43 Paszke (2024110904321050700_btae565-B26) Wang (2024110904321050700_btae565-B3800) Li (2024110904321050700_btae565-B18) 2009; 25 Vaswani (2024110904321050700_btae565-B37) 2017; 30 O’Fallon (2024110904321050700_btae565-B25) Nielsen (2024110904321050700_btae565-B24) 2011; 12 Cooke (2024110904321050700_btae565-B5) 2021; 39 Qi (2024110904321050700_btae565-B30) 2021. Cleary (2024110904321050700_btae565-B4) 2015 Izmailov (2024110904321050700_btae565-B12) 2018 Riquelme (2024110904321050700_btae565-B32) 2021; 34 Poplin (2024110904321050700_btae565-B28) 2018; 36 Wagner (2024110904321050700_btae565-B38) 2022; 2 Gupta (2024110904321050700_btae565-B11) 2020; 1 Ramachandran (2024110904321050700_btae565-B31) 2020 Su (2024110904321050700_btae565-B36) 2024; 568 Dao (2024110904321050700_btae565-B6) 2022; 35 Roy (2024110904321050700_btae565-B33) 2021; 9 Luo (2024110904321050700_btae565-B21) 2019; 10 Shazeer (2024110904321050700_btae565-B35) 2019 Marco-Sola (2024110904321050700_btae565-B23) 2012; 9 Choromanski (2024110904321050700_btae565-B3) 2020 Goldfeder (2024110904321050700_btae565-B10) 2016; 8 Luo (2024110904321050700_btae565-B22) 2020; 2 Köster (2024110904321050700_btae565-B14) 2020; 21 Krusche (2024110904321050700_btae565-B15) 2019; 37 Baid (2024110904321050700_btae565-B1) 2023; 41 Behera (2024110904321050700_btae565-B2) Li (2024110904321050700_btae565-B19) 2019; 15 |
| References_xml | – volume: 34 start-page: 8583 year: 2021 ident: 2024110904321050700_btae565-B32 article-title: Scaling vision with sparse mixture of experts publication-title: Adv Neural Inf Proces Syst – volume: 2 start-page: 1 year: 2022 ident: 2024110904321050700_btae565-B38 article-title: Benchmarking challenging small variants with linked and long reads publication-title: Cell Genom – volume: 36 start-page: 983 year: 2018 ident: 2024110904321050700_btae565-B28 article-title: A universal SNP and small-indel variant caller using deep neural networks publication-title: Nat Biotechnol doi: 10.1038/nbt.4235 – volume: 15 start-page: 591 year: 2018 ident: 2024110904321050700_btae565-B13 article-title: Strelka2: fast and accurate calling of germline and somatic variants publication-title: Nat Methods doi: 10.1038/s41592-018-0051-x – volume: 30 year: 2017 ident: 2024110904321050700_btae565-B37 article-title: Attention is all you need publication-title: Adv Neural Inf Process Syst – year: 2015 ident: 2024110904321050700_btae565-B4 doi: 10.1101/023754, – year: 2020 ident: 2024110904321050700_btae565-B3 – volume: 12 start-page: 443 year: 2011 ident: 2024110904321050700_btae565-B24 article-title: Genotype and SNP calling from next-generation sequencing data publication-title: Nat Rev Genet doi: 10.1038/nrg2986 – volume: 21 start-page: 98 year: 2020 ident: 2024110904321050700_btae565-B14 article-title: Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery publication-title: Genome Biol doi: 10.1186/s13059-020-01993-6 – year: 2019 ident: 2024110904321050700_btae565-B35 – volume: 43 start-page: 491 year: 2011 ident: 2024110904321050700_btae565-B7 article-title: A framework for variation discovery and genotyping using next-generation DNA sequencing data publication-title: Nat Genet doi: 10.1038/ng.806 – volume: 25 start-page: 2078 year: 2009 ident: 2024110904321050700_btae565-B18 article-title: The sequence alignment/map format and SAMtools publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp352 – year: 2018 ident: 2024110904321050700_btae565-B34 – ident: 2024110904321050700_btae565-B25 doi: 10.1101/2022.09.12.506413, – ident: 2024110904321050700_btae565-B2 doi: 10.1101/2024.01.02.573821 – ident: 2024110904321050700_btae565-B3800 – volume: 1 start-page: 025013 year: 2020 ident: 2024110904321050700_btae565-B11 article-title: DAVI: deep learning-based tool for alignment and single nucleotide variant identification publication-title: Mach Learn Sci Technol doi: 10.1088/2632-2153/ab7e19 – volume: 9 start-page: 1185 year: 2012 ident: 2024110904321050700_btae565-B23 article-title: The GEM mapper: fast, accurate and versatile alignment by filtration publication-title: Nat Methods doi: 10.1038/nmeth.2221 – volume: 2 start-page: 220 year: 2020 ident: 2024110904321050700_btae565-B22 article-title: Exploring the limit of using a deep neural network on pileup data for germline variant calling publication-title: Nat Mach Intell doi: 10.1038/s42256-020-0167-4 – volume: 30 start-page: 2843 year: 2014 ident: 2024110904321050700_btae565-B16 article-title: Toward better understanding of artifacts in variant calling from high-coverage samples publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu356 – volume: 41 start-page: 232 year: 2023 ident: 2024110904321050700_btae565-B1 article-title: DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer publication-title: Nat Biotechnol – volume: 39 start-page: 885 year: 2021 ident: 2024110904321050700_btae565-B5 article-title: A unified haplotype-based method for accurate and comprehensive variant calling publication-title: Nat Biotechnol doi: 10.1038/s41587-021-00861-3 – year: 2012 ident: 2024110904321050700_btae565-B9 – year: 2020 ident: 2024110904321050700_btae565-B31 doi: 10.1101/2020.03.23.004473, – volume: 8 start-page: 24 year: 2016 ident: 2024110904321050700_btae565-B10 article-title: Medical implications of technical accuracy in genome sequencing publication-title: Genome Med doi: 10.1186/s13073-016-0269-0 – volume: 37 start-page: 555 year: 2019 ident: 2024110904321050700_btae565-B15 article-title: Best practices for benchmarking germline small-variant calls in human genomes publication-title: Nat Biotechnol doi: 10.1038/s41587-019-0054-x – volume: 15 start-page: e1007556 year: 2019 ident: 2024110904321050700_btae565-B19 article-title: ForestQC: quality control on genetic variants from next-generation sequencing data using random forest publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1007556 – volume: 10 start-page: 998 year: 2019 ident: 2024110904321050700_btae565-B21 article-title: A multi-task convolutional deep neural network for variant calling in single molecule sequencing publication-title: Nat Commun doi: 10.1038/s41467-019-09025-z – year: 2021. ident: 2024110904321050700_btae565-B30 – volume-title: Advances in Neural Information Processing Systems ident: 2024110904321050700_btae565-B26 – year: 2018 ident: 2024110904321050700_btae565-B12 – volume: 568 start-page: 127063 year: 2024 ident: 2024110904321050700_btae565-B36 article-title: RoFormer: enhanced transformer with rotary position embedding publication-title: Neurocomputing doi: 10.1016/j.neucom.2023.127063 – volume: 9 start-page: 53 year: 2021 ident: 2024110904321050700_btae565-B33 article-title: Efficient content-based sparse attention with routing transformers publication-title: Trans Assoc Comput Linguist doi: 10.1162/tacl_a_00353 – volume: 35 start-page: 16344 year: 2022 ident: 2024110904321050700_btae565-B6 article-title: FlashAttention: fast and memory-efficient exact attention with IO-awareness publication-title: Adv Neural Inf Process Syst |
| SSID | ssj0005056 |
| Score | 2.472899 |
| Snippet | Motivation
Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools... Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely... Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools... |
| SourceID | unpaywall pubmedcentral proquest pubmed crossref oup |
| SourceType | Open Access Repository Aggregation Database Index Database Publisher |
| SubjectTerms | Accuracy Algorithms Availability Diploids Gene deletion Gene sequencing Genome, Human Genomes Genomic analysis Genomics - methods Genotypes Genotyping Graph theory Haplotypes High-Throughput Nucleotide Sequencing - methods Humans Large language models Markov chains Next-generation sequencing Nucleotides Original Paper Polymorphism, Single Nucleotide Sensitivity Sequence Analysis, DNA - methods Software Source code Statistical analysis Statistical methods Thresholds |
| Title | Generative haplotype prediction outperforms statistical methods for small variant detection in next-generation sequencing data |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39298478 https://www.proquest.com/docview/3213586538 https://www.proquest.com/docview/3107155412 https://pubmed.ncbi.nlm.nih.gov/PMC11549014 https://doi.org/10.1093/bioinformatics/btae565 |
| UnpaywallVersion | publishedVersion |
| Volume | 40 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: KQ8 dateStart: 19960101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: DOA dateStart: 20230101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: ADMLS dateStart: 19980101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1367-4811 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: DIK dateStart: 19960101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1367-4811 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: GX1 dateStart: 19960101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: RPM dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVOVD databaseName: Journals@Ovid LWW All Open Access Journal Collection Rolling customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: OVEED dateStart: 20010101 isFulltext: true titleUrlDefault: http://ovidsp.ovid.com/ providerName: Ovid – providerCode: PRVASL databaseName: Oxford - Revues - OpenAccess customDbUrl: eissn: 1367-4811 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0005056 issn: 1367-4811 databaseCode: TOX dateStart: 19850101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwED9BJwQvfH8ExmQknpCyJnHipI8TYpqQ2HhYpfIU-RyXVStJtCag8cDfzl3tVgsIqbxEUWwnsu_s-13O9zPA2xwrTDNyU5VCFabM3qpljOGkSooqySOMJOc7fzpVJ9P04yybeUeRc2EG8fuJHOOi8QyizFo8xk5bgiC3YU9lhL1HsDc9_Xz0xSVX5WFarI9C9vdxvEkJ_ueLBtZokOF2A2j-vV_ybl-3-vqHXi5vGKPjB3C26Ybbg3J52Hd4aH7-wfC4ez8fwn2PS8WRU6RHcMvWj-GOO6ny-gn8cvTUvDaKC90uG_51K9orDvOwaEXTd63LQVgJTlJa8z_TC90J1StBJWL1jboivpN3TuIUle2sa7uoRc0O-Ff_DXrkd3iTXRW8h_UpTI8_nL8_Cf3RDaHJoqQjoZu5zqMY80rOJ4gEfAqk1YPpDckJyisTzxXaSWp1ZMjrM6nWGdpUK4IcKCv5DEZ1U9sXIKJKGYIV0ppkThbXFhmmyiaFNoagq7YBjDciLFvH0FG6yLosh6Na-lEN4B1JeufK-xuFKP30XpUyiWVWKDIWAbzZFtPE5GiLrm3TUx1yrBmsxUkAz53-bD_JoJRgAbUuBpq1rcCk38OSenGxJv9m-iQOfQcQbZVwx668_P8mr-BeQijOJV_uw6i76u1rQmEdHqz_XtD1_Gx24Kfhb1UOPhQ |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7BVoheypsGCjISJ6R0kzhxsscKUVVIFA6sVE6Rx_HSFUsSdRNQe-hv78zau2pASMstih-RPWPPNxnPZ4C3OVaYZuSmKoUqTJm9VcsYw0mVFFWSRxhJznf-dKpOpunHs-zMO4qcCzOI30_kGOeNZxBl1uIxdtoSBLkLOyoj7D2Cnenpl6NvLrkqD9NidRWyf47jdUrwPzsaWKNBhtstoPn3ecn7fd3qy996sbhljI4fwOf1MNwZlB-HfYeH5uoPhsftx_kQ9jwuFUdOkR7BHVs_hnvupsrLJ3Dt6Kl5bxTnul00_OtWtBcc5mHRiqbvWpeDsBScpLTif6YO3Q3VS0ElYvmThiJ-kXdO4hSV7axrO69FzQ74d_8NeuVPeJNdFXyG9SlMjz98fX8S-qsbQpNFSUdCNzOdRzHmlZxNEAn4FEi7B9MbkhOUVyaeKbST1OrIkNdnUq0ztKlWBDlQVvIZjOqmtvsgokoZghXSmmRGFtcWGabKJoU2hqCrtgGM1yIsW8fQUbrIuiyHs1r6WQ3gHUl668oHa4Uo_fJeljKJZVYoMhYBvNkU08LkaIuubdNTHXKsGazFSQDPnf5sPsmglGABtS4GmrWpwKTfw5J6fr4i_2b6JA59BxBtlHDLobz4_yYvYTchFOeSLw9g1F309hWhsA5f-6V3A32aPAM |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generative+haplotype+prediction+outperforms+statistical+methods+for+small+variant+detection+in+next-generation+sequencing+data&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=O%E2%80%99Fallon%2C+Brendan&rft.au=Bolia%2C+Ashini&rft.au=Durtschi%2C+Jacob&rft.au=Yang%2C+Luobin&rft.date=2024-11-01&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=40&rft.issue=11&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtae565&rft_id=info%3Apmid%2F39298478&rft.externalDocID=PMC11549014 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon |