Generative haplotype prediction outperforms statistical methods for small variant detection in next-generation sequencing data

Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and t...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 40; no. 11
Main Authors O’Fallon, Brendan, Bolia, Ashini, Durtschi, Jacob, Yang, Luobin, Fredrickson, Eric, Best, Hunter
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.11.2024
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1367-4811
1367-4803
1367-4811
DOI10.1093/bioinformatics/btae565

Cover

Abstract Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. Results We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Availability and implementation Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/
AbstractList Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. Results We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Availability and implementation Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/
Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. Results We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Availability and implementation Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/
Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden.MOTIVATIONDetection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden.We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested.RESULTSWe introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested.Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.AVAILABILITY AND IMPLEMENTATIONJenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.
Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely on statistical algorithms such as de Bruijn graphs or Hidden Markov models, and are often coupled with heuristic techniques and thresholds to maximize accuracy. Despite significant progress in recent years, current methods still generate thousands of false-positive detections in a typical human whole genome, creating a significant manual review burden. We introduce a new approach that replaces the handcrafted statistical techniques of previous methods with a single deep generative model. Using a standard transformer-based encoder and double-decoder architecture, our model learns to construct diploid germline haplotypes in a generative fashion identical to modern large language models. We train our model on 37 whole genome sequences from Genome-in-a-Bottle samples, and demonstrate that our method learns to produce accurate haplotypes with correct phase and genotype for all classes of small variants. We compare our method, called Jenever, to FreeBayes, GATK HaplotypeCaller, Clair3, and DeepVariant, and demonstrate that our method has superior overall accuracy compared to other methods. At F1-maximizing quality thresholds, our model delivers the highest sensitivity, precision, and the fewest genotyping errors for insertion and deletion variants. For single nucleotide variants, our model demonstrates the highest sensitivity but at somewhat lower precision, and achieves the highest overall F1 score among all callers we tested. Jenever is implemented as a python-based command line tool. Source code is available at https://github.com/ARUP-NGS/jenever/.
Author Fredrickson, Eric
Durtschi, Jacob
Bolia, Ashini
Best, Hunter
O’Fallon, Brendan
Yang, Luobin
Author_xml – sequence: 1
  givenname: Brendan
  orcidid: 0000-0001-7185-7894
  surname: O’Fallon
  fullname: O’Fallon, Brendan
  email: brendan.ofallon@aruplab.com
– sequence: 2
  givenname: Ashini
  surname: Bolia
  fullname: Bolia, Ashini
– sequence: 3
  givenname: Jacob
  surname: Durtschi
  fullname: Durtschi, Jacob
– sequence: 4
  givenname: Luobin
  surname: Yang
  fullname: Yang, Luobin
– sequence: 5
  givenname: Eric
  surname: Fredrickson
  fullname: Fredrickson, Eric
– sequence: 6
  givenname: Hunter
  surname: Best
  fullname: Best, Hunter
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39298478$$D View this record in MEDLINE/PubMed
BookMark eNqNUctu1DAUjVARfcAvVJbYsAm1YztOVghVUJAqsYG1de3czLhK7GA7A7Ph23E1A7SsurLle173-Lw68cFjVV0y-pbRnl8ZF5wfQ5whO5uuTAaUrXxWnTHeqlp0jJ08uJ9W5yndUUolle2L6pT3Td8J1Z1Vv27QYywiOyRbWKaQ9wuSJeLgbHbBk7DmBeO9UyIpF2AqhjCRGfM2DImUCUkzTBPZQXTgMxkw44HrPPH4M9ebo0d5Svh9RW-d35ABMrysno8wJXx1PC-qbx8_fL3-VN9-ufl8_f62tpI2uRbcjqAoM2rgY28MbUVnWMe5Un3DlBosG1uDvUCgtukaKwCkQQFtwzrDB35RqYPu6hfY_yhx9RLdDHGvGdX3jerHjepjo4X57sBcVjPjYNHnCP_YAZx-PPFuqzdhpxmToqdMFIU3R4UYyvIp69kli9MEHsOaNGdUMSkFawr09X_Qu7BGX5rRvGFcdiVQV1CXDyP9zfLnVwugPQBsDClFHJ--LDsQw7o8lfMb3NbU7w
Cites_doi 10.1038/nbt.4235
10.1038/s41592-018-0051-x
10.1101/023754,
10.1038/nrg2986
10.1186/s13059-020-01993-6
10.1038/ng.806
10.1093/bioinformatics/btp352
10.1101/2022.09.12.506413,
10.1101/2024.01.02.573821
10.1088/2632-2153/ab7e19
10.1038/nmeth.2221
10.1038/s42256-020-0167-4
10.1093/bioinformatics/btu356
10.1038/s41587-021-00861-3
10.1101/2020.03.23.004473,
10.1186/s13073-016-0269-0
10.1038/s41587-019-0054-x
10.1371/journal.pcbi.1007556
10.1038/s41467-019-09025-z
10.1016/j.neucom.2023.127063
10.1162/tacl_a_00353
ContentType Journal Article
Copyright The Author(s) 2024. Published by Oxford University Press. 2024
The Author(s) 2024. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2024. Published by Oxford University Press. 2024
– notice: The Author(s) 2024. Published by Oxford University Press.
DBID TOX
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
ADTOC
UNPAY
DOI 10.1093/bioinformatics/btae565
DatabaseName Oxford Journals Open Access Collection
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList Materials Research Database

MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: TOX
  name: Oxford - Revues - OpenAccess
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
ExternalDocumentID 10.1093/bioinformatics/btae565
PMC11549014
39298478
10_1093_bioinformatics_btae565
Genre Journal Article
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNGD
ABNKS
ABPQP
ABPTD
ABQLI
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQPQ
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
5PM
ADTOC
UNPAY
ID FETCH-LOGICAL-c502t-43cfa701b7d3f9bb0648b18337792177dc1f6be94ea0c282c4aa5be4a6218b3d3
IEDL.DBID UNPAY
ISSN 1367-4811
1367-4803
IngestDate Sun Oct 26 02:44:49 EDT 2025
Tue Sep 30 17:07:02 EDT 2025
Thu Oct 02 10:27:49 EDT 2025
Mon Oct 06 17:29:49 EDT 2025
Mon Jul 21 06:17:51 EDT 2025
Tue Jul 01 02:34:05 EDT 2025
Mon Jun 30 08:34:49 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
The Author(s) 2024. Published by Oxford University Press.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c502t-43cfa701b7d3f9bb0648b18337792177dc1f6be94ea0c282c4aa5be4a6218b3d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-7185-7894
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.1093/bioinformatics/btae565
PMID 39298478
PQID 3213586538
PQPubID 36124
ParticipantIDs unpaywall_primary_10_1093_bioinformatics_btae565
pubmedcentral_primary_oai_pubmedcentral_nih_gov_11549014
proquest_miscellaneous_3107155412
proquest_journals_3213586538
pubmed_primary_39298478
crossref_primary_10_1093_bioinformatics_btae565
oup_primary_10_1093_bioinformatics_btae565
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-11-01
PublicationDateYYYYMMDD 2024-11-01
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-11-01
  day: 01
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2024
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Li (2024110904321050700_btae565-B16) 2014; 30
Kim (2024110904321050700_btae565-B13) 2018; 15
Shaw (2024110904321050700_btae565-B34) 2018
Garrison (2024110904321050700_btae565-B9) 2012
DePristo (2024110904321050700_btae565-B7) 2011; 43
Paszke (2024110904321050700_btae565-B26)
Wang (2024110904321050700_btae565-B3800)
Li (2024110904321050700_btae565-B18) 2009; 25
Vaswani (2024110904321050700_btae565-B37) 2017; 30
O’Fallon (2024110904321050700_btae565-B25)
Nielsen (2024110904321050700_btae565-B24) 2011; 12
Cooke (2024110904321050700_btae565-B5) 2021; 39
Qi (2024110904321050700_btae565-B30) 2021.
Cleary (2024110904321050700_btae565-B4) 2015
Izmailov (2024110904321050700_btae565-B12) 2018
Riquelme (2024110904321050700_btae565-B32) 2021; 34
Poplin (2024110904321050700_btae565-B28) 2018; 36
Wagner (2024110904321050700_btae565-B38) 2022; 2
Gupta (2024110904321050700_btae565-B11) 2020; 1
Ramachandran (2024110904321050700_btae565-B31) 2020
Su (2024110904321050700_btae565-B36) 2024; 568
Dao (2024110904321050700_btae565-B6) 2022; 35
Roy (2024110904321050700_btae565-B33) 2021; 9
Luo (2024110904321050700_btae565-B21) 2019; 10
Shazeer (2024110904321050700_btae565-B35) 2019
Marco-Sola (2024110904321050700_btae565-B23) 2012; 9
Choromanski (2024110904321050700_btae565-B3) 2020
Goldfeder (2024110904321050700_btae565-B10) 2016; 8
Luo (2024110904321050700_btae565-B22) 2020; 2
Köster (2024110904321050700_btae565-B14) 2020; 21
Krusche (2024110904321050700_btae565-B15) 2019; 37
Baid (2024110904321050700_btae565-B1) 2023; 41
Behera (2024110904321050700_btae565-B2)
Li (2024110904321050700_btae565-B19) 2019; 15
References_xml – volume: 34
  start-page: 8583
  year: 2021
  ident: 2024110904321050700_btae565-B32
  article-title: Scaling vision with sparse mixture of experts
  publication-title: Adv Neural Inf Proces Syst
– volume: 2
  start-page: 1
  year: 2022
  ident: 2024110904321050700_btae565-B38
  article-title: Benchmarking challenging small variants with linked and long reads
  publication-title: Cell Genom
– volume: 36
  start-page: 983
  year: 2018
  ident: 2024110904321050700_btae565-B28
  article-title: A universal SNP and small-indel variant caller using deep neural networks
  publication-title: Nat Biotechnol
  doi: 10.1038/nbt.4235
– volume: 15
  start-page: 591
  year: 2018
  ident: 2024110904321050700_btae565-B13
  article-title: Strelka2: fast and accurate calling of germline and somatic variants
  publication-title: Nat Methods
  doi: 10.1038/s41592-018-0051-x
– volume: 30
  year: 2017
  ident: 2024110904321050700_btae565-B37
  article-title: Attention is all you need
  publication-title: Adv Neural Inf Process Syst
– year: 2015
  ident: 2024110904321050700_btae565-B4
  doi: 10.1101/023754,
– year: 2020
  ident: 2024110904321050700_btae565-B3
– volume: 12
  start-page: 443
  year: 2011
  ident: 2024110904321050700_btae565-B24
  article-title: Genotype and SNP calling from next-generation sequencing data
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg2986
– volume: 21
  start-page: 98
  year: 2020
  ident: 2024110904321050700_btae565-B14
  article-title: Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-01993-6
– year: 2019
  ident: 2024110904321050700_btae565-B35
– volume: 43
  start-page: 491
  year: 2011
  ident: 2024110904321050700_btae565-B7
  article-title: A framework for variation discovery and genotyping using next-generation DNA sequencing data
  publication-title: Nat Genet
  doi: 10.1038/ng.806
– volume: 25
  start-page: 2078
  year: 2009
  ident: 2024110904321050700_btae565-B18
  article-title: The sequence alignment/map format and SAMtools
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp352
– year: 2018
  ident: 2024110904321050700_btae565-B34
– ident: 2024110904321050700_btae565-B25
  doi: 10.1101/2022.09.12.506413,
– ident: 2024110904321050700_btae565-B2
  doi: 10.1101/2024.01.02.573821
– ident: 2024110904321050700_btae565-B3800
– volume: 1
  start-page: 025013
  year: 2020
  ident: 2024110904321050700_btae565-B11
  article-title: DAVI: deep learning-based tool for alignment and single nucleotide variant identification
  publication-title: Mach Learn Sci Technol
  doi: 10.1088/2632-2153/ab7e19
– volume: 9
  start-page: 1185
  year: 2012
  ident: 2024110904321050700_btae565-B23
  article-title: The GEM mapper: fast, accurate and versatile alignment by filtration
  publication-title: Nat Methods
  doi: 10.1038/nmeth.2221
– volume: 2
  start-page: 220
  year: 2020
  ident: 2024110904321050700_btae565-B22
  article-title: Exploring the limit of using a deep neural network on pileup data for germline variant calling
  publication-title: Nat Mach Intell
  doi: 10.1038/s42256-020-0167-4
– volume: 30
  start-page: 2843
  year: 2014
  ident: 2024110904321050700_btae565-B16
  article-title: Toward better understanding of artifacts in variant calling from high-coverage samples
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu356
– volume: 41
  start-page: 232
  year: 2023
  ident: 2024110904321050700_btae565-B1
  article-title: DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
  publication-title: Nat Biotechnol
– volume: 39
  start-page: 885
  year: 2021
  ident: 2024110904321050700_btae565-B5
  article-title: A unified haplotype-based method for accurate and comprehensive variant calling
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-021-00861-3
– year: 2012
  ident: 2024110904321050700_btae565-B9
– year: 2020
  ident: 2024110904321050700_btae565-B31
  doi: 10.1101/2020.03.23.004473,
– volume: 8
  start-page: 24
  year: 2016
  ident: 2024110904321050700_btae565-B10
  article-title: Medical implications of technical accuracy in genome sequencing
  publication-title: Genome Med
  doi: 10.1186/s13073-016-0269-0
– volume: 37
  start-page: 555
  year: 2019
  ident: 2024110904321050700_btae565-B15
  article-title: Best practices for benchmarking germline small-variant calls in human genomes
  publication-title: Nat Biotechnol
  doi: 10.1038/s41587-019-0054-x
– volume: 15
  start-page: e1007556
  year: 2019
  ident: 2024110904321050700_btae565-B19
  article-title: ForestQC: quality control on genetic variants from next-generation sequencing data using random forest
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1007556
– volume: 10
  start-page: 998
  year: 2019
  ident: 2024110904321050700_btae565-B21
  article-title: A multi-task convolutional deep neural network for variant calling in single molecule sequencing
  publication-title: Nat Commun
  doi: 10.1038/s41467-019-09025-z
– year: 2021.
  ident: 2024110904321050700_btae565-B30
– volume-title: Advances in Neural Information Processing Systems
  ident: 2024110904321050700_btae565-B26
– year: 2018
  ident: 2024110904321050700_btae565-B12
– volume: 568
  start-page: 127063
  year: 2024
  ident: 2024110904321050700_btae565-B36
  article-title: RoFormer: enhanced transformer with rotary position embedding
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2023.127063
– volume: 9
  start-page: 53
  year: 2021
  ident: 2024110904321050700_btae565-B33
  article-title: Efficient content-based sparse attention with routing transformers
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00353
– volume: 35
  start-page: 16344
  year: 2022
  ident: 2024110904321050700_btae565-B6
  article-title: FlashAttention: fast and memory-efficient exact attention with IO-awareness
  publication-title: Adv Neural Inf Process Syst
SSID ssj0005056
Score 2.472899
Snippet Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools...
Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools typically rely...
Motivation Detection of germline variants in next-generation sequencing data is an essential component of modern genomics analysis. Variant detection tools...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
oup
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
SubjectTerms Accuracy
Algorithms
Availability
Diploids
Gene deletion
Gene sequencing
Genome, Human
Genomes
Genomic analysis
Genomics - methods
Genotypes
Genotyping
Graph theory
Haplotypes
High-Throughput Nucleotide Sequencing - methods
Humans
Large language models
Markov chains
Next-generation sequencing
Nucleotides
Original Paper
Polymorphism, Single Nucleotide
Sensitivity
Sequence Analysis, DNA - methods
Software
Source code
Statistical analysis
Statistical methods
Thresholds
Title Generative haplotype prediction outperforms statistical methods for small variant detection in next-generation sequencing data
URI https://www.ncbi.nlm.nih.gov/pubmed/39298478
https://www.proquest.com/docview/3213586538
https://www.proquest.com/docview/3107155412
https://pubmed.ncbi.nlm.nih.gov/PMC11549014
https://doi.org/10.1093/bioinformatics/btae565
UnpaywallVersion publishedVersion
Volume 40
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: KQ8
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: DOA
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: ADMLS
  dateStart: 19980101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: DIK
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: GX1
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: RPM
  dateStart: 20070101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVOVD
  databaseName: Journals@Ovid LWW All Open Access Journal Collection Rolling
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: OVEED
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: http://ovidsp.ovid.com/
  providerName: Ovid
– providerCode: PRVASL
  databaseName: Oxford - Revues - OpenAccess
  customDbUrl:
  eissn: 1367-4811
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0005056
  issn: 1367-4811
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwED9BJwQvfH8ExmQknpCyJnHipI8TYpqQ2HhYpfIU-RyXVStJtCag8cDfzl3tVgsIqbxEUWwnsu_s-13O9zPA2xwrTDNyU5VCFabM3qpljOGkSooqySOMJOc7fzpVJ9P04yybeUeRc2EG8fuJHOOi8QyizFo8xk5bgiC3YU9lhL1HsDc9_Xz0xSVX5WFarI9C9vdxvEkJ_ueLBtZokOF2A2j-vV_ybl-3-vqHXi5vGKPjB3C26Ybbg3J52Hd4aH7-wfC4ez8fwn2PS8WRU6RHcMvWj-GOO6ny-gn8cvTUvDaKC90uG_51K9orDvOwaEXTd63LQVgJTlJa8z_TC90J1StBJWL1jboivpN3TuIUle2sa7uoRc0O-Ff_DXrkd3iTXRW8h_UpTI8_nL8_Cf3RDaHJoqQjoZu5zqMY80rOJ4gEfAqk1YPpDckJyisTzxXaSWp1ZMjrM6nWGdpUK4IcKCv5DEZ1U9sXIKJKGYIV0ppkThbXFhmmyiaFNoagq7YBjDciLFvH0FG6yLosh6Na-lEN4B1JeufK-xuFKP30XpUyiWVWKDIWAbzZFtPE5GiLrm3TUx1yrBmsxUkAz53-bD_JoJRgAbUuBpq1rcCk38OSenGxJv9m-iQOfQcQbZVwx668_P8mr-BeQijOJV_uw6i76u1rQmEdHqz_XtD1_Gx24Kfhb1UOPhQ
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB7BVoheypsGCjISJ6R0kzhxsscKUVVIFA6sVE6Rx_HSFUsSdRNQe-hv78zau2pASMstih-RPWPPNxnPZ4C3OVaYZuSmKoUqTJm9VcsYw0mVFFWSRxhJznf-dKpOpunHs-zMO4qcCzOI30_kGOeNZxBl1uIxdtoSBLkLOyoj7D2Cnenpl6NvLrkqD9NidRWyf47jdUrwPzsaWKNBhtstoPn3ecn7fd3qy996sbhljI4fwOf1MNwZlB-HfYeH5uoPhsftx_kQ9jwuFUdOkR7BHVs_hnvupsrLJ3Dt6Kl5bxTnul00_OtWtBcc5mHRiqbvWpeDsBScpLTif6YO3Q3VS0ElYvmThiJ-kXdO4hSV7axrO69FzQ74d_8NeuVPeJNdFXyG9SlMjz98fX8S-qsbQpNFSUdCNzOdRzHmlZxNEAn4FEi7B9MbkhOUVyaeKbST1OrIkNdnUq0ztKlWBDlQVvIZjOqmtvsgokoZghXSmmRGFtcWGabKJoU2hqCrtgGM1yIsW8fQUbrIuiyHs1r6WQ3gHUl668oHa4Uo_fJeljKJZVYoMhYBvNkU08LkaIuubdNTHXKsGazFSQDPnf5sPsmglGABtS4GmrWpwKTfw5J6fr4i_2b6JA59BxBtlHDLobz4_yYvYTchFOeSLw9g1F309hWhsA5f-6V3A32aPAM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generative+haplotype+prediction+outperforms+statistical+methods+for+small+variant+detection+in+next-generation+sequencing+data&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=O%E2%80%99Fallon%2C+Brendan&rft.au=Bolia%2C+Ashini&rft.au=Durtschi%2C+Jacob&rft.au=Yang%2C+Luobin&rft.date=2024-11-01&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=40&rft.issue=11&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtae565&rft_id=info%3Apmid%2F39298478&rft.externalDocID=PMC11549014
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4811&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4811&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4811&client=summon