Personalized pangenome references

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are...

Full description

Saved in:

Bibliographic Details
Published in	Nature methods Vol. 21; no. 11; pp. 2017 - 2023
Main Authors	Sirén, Jouni, Eskandar, Parsa, Ungaro, Matteo Tommaso, Hickey, Glenn, Eizenga, Jordan M., Novak, Adam M., Chang, Xian, Chang, Pi-Chuan, Kolmogorov, Mikhail, Carroll, Andrew, Monlong, Jean, Paten, Benedict
Format	Journal Article
Language	English
Published	New York Nature Publishing Group US 01.11.2024 Nature Publishing Group
Subjects	631/208 631/208/212 Accuracy Algorithms Bioinformatics Biological Microscopy Biological Techniques Biomedical and Life Sciences Biomedical Engineering/Biotechnology Customization Error reduction Gene Frequency Genetic diversity Genetic Variation Genome, Human Genomes Genomic analysis Genomics - methods Genotyping Graph theory Graphs Haplotypes High-Throughput Nucleotide Sequencing - methods Humans Life Sciences Proteomics Software Toolkits
Online Access	Get full text
ISSN	1548-7091 1548-7105 1548-7105
DOI	10.1038/s41592-024-02407-2

Cover

More Information
Summary:	Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k -mer counts in the reads. We implement the approach in the vg toolkit ( https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods. This work introduces a k -mer-based approach to customizing a pangenome reference, making it more relevant to a new sample of interest. This method enhances the accuracy of genotyping small variants and large structural variants.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1548-7091 1548-7105 1548-7105
DOI:	10.1038/s41592-024-02407-2