Personalized pangenome references

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are...

Full description

Saved in:
Bibliographic Details
Published inNature methods Vol. 21; no. 11; pp. 2017 - 2023
Main Authors Sirén, Jouni, Eskandar, Parsa, Ungaro, Matteo Tommaso, Hickey, Glenn, Eizenga, Jordan M., Novak, Adam M., Chang, Xian, Chang, Pi-Chuan, Kolmogorov, Mikhail, Carroll, Andrew, Monlong, Jean, Paten, Benedict
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.11.2024
Nature Publishing Group
Subjects
Online AccessGet full text
ISSN1548-7091
1548-7105
1548-7105
DOI10.1038/s41592-024-02407-2

Cover

More Information
Summary:Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k -mer counts in the reads. We implement the approach in the vg toolkit ( https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods. This work introduces a k -mer-based approach to customizing a pangenome reference, making it more relevant to a new sample of interest. This method enhances the accuracy of genotyping small variants and large structural variants.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1548-7091
1548-7105
1548-7105
DOI:10.1038/s41592-024-02407-2