Curated variation benchmarks for challenging medically relevant autosomal genes
The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polym...
Saved in:
Published in | Nature biotechnology Vol. 40; no. 5; pp. 672 - 680 |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Nature Publishing Group US
01.05.2022
Nature Publishing Group |
Subjects | |
Online Access | Get full text |
ISSN | 1087-0156 1546-1696 1546-1696 |
DOI | 10.1038/s41587-021-01158-1 |
Cover
Summary: | The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including
CBS
,
CRYAA
and
KCNE1
. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
Variant detection in problematic genes is facilitated with a curated benchmark. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Resources: CX Supervision: C-SC, JMZ, FJS Data curation: JW, NDO, JM Software: JW, NDO These authors contributed equally Conceptualization: JW, NDO, AF, KHM, SEL, MTWE, HL, C-SC, JMZ, FJS Project administration: JW, JMZ, FJS Validation: JW, NDO, LH, JM, HC, AF, Y-CH, RG, AMW, WJR, ZMK, JF, YZ, AP, MM, CX, BY, SMES, DJ, JML-S, AM-B, LAR-R, CF, GN, USE, SEC, JL, HL, C-SC, JMZ, FJS Formal Analysis - benchmark: JW, NDO, JM, JMZ Writing - review & editing: JW, NDO, DEM, JL, CEM, SEL, MTWE, C-SC, JMZ, FJS Methodology: JW, HC, HL, C-SC, JMZ, FJS Formal Analysis - assembly: HC, AS, HL, C-SC Writing - original draft: JW, LH, C-SC, JMZ, FJS Author contributions Visualization: JW, NDO, HC, HL, C-SC |
ISSN: | 1087-0156 1546-1696 1546-1696 |
DOI: | 10.1038/s41587-021-01158-1 |