Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of...

Full description

Saved in:
Bibliographic Details
Published inViruses Vol. 16; no. 3; p. 430
Main Authors Connor, Ryan, Shakya, Migun, Yarmosh, David A., Maier, Wolfgang, Martin, Ross, Bradford, Rebecca, Brister, J. Rodney, Chain, Patrick S. G., Copeland, Courtney A., di Iulio, Julia, Hu, Bin, Ebert, Philip, Gunti, Jonathan, Jin, Yumi, Katz, Kenneth S., Kochergin, Andrey, LaRosa, Tré, Li, Jiani, Li, Po-E, Lo, Chien-Chi, Rashid, Sujatha, Maiorova, Evguenia S., Xiao, Chunlin, Zalunin, Vadim, Purcell, Lisa, Pruitt, Kim D.
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 11.03.2024
MDPI
Subjects
Online AccessGet full text
ISSN1999-4915
1999-4915
DOI10.3390/v16030430

Cover

More Information
Summary:Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
European Union Horizon 2020
USDOE Laboratory Directed Research and Development (LDRD) Program
89233218CNA000001; 871075; 101046203; HHSN272201600013C; 20200732ER; 20210767DI
National Institutes of Health (NIH)
These authors contributed equally to this work.
ISSN:1999-4915
1999-4915
DOI:10.3390/v16030430