Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance

Background With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. H...

Full description

Saved in:
Bibliographic Details
Published inBMC research notes Vol. 7; no. 1; p. 747
Main Authors Kumar, Pankaj, Al-Shafai, Mashael, Al Muftah, Wadha Ahmed, Chalhoub, Nader, Elsaid, Mahmoud F, Aleem, Alice Abdel, Suhre, Karsten
Format Journal Article
LanguageEnglish
Published London BioMed Central 22.10.2014
BioMed Central Ltd
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1756-0500
1756-0500
DOI10.1186/1756-0500-7-747

Cover

More Information
Summary:Background With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. Results Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. Conclusion Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-3
ISSN:1756-0500
1756-0500
DOI:10.1186/1756-0500-7-747