Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance

Background With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. H...

Full description

Saved in:

Bibliographic Details
Published in	BMC research notes Vol. 7; no. 1; p. 747
Main Authors	Kumar, Pankaj, Al-Shafai, Mashael, Al Muftah, Wadha Ahmed, Chalhoub, Nader, Elsaid, Mahmoud F, Aleem, Alice Abdel, Suhre, Karsten
Format	Journal Article
Language	English
Published	London BioMed Central 22.10.2014 BioMed Central Ltd Springer Nature B.V
Subjects	Algorithms Analysis Arabs - genetics Bioinformatics Biomedical and Life Sciences Biomedical research Biomedicine Biotechnology industry Comparative analysis Data analysis Databases, Genetic Diabetes Mellitus - ethnology Diabetes Mellitus - genetics DNA sequencing Freeware Genetic aspects Genetic Predisposition to Disease Genome, Human Genome-Wide Association Study - methods Genomes Heredity High-Throughput Nucleotide Sequencing - methods Humans Life Sciences Medicine/Public Health Models, Genetic Nucleotide sequencing Obesity - ethnology Obesity - genetics Pedigree Phenotype Polymorphism, Single Nucleotide Rare Diseases - ethnology Rare Diseases - genetics Reproducibility of Results Research Article Single nucleotide polymorphisms Software Standard deviation Studies United States Qatar GATK Variant Qatari population Multi-sample calling WGS pipeline Mendelian inheritance CASAVA Trios NGS Illumina Genotype calling
Online Access	Get full text
ISSN	1756-0500 1756-0500
DOI	10.1186/1756-0500-7-747

Cover

More Information
Summary:	Background With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. Results Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. Conclusion Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-3
ISSN:	1756-0500 1756-0500
DOI:	10.1186/1756-0500-7-747