DNA Variant Calling in Targeted Sequencing Data
IntroductionRare DNA variants (minor allele frequency [MAF] of 1% or less in a population), occurring less than one in every 1 KB (Wang et al., 1998), can be distributed in different genes, interact with each other, and affect more than one disease phenotype. To study the association of rare variant...
Saved in:
| Published in | Advances in Statistical Bioinformatics pp. 54 - 76 |
|---|---|
| Main Authors | , , |
| Format | Book Chapter |
| Language | English |
| Published |
Cambridge University Press
10.06.2013
|
| Online Access | Get full text |
| ISBN | 1107027527 9781107027527 |
| DOI | 10.1017/CBO9781139226448.004 |
Cover
| Summary: | IntroductionRare DNA variants (minor allele frequency [MAF] of 1% or less in a population), occurring less than one in every 1 KB (Wang et al., 1998), can be distributed in different genes, interact with each other, and affect more than one disease phenotype. To study the association of rare variants with diseases, it is necessary to obtain many DNA genomes from individuals with specific disorders. Even though next-generation sequencing has achieved a low cost per base and a high throughput on the terabase (TB) scale, it is still challenging to sequence hundreds of samples at regular laboratories and at the same time to comply with the high standards of accuracy and completeness in medical research. Recent developments in targeted sequencing provide a timely solution by generating sequencing data from the genomic regions of interest (e.g., 1 MB for 500 candidate genes vs. 3 TB for whole-genome, per sample), therefore reducing the time, the cost, and the amount of data in the downstream analysis. The selection of these regions or candidate genes can be done through linkage mapping, phenotype-based gene association, or network analysis (Scharfe et al., 2009).Efficient and specific enrichment of tens of thousands of selected genomic regions across hundreds of samples is essential for the success of a targeted sequencing study. This field is currently still under development. The available methods include hybridization-based capture and in-solution capture. Compared with hybridization-based methods, in-solution enrichment strategies usually deliver higher target specificity (>98%) with lower costs and smaller DNA sample requirements, which is useful for multisample studies. In particular, we have developed a novel probe-based in-solution capture technology called long padlock probes (LPP) method (Shen et al., 2011) . |
|---|---|
| ISBN: | 1107027527 9781107027527 |
| DOI: | 10.1017/CBO9781139226448.004 |