Bayesian Model-Based Approaches for Solexa Sequencing Data

IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (B...

Full description

Saved in:
Bibliographic Details
Published inAdvances in Statistical Bioinformatics pp. 126 - 137
Main Authors Mitra, Riten, Mueller, Peter, Ji, Yuan
Format Book Chapter
LanguageEnglish
Published Cambridge University Press 10.06.2013
Online AccessGet full text
ISBN1107027527
9781107027527
DOI10.1017/CBO9781139226448.007

Cover

More Information
Summary:IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (Barski et al., 2007; Mikkelsen et al., 2007), transcriptome analysis (Friedländer et al., 2008; Hafner et al., 2008; Vera et al., 2008), and de novo genome assembly (Chaisson and Pevzner, 2008). In this chapter, we focus on the Illumina/Solexa sequencing platform. However, data from other technologies have similar characteristics, and we expect models similar to the one presented here to remain useful also for these technologies.Solexa sequencing (www.illumina.com) produces millions of polymerase chain reaction (PCR) amplified and labeled sequences of short reads. For each short read, the measurements of their fluorescent intensities are stored in an I × 4 matrix, where I is the length of the read (e.g., I = 36). Such amatrix corresponds to a colony. The positions i = 1, …, I in the short read are sequenced in cycles by a biochemical procedure called sequencing-by-synthesis. As a result, each row of the colony matrix contains measurements from a cycle in the experiment in which the sequence of a single base is synthesized. At each cycle, all four nucleotides (A, C, G, and T) labeled with four different fluorescent dyes are probed, thus producing a quadruple vector of fluorescent intensity scores.
ISBN:1107027527
9781107027527
DOI:10.1017/CBO9781139226448.007