Bayesian Model-Based Approaches for Solexa Sequencing Data

IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (B...

Full description

Saved in:

Bibliographic Details
Published in	Advances in Statistical Bioinformatics pp. 126 - 137
Main Authors	Mitra, Riten, Mueller, Peter, Ji, Yuan
Format	Book Chapter
Language	English
Published	Cambridge University Press 10.06.2013
Online Access	Get full text
ISBN	1107027527 9781107027527
DOI	10.1017/CBO9781139226448.007

Cover

Abstract	IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (Barski et al., 2007; Mikkelsen et al., 2007), transcriptome analysis (Friedländer et al., 2008; Hafner et al., 2008; Vera et al., 2008), and de novo genome assembly (Chaisson and Pevzner, 2008). In this chapter, we focus on the Illumina/Solexa sequencing platform. However, data from other technologies have similar characteristics, and we expect models similar to the one presented here to remain useful also for these technologies.Solexa sequencing (www.illumina.com) produces millions of polymerase chain reaction (PCR) amplified and labeled sequences of short reads. For each short read, the measurements of their fluorescent intensities are stored in an I × 4 matrix, where I is the length of the read (e.g., I = 36). Such amatrix corresponds to a colony. The positions i = 1, …, I in the short read are sequenced in cycles by a biochemical procedure called sequencing-by-synthesis. As a result, each row of the colony matrix contains measurements from a cycle in the experiment in which the sequence of a single base is synthesized. At each cycle, all four nucleotides (A, C, G, and T) labeled with four different fluorescent dyes are probed, thus producing a quadruple vector of fluorescent intensity scores.
AbstractList	IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of sequence data per day. These technologies improve both speed and cost and have found applications in genotyping, protein-DNA interactions (Barski et al., 2007; Mikkelsen et al., 2007), transcriptome analysis (Friedländer et al., 2008; Hafner et al., 2008; Vera et al., 2008), and de novo genome assembly (Chaisson and Pevzner, 2008). In this chapter, we focus on the Illumina/Solexa sequencing platform. However, data from other technologies have similar characteristics, and we expect models similar to the one presented here to remain useful also for these technologies.Solexa sequencing (www.illumina.com) produces millions of polymerase chain reaction (PCR) amplified and labeled sequences of short reads. For each short read, the measurements of their fluorescent intensities are stored in an I × 4 matrix, where I is the length of the read (e.g., I = 36). Such amatrix corresponds to a colony. The positions i = 1, …, I in the short read are sequenced in cycles by a biochemical procedure called sequencing-by-synthesis. As a result, each row of the colony matrix contains measurements from a cycle in the experiment in which the sequence of a single base is synthesized. At each cycle, all four nucleotides (A, C, G, and T) labeled with four different fluorescent dyes are probed, thus producing a quadruple vector of fluorescent intensity scores.
Author	Mueller, Peter Mitra, Riten Ji, Yuan
Author_xml	– sequence: 1 givenname: Riten surname: Mitra fullname: Mitra, Riten organization: University of Texas – sequence: 2 givenname: Peter surname: Mueller fullname: Mueller, Peter organization: University of Texas – sequence: 3 givenname: Yuan surname: Ji fullname: Ji, Yuan organization: NorthShore University Health-System
BookMark	eNqNkM1OwzAQhI0ACVryBhz8Aile58cJnJKWP6moh8I5WjvrEkjjEhepvD1BcClcmMtKO9pvtDNiR53riLFzEBMQoC6m5SJXGUCUS5nGcTYRQh2wYG93yEYAQgmpEqlOWOD9ixiUZWkeRafsssQP8g12_MHV1IYleqp5sdn0Ds0zeW5dz5eupR3yJb29U2eabsVnuMUzdmyx9RT8zDF7url-nN6F88Xt_bSYhwbyZBum0uhEJBJljApsVqvI6iEbYmMExEJpQmMTSkl9mbUyshZGawuEBCaPxqz45hpc676pV1QZ15N27tVXe79Wu3Vb_S6lGEIGxtUfhnb_vf4E03ZmHQ
ContentType	Book Chapter
Copyright	Cambridge University Press 2013
Copyright_xml	– notice: Cambridge University Press 2013
DOI	10.1017/CBO9781139226448.007
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISBN	9781139226448 1139226444
EndPage	137
ExternalDocumentID	9781139226448_xml_CBO9781139226448A014
GroupedDBID	-G2 -VX 089 20A 38. A4J AAAAZ AABBV AAHFW ABARN ABESS ABMFC ABMRC ABWAU ABZUC ACLGV ACNOG ADCGF ADQZK ADVEM AEDFS AERYV AEWAL AEWQY AFQOZ AHAWV AHWGJ AIXPE AJFER AJXXZ ALMA_UNASSIGNED_HOLDINGS AMJDZ ANGWU ASYWF AZZ BBABE BFIBU BOIVQ COBLI COXPH CZZ DNKAV DUGUG EBSCA ECOWB EUQYS FH2 ICERG IPICV JJU MYL OLDIN OTBUH OZASK OZBHS PP- PQQKQ S3M SACVX SN- XI1 ZXKUE ABQPQ
ID	FETCH-LOGICAL-c195t-62cb5052a24a71f8d73fb93314cc01407beacf5e6e7f8d7d7c2d0cbbf1eae1c93
ISBN	1107027527 9781107027527
IngestDate	Fri Feb 21 02:33:48 EST 2025 Wed Jul 30 03:57:19 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c195t-62cb5052a24a71f8d73fb93314cc01407beacf5e6e7f8d7d7c2d0cbbf1eae1c93
PageCount	12
ParticipantIDs	cambridge_corebooks_9781139226448_xml_CBO9781139226448A014 cambridge_cbo_9781139226448_xml_CBO9781139226448A014
PublicationCentury	2000
PublicationDate	20130610 20130605
PublicationDateYYYYMMDD	2013-06-10 2013-06-05
PublicationDate_xml	– month: 06 year: 2013 text: 20130610 day: 10
PublicationDecade	2010
PublicationSubtitle	Models and Integrative Inference for High-Throughput Data
PublicationTitle	Advances in Statistical Bioinformatics
PublicationYear	2013
Publisher	Cambridge University Press
Publisher_xml	– name: Cambridge University Press
SSID	ssj0000886933
Score	1.4183646
Snippet	IntroductionRecent advances in next-generation sequencing have hugely impacted biological research through high-throughput platforms that generate megabases of...
SourceID	cambridge
SourceType	Publisher
StartPage	126
Title	Bayesian Model-Based Approaches for Solexa Sequencing Data
URI	http://dx.doi.org/10.1017/CBO9781139226448.007 https://doi.org/10.1017/CBO9781139226448.007?locatt=mode:legacy
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF5qRRAvPvFNDt5ktZtNsk1v1gdSqB5U0FPJPgKCNlBTUH-9M7t51Qeol5Ck2e0m3zIzuzPfDCEHftIxTKqQplEaUZB-Po21z2litBSp5JwnuFAcXkWXd8HgPrxvtQaNqKVpLo_U-7e8kv-gCvcAV2TJ_gHZqlO4AeeALxwBYTh-Mn5nt1ldeLHz3tt4VjQZbcZl-OD9x6zIhpo3ItmHj_nE2YnQzbj-zKakAs4E6g6sj_9hWk6dwumbvBnLucQCak-0DwpQoxlrSVnGJnY4vMmQMQMiyEZo4z7EmWO_VXsLts4DLaJMf6SNNUND3CoUl5Do_XQk_0ISMj9qKFXmMrt8kdcuydNp_9p2A7aaXS9iPvNaP1VRgzPPjF6fn0afG550sKb5nOiCxJsHvX4-rPbbQJpGMee2WFQ53CLlV3VdUiuZOP5uSM30Gw0j5HaZLCExxUPGCIx3hbTMeJUsuGKib2ukV8LjNeDxang8gMdz8Hg1PB7Cs07uLs5vTy9pURmDKhaHOY18JbECYeIHiWBpVwueSng7FiiFS2YhQZ-moYmMwB-1UL7uKClTZhLDVMw3SHucjc0m8XQgQmbQX4-5H4WOFe8m8HwaSaNZILdIUL30SMls9DsUtkiv0SybWNf_yy8bb__vP3fIYj2Jd0k7n0zNHpiMudwvZsMHlThdCg
linkProvider	ProQuest Ebooks
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Advances+in+Statistical+Bioinformatics&rft.au=Mitra%2C+Riten&rft.au=Mueller%2C+Peter&rft.au=Ji%2C+Yuan&rft.atitle=Bayesian+Model-Based+Approaches+for+Solexa+Sequencing+Data&rft.date=2013-06-10&rft.pub=Cambridge+University+Press&rft.isbn=9781107027527&rft.spage=126&rft.epage=137&rft_id=info:doi/10.1017%2FCBO9781139226448.007&rft.externalDocID=9781139226448_xml_CBO9781139226448A014
thumbnail_m	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fassets.cambridge.org%2F97811070%2F27527%2Fcover%2F9781107027527.jpg