Alignment‐Free Measures for Whole‐Genome Comparison

The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicel...

Full description

Saved in:

Bibliographic Details
Published in	Pattern Recognition in Computational Molecular Biology pp. 43 - 64
Main Authors	Comin, Matteo, Verzotto, Davide
Format	Book Chapter
Language	English
Published	Hoboken, NJ, USA John Wiley & Sons, Inc 19.11.2015
Subjects	alignment‐free methods irredundant common subwords matching statistics phylogeny reconstruction string algorithms whole‐genome sequence analysis
Online Access	Get full text
ISBN	9781118893685 1118893689
DOI	10.1002/9781119078845.ch3

Cover

More Information
Summary:	The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicellular eukaryotes, using whole‐genome pairwise sequence comparison. It reviews the problem of whole‐genome comparison and the most important methods and also the use of alignment‐free distance function based on subword compositions. It proves that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of independent subwords, namely the irredundant common subwords. The chapter also shows how to compute the irredundant common subwords and the matching statistics.
ISBN:	9781118893685 1118893689
DOI:	10.1002/9781119078845.ch3