Alignment‐Free Measures for Whole‐Genome Comparison

The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicel...

Full description

Saved in:
Bibliographic Details
Published inPattern Recognition in Computational Molecular Biology pp. 43 - 64
Main Authors Comin, Matteo, Verzotto, Davide
Format Book Chapter
LanguageEnglish
Published Hoboken, NJ, USA John Wiley & Sons, Inc 19.11.2015
Subjects
Online AccessGet full text
ISBN9781118893685
1118893689
DOI10.1002/9781119078845.ch3

Cover

More Information
Summary:The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicellular eukaryotes, using whole‐genome pairwise sequence comparison. It reviews the problem of whole‐genome comparison and the most important methods and also the use of alignment‐free distance function based on subword compositions. It proves that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of independent subwords, namely the irredundant common subwords. The chapter also shows how to compute the irredundant common subwords and the matching statistics.
ISBN:9781118893685
1118893689
DOI:10.1002/9781119078845.ch3