Alignment‐Free Measures for Whole‐Genome Comparison
The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicel...
Saved in:
| Published in | Pattern Recognition in Computational Molecular Biology pp. 43 - 64 |
|---|---|
| Main Authors | , |
| Format | Book Chapter |
| Language | English |
| Published |
Hoboken, NJ, USA
John Wiley & Sons, Inc
19.11.2015
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9781118893685 1118893689 |
| DOI | 10.1002/9781119078845.ch3 |
Cover
| Summary: | The global spread of low‐cost high‐throughput sequencing technologies has made a large number of complete genomes publicly available, and this number is still growing rapidly. This chapter addresses the phylogeny reconstruction problem for different organisms, namely viruses, prokaryotes, and unicellular eukaryotes, using whole‐genome pairwise sequence comparison. It reviews the problem of whole‐genome comparison and the most important methods and also the use of alignment‐free distance function based on subword compositions. It proves that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of independent subwords, namely the irredundant common subwords. The chapter also shows how to compute the irredundant common subwords and the matching statistics. |
|---|---|
| ISBN: | 9781118893685 1118893689 |
| DOI: | 10.1002/9781119078845.ch3 |