Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads

Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We he...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 7; no. 8; p. e41224
Main Authors	Davenport, Colin F., Neugebauer, Jens, Beckmann, Nils, Friedrich, Benedikt, Kameri, Burim, Kokott, Svea, Paetow, Malte, Siekmann, Björn, Wieding-Drewes, Matthias, Wienhöfer, Markus, Wolf, Stefan, Tümmler, Burkhard, Ahlers, Volker, Sprengel, Frauke
Format	Journal Article
Language	English
Published	United States Public Library of Science 21.08.2012 Public Library of Science (PLoS)
Subjects	Algorithms Bacteria Bacteria - classification Bacteria - genetics bacterial genomics BASIC BIOLOGICAL SCIENCES Bioinformatics Biology BLAST algorithm Computer Science Computer simulation Cystic fibrosis Data processing Datasets DNA sequencing Feasibility studies Gene sequencing Genes genome sequencing Genomes Genomics Graphical user interface High-Throughput Nucleotide Sequencing human genomics Humans Intestines - microbiology Java (programming language) Medical schools metagenomics Metagenomics - instrumentation Metagenomics - methods Microbial activity Microcomputers Microorganisms Next-generation sequencing Operating systems Pediatrics Personal computers Phylogenetics sequence alignment shotgun sequencing Species Taxa Taxonomy Time Factors Germany
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0041224

Cover

More Information
Summary:	Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 AC02-05CH11231; GRK 653/3 German Research Council (DFG) USDOE Office of Science (SC), Biological and Environmental Research (BER) Conceived and designed the experiments: CD JN BT VA FS. Performed the experiments: CD JN NB BF BK SK MP BS MW-D MW SW. Analyzed the data: CD BT. Wrote the paper: CD BT. Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0041224