Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 8; no. 1; p. e53112
Main Authors	Abbas, Ahmed, Kong, Xin-Bing, Liu, Zhi, Jing, Bing-Yi, Gao, Xin
Format	Journal Article
Language	English
Published	United States Public Library of Science 07.01.2013 Public Library of Science (PLoS)
Subjects	Algorithms Automation Bioinformatics Biology Chemistry Computation Computational Biology - methods Computer applications Computer Science Computers Discriminant analysis Magnetic resonance Methods NMR Noise Nuclear magnetic resonance Nuclear Magnetic Resonance, Biomolecular - methods Picking Protein structure Proteins Proteins - chemistry Science Software
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0053112

Cover

More Information
Summary:	A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Critical revision of the manuscript: XK ZL. Conceived and designed the experiments: BJ XG. Performed the experiments: AA XG. Analyzed the data: AA BJ XG. Contributed reagents/materials/analysis tools: XK ZL XG. Wrote the paper: AA BJ XG. Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0053112