A Naive Bayes Classifier for Protein Function Prediction

A Naive Bayes classifier tool is presented for annotating proteins on the basis of amino acid motifs, cellular localization and protein-protein interactions. Annotations take the form of posterior probabilities within the Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the d...

Full description

Saved in:

Bibliographic Details
Published in	In silico biology Vol. 9; no. 1-2; pp. 23 - 34
Main Authors	Kohonen, Jukka, Talikota, Sarish, Corander, Jukka, Auvinen, Petri, Arjas, Elja
Format	Journal Article
Language	English
Published	London, England SAGE Publications 2009 IOS Press
Subjects	Algorithms Biological and medical sciences Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Saccharomyces cerevisiae - genetics Saccharomyces cerevisiae - metabolism Saccharomyces cerevisiae Proteins - classification Saccharomyces cerevisiae Proteins - physiology Signal Transduction - physiology Naive Bayes data integration Gene Ontology Protein function prediction Gene ontology Data integration Prediction Bayes network Probabilistic model data integration. Gene Ontology Bioinformatics Supervised classification Protein
Online Access	Get full text
ISSN	1386-6338 1434-3207
DOI	10.3233/ISB-2009-0382

Cover

More Information
Summary:	A Naive Bayes classifier tool is presented for annotating proteins on the basis of amino acid motifs, cellular localization and protein-protein interactions. Annotations take the form of posterior probabilities within the Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the data available for yeast, Saccharomyces cerevisiae, show that our prediction method can yield a relatively high level of accuracy. Several apparent challenges and possibilities for future developments are also discussed. A common approach to functional characterization is to use sequence similarities at varying levels, by utilizing several existing databases and local alignment/identification algorithms. Such an approach is typically quite labor-intensive when performed by an expert in a manual fashion. Integration of several sources of information is in this context generally considered as the only possibility to obtain valuable predictions with practical implications. However, some improvements in the prediction accuracy of the molecular functions, and thereby also savings in the computational effort, can be achieved by restricting attention to only those data sources that involve a higher degree of specificity. We employ here a Naive Bayes model in order to provide probabilistic predictions, and to enable a computationally efficient approach to data integration.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1386-6338 1434-3207
DOI:	10.3233/ISB-2009-0382