A Naive Bayes Classifier for Protein Function Prediction

A Naive Bayes classifier tool is presented for annotating proteins on the basis of amino acid motifs, cellular localization and protein-protein interactions. Annotations take the form of posterior probabilities within the Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the d...

Full description

Saved in:
Bibliographic Details
Published inIn silico biology Vol. 9; no. 1-2; pp. 23 - 34
Main Authors Kohonen, Jukka, Talikota, Sarish, Corander, Jukka, Auvinen, Petri, Arjas, Elja
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 2009
IOS Press
Subjects
Online AccessGet full text
ISSN1386-6338
1434-3207
DOI10.3233/ISB-2009-0382

Cover

More Information
Summary:A Naive Bayes classifier tool is presented for annotating proteins on the basis of amino acid motifs, cellular localization and protein-protein interactions. Annotations take the form of posterior probabilities within the Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the data available for yeast, Saccharomyces cerevisiae, show that our prediction method can yield a relatively high level of accuracy. Several apparent challenges and possibilities for future developments are also discussed. A common approach to functional characterization is to use sequence similarities at varying levels, by utilizing several existing databases and local alignment/identification algorithms. Such an approach is typically quite labor-intensive when performed by an expert in a manual fashion. Integration of several sources of information is in this context generally considered as the only possibility to obtain valuable predictions with practical implications. However, some improvements in the prediction accuracy of the molecular functions, and thereby also savings in the computational effort, can be achieved by restricting attention to only those data sources that involve a higher degree of specificity. We employ here a Naive Bayes model in order to provide probabilistic predictions, and to enable a computationally efficient approach to data integration.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1386-6338
1434-3207
DOI:10.3233/ISB-2009-0382