Predicting residue-residue contacts using random forest models
Motivation: Protein residue-residue contact prediction can be useful in predicting protein 3D structures. Current algorithms for such a purpose leave room for improvement. Results: We develop ProC_S3, a set of Random Forest algorithm-based models, for predicting residue-residue contact maps. The mod...
Saved in:
| Published in | Bioinformatics Vol. 27; no. 24; pp. 3379 - 3384 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Oxford
Oxford University Press
15.12.2011
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1367-4803 1367-4811 1460-2059 1367-4811 |
| DOI | 10.1093/bioinformatics/btr579 |
Cover
| Summary: | Motivation: Protein residue-residue contact prediction can be useful in predicting protein 3D structures. Current algorithms for such a purpose leave room for improvement.
Results: We develop ProC_S3, a set of Random Forest algorithm-based models, for predicting residue-residue contact maps. The models are constructed based on a collection of 1490 non-redundant, high-resolution protein structures using >1280 sequence-based features. A new amino acid residue contact propensity matrix and a new set of seven amino acid groups based on contact preference are developed and used in ProC_S3. ProC_S3 delivers a 3-fold cross-validated accuracy of 26.9% with coverage of 4.7% for top L 5 predictions (L is the number of residues in a protein) of long-range contacts (sequence separation ≥24). Further benchmark tests deliver an accuracy of 29.7% and coverage of 5.6% for an independent set of 329 proteins. In the recently completed Ninth Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP9), ProC_S3 is ranked as No. 1, No. 3, and No. 2 accuracies in the top L 5, L 10 and best 5 predictions of long-range contacts, respectively, among 18 automatic prediction servers.
Availability:
http://www.abl.ku.edu/proc/proc_s3.html.
Contact:
jwfang@ku.edu
Supplementary Information:
Supplementary data are available at Bioinformatics online. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1367-4803 1367-4811 1460-2059 1367-4811 |
| DOI: | 10.1093/bioinformatics/btr579 |