Prediction of DNA-binding residues from protein sequence information using random forests

Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for on...

Full description

Saved in:

Bibliographic Details
Published in	BMC genomics Vol. 10; no. Suppl 1; p. S1
Main Authors	Wang, Liangjiang, Yang, Mary Qu, Yang, Jack Y
Format	Journal Article
Language	English
Published	London BioMed Central 07.07.2009
Subjects	Algorithms Animal Genetics and Genomics Artificial Intelligence Binding Sites Biomedical and Life Sciences Computational Biology - methods DNA-Binding Proteins - metabolism Life Sciences Microarrays Microbial Genetics and Genomics Plant Genetics and Genomics Proteomics ROC Curve Sequence Analysis, Protein - methods Software Classifier Performance Evolutionary Information Prediction Strength Receiver Operating Characteristic Curve Random Forest
Online Access	Get full text
ISSN	1471-2164 1471-2164
DOI	10.1186/1471-2164-10-S1-S1

Cover

Abstract	Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
AbstractList	Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF has thus been developed to make the RF classifier accessible to the biological research community. Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community. Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community. Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data.BACKGROUNDProtein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data.A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures.RESULTSA new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures.The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.CONCLUSIONThe results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
Author	Wang, Liangjiang Yang, Jack Y Yang, Mary Qu
AuthorAffiliation	3 National Human Genome Research Institute, National Institutes of Health (NIH), U.S. Department of Health and Human Services, Bethesda, MD 20852, USA 4 Harvard Medical School, Harvard University, P.O. Box 400888, Cambridge, MA 02115, USA 2 J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646, USA 1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
AuthorAffiliation_xml	– name: 3 National Human Genome Research Institute, National Institutes of Health (NIH), U.S. Department of Health and Human Services, Bethesda, MD 20852, USA – name: 4 Harvard Medical School, Harvard University, P.O. Box 400888, Cambridge, MA 02115, USA – name: 1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA – name: 2 J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646, USA
Author_xml	– sequence: 1 givenname: Liangjiang surname: Wang fullname: Wang, Liangjiang email: liangjw@clemson.edu organization: Department of Genetics and Biochemistry, Clemson University, J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center – sequence: 2 givenname: Mary Qu surname: Yang fullname: Yang, Mary Qu organization: U.S. Department of Health and Human Services, National Human Genome Research Institute, National Institutes of Health (NIH) – sequence: 3 givenname: Jack Y surname: Yang fullname: Yang, Jack Y organization: Harvard Medical School, Harvard University
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/19594868$$D View this record in MEDLINE/PubMed
BookMark	eNqFkctu1TAQhi1URC_wAixQVuxCPY4d2xukqlylCpDaDSvLcSYHV4l9sBNQ3x6fi2jL4iCNZGv8f-OZf07JUYgBCXkJ9A2Aas-BS6gZtLwGWl9DiSfk5G_y6MH9mJzmfEspSMXEM3IMWmiuWnVCvn9L2Hs3-xiqOFTvvlzUnQ-9D6sqYfb9grkaUpyqdYoz-lBl_LlgcFj5MMQ02S255C1gQ1-UJY15zs_J08GOGV_szzNy8-H9zeWn-urrx8-XF1e1ExzmuuV6oMqJhitnObq2a2WjO-76wTVCAHXKAQ6AwgLV5d06pWXXacehk31zRppd2SWs7d1vO45mnfxk050BajY-mY0NZmPDJpOhRKHe7qj10k3YOwxzsvdktN48fgn-h1nFX4ZJqplgpcDrfYEUiyF5NpPPDsfRBoxLNq3kUmgm_ytklDFVmivCVw9bup9iv6siYDuBSzHnhMPhQa-hRIHUP5Dz83ZpZSw_Hkb3zubyT1hhMrdxSaHs8hD1B4pry3M
CitedBy_id	crossref_primary_10_1111_jav_03404 crossref_primary_10_1371_journal_pcbi_1004619 crossref_primary_10_1021_acs_jcim_0c00735 crossref_primary_10_1002_prot_22898 crossref_primary_10_1371_journal_pgen_1002303 crossref_primary_10_1371_journal_pone_0049040 crossref_primary_10_1093_bib_bbaf016 crossref_primary_10_1371_journal_pone_0133260 crossref_primary_10_1093_bib_bbae040 crossref_primary_10_1093_bib_bbae162 crossref_primary_10_1186_1752_0509_5_S1_S7 crossref_primary_10_1109_TCBB_2016_2616469 crossref_primary_10_1093_nar_gkad1131 crossref_primary_10_9787_PBB_2023_11_3_208 crossref_primary_10_1016_j_fuel_2022_125889 crossref_primary_10_1186_s12859_018_2527_1 crossref_primary_10_1093_bioinformatics_btr579 crossref_primary_10_1093_bib_bbv023 crossref_primary_10_1039_c3mb70033j crossref_primary_10_1007_s11033_019_04763_1 crossref_primary_10_3390_ijms16035194 crossref_primary_10_1007_s11831_021_09661_z crossref_primary_10_1093_mutage_get067 crossref_primary_10_1002_widm_48 crossref_primary_10_1142_S0219720018400097 crossref_primary_10_1186_1471_2105_14_44 crossref_primary_10_1007_s00425_016_2560_0 crossref_primary_10_1093_bioinformatics_btt029 crossref_primary_10_3390_e18100379 crossref_primary_10_1016_j_jbiotec_2019_10_003 crossref_primary_10_1016_j_sjbs_2015_10_008 crossref_primary_10_1080_08839514_2011_570158 crossref_primary_10_1186_s12915_024_02014_9 crossref_primary_10_18632_oncotarget_17776 crossref_primary_10_3934_mbe_2024008 crossref_primary_10_1016_j_csbj_2020_02_008 crossref_primary_10_1109_TCBB_2021_3123828 crossref_primary_10_1016_j_jtbi_2016_06_002 crossref_primary_10_1002_term_2333 crossref_primary_10_1371_journal_pone_0096694 crossref_primary_10_1021_acs_jcim_3c02011 crossref_primary_10_1109_TCBB_2012_106 crossref_primary_10_1016_j_eujim_2013_08_001 crossref_primary_10_1371_journal_pone_0106542 crossref_primary_10_1093_nar_gkt617 crossref_primary_10_1016_j_biochi_2012_10_006 crossref_primary_10_1093_bib_bbac322 crossref_primary_10_3390_genes10120965 crossref_primary_10_1371_journal_pcbi_1007624 crossref_primary_10_1016_j_compbiolchem_2014_09_002 crossref_primary_10_1021_ci1003703 crossref_primary_10_1186_s12859_017_1792_8 crossref_primary_10_1186_s13059_017_1369_x crossref_primary_10_1186_1471_2164_10_S1_I1 crossref_primary_10_1371_journal_pone_0028440 crossref_primary_10_1021_acs_jcim_7b00307 crossref_primary_10_1093_nar_gks481 crossref_primary_10_1155_2014_262850 crossref_primary_10_1016_j_jgeb_2024_100427 crossref_primary_10_3390_molecules22122079 crossref_primary_10_1007_s00438_014_0812_x crossref_primary_10_1016_j_ymeth_2024_09_004 crossref_primary_10_1080_13102818_2022_2122871 crossref_primary_10_3109_09553002_2013_804962 crossref_primary_10_1371_journal_pone_0167345 crossref_primary_10_1007_s12663_024_02193_6 crossref_primary_10_1155_2014_845479 crossref_primary_10_1002_ps_5185 crossref_primary_10_1038_srep27653 crossref_primary_10_1002_jobm_201700162 crossref_primary_10_1038_s41598_024_77112_3 crossref_primary_10_1016_j_indcrop_2021_113615 crossref_primary_10_1155_2013_524502 crossref_primary_10_1093_nar_gkt544 crossref_primary_10_1093_nar_gkq396 crossref_primary_10_1021_acs_jcim_8b00749 crossref_primary_10_1093_bib_bbab336 crossref_primary_10_1109_TCBB_2018_2858806
ContentType	Journal Article
Copyright	Wang et al; licensee BioMed Central Ltd. 2009 Copyright © 2009 Wang et al; licensee BioMed Central Ltd. 2009 Wang et al; licensee BioMed Central Ltd.
Copyright_xml	– notice: Wang et al; licensee BioMed Central Ltd. 2009 – notice: Copyright © 2009 Wang et al; licensee BioMed Central Ltd. 2009 Wang et al; licensee BioMed Central Ltd.
DBID	C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM 7TM 8FD FR3 P64 RC3 7X8 5PM ADTOC UNPAY
DOI	10.1186/1471-2164-10-S1-S1
DatabaseName	Springer Nature Link Open Access Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Nucleic Acids Abstracts Technology Research Database Engineering Research Database Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Engineering Research Database Technology Research Database Nucleic Acids Abstracts Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitleList	Genetics Abstracts MEDLINE MEDLINE - Academic
Database_xml	– sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1471-2164
EndPage	S1
ExternalDocumentID	10.1186/1471-2164-10-s1-s1 PMC2709252 19594868 10_1186_1471_2164_10_S1_S1
Genre	Journal Article
GroupedDBID	--- 0R~ 23N 2VQ 2WC 2XV 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FH 8FI 8FJ AAFWJ AAHBH AAJSJ AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADRAZ ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BHPHI BMC BPHCQ BVXVI C1A C6C CCPQU CS3 DIK DU5 E3Z EAD EAP EAS EBD EBLON EBS EJD EMB EMK EMOBN ESX F5P FYUFA GROUPED_DOAJ GX1 H13 HCIFZ HMCUK HYE IAO IGS IHR INH INR IPNFZ ISR ITC KQ8 LK8 M1P M48 M7P M~E O5R O5S OK1 OVT P2P PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RIG RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS U2A UKHRP W2D WOQ WOW XSB AAYXX CITATION ALIPV CGR CUY CVF ECM EIF NPM 7TM 8FD FR3 P64 RC3 7X8 5PM ADTOC AFFHD UNPAY
ID	FETCH-LOGICAL-c541t-649f08c5348ca4ec6b6739b4cdfc35510c8c1ef1e5a109ec6ac897bb9c41b7d3
IEDL.DBID	M48
ISSN	1471-2164
IngestDate	Wed Oct 29 12:17:38 EDT 2025 Tue Sep 30 16:40:12 EDT 2025 Fri Sep 05 07:45:02 EDT 2025 Wed Oct 01 17:10:56 EDT 2025 Mon Jul 21 06:04:25 EDT 2025 Thu Apr 24 23:03:37 EDT 2025 Wed Oct 01 03:02:51 EDT 2025 Sat Sep 06 07:28:46 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	Suppl 1
Keywords	Classifier Performance Evolutionary Information Prediction Strength Receiver Operating Characteristic Curve Random Forest
Language	English
License	This is an open access article distributed under the terms of the Creative Commons Attribution License (), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c541t-649f08c5348ca4ec6b6739b4cdfc35510c8c1ef1e5a109ec6ac897bb9c41b7d3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	https://link.springer.com/10.1186/1471-2164-10-S1-S1
PMID	19594868
PQID	20228216
PQPubID	23462
ParticipantIDs	unpaywall_primary_10_1186_1471_2164_10_s1_s1 pubmedcentral_primary_oai_pubmedcentral_nih_gov_2709252 proquest_miscellaneous_67475927 proquest_miscellaneous_20228216 pubmed_primary_19594868 crossref_primary_10_1186_1471_2164_10_S1_S1 crossref_citationtrail_10_1186_1471_2164_10_S1_S1 springer_journals_10_1186_1471_2164_10_S1_S1
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20090707
PublicationDateYYYYMMDD	2009-07-07
PublicationDate_xml	– month: 7 year: 2009 text: 20090707 day: 7
PublicationDecade	2000
PublicationPlace	London
PublicationPlace_xml	– name: London – name: England
PublicationTitle	BMC genomics
PublicationTitleAbbrev	BMC Genomics
PublicationTitleAlternate	BMC Genomics
PublicationYear	2009
Publisher	BioMed Central
Publisher_xml	– name: BioMed Central
References	16568445 - Proteins. 2006 Jul 1;64(1):19-27 17237068 - Bioinformatics. 2007 Mar 1;23(5):634-6 9254694 - Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 11867549 - EMBO J. 2002 Mar 1;21(5):1210-8 16712732 - BMC Bioinformatics. 2006;7:262 15720719 - BMC Bioinformatics. 2005;6:33 15869395 - Annu Rev Biophys Biomol Struct. 2005;34:379-98 16381842 - Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91 16845003 - Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W243-8 3287615 - Science. 1988 Jun 3;240(4857):1285-93 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42 14990443 - Bioinformatics. 2004 Mar 1;20(4):477-86 15010543 - Protein Sci. 2004 Apr;13(4):884-92 15950866 - Trends Biochem Sci. 2005 Jun;30(6):275-9 17245807 - J Bioinform Comput Biol. 2006 Dec;4(6):1141-58
References_xml	– reference: 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42 – reference: 14990443 - Bioinformatics. 2004 Mar 1;20(4):477-86 – reference: 11867549 - EMBO J. 2002 Mar 1;21(5):1210-8 – reference: 15869395 - Annu Rev Biophys Biomol Struct. 2005;34:379-98 – reference: 16568445 - Proteins. 2006 Jul 1;64(1):19-27 – reference: 15010543 - Protein Sci. 2004 Apr;13(4):884-92 – reference: 15950866 - Trends Biochem Sci. 2005 Jun;30(6):275-9 – reference: 3287615 - Science. 1988 Jun 3;240(4857):1285-93 – reference: 16712732 - BMC Bioinformatics. 2006;7:262 – reference: 9254694 - Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 – reference: 17245807 - J Bioinform Comput Biol. 2006 Dec;4(6):1141-58 – reference: 16845003 - Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W243-8 – reference: 17237068 - Bioinformatics. 2007 Mar 1;23(5):634-6 – reference: 15720719 - BMC Bioinformatics. 2005;6:33 – reference: 16381842 - Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91
SSID	ssj0017825
Score	2.2925153
Snippet	Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of... Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA... Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of...
SourceID	unpaywall pubmedcentral proquest pubmed crossref springer
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	S1
SubjectTerms	Algorithms Animal Genetics and Genomics Artificial Intelligence Binding Sites Biomedical and Life Sciences Computational Biology - methods DNA-Binding Proteins - metabolism Life Sciences Microarrays Microbial Genetics and Genomics Plant Genetics and Genomics Proteomics ROC Curve Sequence Analysis, Protein - methods Software
SummonAdditionalLinks	– databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3da9swED-2jrH2oaz7aN3uQw99W8wsWZasx9CulMHKoA1kT0aSbRYITqkTSv_73vlrDdkCAz1Jd7akk3Qn3ekngFM0CWLjiyjUInah5EKHVmsblsaigjOxjRt0_h9X6nIiv0-TaXcprO6j3XuXZLNSN9M6VV85LqOhQOueVo5rjuk5vEgIzgtH8USMB98B6rykvx7zV751FbRhV26GRw4-0j14tapu7cO9nc-fqKGL17Df2Y9s3Ar8AJ4V1Rt42b4o-fAWfv28I88L9TZblOz8akxbX9JPDPfVsxyrwOhGCWvwGWYV60OpWYeg2nBSMDwy2CpHSszGetfv4Obi283ZZdi9nhD6RPJlqKQpo9QnsUy9lYVXTunYOOnz0qORwSOfel6UvEgsjwyWW58a7Zzxkjudx-9hp1pUxREwV6a5KqXNuddSCWtj5R3aNZGjI5BcBsD7_sx8hyxOD1zMs2aHkaqMZJCRDCjnmmMK4MvAc9viamyl_tyLKcPhTz4NWxWLVZ0Jwu9B2n9TKE2QhkIHcNiK9c__DGHVqDQAvSbwgYCgt9dLqtnvBoJb6MiIRAQw6odG1s39emszRsPw2d7qmmM6_r-vn8Bu6-qis-cPsLO8WxUf0WJauk_NBHkEcHUPeQ priority: 102 providerName: Springer Nature – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED-2lLHtYd8f3qce9rY6tWxZsh7DtlIGC4W20D4ZSZZZWOKEOGF0f_3u_JE1LQsMBvcknzAnnXQ_fdxPAB8QEiTa-ShUcWJDwWMVGqVMWGqDAU4nJmnY-b-N5dGZ-Hqennfp0ZQLY2eOyElnE1cPryagT9v8Bno_wS8PFkXZDvdMHnCcXsMYUT_NKCcc5TbsyRSR-QD2zsbHo4smwahT6vNmrlesOcp2bLoBOG_em9wcnt6Hu-tqYS5_mun0Snw6fAiz3rL2WsqP4Xplh-7XNdLH_2X6I3jQAVk2aj3vMdzy1RO40z5tefkULo6XdARE3c7mJfs8HtEanAIlwwX-pECTGaW2sIYoYlKx_k4366hcm5p0Kx8rmKpATSzGdqqfwenhl9NPR2H3jEPoUsFXoRS6jDKXJiJzRngnrVSJtsIVpUO0wyOXOe5L7lPDI43fjcu0slY7wa0qkucwqOaVfwnMllkhS2EK7pSQsTGJdBYBVmRpL6YQAfC-_3LXUZzTSxvTvFnqZDKnFsupxajkhKME8HFTZ9ESfOzUft-7RY7jkA5XTOXn6zqPiUgIdf-uIRVxK8YqgBetG_35nybSHJkFoLYcbKNAHODbX6rJ94YLPFaRjtM4gP3eFfNuEqp3mrG_cdfdVtcc5dW_qb-Ge-2ZG22Cv4HBarn2bxG6rey7bkT-BoExPzA priority: 102 providerName: Unpaywall
Title	Prediction of DNA-binding residues from protein sequence information using random forests
URI	https://link.springer.com/article/10.1186/1471-2164-10-S1-S1 https://www.ncbi.nlm.nih.gov/pubmed/19594868 https://www.proquest.com/docview/20228216 https://www.proquest.com/docview/67475927 https://pubmed.ncbi.nlm.nih.gov/PMC2709252 https://bmcgenomics.biomedcentral.com/counter/pdf/10.1186/1471-2164-10-S1-S1
UnpaywallVersion	publishedVersion
Volume	10
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVADU databaseName: BioMedCentral customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: KQ8 dateStart: 20000701 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: KQ8 dateStart: 20000101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: ABDBF dateStart: 20000101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: DIK dateStart: 20000101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: GX1 dateStart: 0 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (selected full-text only) customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central (Selected Fulltext) customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: RPM dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection (Proquest) customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1471-2164 dateEnd: 20250331 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: M48 dateStart: 20000701 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal – providerCode: PRVAVX databaseName: Springer Nature HAS Fully OA customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: AAJSJ dateStart: 20001201 isFulltext: true titleUrlDefault: https://www.springernature.com providerName: Springer Nature – providerCode: PRVAVX databaseName: Springer Nature OA Free Journals customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: C6C dateStart: 20000112 isFulltext: true titleUrlDefault: http://www.springeropen.com/ providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1471-2164 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017825 issn: 1471-2164 databaseCode: U2A dateStart: 20001201 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3da9swED_ahrH1Yey73kfmh7GX1Zsly5L1MEaWtZRBQ1gbSJ-MJNssEJw0H2z573fn2OlCuwyEsa2Tbel0vpNO-h3AOzQJIu3yMFA8soFgXAVGKRMU2qCC05GJKnT-8548G4jvw3i4B81y27oB53cO7Sie1GA2_vj7evUFBf5zJfCJ_MTwBxtwtPvpn3LBML2fXgcUWIocsHWUjX1oofLSFN3hXNw4GlBBxtUGpPoRzb6aOx-7rbtuGaS311VunKuHcH9ZTs3qlxmP_9Jfp4_gYW14-p11T3kMe3n5BO6tQ1GunsJVf0YuG2KTPyn8b70OjZlJsfk4IB9l-Ak-bUXxK2CHUek3a7D9Gnq1Kkmr6LGAKTOkxNv43fNncHl6ctk9C-qwC4GLBVsEUugiTFwcicQZkTtppYq0FS4rHFonLHSJY3nB8tiwUGO-cYlW1monmFVZ9BwOykmZH4FviySThTAZc0pIbkwknUWDKLQ0d5IJD1jTnqmrIckpMsY4rYYmiUyJBynxgO5cMEwefNiUma4BOXZSv23YlKLckDPElPlkOU85Af8g7b8ppCIsRK48eLFm6837NIHcyMQDtcXwDQFhdm_nlKOfFXY3V6HmMffguOkaadPnd1bjeNN9dtd6zjC9_G-tX8GDtVuM5qlfw8FitszfoHW1sG3YV0PVhtbXk17_B151ZbddzVS0K8nB44DjeWvQ63eu_gCb6Cdm
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7ShJLmUPpK6vQRHXprTC1Zlqzj0jZs0mQpZAPpyUiyTRYWb4h3Kfn3nfGrXdIuFHSSZmxJI2lGmtEngA9oEsTGF1GoRexCyYUOrdY2LI1FBWdiGzfo_BcTNb6SZ9fJdXcprO6j3XuXZLNSN9M6VZ84LqOhQOueVo5LjukR7BCAFSHmX4nR4DtAnZf012P-yreugh7YlQ_DIwcf6R7srqpbe__Tzud_qKGTZ_C0sx_ZqBX4c9gqqhfwuH1R8v4l_Ph-R54X6m22KNmXyYi2vqSfGO6rZzlWgdGNEtbgM8wq1odSsw5BteGkYHhksFWOlJiN9a5fwfTk6_TzOOxeTwh9IvkyVNKUUeqTWKbeysIrp3RsnPR56dHI4JFPPS9KXiSWRwbLrU-Nds54yZ3O433YrhZV8RqYK9NcldLm3GuphLWx8g7tmsjREUguA-B9f2a-QxanBy7mWbPDSFVGMshIBpRzyTEF8HHguW1xNTZSH_ViynD4k0_DVsViVWeC8HuQ9t8UShOkodABHLRi_f0_Q1g1Kg1Arwl8ICDo7fWSanbTQHALHRmRiACO-6GRdXO_3tiM42H4bG51zTEd_t_Xj2B3PL04z85PJ9_ewJPW7UXn0G9he3m3Kt6h9bR075vJ8gtR5RJl
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BERQOiEcp4VUfuNGosePY8bHasiqvVaUWqZwi20nESivvqrsr1H_PTF50VVgJySd7JrE9tmfssb8BeI8mQWp8lcRapC6WXOjYam3j2lhUcCa1aYPO_22iTr_Lz5fZ5Y1X_M1t994l2b5pIJSmsDpalHU7xXN1xHFJjQVa-rSKnHNMd-GeRO1GMQxGajT4EVD_Zf1Tmb_ybaqjWzbm7auSg7_0Eeyuw8Je_7Kz2Q2VNH4Cjztbkh23wn8Kd6rwDO630SWvn8OPsyvywlDPs3nNTibHtA0mXcVwjz0tsQqMXpewBqthGlh_rZp1aKoNJ12MRwYbSqTEbKz3cg8uxh8vRqdxF0kh9pnkq1hJUye5z1KZeysrr5zSqXHSl7VHg4MnPve8qnmVWZ4YLLc-N9o54yV3ukxfwE6Yh-olMFfnpaqlLbnXUglrU-Ud2jiJo-OQUkbA-_4sfIcyTsEuZkWz28hVQTIoSAaUc84xRfBh4Fm0GBtbqQ96MRU4Fci_YUM1Xy8LQVg-SPtvCqUJ3lDoCPZbsf75nyHcGpVHoDcEPhAQDPdmSZj-bOC4hU6MyEQEh_3QKLp1YLm1GYfD8Nne6iXH9Or_vn4AD85OxsXXT5Mvr-Fh6wGjI-k3sLO6Wldv0ZBauXfNXPkNiisWiw
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED-2lLHtYd8f3qce9rY6tWxZsh7DtlIGC4W20D4ZSZZZWOKEOGF0f_3u_JE1LQsMBvcknzAnnXQ_fdxPAB8QEiTa-ShUcWJDwWMVGqVMWGqDAU4nJmnY-b-N5dGZ-Hqennfp0ZQLY2eOyElnE1cPryagT9v8Bno_wS8PFkXZDvdMHnCcXsMYUT_NKCcc5TbsyRSR-QD2zsbHo4smwahT6vNmrlesOcp2bLoBOG_em9wcnt6Hu-tqYS5_mun0Snw6fAiz3rL2WsqP4Xplh-7XNdLH_2X6I3jQAVk2aj3vMdzy1RO40z5tefkULo6XdARE3c7mJfs8HtEanAIlwwX-pECTGaW2sIYoYlKx_k4366hcm5p0Kx8rmKpATSzGdqqfwenhl9NPR2H3jEPoUsFXoRS6jDKXJiJzRngnrVSJtsIVpUO0wyOXOe5L7lPDI43fjcu0slY7wa0qkucwqOaVfwnMllkhS2EK7pSQsTGJdBYBVmRpL6YQAfC-_3LXUZzTSxvTvFnqZDKnFsupxajkhKME8HFTZ9ESfOzUft-7RY7jkA5XTOXn6zqPiUgIdf-uIRVxK8YqgBetG_35nybSHJkFoLYcbKNAHODbX6rJ94YLPFaRjtM4gP3eFfNuEqp3mrG_cdfdVtcc5dW_qb-Ge-2ZG22Cv4HBarn2bxG6rey7bkT-BoExPzA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Prediction+of+DNA-binding+residues+from+protein+sequence+information+using+random+forests&rft.jtitle=BMC+genomics&rft.au=Wang%2C+Liangjiang&rft.au=Yang%2C+Mary+Qu&rft.au=Yang%2C+Jack+Y&rft.date=2009-07-07&rft.issn=1471-2164&rft.eissn=1471-2164&rft.volume=10&rft.issue=Suppl+1&rft.spage=S1&rft.epage=S1&rft_id=info:doi/10.1186%2F1471-2164-10-S1-S1&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2164&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2164&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2164&client=summon