Prediction of DNA-binding residues from protein sequence information using random forests

Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for on...

Full description

Saved in:
Bibliographic Details
Published inBMC genomics Vol. 10; no. Suppl 1; p. S1
Main Authors Wang, Liangjiang, Yang, Mary Qu, Yang, Jack Y
Format Journal Article
LanguageEnglish
Published London BioMed Central 07.07.2009
Subjects
Online AccessGet full text
ISSN1471-2164
1471-2164
DOI10.1186/1471-2164-10-S1-S1

Cover

Abstract Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
AbstractList Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF has thus been developed to make the RF classifier accessible to the biological research community.
Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data. Results A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures. Conclusion The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data.BACKGROUNDProtein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA recognition, it is necessary to identify the DNA-binding residues in DNA-binding proteins. However, structural data are available for only a few hundreds of protein-DNA complexes. With the rapid accumulation of sequence data, it becomes an important but challenging task to accurately predict DNA-binding residues directly from amino acid sequence data.A new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures.RESULTSA new machine learning approach has been developed in this study for predicting DNA-binding residues from amino acid sequence data. The approach used both the labelled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices (PSSMs) and several new descriptors. The sequence-derived features were then used to train random forests (RFs), which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset, and the predicted DNA-binding residues were examined in the context of three-dimensional structures.The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.CONCLUSIONThe results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies. A new web server called BindN-RF http://bioinfo.ggc.org/bindn-rf/ has thus been developed to make the RF classifier accessible to the biological research community.
Author Wang, Liangjiang
Yang, Jack Y
Yang, Mary Qu
AuthorAffiliation 3 National Human Genome Research Institute, National Institutes of Health (NIH), U.S. Department of Health and Human Services, Bethesda, MD 20852, USA
4 Harvard Medical School, Harvard University, P.O. Box 400888, Cambridge, MA 02115, USA
2 J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646, USA
1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
AuthorAffiliation_xml – name: 3 National Human Genome Research Institute, National Institutes of Health (NIH), U.S. Department of Health and Human Services, Bethesda, MD 20852, USA
– name: 4 Harvard Medical School, Harvard University, P.O. Box 400888, Cambridge, MA 02115, USA
– name: 1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
– name: 2 J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646, USA
Author_xml – sequence: 1
  givenname: Liangjiang
  surname: Wang
  fullname: Wang, Liangjiang
  email: liangjw@clemson.edu
  organization: Department of Genetics and Biochemistry, Clemson University, J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center
– sequence: 2
  givenname: Mary Qu
  surname: Yang
  fullname: Yang, Mary Qu
  organization: U.S. Department of Health and Human Services, National Human Genome Research Institute, National Institutes of Health (NIH)
– sequence: 3
  givenname: Jack Y
  surname: Yang
  fullname: Yang, Jack Y
  organization: Harvard Medical School, Harvard University
BackLink https://www.ncbi.nlm.nih.gov/pubmed/19594868$$D View this record in MEDLINE/PubMed
BookMark eNqFkctu1TAQhi1URC_wAixQVuxCPY4d2xukqlylCpDaDSvLcSYHV4l9sBNQ3x6fi2jL4iCNZGv8f-OZf07JUYgBCXkJ9A2Aas-BS6gZtLwGWl9DiSfk5G_y6MH9mJzmfEspSMXEM3IMWmiuWnVCvn9L2Hs3-xiqOFTvvlzUnQ-9D6sqYfb9grkaUpyqdYoz-lBl_LlgcFj5MMQ02S255C1gQ1-UJY15zs_J08GOGV_szzNy8-H9zeWn-urrx8-XF1e1ExzmuuV6oMqJhitnObq2a2WjO-76wTVCAHXKAQ6AwgLV5d06pWXXacehk31zRppd2SWs7d1vO45mnfxk050BajY-mY0NZmPDJpOhRKHe7qj10k3YOwxzsvdktN48fgn-h1nFX4ZJqplgpcDrfYEUiyF5NpPPDsfRBoxLNq3kUmgm_ytklDFVmivCVw9bup9iv6siYDuBSzHnhMPhQa-hRIHUP5Dz83ZpZSw_Hkb3zubyT1hhMrdxSaHs8hD1B4pry3M
CitedBy_id crossref_primary_10_1111_jav_03404
crossref_primary_10_1371_journal_pcbi_1004619
crossref_primary_10_1021_acs_jcim_0c00735
crossref_primary_10_1002_prot_22898
crossref_primary_10_1371_journal_pgen_1002303
crossref_primary_10_1371_journal_pone_0049040
crossref_primary_10_1093_bib_bbaf016
crossref_primary_10_1371_journal_pone_0133260
crossref_primary_10_1093_bib_bbae040
crossref_primary_10_1093_bib_bbae162
crossref_primary_10_1186_1752_0509_5_S1_S7
crossref_primary_10_1109_TCBB_2016_2616469
crossref_primary_10_1093_nar_gkad1131
crossref_primary_10_9787_PBB_2023_11_3_208
crossref_primary_10_1016_j_fuel_2022_125889
crossref_primary_10_1186_s12859_018_2527_1
crossref_primary_10_1093_bioinformatics_btr579
crossref_primary_10_1093_bib_bbv023
crossref_primary_10_1039_c3mb70033j
crossref_primary_10_1007_s11033_019_04763_1
crossref_primary_10_3390_ijms16035194
crossref_primary_10_1007_s11831_021_09661_z
crossref_primary_10_1093_mutage_get067
crossref_primary_10_1002_widm_48
crossref_primary_10_1142_S0219720018400097
crossref_primary_10_1186_1471_2105_14_44
crossref_primary_10_1007_s00425_016_2560_0
crossref_primary_10_1093_bioinformatics_btt029
crossref_primary_10_3390_e18100379
crossref_primary_10_1016_j_jbiotec_2019_10_003
crossref_primary_10_1016_j_sjbs_2015_10_008
crossref_primary_10_1080_08839514_2011_570158
crossref_primary_10_1186_s12915_024_02014_9
crossref_primary_10_18632_oncotarget_17776
crossref_primary_10_3934_mbe_2024008
crossref_primary_10_1016_j_csbj_2020_02_008
crossref_primary_10_1109_TCBB_2021_3123828
crossref_primary_10_1016_j_jtbi_2016_06_002
crossref_primary_10_1002_term_2333
crossref_primary_10_1371_journal_pone_0096694
crossref_primary_10_1021_acs_jcim_3c02011
crossref_primary_10_1109_TCBB_2012_106
crossref_primary_10_1016_j_eujim_2013_08_001
crossref_primary_10_1371_journal_pone_0106542
crossref_primary_10_1093_nar_gkt617
crossref_primary_10_1016_j_biochi_2012_10_006
crossref_primary_10_1093_bib_bbac322
crossref_primary_10_3390_genes10120965
crossref_primary_10_1371_journal_pcbi_1007624
crossref_primary_10_1016_j_compbiolchem_2014_09_002
crossref_primary_10_1021_ci1003703
crossref_primary_10_1186_s12859_017_1792_8
crossref_primary_10_1186_s13059_017_1369_x
crossref_primary_10_1186_1471_2164_10_S1_I1
crossref_primary_10_1371_journal_pone_0028440
crossref_primary_10_1021_acs_jcim_7b00307
crossref_primary_10_1093_nar_gks481
crossref_primary_10_1155_2014_262850
crossref_primary_10_1016_j_jgeb_2024_100427
crossref_primary_10_3390_molecules22122079
crossref_primary_10_1007_s00438_014_0812_x
crossref_primary_10_1016_j_ymeth_2024_09_004
crossref_primary_10_1080_13102818_2022_2122871
crossref_primary_10_3109_09553002_2013_804962
crossref_primary_10_1371_journal_pone_0167345
crossref_primary_10_1007_s12663_024_02193_6
crossref_primary_10_1155_2014_845479
crossref_primary_10_1002_ps_5185
crossref_primary_10_1038_srep27653
crossref_primary_10_1002_jobm_201700162
crossref_primary_10_1038_s41598_024_77112_3
crossref_primary_10_1016_j_indcrop_2021_113615
crossref_primary_10_1155_2013_524502
crossref_primary_10_1093_nar_gkt544
crossref_primary_10_1093_nar_gkq396
crossref_primary_10_1021_acs_jcim_8b00749
crossref_primary_10_1093_bib_bbab336
crossref_primary_10_1109_TCBB_2018_2858806
ContentType Journal Article
Copyright Wang et al; licensee BioMed Central Ltd. 2009
Copyright © 2009 Wang et al; licensee BioMed Central Ltd. 2009 Wang et al; licensee BioMed Central Ltd.
Copyright_xml – notice: Wang et al; licensee BioMed Central Ltd. 2009
– notice: Copyright © 2009 Wang et al; licensee BioMed Central Ltd. 2009 Wang et al; licensee BioMed Central Ltd.
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7TM
8FD
FR3
P64
RC3
7X8
5PM
ADTOC
UNPAY
DOI 10.1186/1471-2164-10-S1-S1
DatabaseName Springer Nature Link Open Access Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Nucleic Acids Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Engineering Research Database
Technology Research Database
Nucleic Acids Abstracts
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList Genetics Abstracts
MEDLINE

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1471-2164
EndPage S1
ExternalDocumentID 10.1186/1471-2164-10-s1-s1
PMC2709252
19594868
10_1186_1471_2164_10_S1_S1
Genre Journal Article
GroupedDBID ---
0R~
23N
2VQ
2WC
2XV
4.4
53G
5VS
6J9
7X7
88E
8AO
8FE
8FH
8FI
8FJ
AAFWJ
AAHBH
AAJSJ
AASML
ABDBF
ABUWG
ACGFO
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
ADUKV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHBYD
AHMBA
AHSBF
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BHPHI
BMC
BPHCQ
BVXVI
C1A
C6C
CCPQU
CS3
DIK
DU5
E3Z
EAD
EAP
EAS
EBD
EBLON
EBS
EJD
EMB
EMK
EMOBN
ESX
F5P
FYUFA
GROUPED_DOAJ
GX1
H13
HCIFZ
HMCUK
HYE
IAO
IGS
IHR
INH
INR
IPNFZ
ISR
ITC
KQ8
LK8
M1P
M48
M7P
M~E
O5R
O5S
OK1
OVT
P2P
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
RBZ
RIG
RNS
ROL
RPM
RSV
SBL
SOJ
SV3
TR2
TUS
U2A
UKHRP
W2D
WOQ
WOW
XSB
AAYXX
CITATION
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
7TM
8FD
FR3
P64
RC3
7X8
5PM
ADTOC
AFFHD
UNPAY
ID FETCH-LOGICAL-c541t-649f08c5348ca4ec6b6739b4cdfc35510c8c1ef1e5a109ec6ac897bb9c41b7d3
IEDL.DBID M48
ISSN 1471-2164
IngestDate Wed Oct 29 12:17:38 EDT 2025
Tue Sep 30 16:40:12 EDT 2025
Fri Sep 05 07:45:02 EDT 2025
Wed Oct 01 17:10:56 EDT 2025
Mon Jul 21 06:04:25 EDT 2025
Thu Apr 24 23:03:37 EDT 2025
Wed Oct 01 03:02:51 EDT 2025
Sat Sep 06 07:28:46 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Suppl 1
Keywords Classifier Performance
Evolutionary Information
Prediction Strength
Receiver Operating Characteristic Curve
Random Forest
Language English
License This is an open access article distributed under the terms of the Creative Commons Attribution License (), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c541t-649f08c5348ca4ec6b6739b4cdfc35510c8c1ef1e5a109ec6ac897bb9c41b7d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://link.springer.com/10.1186/1471-2164-10-S1-S1
PMID 19594868
PQID 20228216
PQPubID 23462
ParticipantIDs unpaywall_primary_10_1186_1471_2164_10_s1_s1
pubmedcentral_primary_oai_pubmedcentral_nih_gov_2709252
proquest_miscellaneous_67475927
proquest_miscellaneous_20228216
pubmed_primary_19594868
crossref_primary_10_1186_1471_2164_10_S1_S1
crossref_citationtrail_10_1186_1471_2164_10_S1_S1
springer_journals_10_1186_1471_2164_10_S1_S1
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20090707
PublicationDateYYYYMMDD 2009-07-07
PublicationDate_xml – month: 7
  year: 2009
  text: 20090707
  day: 7
PublicationDecade 2000
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle BMC genomics
PublicationTitleAbbrev BMC Genomics
PublicationTitleAlternate BMC Genomics
PublicationYear 2009
Publisher BioMed Central
Publisher_xml – name: BioMed Central
References 16568445 - Proteins. 2006 Jul 1;64(1):19-27
17237068 - Bioinformatics. 2007 Mar 1;23(5):634-6
9254694 - Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
11867549 - EMBO J. 2002 Mar 1;21(5):1210-8
16712732 - BMC Bioinformatics. 2006;7:262
15720719 - BMC Bioinformatics. 2005;6:33
15869395 - Annu Rev Biophys Biomol Struct. 2005;34:379-98
16381842 - Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91
16845003 - Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W243-8
3287615 - Science. 1988 Jun 3;240(4857):1285-93
10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42
14990443 - Bioinformatics. 2004 Mar 1;20(4):477-86
15010543 - Protein Sci. 2004 Apr;13(4):884-92
15950866 - Trends Biochem Sci. 2005 Jun;30(6):275-9
17245807 - J Bioinform Comput Biol. 2006 Dec;4(6):1141-58
References_xml – reference: 10592235 - Nucleic Acids Res. 2000 Jan 1;28(1):235-42
– reference: 14990443 - Bioinformatics. 2004 Mar 1;20(4):477-86
– reference: 11867549 - EMBO J. 2002 Mar 1;21(5):1210-8
– reference: 15869395 - Annu Rev Biophys Biomol Struct. 2005;34:379-98
– reference: 16568445 - Proteins. 2006 Jul 1;64(1):19-27
– reference: 15010543 - Protein Sci. 2004 Apr;13(4):884-92
– reference: 15950866 - Trends Biochem Sci. 2005 Jun;30(6):275-9
– reference: 3287615 - Science. 1988 Jun 3;240(4857):1285-93
– reference: 16712732 - BMC Bioinformatics. 2006;7:262
– reference: 9254694 - Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
– reference: 17245807 - J Bioinform Comput Biol. 2006 Dec;4(6):1141-58
– reference: 16845003 - Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W243-8
– reference: 17237068 - Bioinformatics. 2007 Mar 1;23(5):634-6
– reference: 15720719 - BMC Bioinformatics. 2005;6:33
– reference: 16381842 - Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91
SSID ssj0017825
Score 2.2925153
Snippet Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of...
Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of protein-DNA...
Background Protein-DNA interactions are involved in many biological processes essential for cellular function. To understand the molecular mechanism of...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
springer
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage S1
SubjectTerms Algorithms
Animal Genetics and Genomics
Artificial Intelligence
Binding Sites
Biomedical and Life Sciences
Computational Biology - methods
DNA-Binding Proteins - metabolism
Life Sciences
Microarrays
Microbial Genetics and Genomics
Plant Genetics and Genomics
Proteomics
ROC Curve
Sequence Analysis, Protein - methods
Software
SummonAdditionalLinks – databaseName: SpringerLink Journals (ICM)
  dbid: U2A
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3da9swED-2jrH2oaz7aN3uQw99W8wsWZasx9CulMHKoA1kT0aSbRYITqkTSv_73vlrDdkCAz1Jd7akk3Qn3ekngFM0CWLjiyjUInah5EKHVmsblsaigjOxjRt0_h9X6nIiv0-TaXcprO6j3XuXZLNSN9M6VV85LqOhQOueVo5rjuk5vEgIzgtH8USMB98B6rykvx7zV751FbRhV26GRw4-0j14tapu7cO9nc-fqKGL17Df2Y9s3Ar8AJ4V1Rt42b4o-fAWfv28I88L9TZblOz8akxbX9JPDPfVsxyrwOhGCWvwGWYV60OpWYeg2nBSMDwy2CpHSszGetfv4Obi283ZZdi9nhD6RPJlqKQpo9QnsUy9lYVXTunYOOnz0qORwSOfel6UvEgsjwyWW58a7Zzxkjudx-9hp1pUxREwV6a5KqXNuddSCWtj5R3aNZGjI5BcBsD7_sx8hyxOD1zMs2aHkaqMZJCRDCjnmmMK4MvAc9viamyl_tyLKcPhTz4NWxWLVZ0Jwu9B2n9TKE2QhkIHcNiK9c__DGHVqDQAvSbwgYCgt9dLqtnvBoJb6MiIRAQw6odG1s39emszRsPw2d7qmmM6_r-vn8Bu6-qis-cPsLO8WxUf0WJauk_NBHkEcHUPeQ
  priority: 102
  providerName: Springer Nature
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED-2lLHtYd8f3qce9rY6tWxZsh7DtlIGC4W20D4ZSZZZWOKEOGF0f_3u_JE1LQsMBvcknzAnnXQ_fdxPAB8QEiTa-ShUcWJDwWMVGqVMWGqDAU4nJmnY-b-N5dGZ-Hqennfp0ZQLY2eOyElnE1cPryagT9v8Bno_wS8PFkXZDvdMHnCcXsMYUT_NKCcc5TbsyRSR-QD2zsbHo4smwahT6vNmrlesOcp2bLoBOG_em9wcnt6Hu-tqYS5_mun0Snw6fAiz3rL2WsqP4Xplh-7XNdLH_2X6I3jQAVk2aj3vMdzy1RO40z5tefkULo6XdARE3c7mJfs8HtEanAIlwwX-pECTGaW2sIYoYlKx_k4366hcm5p0Kx8rmKpATSzGdqqfwenhl9NPR2H3jEPoUsFXoRS6jDKXJiJzRngnrVSJtsIVpUO0wyOXOe5L7lPDI43fjcu0slY7wa0qkucwqOaVfwnMllkhS2EK7pSQsTGJdBYBVmRpL6YQAfC-_3LXUZzTSxvTvFnqZDKnFsupxajkhKME8HFTZ9ESfOzUft-7RY7jkA5XTOXn6zqPiUgIdf-uIRVxK8YqgBetG_35nybSHJkFoLYcbKNAHODbX6rJ94YLPFaRjtM4gP3eFfNuEqp3mrG_cdfdVtcc5dW_qb-Ge-2ZG22Cv4HBarn2bxG6rey7bkT-BoExPzA
  priority: 102
  providerName: Unpaywall
Title Prediction of DNA-binding residues from protein sequence information using random forests
URI https://link.springer.com/article/10.1186/1471-2164-10-S1-S1
https://www.ncbi.nlm.nih.gov/pubmed/19594868
https://www.proquest.com/docview/20228216
https://www.proquest.com/docview/67475927
https://pubmed.ncbi.nlm.nih.gov/PMC2709252
https://bmcgenomics.biomedcentral.com/counter/pdf/10.1186/1471-2164-10-S1-S1
UnpaywallVersion publishedVersion
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMedCentral
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: KQ8
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: KQ8
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: ABDBF
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: DIK
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: GX1
  dateStart: 0
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources (selected full-text only)
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: M~E
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central (Selected Fulltext)
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: RPM
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection (Proquest)
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: 7X7
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: BENPR
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 20250331
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: M48
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
– providerCode: PRVAVX
  databaseName: Springer Nature HAS Fully OA
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: AAJSJ
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: https://www.springernature.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: Springer Nature OA Free Journals
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: C6C
  dateStart: 20000112
  isFulltext: true
  titleUrlDefault: http://www.springeropen.com/
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1471-2164
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017825
  issn: 1471-2164
  databaseCode: U2A
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3da9swED_ahrH1Yey73kfmh7GX1Zsly5L1MEaWtZRBQ1gbSJ-MJNssEJw0H2z573fn2OlCuwyEsa2Tbel0vpNO-h3AOzQJIu3yMFA8soFgXAVGKRMU2qCC05GJKnT-8548G4jvw3i4B81y27oB53cO7Sie1GA2_vj7evUFBf5zJfCJ_MTwBxtwtPvpn3LBML2fXgcUWIocsHWUjX1oofLSFN3hXNw4GlBBxtUGpPoRzb6aOx-7rbtuGaS311VunKuHcH9ZTs3qlxmP_9Jfp4_gYW14-p11T3kMe3n5BO6tQ1GunsJVf0YuG2KTPyn8b70OjZlJsfk4IB9l-Ak-bUXxK2CHUek3a7D9Gnq1Kkmr6LGAKTOkxNv43fNncHl6ctk9C-qwC4GLBVsEUugiTFwcicQZkTtppYq0FS4rHFonLHSJY3nB8tiwUGO-cYlW1monmFVZ9BwOykmZH4FviySThTAZc0pIbkwknUWDKLQ0d5IJD1jTnqmrIckpMsY4rYYmiUyJBynxgO5cMEwefNiUma4BOXZSv23YlKLckDPElPlkOU85Af8g7b8ppCIsRK48eLFm6837NIHcyMQDtcXwDQFhdm_nlKOfFXY3V6HmMffguOkaadPnd1bjeNN9dtd6zjC9_G-tX8GDtVuM5qlfw8FitszfoHW1sG3YV0PVhtbXk17_B151ZbddzVS0K8nB44DjeWvQ63eu_gCb6Cdm
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7ShJLmUPpK6vQRHXprTC1Zlqzj0jZs0mQpZAPpyUiyTRYWb4h3Kfn3nfGrXdIuFHSSZmxJI2lGmtEngA9oEsTGF1GoRexCyYUOrdY2LI1FBWdiGzfo_BcTNb6SZ9fJdXcprO6j3XuXZLNSN9M6VZ84LqOhQOueVo5LjukR7BCAFSHmX4nR4DtAnZf012P-yreugh7YlQ_DIwcf6R7srqpbe__Tzud_qKGTZ_C0sx_ZqBX4c9gqqhfwuH1R8v4l_Ph-R54X6m22KNmXyYi2vqSfGO6rZzlWgdGNEtbgM8wq1odSsw5BteGkYHhksFWOlJiN9a5fwfTk6_TzOOxeTwh9IvkyVNKUUeqTWKbeysIrp3RsnPR56dHI4JFPPS9KXiSWRwbLrU-Nds54yZ3O433YrhZV8RqYK9NcldLm3GuphLWx8g7tmsjREUguA-B9f2a-QxanBy7mWbPDSFVGMshIBpRzyTEF8HHguW1xNTZSH_ViynD4k0_DVsViVWeC8HuQ9t8UShOkodABHLRi_f0_Q1g1Kg1Arwl8ICDo7fWSanbTQHALHRmRiACO-6GRdXO_3tiM42H4bG51zTEd_t_Xj2B3PL04z85PJ9_ewJPW7UXn0G9he3m3Kt6h9bR075vJ8gtR5RJl
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BERQOiEcp4VUfuNGosePY8bHasiqvVaUWqZwi20nESivvqrsr1H_PTF50VVgJySd7JrE9tmfssb8BeI8mQWp8lcRapC6WXOjYam3j2lhUcCa1aYPO_22iTr_Lz5fZ5Y1X_M1t994l2b5pIJSmsDpalHU7xXN1xHFJjQVa-rSKnHNMd-GeRO1GMQxGajT4EVD_Zf1Tmb_ybaqjWzbm7auSg7_0Eeyuw8Je_7Kz2Q2VNH4Cjztbkh23wn8Kd6rwDO630SWvn8OPsyvywlDPs3nNTibHtA0mXcVwjz0tsQqMXpewBqthGlh_rZp1aKoNJ12MRwYbSqTEbKz3cg8uxh8vRqdxF0kh9pnkq1hJUye5z1KZeysrr5zSqXHSl7VHg4MnPve8qnmVWZ4YLLc-N9o54yV3ukxfwE6Yh-olMFfnpaqlLbnXUglrU-Ud2jiJo-OQUkbA-_4sfIcyTsEuZkWz28hVQTIoSAaUc84xRfBh4Fm0GBtbqQ96MRU4Fci_YUM1Xy8LQVg-SPtvCqUJ3lDoCPZbsf75nyHcGpVHoDcEPhAQDPdmSZj-bOC4hU6MyEQEh_3QKLp1YLm1GYfD8Nne6iXH9Or_vn4AD85OxsXXT5Mvr-Fh6wGjI-k3sLO6Wldv0ZBauXfNXPkNiisWiw
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED-2lLHtYd8f3qce9rY6tWxZsh7DtlIGC4W20D4ZSZZZWOKEOGF0f_3u_JE1LQsMBvcknzAnnXQ_fdxPAB8QEiTa-ShUcWJDwWMVGqVMWGqDAU4nJmnY-b-N5dGZ-Hqennfp0ZQLY2eOyElnE1cPryagT9v8Bno_wS8PFkXZDvdMHnCcXsMYUT_NKCcc5TbsyRSR-QD2zsbHo4smwahT6vNmrlesOcp2bLoBOG_em9wcnt6Hu-tqYS5_mun0Snw6fAiz3rL2WsqP4Xplh-7XNdLH_2X6I3jQAVk2aj3vMdzy1RO40z5tefkULo6XdARE3c7mJfs8HtEanAIlwwX-pECTGaW2sIYoYlKx_k4366hcm5p0Kx8rmKpATSzGdqqfwenhl9NPR2H3jEPoUsFXoRS6jDKXJiJzRngnrVSJtsIVpUO0wyOXOe5L7lPDI43fjcu0slY7wa0qkucwqOaVfwnMllkhS2EK7pSQsTGJdBYBVmRpL6YQAfC-_3LXUZzTSxvTvFnqZDKnFsupxajkhKME8HFTZ9ESfOzUft-7RY7jkA5XTOXn6zqPiUgIdf-uIRVxK8YqgBetG_35nybSHJkFoLYcbKNAHODbX6rJ94YLPFaRjtM4gP3eFfNuEqp3mrG_cdfdVtcc5dW_qb-Ge-2ZG22Cv4HBarn2bxG6rey7bkT-BoExPzA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Prediction+of+DNA-binding+residues+from+protein+sequence+information+using+random+forests&rft.jtitle=BMC+genomics&rft.au=Wang%2C+Liangjiang&rft.au=Yang%2C+Mary+Qu&rft.au=Yang%2C+Jack+Y&rft.date=2009-07-07&rft.issn=1471-2164&rft.eissn=1471-2164&rft.volume=10&rft.issue=Suppl+1&rft.spage=S1&rft.epage=S1&rft_id=info:doi/10.1186%2F1471-2164-10-S1-S1&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2164&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2164&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2164&client=summon