The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or o...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 8; no. 7; p. e67863
Main Authors Wei, Qiong, Dunbrack, Roland L.
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 09.07.2013
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0067863

Cover

Abstract Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.
AbstractList Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.
Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.
Audience Academic
Author Dunbrack, Roland L.
Wei, Qiong
AuthorAffiliation Miami University, United States of America
Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America
AuthorAffiliation_xml – name: Miami University, United States of America
– name: Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America
Author_xml – sequence: 1
  givenname: Qiong
  surname: Wei
  fullname: Wei, Qiong
– sequence: 2
  givenname: Roland L.
  surname: Dunbrack
  fullname: Dunbrack, Roland L.
BackLink https://www.ncbi.nlm.nih.gov/pubmed/23874456$$D View this record in MEDLINE/PubMed
BookMark eNqNkl1v0zAUhiM0xD7gHyCIhITgosWO7djhAmkrX5UmTdoKV0jWSXLSunLtEifA_j0O7aZ2mtDki9jHz3nP8XlznBw47zBJnlMypkzSd0vftw7seB3DY0JyqXL2KDmiBctGeUbYwc7-MDkOYUmIYCrPnySHGVOSc5EfJT9mC0wvvcXUN-kZWHAV1umsBeOMm6fg4gFDN-w_QgfpFXYhbXybnhkH7XU6sRCCaQy2ITUuRr1x8XoFnanC0-RxAzbgs-33JPn2-dNs8nV0fvFlOjk9H1VSqG7UlFUNsgaFUtVSSMELgaKsackLWqIUtBQF0kzJMlONKLNMirqRkmLNRI6MnSQvN7pr64PeDiZoWkguFOOyiMR0Q9QelnrdmlVsXnsw-l_At3MNbWzZolak4HlZiwzzihclhwIIioKUcXGmsqglNlq9W8P1b7D2VpASPXhz04IevNFbb2Leh22XfbnCukLXtWD3mtm_cWah5_6XZpLknPEo8GYr0PqffXRFr0yo0EbT0PfxvZzSWIeJAX11B71_KltqDvHhg3GxbjWI6lMuFROKkKHv8T1UXDWuTBVf2JgY30t4u5cQmQ7_dHPoQ9DTq8uHsxff99nXO-wCwXaL4G3fGe_CPvhid9K3I7757SPAN0DV-hBabB7q4Ps7aZXpYCgfJ2Ls_5P_Aqt9KPs
CitedBy_id crossref_primary_10_1016_j_schres_2016_08_027
crossref_primary_10_1016_j_bspc_2019_04_032
crossref_primary_10_1016_j_rama_2024_10_004
crossref_primary_10_1007_s13205_016_0410_1
crossref_primary_10_1002_qute_202400084
crossref_primary_10_21501_21454086_2904
crossref_primary_10_3390_ijgi7020039
crossref_primary_10_1111_ene_16591
crossref_primary_10_1111_jop_13397
crossref_primary_10_1587_transinf_2019ICI0001
crossref_primary_10_1186_s12859_022_04677_z
crossref_primary_10_1186_s12888_021_03117_1
crossref_primary_10_1016_j_geoderma_2024_116916
crossref_primary_10_3389_fdgth_2022_878369
crossref_primary_10_1007_s11517_022_02617_w
crossref_primary_10_3390_bioengineering11100962
crossref_primary_10_1093_bib_bbx119
crossref_primary_10_1016_j_xcrp_2022_101113
crossref_primary_10_1017_ice_2019_288
crossref_primary_10_1002_humu_23222
crossref_primary_10_1186_s12911_023_02380_4
crossref_primary_10_1002_humu_22770
crossref_primary_10_3390_min12121621
crossref_primary_10_1007_s12530_021_09403_3
crossref_primary_10_1109_ACCESS_2024_3502540
crossref_primary_10_1016_j_fss_2014_01_015
crossref_primary_10_1038_s41598_018_21758_3
crossref_primary_10_1007_s11030_021_10256_w
crossref_primary_10_1016_j_eswa_2023_122778
crossref_primary_10_1038_srep12512
crossref_primary_10_1038_s42003_022_03397_7
crossref_primary_10_1007_s10916_015_0428_7
crossref_primary_10_1109_ACCESS_2022_3184113
crossref_primary_10_1038_s41398_023_02422_5
crossref_primary_10_1093_bioinformatics_btab743
crossref_primary_10_1093_jamia_ocab267
crossref_primary_10_1111_aos_13334
crossref_primary_10_1093_biostatistics_kxaa028
crossref_primary_10_1007_s10278_017_0013_3
crossref_primary_10_3390_diagnostics13040584
crossref_primary_10_2196_54335
crossref_primary_10_3390_jcm13102955
crossref_primary_10_1038_s41598_018_26041_z
crossref_primary_10_1371_journal_pcbi_1006615
crossref_primary_10_1007_s00259_021_05242_1
crossref_primary_10_1177_0306624X221102799
crossref_primary_10_1016_j_geoderma_2021_115446
crossref_primary_10_1177_08944393211032950
crossref_primary_10_1177_0306624X211062139
crossref_primary_10_1021_acs_cgd_9b00318
crossref_primary_10_1111_jedm_12416
crossref_primary_10_1038_s41598_020_80570_0
crossref_primary_10_1109_TIFS_2017_2762828
crossref_primary_10_1016_j_str_2014_11_010
crossref_primary_10_1080_15384047_2017_1326439
crossref_primary_10_1186_s13073_021_00828_8
crossref_primary_10_1186_s12911_020_01335_3
crossref_primary_10_1186_s13634_021_00755_1
crossref_primary_10_1007_s10661_017_6333_4
crossref_primary_10_1186_s13040_021_00244_z
crossref_primary_10_3390_rs14133157
crossref_primary_10_2196_17853
crossref_primary_10_1029_2021GL093455
crossref_primary_10_1109_JSTARS_2023_3326963
crossref_primary_10_1038_nmeth_3039
crossref_primary_10_3390_curroncol30110707
crossref_primary_10_1016_j_jtbi_2015_11_021
crossref_primary_10_1088_1748_0221_12_10_C10004
crossref_primary_10_1109_ACCESS_2025_3549271
crossref_primary_10_18553_jmcp_2017_23_9_926
crossref_primary_10_3390_make4010012
crossref_primary_10_3390_medicina60040558
crossref_primary_10_3390_rs14246307
crossref_primary_10_1007_s10661_023_11608_9
crossref_primary_10_1002_humu_23802
crossref_primary_10_1148_radiol_211860
crossref_primary_10_52586_5036
crossref_primary_10_2147_COPD_S271237
crossref_primary_10_3390_electronics10030249
crossref_primary_10_1002_prot_24997
crossref_primary_10_1038_s41598_022_16062_0
crossref_primary_10_1007_s12525_017_0254_5
crossref_primary_10_1007_s13534_022_00246_8
crossref_primary_10_1186_s13062_019_0252_y
crossref_primary_10_1007_s11227_024_06108_7
crossref_primary_10_1371_journal_pone_0315928
crossref_primary_10_1186_s12859_017_1972_6
crossref_primary_10_1177_10790632231200838
crossref_primary_10_3390_diagnostics14161727
crossref_primary_10_1371_journal_pone_0117380
crossref_primary_10_3390_jpm13121660
crossref_primary_10_1155_2024_6052552
crossref_primary_10_3390_math9162015
crossref_primary_10_1016_j_compbiomed_2015_10_013
crossref_primary_10_3390_math11234735
crossref_primary_10_3389_fcimb_2023_1223576
crossref_primary_10_1002_prot_25574
crossref_primary_10_1109_ACCESS_2024_3521497
crossref_primary_10_1007_s12539_016_0151_1
crossref_primary_10_1177_03611981231212162
crossref_primary_10_1021_acs_molpharmaceut_0c00326
crossref_primary_10_1007_s00521_020_05570_7
crossref_primary_10_18267_j_aip_163
crossref_primary_10_1002_humu_22987
crossref_primary_10_1093_bioinformatics_btw696
crossref_primary_10_1007_s00227_020_03811_w
crossref_primary_10_1021_acs_jcim_3c01410
crossref_primary_10_1016_j_sbi_2015_01_003
crossref_primary_10_3389_fnagi_2023_1090400
crossref_primary_10_1158_0008_5472_CAN_18_2032
crossref_primary_10_1007_s00429_015_1059_y
crossref_primary_10_3389_fenvs_2020_579676
crossref_primary_10_1007_s42977_023_00188_x
crossref_primary_10_1016_j_compbiolchem_2014_11_002
crossref_primary_10_1007_s00018_019_03097_2
crossref_primary_10_1109_ACCESS_2018_2884249
crossref_primary_10_1038_s41467_021_25496_5
crossref_primary_10_1109_ACCESS_2024_3521319
crossref_primary_10_1155_2017_5151895
crossref_primary_10_1002_humu_23173
crossref_primary_10_1109_JBHI_2022_3147383
crossref_primary_10_1016_j_neucom_2017_07_004
crossref_primary_10_2196_27110
crossref_primary_10_3390_informatics11020034
crossref_primary_10_1016_j_xcrp_2024_102255
crossref_primary_10_1093_bioinformatics_bty262
crossref_primary_10_1109_ACCESS_2024_3487114
crossref_primary_10_1186_s12859_020_03835_5
crossref_primary_10_1038_s41598_020_78681_9
crossref_primary_10_1016_j_compbiomed_2024_108408
crossref_primary_10_1038_s41598_023_28571_7
crossref_primary_10_1117_1_JMI_6_2_026001
crossref_primary_10_1093_bib_bbz161
crossref_primary_10_1155_2020_8824625
crossref_primary_10_1007_s10579_023_09683_y
crossref_primary_10_1002_tpg2_20249
crossref_primary_10_1109_ACCESS_2022_3158977
crossref_primary_10_1111_bph_17388
crossref_primary_10_1007_s00894_023_05626_0
crossref_primary_10_1371_journal_pone_0129024
crossref_primary_10_1021_acs_analchem_0c00710
crossref_primary_10_1080_03091902_2016_1213902
crossref_primary_10_3389_fpsyg_2021_587943
crossref_primary_10_1007_s10639_023_12180_y
crossref_primary_10_1111_ecc_13555
crossref_primary_10_1093_bioinformatics_btu297
crossref_primary_10_1098_rsos_240454
crossref_primary_10_1213_ANE_0000000000004651
crossref_primary_10_1038_s41746_023_00937_1
crossref_primary_10_1016_j_compbiolchem_2022_107766
crossref_primary_10_3390_math8050768
crossref_primary_10_1017_wtc_2021_9
crossref_primary_10_1186_s12931_020_1285_6
crossref_primary_10_1186_s41747_020_00203_z
crossref_primary_10_1002_prot_24708
crossref_primary_10_1021_acs_molpharmaceut_6b00471
crossref_primary_10_3389_fpls_2024_1417912
crossref_primary_10_1016_j_gene_2018_02_044
crossref_primary_10_1021_acsami_3c00593
crossref_primary_10_1007_s10278_022_00770_0
crossref_primary_10_3390_network2040036
crossref_primary_10_1093_nar_gkw374
crossref_primary_10_1002_sta4_384
crossref_primary_10_3390_app14177459
crossref_primary_10_1007_s11042_024_18775_y
crossref_primary_10_1177_09670335241269014
crossref_primary_10_1016_j_future_2021_04_007
crossref_primary_10_3390_ijms19041009
crossref_primary_10_3390_diagnostics12020463
crossref_primary_10_3390_s22218114
crossref_primary_10_1002_humu_23048
crossref_primary_10_1016_j_str_2018_08_011
crossref_primary_10_1186_s12864_017_3914_0
crossref_primary_10_3389_fenvs_2023_1213069
crossref_primary_10_1080_19466315_2014_1002628
crossref_primary_10_1186_1752_0509_9_S5_S1
Cites_doi 10.1110/ps.04881304
10.1145/1007730.1007737
10.1613/jair.953
10.1093/bioinformatics/btm324
10.1093/nar/gkr363
10.2753/MIS0742-1222290110
10.1093/bioinformatics/bti541
10.1186/1471-2164-11-S2-S5
10.1093/bioinformatics/bti486
10.1093/bioinformatics/17.8.700
10.1186/1471-2105-10-436
10.1186/1471-2105-5-113
10.1002/jcc.21701
10.1016/j.jmb.2005.08.020
10.1002/gepi.20211
10.1111/j.0824-7935.2004.t01-1-00228.x
10.1093/nar/gkm238
10.1007/11538059_91
10.1080/073911012010525022
10.1016/S0968-0004(97)01104-3
10.1186/1471-2105-7-208
10.1021/bi00387a002
10.1093/nar/gki372
10.1093/nar/gkm363
10.1038/nmeth0410-248
10.1016/0005-2795(75)90109-9
10.1093/bioinformatics/bth322
10.1002/humu.22110
10.1016/j.jmb.2004.02.002
10.1109/TKDE.2008.239
10.1016/j.jmb.2005.12.025
10.1002/prot.22722
10.1007/3-540-48229-6_9
10.1002/prot.20252
10.1093/nar/gkh340
10.1093/database/bar009
10.1007/978-1-4757-2440-0
10.1007/978-1-4615-0907-3
10.1007/978-3-642-22589-5_2
10.1101/gad.2017311
10.1093/nar/25.17.3389
10.1186/1471-2105-9-S2-S6
10.1093/bioinformatics/bti365
10.1093/bioinformatics/btr682
10.1006/jmbi.2001.5255
10.1186/1471-2105-7-166
10.1093/bioinformatics/btl423
10.1109/IJCNN.2010.5596486
10.1093/bioinformatics/btl504
10.1080/14786440009463897
10.1148/radiology.143.1.7063747
10.1093/bioinformatics/bth195
10.1093/nar/gki058
10.1093/bioinformatics/bti534
10.1111/j.1742-4658.2005.04945.x
10.1016/j.jmb.2005.01.071
10.1109/TSMC.1976.4309452
10.7551/mitpress/1130.003.0015
10.1186/1471-2105-7-217
10.1002/humu.21047
10.3233/ISB-2010-0426
10.1002/humu.20628
10.1093/bioinformatics/btq028
10.1002/psc.401
10.1093/nar/gkf493
10.1109/TKDE.2002.1000348
10.1093/bioinformatics/btn435
10.1093/nar/gkq1208
ContentType Journal Article
Copyright COPYRIGHT 2013 Public Library of Science
2013 Wei, Dunbrack. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2013 Wei, Dunbrack 2013 Wei, Dunbrack
Copyright_xml – notice: COPYRIGHT 2013 Public Library of Science
– notice: 2013 Wei, Dunbrack. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2013 Wei, Dunbrack 2013 Wei, Dunbrack
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
IOV
ISR
3V.
7QG
7QL
7QO
7RV
7SN
7SS
7T5
7TG
7TM
7U9
7X2
7X7
7XB
88E
8AO
8C1
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
KB.
KB0
KL.
L6V
LK8
M0K
M0S
M1P
M7N
M7P
M7S
NAPCQ
P5Z
P62
P64
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PYCSY
RC3
7X8
5PM
ADTOC
UNPAY
DOA
DOI 10.1371/journal.pone.0067863
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Opposing Viewpoints
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
ProQuest Nursing & Allied Health Database
Ecology Abstracts
Entomology Abstracts (Full archive)
Immunology Abstracts
Meteorological & Geoastrophysical Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Agricultural Science Collection
ProQuest Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
ProQuest Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Journals
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central UK/Ireland
ProQuest Advanced Technologies & Aerospace Database
ProQuest Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Technology Collection
ProQuest Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One
ProQuest Materials Science Collection
ProQuest Central
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database (Proquest)
Nursing & Allied Health Database (Alumni Edition)
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest Engineering Collection
Biological Sciences
Agricultural Science Database
ProQuest Health & Medical Collection
Medical Database
Algology Mycology and Protozoology Abstracts (Microbiology C)
ProQuest Biological Science Database
Engineering Database (Proquest)
Nursing & Allied Health Premium
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Environmental Science Database
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering collection
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Agricultural Science Database
Publicly Available Content Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Meteorological & Geoastrophysical Abstracts
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
Virology and AIDS Abstracts
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
Entomology Abstracts
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest Nursing & Allied Health Source (Alumni)
Engineering Research Database
ProQuest One Academic
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
ProQuest Engineering Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
Materials Science Database
ProQuest Materials Science Collection
ProQuest Public Health
ProQuest Nursing & Allied Health Source
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Animal Behavior Abstracts
Materials Science & Engineering Collection
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList


Agricultural Science Database


MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 5
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Biology
Computer Science
DocumentTitleAlternate Balanced Data Sets in Bioinformatics Classifiers
EISSN 1932-6203
ExternalDocumentID 1974583479
oai_doaj_org_article_80946bd52e6c49b4a9a0e590b0b04382
10.1371/journal.pone.0067863
PMC3706434
A478358003
23874456
10_1371_journal_pone_0067863
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GeographicLocations United States--US
Pennsylvania
GeographicLocations_xml – name: Pennsylvania
– name: United States--US
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R01 GM073784
– fundername: NIGMS NIH HHS
  grantid: R01 GM084453
– fundername: NIGMS NIH HHS
  grantid: GM73784
– fundername: NIGMS NIH HHS
  grantid: GM84453
GroupedDBID ---
123
29O
2WC
53G
5VS
7RV
7X2
7X7
7XC
88E
8AO
8C1
8CJ
8FE
8FG
8FH
8FI
8FJ
A8Z
AAFWJ
AAUCC
AAWOE
AAYXX
ABDBF
ABIVO
ABJCF
ABUWG
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHMBA
ALMA_UNASSIGNED_HOLDINGS
AOIJS
APEBS
ARAPS
ATCPS
BAWUL
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BKEYQ
BPHCQ
BVXVI
BWKFM
CCPQU
CITATION
CS3
D1I
D1J
D1K
DIK
DU5
E3Z
EAP
EAS
EBD
EMOBN
ESTFP
ESX
EX3
F5P
FPL
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IEA
IGS
IHR
IHW
INH
INR
IOV
IPNFZ
IPY
ISE
ISR
ITC
K6-
KB.
KQ8
L6V
LK5
LK8
M0K
M1P
M48
M7P
M7R
M7S
M~E
NAPCQ
O5R
O5S
OK1
OVT
P2P
P62
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PUEGO
PYCSY
RIG
RNS
RPM
SV3
TR2
UKHRP
WOQ
WOW
~02
~KM
3V.
ALIPV
BBORY
CGR
CUY
CVF
ECM
EIF
NPM
PV9
RZL
7QG
7QL
7QO
7SN
7SS
7T5
7TG
7TM
7U9
7XB
8FD
8FK
AZQEC
C1K
DWQXO
FR3
GNUQQ
H94
K9.
KL.
M7N
P64
PKEHL
PQEST
PQUKI
PRINS
RC3
7X8
5PM
ACCTH
ADTOC
AFFHD
BBTPI
UNPAY
-
02
AAPBV
ABPTK
ADACO
BBAFP
KM
ID FETCH-LOGICAL-c758t-fbcda7da8e78d7575495e5bd1b491be751b59e1287b28f5b2275df771ed356e33
IEDL.DBID M48
ISSN 1932-6203
IngestDate Fri Nov 26 17:12:39 EST 2021
Tue Oct 14 14:58:48 EDT 2025
Wed Oct 29 12:00:02 EDT 2025
Tue Sep 30 16:53:29 EDT 2025
Mon Sep 08 05:32:41 EDT 2025
Tue Oct 07 08:07:12 EDT 2025
Mon Oct 20 22:07:57 EDT 2025
Mon Oct 20 16:19:35 EDT 2025
Thu Oct 16 15:12:14 EDT 2025
Thu Oct 16 15:00:35 EDT 2025
Thu May 22 21:19:31 EDT 2025
Wed Feb 19 02:05:08 EST 2025
Wed Oct 01 05:10:02 EDT 2025
Thu Apr 24 22:58:12 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
cc-by
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c758t-fbcda7da8e78d7575495e5bd1b491be751b59e1287b28f5b2275df771ed356e33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: I have read the journal’s policy and have the following conflicts. I (Roland Dunbrack) have previously served as a guest editor for PLOS ONE. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials.
Conceived and designed the experiments: QW RLD. Performed the experiments: QW. Analyzed the data: QW RLD. Contributed reagents/materials/analysis tools: QW. Wrote the paper: QW RLD.
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pone.0067863
PMID 23874456
PQID 1974583479
PQPubID 1436336
PageCount e67863
ParticipantIDs plos_journals_1974583479
doaj_primary_oai_doaj_org_article_80946bd52e6c49b4a9a0e590b0b04382
unpaywall_primary_10_1371_journal_pone_0067863
pubmedcentral_primary_oai_pubmedcentral_nih_gov_3706434
proquest_miscellaneous_1411633354
proquest_journals_1974583479
gale_infotracmisc_A478358003
gale_infotracacademiconefile_A478358003
gale_incontextgauss_ISR_A478358003
gale_incontextgauss_IOV_A478358003
gale_healthsolutions_A478358003
pubmed_primary_23874456
crossref_primary_10_1371_journal_pone_0067863
crossref_citationtrail_10_1371_journal_pone_0067863
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2013-07-09
PublicationDateYYYYMMDD 2013-07-09
PublicationDate_xml – month: 07
  year: 2013
  text: 2013-07-09
  day: 09
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco
– name: San Francisco, USA
PublicationTitle PloS one
PublicationTitleAlternate PLoS One
PublicationYear 2013
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References L Chin (ref26) 2011; 25
ref59
JJ Ward (ref15) 2004; 20
T Zhang (ref16) 2012; 29
SF Altschul (ref53) 1997; 25
BW Matthews (ref63) 1975; 405
C Ferrer-Costa (ref44) 2004; 57
SF Altschul (ref54) 2005; 272
SO Garbuzynskiy (ref11) 2004; 13
DL Masica (ref24) 2012; 33
R Calabrese (ref23) 2009; 30
RC Edgar (ref57) 2004; 32
A Estabrooks (ref29) 2004; 20
X Deng (ref76) 2009; 10
M Magrane (ref49) 2011; 2011
I Walsh (ref65) 2012; 28
L Bao (ref4) 2005; 21
S Teng (ref46) 2010; 11
NV Chawla (ref30) 2002; 16
K Pearson (ref61) 1900; 50
M Asgary (ref19) 2007; 23
P Yue (ref42) 2005; 353
G Wainreb (ref7) 2010; 38
KM Ting (ref37) 2002; 14
C Ferrer-Costa (ref2) 2005; 21
A Mottaz (ref48) 2010; 26
H He (ref39) 2009; 21
RJ Dobson (ref47) 2006; 7
G Cui (ref78) 2012; 29
P Yue (ref1) 2006; 7
IA Adzhubei (ref73) 2010; 7
Q Wei (ref25) 2010; 78
K Peng (ref66) 2006; 7
C-C Chang (ref71) 2011; 21–27
JJ Ward (ref75) 2004; 337
(ref74); 79
T Alber (ref52) 1987; 26
ref79
Z Dosztanyi (ref10) 2005; 21
MA Care (ref50) 2007; 23
ref36
YD Cai (ref20) 2002; 8
ref31
J Pei (ref58) 2001; 17
ref33
ref77
ref32
T Jo (ref35) 2004; 6
ref38
DR Velez (ref62) 2007; 31
P Yue (ref67) 2006; 356
V Ramensky (ref8) 2002; 30
Z Dosztanyi (ref9) 2005; 347
L Bao (ref3) 2005; 33
E Capriotti (ref41) 2008; 9
S Velankar (ref55) 2005; 33
ZR Yang (ref13) 2005; 21
I Tomek (ref34) 1976; 6
HC Pace (ref51) 1997; 22
CL Worth (ref22) 2011; 39
C Ferrer-Costa (ref43) 2002; 315
RC Edgar (ref56) 2004; 5
ref70
ref72
T Ishida (ref14) 2007; 35
OV Galzitskaya (ref12) 2006; 22
ref68
ref69
ref64
Y Bromberg (ref6) 2008; 24
E Capriotti (ref45) 2008; 29
ref28
ref27
E Capriotti (ref40) 2006; 22
EL Sonnhammer (ref21) 1998; 6
H Kaur (ref18) 2004; 20
Y Bromberg (ref5) 2007; 35
ref60
S Hirose (ref17) 2010; 10
References_xml – volume: 13
  start-page: 2871
  year: 2004
  ident: ref11
  article-title: To be folded or to be unfolded?
  publication-title: Protein Sci
  doi: 10.1110/ps.04881304
– volume: 6
  start-page: 40
  year: 2004
  ident: ref35
  article-title: Class imbalances versus small disjuncts
  publication-title: ACM SIGKDD Explorations Newsletter
  doi: 10.1145/1007730.1007737
– volume: 16
  start-page: 321
  year: 2002
  ident: ref30
  article-title: SMOTE: Synthetic minority over-sampling technique
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.953
– volume: 23
  start-page: 3125
  year: 2007
  ident: ref19
  article-title: Analysis and prediction of beta-turn types using multinomial logistic regression and artificial neural network
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm324
– volume: 39
  start-page: W215
  year: 2011
  ident: ref22
  article-title: SDM–a server for predicting effects of mutations on protein stability and malfunction
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr363
– volume: 29
  start-page: 341
  year: 2012
  ident: ref78
  article-title: Cost-Sensitive Learning via Priority Samling to Improve the Return on Markering and CRM Investment
  publication-title: Journal of Management Information Systems
  doi: 10.2753/MIS0742-1222290110
– volume: 21
  start-page: 3433
  year: 2005
  ident: ref10
  article-title: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti541
– volume: 6
  start-page: 175
  year: 1998
  ident: ref21
  article-title: A hidden Markov model for predicting transmembrane helices in protein sequences
  publication-title: Proc Int Conf Intell Syst Mol Biol
– volume: 11
  start-page: S5
  year: 2010
  ident: ref46
  article-title: Sequence feature-based prediction of protein stability changes upon amino acid substitutions
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-11-S2-S5
– ident: ref27
– volume: 21
  start-page: 3176
  year: 2005
  ident: ref2
  article-title: PMUT: a web-based tool for the annotation of pathological mutations on proteins
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti486
– volume: 17
  start-page: 700
  year: 2001
  ident: ref58
  article-title: AL2CO: calculation of positional conservation in a protein sequence alignment
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/17.8.700
– volume: 10
  start-page: 436
  year: 2009
  ident: ref76
  article-title: PreDisorder: ab initio sequence-based prediction of protein disordered regions
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-10-436
– volume: 5
  start-page: 113
  year: 2004
  ident: ref56
  article-title: MUSCLE: a multiple sequence alignment method with reduced time and space complexity
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-5-113
– ident: ref72
  doi: 10.1002/jcc.21701
– volume: 353
  start-page: 459
  year: 2005
  ident: ref42
  article-title: Loss of protein structure stability as a major causative factor in monogenic disease
  publication-title: J Mol Biol
  doi: 10.1016/j.jmb.2005.08.020
– volume: 31
  start-page: 306
  year: 2007
  ident: ref62
  article-title: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction
  publication-title: Genet Epidemiol
  doi: 10.1002/gepi.20211
– volume: 20
  start-page: 18
  year: 2004
  ident: ref29
  article-title: A multiple resampling method for learning from imbalanced data sets
  publication-title: Computational Intelligence
  doi: 10.1111/j.0824-7935.2004.t01-1-00228.x
– volume: 35
  start-page: 3823
  year: 2007
  ident: ref5
  article-title: SNAP: predict effect of non-synonymous polymorphisms on function
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkm238
– ident: ref36
– ident: ref32
  doi: 10.1007/11538059_91
– volume: 29
  start-page: 799
  year: 2012
  ident: ref16
  article-title: SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method
  publication-title: J Biomol Struct Dyn
  doi: 10.1080/073911012010525022
– volume: 22
  start-page: 334
  year: 1997
  ident: ref51
  article-title: Lac repressor genetic map in real space
  publication-title: Trends Biochem Sci
  doi: 10.1016/S0968-0004(97)01104-3
– volume: 7
  start-page: 208
  year: 2006
  ident: ref66
  article-title: Length-dependent prediction of protein intrinsic disorder
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-7-208
– volume: 26
  start-page: 3754
  year: 1987
  ident: ref52
  article-title: Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibility in the folded protein
  publication-title: Biochemistry
  doi: 10.1021/bi00387a002
– volume: 33
  start-page: W480
  year: 2005
  ident: ref3
  article-title: nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gki372
– volume: 35
  start-page: W460
  year: 2007
  ident: ref14
  article-title: PrDOS: prediction of disordered protein regions from amino acid sequence
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkm363
– volume: 7
  start-page: 248
  year: 2010
  ident: ref73
  article-title: A method and server for predicting damaging missense mutations
  publication-title: Nature Methods
  doi: 10.1038/nmeth0410-248
– volume: 405
  start-page: 442
  year: 1975
  ident: ref63
  article-title: Comparison of the predicted and observed secondary structure of T4 phage lysozyme
  publication-title: Biochim Biophys Acta
  doi: 10.1016/0005-2795(75)90109-9
– volume: 21–27
  start-page: 27
  year: 2011
  ident: ref71
  article-title: LIBSVM: a library for support vector machines
  publication-title: ACM Transactions on Intelligent Systems and Technology 2: 27
– volume: 20
  start-page: 2751
  year: 2004
  ident: ref18
  article-title: A neural network method for prediction of beta-turn types in proteins using evolutionary information
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth322
– volume: 33
  start-page: 1267
  year: 2012
  ident: ref24
  article-title: Phenotype-optimized sequence ensembles substantially improve prediction of disease-causing mutation in cystic fibrosis
  publication-title: Hum Mutat
  doi: 10.1002/humu.22110
– volume: 337
  start-page: 635
  year: 2004
  ident: ref75
  article-title: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life
  publication-title: J Mol Biol
  doi: 10.1016/j.jmb.2004.02.002
– volume: 21
  start-page: 1263
  year: 2009
  ident: ref39
  article-title: Learning from Imbalaned data
  publication-title: IEEE transactions on Knowledge and Data Engineering
  doi: 10.1109/TKDE.2008.239
– volume: 356
  start-page: 1263
  year: 2006
  ident: ref67
  article-title: Identification and analysis of deleterious human SNPs
  publication-title: J Mol Biol
  doi: 10.1016/j.jmb.2005.12.025
– volume: 78
  start-page: 2058
  year: 2010
  ident: ref25
  article-title: Testing computational prediction of missense mutation phenotypes: functional characterization of 204 mutations of human cystathionine beta synthase
  publication-title: Proteins
  doi: 10.1002/prot.22722
– ident: ref28
  doi: 10.1007/3-540-48229-6_9
– ident: ref33
– volume: 57
  start-page: 811
  year: 2004
  ident: ref44
  article-title: Sequence-based prediction of pathological mutations
  publication-title: Proteins
  doi: 10.1002/prot.20252
– volume: 32
  start-page: 1792
  year: 2004
  ident: ref57
  article-title: MUSCLE: multiple sequence alignment with high accuracy and high throughput
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkh340
– volume: 2011
  start-page: bar009
  year: 2011
  ident: ref49
  article-title: UniProt Knowledgebase: a hub of integrated protein data
  publication-title: Database (Oxford)
  doi: 10.1093/database/bar009
– ident: ref68
  doi: 10.1007/978-1-4757-2440-0
– volume: 23
  start-page: 664
  year: 2007
  ident: ref50
  article-title: Deleterious SNP prediction: be mindful of your training data! Bioinformatics
– ident: ref70
  doi: 10.1007/978-1-4615-0907-3
– ident: ref77
  doi: 10.1007/978-3-642-22589-5_2
– volume: 25
  start-page: 534
  year: 2011
  ident: ref26
  article-title: Making sense of cancer genomic data
  publication-title: Genes Dev
  doi: 10.1101/gad.2017311
– volume: 25
  start-page: 3389
  year: 1997
  ident: ref53
  article-title: Gapped BLAST and PSI-BLAST: a new generation of database programs
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/25.17.3389
– volume: 9
  start-page: S6
  year: 2008
  ident: ref41
  article-title: A three-state prediction of single point mutations on protein stability changes
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-9-S2-S6
– volume: 21
  start-page: 2185
  year: 2005
  ident: ref4
  article-title: Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti365
– ident: ref60
– volume: 28
  start-page: 503
  year: 2012
  ident: ref65
  article-title: ESpritz: accurate and fast prediction of protein disorder
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr682
– volume: 315
  start-page: 771
  year: 2002
  ident: ref43
  article-title: Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties
  publication-title: J Mol Biol
  doi: 10.1006/jmbi.2001.5255
– volume: 7
  start-page: 166
  year: 2006
  ident: ref1
  article-title: SNPs3D: candidate gene and SNP selection for association studies
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-7-166
– volume: 22
  start-page: 2729
  year: 2006
  ident: ref40
  article-title: Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl423
– ident: ref79
  doi: 10.1109/IJCNN.2010.5596486
– volume: 22
  start-page: 2948
  year: 2006
  ident: ref12
  article-title: FoldUnfold: web server for the prediction of disordered regions in protein chain
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btl504
– volume: 79
  start-page: 107
  ident: ref74
  article-title: Monastyrskyy B, Fidelis K, Moult J, Tramontano A, Kryshtafovych A Evaluation of disorder predictions in CASP9
  publication-title: Proteins
– volume: 50
  start-page: 157
  year: 1900
  ident: ref61
  article-title: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling
  publication-title: Philosophical Magazine Series 5
  doi: 10.1080/14786440009463897
– ident: ref64
  doi: 10.1148/radiology.143.1.7063747
– volume: 20
  start-page: 2138
  year: 2004
  ident: ref15
  article-title: The DISOPRED server for the prediction of protein disorder
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth195
– ident: ref38
– ident: ref59
– volume: 33
  start-page: D262
  year: 2005
  ident: ref55
  article-title: E-MSD: an integrated data resource for bioinformatics
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gki058
– volume: 21
  start-page: 3369
  year: 2005
  ident: ref13
  article-title: RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bti534
– volume: 272
  start-page: 5101
  year: 2005
  ident: ref54
  article-title: Protein database searches using compositionally adjusted substitution matrices
  publication-title: Febs J
  doi: 10.1111/j.1742-4658.2005.04945.x
– volume: 347
  start-page: 827
  year: 2005
  ident: ref9
  article-title: The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins
  publication-title: J Mol Biol
  doi: 10.1016/j.jmb.2005.01.071
– volume: 6
  start-page: 769
  year: 1976
  ident: ref34
  article-title: Two modifications of CNN
  publication-title: IEEE Trans System, Man Cybernetics
  doi: 10.1109/TSMC.1976.4309452
– ident: ref69
  doi: 10.7551/mitpress/1130.003.0015
– volume: 7
  start-page: 217
  year: 2006
  ident: ref47
  article-title: Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-7-217
– volume: 30
  start-page: 1237
  year: 2009
  ident: ref23
  article-title: Functional annotations improve the predictive score of human disease-related mutations in proteins
  publication-title: Hum Mutat
  doi: 10.1002/humu.21047
– volume: 10
  start-page: 185
  year: 2010
  ident: ref17
  article-title: POODLE-I: Disordered Region Prediction by Integrating POODLE Series and Structural Information Predictors Based on a Workflow Approach
  publication-title: In Silico Biol
  doi: 10.3233/ISB-2010-0426
– volume: 29
  start-page: 198
  year: 2008
  ident: ref45
  article-title: Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans
  publication-title: Hum Mutat
  doi: 10.1002/humu.20628
– volume: 26
  start-page: 851
  year: 2010
  ident: ref48
  article-title: Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq028
– volume: 8
  start-page: 297
  year: 2002
  ident: ref20
  article-title: Support vector machines for the classification and prediction of beta-turn types
  publication-title: J Pept Sci
  doi: 10.1002/psc.401
– volume: 30
  start-page: 3894
  year: 2002
  ident: ref8
  article-title: Human non-synonymous SNPs: server and survey
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkf493
– volume: 14
  start-page: 659
  year: 2002
  ident: ref37
  article-title: An instance-weighing method to induce cost-sensitive trees
  publication-title: IEEE Trans Knowledge and Data Eng
  doi: 10.1109/TKDE.2002.1000348
– volume: 24
  start-page: 2397
  year: 2008
  ident: ref6
  article-title: SNAP predicts effect of mutations on protein function
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn435
– volume: 38
  year: 2010
  ident: ref7
  article-title: MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkq1208
– ident: ref31
SSID ssj0053866
Score 2.5303023
Snippet Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant...
SourceID plos
doaj
unpaywall
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e67863
SubjectTerms Accuracy
Algorithms
Analysis
Animals
Artificial Intelligence
Bioinformatics
Biology
Cancer
Classifiers
Computational Biology - methods
Computer Science
Correlation coefficient
Correlation coefficients
Data points
Databases, Genetic
Datasets
Genetic Association Studies
Genomes
Genomics
Genotype & phenotype
Humans
Learning algorithms
Machine learning
Mathematical models
Medical research
Missense mutation
Models, Biological
Mutation
Mutation, Missense
Oversampling
Phenotype
Polymorphism, Genetic
Proteins
Reproducibility of Results
Social and Behavioral Sciences
Teaching methods
Training
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3di9QwEA-yL_oinl9XPTWKoD50r22Spn08xeMUVFBP7kEI-dSFpV3sLuJ_70yblise3D3Ivuw20y6ZmSS_aSa_IeR5JZnQuD_obZmlnBV1asocAldRBCwrkOUOA8UPH8uTU_7-TJydK_WFOWEDPfCguMMK4o_SOFH40vLacF3rzIs6M_DBTSycfbOqHoOpYQ6GUVyW8aAck_lhtMty0zZ-2U_QJZstRD1f_zQrLzbrtrsIcv6bOXl912z0n996vT63LB3fIjcjnqRHQz_2yDXf3CZ7ccR29GWklX51h3wHj6CYTEjbQA1mNFrv6FgiguoGfiDlBnzHtFHa-W1HAdJS0x_ZpRZx9ipg6Wy6auBqG0lXkej5Ljk9fvv1zUkaayukFiKEbRqMdVo6XXlZOQmYDQIlL4zLDa9z46XIjag9LF7SFFUQpiikcEHK3DsmSs_YPbJoQJv7hFY2sKriNtjM8hKs7a0G4SwYAGsuhISwUdHKRuJx7Nxa9btpEgKQQVcKzaOieRKSTndtBuKNS-Rfow0nWaTN7i-AM6noTOoyZ0rIE_QANZxBnQa_OuL4ggywNfzNs14CqTMazM35oXddp959-nYFoS-fZ0IvolBoQR1Wx_MQ0Cek5JpJHswkYQKws-Z99NdRK53KIUYUFR4RhjtHH764-enUjA_FfLvGtzuQ4TkAdcYET8j9weUnzQLIkxxwd0LkbDDMVD9vaVY_e-ZyJhEBwzOX07C5knEf_A_jPiQ3ir6WiUyz-oAstr92_hEgyq153E8efwHi6nNz
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEF6V9AAX1PJqaIEFIQEHp7bX67UPCDXQqiARUNqiHpCsfZZIkR3qRIh_z4yzNlhUUOWSeMeOPDsz_sa78w0hzzPBuMT1QavTMEhYnAcqjSBx5bHDtgJhZDBR_DhJj8-SD-f8fINM2loY3FbZxsQmUJtK4zvy_QiAL8-w7vHN4nuAXaNwdbVtoSF9awXzuqEYu0E2Y2TGGpDN8eHk87SNzeDdaeoL6JiI9v18jRZVaUdN4E5Z7wHV8Ph30XqwmFf1VVD07x2VN1flQv78IefzPx5XR1vktseZ9GBtGNtkw5Z3yLb35Jq-9HTTr-6Sr2ApdFrNLa0cHeNOR20NPfWtI6gs4QdSccD3d3Ip6Yld1hSgLh03pby0aas5c9hSm85KOFp5MlYkgL5Hzo4OT98eB77nQqAhc1gGTmkjhZGZFZkRgOUggbJcmUgleaSs4JHiuYWHmlBx5riKY8GNEyKyhvHUMnafDErQ5g6hmXYsyxLtdKiTFKzAagnCoVMA4oxzQ8JaRRfaE5JjX4x50ayyCUhM1roqcHoKPz1DEnRnLdaEHP-RH-McdrJIp90cqC4vCu-dRQZJbqoMj22qk1wlMpeh5Xmo4IMrpUPyBC2gWNemdkGhOEjwxRlgbvibZ40EUmqUuGfnQq7qunj_6cs1hE6mPaEXXshVoA4tfZ0E3BNSdfUk93qSEBh0b3gH7bXVSl38diE4s7Xhq4efdsN4UdyHV9pqBTJJBACeMZ4MyYO1yXeaBfAnEsDjQyJ6ztBTfX-knH1rGM2ZQGQM1xx1bnOtyX347_vYJbfipnuJCMJ8jwyWlyv7CDDkUj32geEXRzRxJg
  priority: 102
  providerName: ProQuest
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdG9wAvwPhaYYBBSMBDsiSO4-SxBaaBxEDbioYEimzHhooqqUgqBA_87dwlbkRgiPGA-pLGZ7c5n893Od_vCHmQCsYlxgeNTgIvZlHmqSQEx5VHFssKBGGBjuLLg2R_Fr844Scb5P06F8ZxEHzERVW3kXy8qEqz6zi5i3hFXfTUD5kI1z38JRD5rfKFFdMiDuGbsQYTkM6RzYSDqT4im7OD15O3XaQ58pIoYC6d7k8jDbarFtW_190j_GenGaa_n688vyqX8usXuVj8tHntXSLf14_dnVn55K8a5etvvyBC_je-XCYXndlLJ90oW2TDlFfIllMsNX3k0K8fXyXvQHDpYbUwtLJ0igcvtSnosatkQWUJXxAZBK6fykbSI9PUFCxvOm0zi2lb5XNuscI3nZdwt3LYsIhHfY3M9p4dP9n3XAkIT4Mj03hW6UKKQqZGpIUA0xL8OcNVEao4C5URPFQ8M7DHChWllqsoErywQoSmYDwxjF0noxIYsE1oqi1L01hbHeg4AaE0WgJxYBXYlIW1Y8LWM51rh4-OZToWeRv0E-AndbzKkaO54-iYeH2vZYcP8hf6KQpRT4vo3u0NmNLcTWWegs-dqIJHJtFxpmKZycDwLFDwwcDtmNxFEcy7VNleR-WTGN_jgQsAP3O_pUCEjxKPEH2Qq7rOn796cwaio8MB0UNHZCtgh5YubQOeCSVuQLkzoAQ9pQfN2yiya67UeQiuLE8xkxl6rhfR6c33-mYcFI8FlqZaAU0cgj_BGI_H5Ea35nrOgi0qYnAPxkQMVuOA9cOWcv6xBVhnAg11GNPv1-2ZJvfmv3a4RS5EbXkV4QXZDhk1n1fmNhi5jbrjVNUPo9qqTg
  priority: 102
  providerName: Unpaywall
Title The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics
URI https://www.ncbi.nlm.nih.gov/pubmed/23874456
https://www.proquest.com/docview/1974583479
https://www.proquest.com/docview/1411633354
https://pubmed.ncbi.nlm.nih.gov/PMC3706434
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0067863&type=printable
https://doaj.org/article/80946bd52e6c49b4a9a0e590b0b04382
http://dx.doi.org/10.1371/journal.pone.0067863
UnpaywallVersion publishedVersion
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVFSB
  databaseName: Free Full-Text Journals in Chemistry
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: HH5
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://abc-chemistry.org/
  providerName: ABC ChemistRy
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KQ8
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KQ8
  dateStart: 20061001
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: Academic Search Ultimate (EBSCO)
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: ABDBF
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: EBSCOhost Food Science Source
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: A8Z
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=fsr
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DIK
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: GX1
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M~E
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: RPM
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7X7
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: BENPR
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Proquest Public Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8C1
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/publichealth
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8FG
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 20250930
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M48
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELe27gFeEONrhVEMQgIeUuXDjpMHhNqxMpBWpm5FRUKKbMcZlaqka1rB_nvuUjciosAUKUrts6uc7-y7nH0_Ql5GIuAS44NGh67DAj92VOiB48r9DGEFXC9FR_F0GJ6M2acJn-yQTaDdMrDc6tohntR4Mev-vLp-Bwr_tkJtEN6mUXde5KZbTb-gM_MrB6GlMARrcTZ2yR4sXzHiO5yyOtQACh-G9kzd3zprrFlVav96Am_NZ0W5zTr9c5PlrVU-l9c_5Gz22wo2uEvuWNOT9taysk92TH6P7FvlLulrm4H6zX3yDYSHjoqZoUVG-7j5UZuUXlg0CSpz-IHZOeD5vVxKem6WJQXrl_ar0720QtqcZoiyTac5lBY2PyvmhH5AxoPji6MTx8IwOBqciaWTKZ1KkcrIiCgVYN6BT2W4Sj3FYk8ZwT3FYwPrnFB-lHHl-4KnmRCeSQMemiB4SFo5cPOA0EhnQRQxnWlXsxAEw2gJxG6mwK5Ls6xNgg2jE21zlCNUxiypAm8CfJU1rxIcnsQOT5s4dav5OkfHf-j7OIY1LWbYrgqKxWViFTaJwO8NVcp9E2oWKyZj6RoeuwouDJ62yTOUgGR9XLWeJ5Iew29pYIbD37yoKDDLRo7beC7lqiyTj5-_3IDofNQgemWJsgLYoaU9OgHvhNm7GpSHDUqYK3Sj-gDldcOVMvHAneQRniaGlhsZ3l79vK7GTnFrXm6KFdAwD2z6IOCsTR6tRb7mLNiDgoGJ3iaioQwN1jdr8un3Ksl5INBYhj67tdrcaHAf__s9npDbfgVoIhw3PiSt5WJlnoJZuVQdsismAu7RkYf3wYcO2esfD89GnepDTaeaNqBsPDzrff0FWqh-yQ
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELem8TBeEONrhcEMAgEP6fJhx8kDQitjatkH0tahPiAF23FGpSopS6tp_xR_I3eJE4iYYC9TX9r4kirn88-_s313hLyMRMAl7g8aHboOC_zYUaEHjiv3Mywr4HopOoqHR-HwlH2a8MkK-dnEwuCxygYTK6BOC41r5NseEF8eYdzj-_kPB6tG4e5qU0KjNot9c3kBLlv5brQL_fvK9_c-jj8MHVtVwNHAjRdOpnQqRSojI6JUAFsBF8FwlXqKxZ4ygnuKxwZgWyg_yrjyfcHTTAjPpAEPDS6AAuTfYgFgCYwfMWkdPMCOMLTheYHwtq019OdFbvrVtBAGnemvqhLQzgWr81lRXkV0_z6vubbM5_LyQs5mf0yGe3fJHcti6U5tdutkxeT3yLrFiZK-scms394nX8EO6XExM7TI6ADPUWqT0rEtTEFlDj8w0Qd835ULSU_MoqRApOmgChSmVdHOaYYFu-k0h6uFTfWK6aUfkNMb0f1DspqDNjcIjXQWRBHTmXY1C8HGjJYg7GYKKGKaZT0SNIpOtE13jlU3Zkm1hyfA7al1lWD3JLZ7esRp75rX6T7-Iz_APmxlMVl3daE4P0vs2E8icKFDlXLfhJrFislYuobHroIP7sP2yBZaQFJHvraQk-wwXJYDRg9_86KSwIQdOZ4IOpPLskxGn79cQ-jkuCP02gplBahDSxuFAe-EicA6kpsdSYAd3WneQHtttFImvwco3NnY8NXNz9tmfCie8stNsQQZ5oF7EASc9cij2uRbzQK1FAzYfo-IzmDoqL7bkk-_V_nSA4G8G57Zb4fNtTr38b_fY4usDceHB8nB6Gj_CbntV3VShOPGm2R1cb40T4GtLtSzCiIo-XbTmPQLRUan5w
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELemIgEviPG1wmAGgYCHtPlynDwgtFKqlcFA-1IfkILt2KNSlZSl1bR_jb-Ou8QNREywl6kvbXxJlfP559_ZvjtCnsc8YAL3B7WKXCcM_MSRkQeOK_MNlhVwvQwdxU970c5R-GHCJmvk5yoWBo9VrjCxAuqsULhG3veA-LIY4x77xh6L-DIcvZ3_cLCCFO60rspp1Cayq8_PwH0r34yH0NcvfH_0_vDdjmMrDDgKePLCMVJlgmci1jzOODAXcBc0k5knw8STmjNPskQDhHPpx4ZJ3-csM5x7OgtYpHExFOD_Gg-CBI8T8knj7AGORJEN1Qu417eW0ZsXue5VU0QUtKbCqmJAMy905rOivIj0_n1288Yyn4vzMzGb_TExjm6TW5bR0u3aBNfJms7vkHWLGSV9ZRNbv75LvoJN0v1ipmlh6ADPVCqd0UNbpIKKHH5g0g_4PhQLQQ_0oqRAqumgChqmVQHPqcHi3XSaw9XCpn3FVNP3yNGV6P4-6eSgzQ1CY2WCOA6VUa4KI7A3rQQIu0YCXcyM6ZJgpehU2dTnWIFjllb7eRxcoFpXKXZParunS5zmrnmd-uM_8gPsw0YWE3dXF4rTk9TiQBqDOx3JjPk6UmEiQ5EIV7PElfDBPdku2UILSOso2AZ-0u0Ql-iA3cPfPKskMHlHjsPgRCzLMh1_Pr6E0MF-S-ilFTIFqEMJG5EB74RJwVqSmy1JgCDVat5Ae11ppUx_D1a4c2XDFzc_bZrxoXjiL9fFEmRCD1yFIGBhlzyoTb7RLNBMHgLz7xLeGgwt1bdb8un3Knd6wJGDwzN7zbC5VOc-_Pd7bJHrgEbpx_He7iNy069KpnDHTTZJZ3G61I-BuC7kkwohKPl21ZD0C2b-rCo
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdG9wAvwPhaYYBBSMBDsiSO4-SxBaaBxEDbioYEimzHhooqqUgqBA_87dwlbkRgiPGA-pLGZ7c5n893Od_vCHmQCsYlxgeNTgIvZlHmqSQEx5VHFssKBGGBjuLLg2R_Fr844Scb5P06F8ZxEHzERVW3kXy8qEqz6zi5i3hFXfTUD5kI1z38JRD5rfKFFdMiDuGbsQYTkM6RzYSDqT4im7OD15O3XaQ58pIoYC6d7k8jDbarFtW_190j_GenGaa_n688vyqX8usXuVj8tHntXSLf14_dnVn55K8a5etvvyBC_je-XCYXndlLJ90oW2TDlFfIllMsNX3k0K8fXyXvQHDpYbUwtLJ0igcvtSnosatkQWUJXxAZBK6fykbSI9PUFCxvOm0zi2lb5XNuscI3nZdwt3LYsIhHfY3M9p4dP9n3XAkIT4Mj03hW6UKKQqZGpIUA0xL8OcNVEao4C5URPFQ8M7DHChWllqsoErywQoSmYDwxjF0noxIYsE1oqi1L01hbHeg4AaE0WgJxYBXYlIW1Y8LWM51rh4-OZToWeRv0E-AndbzKkaO54-iYeH2vZYcP8hf6KQpRT4vo3u0NmNLcTWWegs-dqIJHJtFxpmKZycDwLFDwwcDtmNxFEcy7VNleR-WTGN_jgQsAP3O_pUCEjxKPEH2Qq7rOn796cwaio8MB0UNHZCtgh5YubQOeCSVuQLkzoAQ9pQfN2yiya67UeQiuLE8xkxl6rhfR6c33-mYcFI8FlqZaAU0cgj_BGI_H5Ea35nrOgi0qYnAPxkQMVuOA9cOWcv6xBVhnAg11GNPv1-2ZJvfmv3a4RS5EbXkV4QXZDhk1n1fmNhi5jbrjVNUPo9qqTg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Role+of+Balanced+Training+and+Testing+Data+Sets+for+Binary+Classifiers+in+Bioinformatics&rft.jtitle=PloS+one&rft.au=Wei%2C+Qiong&rft.au=Dunbrack%2C+Roland+L&rft.date=2013-07-09&rft.pub=Public+Library+of+Science&rft.eissn=1932-6203&rft.volume=8&rft.issue=7&rft.spage=e67863&rft_id=info:doi/10.1371%2Fjournal.pone.0067863&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon