The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics
Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or o...
Saved in:
| Published in | PloS one Vol. 8; no. 7; p. e67863 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Public Library of Science
09.07.2013
Public Library of Science (PLoS) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1932-6203 1932-6203 |
| DOI | 10.1371/journal.pone.0067863 |
Cover
| Abstract | Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. |
|---|---|
| AbstractList | Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. |
| Audience | Academic |
| Author | Dunbrack, Roland L. Wei, Qiong |
| AuthorAffiliation | Miami University, United States of America Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America |
| AuthorAffiliation_xml | – name: Miami University, United States of America – name: Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania, United States of America |
| Author_xml | – sequence: 1 givenname: Qiong surname: Wei fullname: Wei, Qiong – sequence: 2 givenname: Roland L. surname: Dunbrack fullname: Dunbrack, Roland L. |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/23874456$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkl1v0zAUhiM0xD7gHyCIhITgosWO7djhAmkrX5UmTdoKV0jWSXLSunLtEifA_j0O7aZ2mtDki9jHz3nP8XlznBw47zBJnlMypkzSd0vftw7seB3DY0JyqXL2KDmiBctGeUbYwc7-MDkOYUmIYCrPnySHGVOSc5EfJT9mC0wvvcXUN-kZWHAV1umsBeOMm6fg4gFDN-w_QgfpFXYhbXybnhkH7XU6sRCCaQy2ITUuRr1x8XoFnanC0-RxAzbgs-33JPn2-dNs8nV0fvFlOjk9H1VSqG7UlFUNsgaFUtVSSMELgaKsackLWqIUtBQF0kzJMlONKLNMirqRkmLNRI6MnSQvN7pr64PeDiZoWkguFOOyiMR0Q9QelnrdmlVsXnsw-l_At3MNbWzZolak4HlZiwzzihclhwIIioKUcXGmsqglNlq9W8P1b7D2VpASPXhz04IevNFbb2Leh22XfbnCukLXtWD3mtm_cWah5_6XZpLknPEo8GYr0PqffXRFr0yo0EbT0PfxvZzSWIeJAX11B71_KltqDvHhg3GxbjWI6lMuFROKkKHv8T1UXDWuTBVf2JgY30t4u5cQmQ7_dHPoQ9DTq8uHsxff99nXO-wCwXaL4G3fGe_CPvhid9K3I7757SPAN0DV-hBabB7q4Ps7aZXpYCgfJ2Ls_5P_Aqt9KPs |
| CitedBy_id | crossref_primary_10_1016_j_schres_2016_08_027 crossref_primary_10_1016_j_bspc_2019_04_032 crossref_primary_10_1016_j_rama_2024_10_004 crossref_primary_10_1007_s13205_016_0410_1 crossref_primary_10_1002_qute_202400084 crossref_primary_10_21501_21454086_2904 crossref_primary_10_3390_ijgi7020039 crossref_primary_10_1111_ene_16591 crossref_primary_10_1111_jop_13397 crossref_primary_10_1587_transinf_2019ICI0001 crossref_primary_10_1186_s12859_022_04677_z crossref_primary_10_1186_s12888_021_03117_1 crossref_primary_10_1016_j_geoderma_2024_116916 crossref_primary_10_3389_fdgth_2022_878369 crossref_primary_10_1007_s11517_022_02617_w crossref_primary_10_3390_bioengineering11100962 crossref_primary_10_1093_bib_bbx119 crossref_primary_10_1016_j_xcrp_2022_101113 crossref_primary_10_1017_ice_2019_288 crossref_primary_10_1002_humu_23222 crossref_primary_10_1186_s12911_023_02380_4 crossref_primary_10_1002_humu_22770 crossref_primary_10_3390_min12121621 crossref_primary_10_1007_s12530_021_09403_3 crossref_primary_10_1109_ACCESS_2024_3502540 crossref_primary_10_1016_j_fss_2014_01_015 crossref_primary_10_1038_s41598_018_21758_3 crossref_primary_10_1007_s11030_021_10256_w crossref_primary_10_1016_j_eswa_2023_122778 crossref_primary_10_1038_srep12512 crossref_primary_10_1038_s42003_022_03397_7 crossref_primary_10_1007_s10916_015_0428_7 crossref_primary_10_1109_ACCESS_2022_3184113 crossref_primary_10_1038_s41398_023_02422_5 crossref_primary_10_1093_bioinformatics_btab743 crossref_primary_10_1093_jamia_ocab267 crossref_primary_10_1111_aos_13334 crossref_primary_10_1093_biostatistics_kxaa028 crossref_primary_10_1007_s10278_017_0013_3 crossref_primary_10_3390_diagnostics13040584 crossref_primary_10_2196_54335 crossref_primary_10_3390_jcm13102955 crossref_primary_10_1038_s41598_018_26041_z crossref_primary_10_1371_journal_pcbi_1006615 crossref_primary_10_1007_s00259_021_05242_1 crossref_primary_10_1177_0306624X221102799 crossref_primary_10_1016_j_geoderma_2021_115446 crossref_primary_10_1177_08944393211032950 crossref_primary_10_1177_0306624X211062139 crossref_primary_10_1021_acs_cgd_9b00318 crossref_primary_10_1111_jedm_12416 crossref_primary_10_1038_s41598_020_80570_0 crossref_primary_10_1109_TIFS_2017_2762828 crossref_primary_10_1016_j_str_2014_11_010 crossref_primary_10_1080_15384047_2017_1326439 crossref_primary_10_1186_s13073_021_00828_8 crossref_primary_10_1186_s12911_020_01335_3 crossref_primary_10_1186_s13634_021_00755_1 crossref_primary_10_1007_s10661_017_6333_4 crossref_primary_10_1186_s13040_021_00244_z crossref_primary_10_3390_rs14133157 crossref_primary_10_2196_17853 crossref_primary_10_1029_2021GL093455 crossref_primary_10_1109_JSTARS_2023_3326963 crossref_primary_10_1038_nmeth_3039 crossref_primary_10_3390_curroncol30110707 crossref_primary_10_1016_j_jtbi_2015_11_021 crossref_primary_10_1088_1748_0221_12_10_C10004 crossref_primary_10_1109_ACCESS_2025_3549271 crossref_primary_10_18553_jmcp_2017_23_9_926 crossref_primary_10_3390_make4010012 crossref_primary_10_3390_medicina60040558 crossref_primary_10_3390_rs14246307 crossref_primary_10_1007_s10661_023_11608_9 crossref_primary_10_1002_humu_23802 crossref_primary_10_1148_radiol_211860 crossref_primary_10_52586_5036 crossref_primary_10_2147_COPD_S271237 crossref_primary_10_3390_electronics10030249 crossref_primary_10_1002_prot_24997 crossref_primary_10_1038_s41598_022_16062_0 crossref_primary_10_1007_s12525_017_0254_5 crossref_primary_10_1007_s13534_022_00246_8 crossref_primary_10_1186_s13062_019_0252_y crossref_primary_10_1007_s11227_024_06108_7 crossref_primary_10_1371_journal_pone_0315928 crossref_primary_10_1186_s12859_017_1972_6 crossref_primary_10_1177_10790632231200838 crossref_primary_10_3390_diagnostics14161727 crossref_primary_10_1371_journal_pone_0117380 crossref_primary_10_3390_jpm13121660 crossref_primary_10_1155_2024_6052552 crossref_primary_10_3390_math9162015 crossref_primary_10_1016_j_compbiomed_2015_10_013 crossref_primary_10_3390_math11234735 crossref_primary_10_3389_fcimb_2023_1223576 crossref_primary_10_1002_prot_25574 crossref_primary_10_1109_ACCESS_2024_3521497 crossref_primary_10_1007_s12539_016_0151_1 crossref_primary_10_1177_03611981231212162 crossref_primary_10_1021_acs_molpharmaceut_0c00326 crossref_primary_10_1007_s00521_020_05570_7 crossref_primary_10_18267_j_aip_163 crossref_primary_10_1002_humu_22987 crossref_primary_10_1093_bioinformatics_btw696 crossref_primary_10_1007_s00227_020_03811_w crossref_primary_10_1021_acs_jcim_3c01410 crossref_primary_10_1016_j_sbi_2015_01_003 crossref_primary_10_3389_fnagi_2023_1090400 crossref_primary_10_1158_0008_5472_CAN_18_2032 crossref_primary_10_1007_s00429_015_1059_y crossref_primary_10_3389_fenvs_2020_579676 crossref_primary_10_1007_s42977_023_00188_x crossref_primary_10_1016_j_compbiolchem_2014_11_002 crossref_primary_10_1007_s00018_019_03097_2 crossref_primary_10_1109_ACCESS_2018_2884249 crossref_primary_10_1038_s41467_021_25496_5 crossref_primary_10_1109_ACCESS_2024_3521319 crossref_primary_10_1155_2017_5151895 crossref_primary_10_1002_humu_23173 crossref_primary_10_1109_JBHI_2022_3147383 crossref_primary_10_1016_j_neucom_2017_07_004 crossref_primary_10_2196_27110 crossref_primary_10_3390_informatics11020034 crossref_primary_10_1016_j_xcrp_2024_102255 crossref_primary_10_1093_bioinformatics_bty262 crossref_primary_10_1109_ACCESS_2024_3487114 crossref_primary_10_1186_s12859_020_03835_5 crossref_primary_10_1038_s41598_020_78681_9 crossref_primary_10_1016_j_compbiomed_2024_108408 crossref_primary_10_1038_s41598_023_28571_7 crossref_primary_10_1117_1_JMI_6_2_026001 crossref_primary_10_1093_bib_bbz161 crossref_primary_10_1155_2020_8824625 crossref_primary_10_1007_s10579_023_09683_y crossref_primary_10_1002_tpg2_20249 crossref_primary_10_1109_ACCESS_2022_3158977 crossref_primary_10_1111_bph_17388 crossref_primary_10_1007_s00894_023_05626_0 crossref_primary_10_1371_journal_pone_0129024 crossref_primary_10_1021_acs_analchem_0c00710 crossref_primary_10_1080_03091902_2016_1213902 crossref_primary_10_3389_fpsyg_2021_587943 crossref_primary_10_1007_s10639_023_12180_y crossref_primary_10_1111_ecc_13555 crossref_primary_10_1093_bioinformatics_btu297 crossref_primary_10_1098_rsos_240454 crossref_primary_10_1213_ANE_0000000000004651 crossref_primary_10_1038_s41746_023_00937_1 crossref_primary_10_1016_j_compbiolchem_2022_107766 crossref_primary_10_3390_math8050768 crossref_primary_10_1017_wtc_2021_9 crossref_primary_10_1186_s12931_020_1285_6 crossref_primary_10_1186_s41747_020_00203_z crossref_primary_10_1002_prot_24708 crossref_primary_10_1021_acs_molpharmaceut_6b00471 crossref_primary_10_3389_fpls_2024_1417912 crossref_primary_10_1016_j_gene_2018_02_044 crossref_primary_10_1021_acsami_3c00593 crossref_primary_10_1007_s10278_022_00770_0 crossref_primary_10_3390_network2040036 crossref_primary_10_1093_nar_gkw374 crossref_primary_10_1002_sta4_384 crossref_primary_10_3390_app14177459 crossref_primary_10_1007_s11042_024_18775_y crossref_primary_10_1177_09670335241269014 crossref_primary_10_1016_j_future_2021_04_007 crossref_primary_10_3390_ijms19041009 crossref_primary_10_3390_diagnostics12020463 crossref_primary_10_3390_s22218114 crossref_primary_10_1002_humu_23048 crossref_primary_10_1016_j_str_2018_08_011 crossref_primary_10_1186_s12864_017_3914_0 crossref_primary_10_3389_fenvs_2023_1213069 crossref_primary_10_1080_19466315_2014_1002628 crossref_primary_10_1186_1752_0509_9_S5_S1 |
| Cites_doi | 10.1110/ps.04881304 10.1145/1007730.1007737 10.1613/jair.953 10.1093/bioinformatics/btm324 10.1093/nar/gkr363 10.2753/MIS0742-1222290110 10.1093/bioinformatics/bti541 10.1186/1471-2164-11-S2-S5 10.1093/bioinformatics/bti486 10.1093/bioinformatics/17.8.700 10.1186/1471-2105-10-436 10.1186/1471-2105-5-113 10.1002/jcc.21701 10.1016/j.jmb.2005.08.020 10.1002/gepi.20211 10.1111/j.0824-7935.2004.t01-1-00228.x 10.1093/nar/gkm238 10.1007/11538059_91 10.1080/073911012010525022 10.1016/S0968-0004(97)01104-3 10.1186/1471-2105-7-208 10.1021/bi00387a002 10.1093/nar/gki372 10.1093/nar/gkm363 10.1038/nmeth0410-248 10.1016/0005-2795(75)90109-9 10.1093/bioinformatics/bth322 10.1002/humu.22110 10.1016/j.jmb.2004.02.002 10.1109/TKDE.2008.239 10.1016/j.jmb.2005.12.025 10.1002/prot.22722 10.1007/3-540-48229-6_9 10.1002/prot.20252 10.1093/nar/gkh340 10.1093/database/bar009 10.1007/978-1-4757-2440-0 10.1007/978-1-4615-0907-3 10.1007/978-3-642-22589-5_2 10.1101/gad.2017311 10.1093/nar/25.17.3389 10.1186/1471-2105-9-S2-S6 10.1093/bioinformatics/bti365 10.1093/bioinformatics/btr682 10.1006/jmbi.2001.5255 10.1186/1471-2105-7-166 10.1093/bioinformatics/btl423 10.1109/IJCNN.2010.5596486 10.1093/bioinformatics/btl504 10.1080/14786440009463897 10.1148/radiology.143.1.7063747 10.1093/bioinformatics/bth195 10.1093/nar/gki058 10.1093/bioinformatics/bti534 10.1111/j.1742-4658.2005.04945.x 10.1016/j.jmb.2005.01.071 10.1109/TSMC.1976.4309452 10.7551/mitpress/1130.003.0015 10.1186/1471-2105-7-217 10.1002/humu.21047 10.3233/ISB-2010-0426 10.1002/humu.20628 10.1093/bioinformatics/btq028 10.1002/psc.401 10.1093/nar/gkf493 10.1109/TKDE.2002.1000348 10.1093/bioinformatics/btn435 10.1093/nar/gkq1208 |
| ContentType | Journal Article |
| Copyright | COPYRIGHT 2013 Public Library of Science 2013 Wei, Dunbrack. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2013 Wei, Dunbrack 2013 Wei, Dunbrack |
| Copyright_xml | – notice: COPYRIGHT 2013 Public Library of Science – notice: 2013 Wei, Dunbrack. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2013 Wei, Dunbrack 2013 Wei, Dunbrack |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM IOV ISR 3V. 7QG 7QL 7QO 7RV 7SN 7SS 7T5 7TG 7TM 7U9 7X2 7X7 7XB 88E 8AO 8C1 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABJCF ABUWG AEUYN AFKRA ARAPS ATCPS AZQEC BBNVY BENPR BGLVJ BHPHI C1K CCPQU D1I DWQXO FR3 FYUFA GHDGH GNUQQ H94 HCIFZ K9. KB. KB0 KL. L6V LK8 M0K M0S M1P M7N M7P M7S NAPCQ P5Z P62 P64 PATMY PDBOC PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS PTHSS PYCSY RC3 7X8 5PM ADTOC UNPAY DOA |
| DOI | 10.1371/journal.pone.0067863 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Opposing Viewpoints Gale In Context: Science ProQuest Central (Corporate) Animal Behavior Abstracts Bacteriology Abstracts (Microbiology B) Biotechnology Research Abstracts ProQuest Nursing & Allied Health Database Ecology Abstracts Entomology Abstracts (Full archive) Immunology Abstracts Meteorological & Geoastrophysical Abstracts Nucleic Acids Abstracts Virology and AIDS Abstracts Agricultural Science Collection ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest Pharma Collection ProQuest Public Health Database Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Journals Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland ProQuest Advanced Technologies & Aerospace Database ProQuest Agricultural & Environmental Science Collection ProQuest Central Essentials Biological Science Collection ProQuest Central Technology Collection ProQuest Natural Science Collection Environmental Sciences and Pollution Management ProQuest One ProQuest Materials Science Collection ProQuest Central Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student AIDS and Cancer Research Abstracts SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Materials Science Database (Proquest) Nursing & Allied Health Database (Alumni Edition) Meteorological & Geoastrophysical Abstracts - Academic ProQuest Engineering Collection Biological Sciences Agricultural Science Database ProQuest Health & Medical Collection Medical Database Algology Mycology and Protozoology Abstracts (Microbiology C) ProQuest Biological Science Database Engineering Database (Proquest) Nursing & Allied Health Premium Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts Environmental Science Database Materials Science Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering collection Environmental Science Collection Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Agricultural Science Database Publicly Available Content Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials Nucleic Acids Abstracts SciTech Premium Collection ProQuest Central China Environmental Sciences and Pollution Management ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Meteorological & Geoastrophysical Abstracts Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database Virology and AIDS Abstracts ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Agricultural Science Collection ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database Ecology Abstracts ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Environmental Science Collection Entomology Abstracts Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Environmental Science Database ProQuest Nursing & Allied Health Source (Alumni) Engineering Research Database ProQuest One Academic Meteorological & Geoastrophysical Abstracts - Academic ProQuest One Academic (New) Technology Collection Technology Research Database ProQuest One Academic Middle East (New) Materials Science Collection ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central ProQuest Health & Medical Research Collection Genetics Abstracts ProQuest Engineering Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Bacteriology Abstracts (Microbiology B) Algology Mycology and Protozoology Abstracts (Microbiology C) Agricultural & Environmental Science Collection AIDS and Cancer Research Abstracts Materials Science Database ProQuest Materials Science Collection ProQuest Public Health ProQuest Nursing & Allied Health Source ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest Medical Library Animal Behavior Abstracts Materials Science & Engineering Collection Immunology Abstracts ProQuest Central (Alumni) MEDLINE - Academic |
| DatabaseTitleList | Agricultural Science Database MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 5 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) Biology Computer Science |
| DocumentTitleAlternate | Balanced Data Sets in Bioinformatics Classifiers |
| EISSN | 1932-6203 |
| ExternalDocumentID | 1974583479 oai_doaj_org_article_80946bd52e6c49b4a9a0e590b0b04382 10.1371/journal.pone.0067863 PMC3706434 A478358003 23874456 10_1371_journal_pone_0067863 |
| Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GeographicLocations | United States--US Pennsylvania |
| GeographicLocations_xml | – name: Pennsylvania – name: United States--US |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM073784 – fundername: NIGMS NIH HHS grantid: R01 GM084453 – fundername: NIGMS NIH HHS grantid: GM73784 – fundername: NIGMS NIH HHS grantid: GM84453 |
| GroupedDBID | --- 123 29O 2WC 53G 5VS 7RV 7X2 7X7 7XC 88E 8AO 8C1 8CJ 8FE 8FG 8FH 8FI 8FJ A8Z AAFWJ AAUCC AAWOE AAYXX ABDBF ABIVO ABJCF ABUWG ACGFO ACIHN ACIWK ACPRK ACUHS ADBBV ADRAZ AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHMBA ALMA_UNASSIGNED_HOLDINGS AOIJS APEBS ARAPS ATCPS BAWUL BBNVY BCNDV BENPR BGLVJ BHPHI BKEYQ BPHCQ BVXVI BWKFM CCPQU CITATION CS3 D1I D1J D1K DIK DU5 E3Z EAP EAS EBD EMOBN ESTFP ESX EX3 F5P FPL FYUFA GROUPED_DOAJ GX1 HCIFZ HH5 HMCUK HYE IAO IEA IGS IHR IHW INH INR IOV IPNFZ IPY ISE ISR ITC K6- KB. KQ8 L6V LK5 LK8 M0K M1P M48 M7P M7R M7S M~E NAPCQ O5R O5S OK1 OVT P2P P62 PATMY PDBOC PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PTHSS PUEGO PYCSY RIG RNS RPM SV3 TR2 UKHRP WOQ WOW ~02 ~KM 3V. ALIPV BBORY CGR CUY CVF ECM EIF NPM PV9 RZL 7QG 7QL 7QO 7SN 7SS 7T5 7TG 7TM 7U9 7XB 8FD 8FK AZQEC C1K DWQXO FR3 GNUQQ H94 K9. KL. M7N P64 PKEHL PQEST PQUKI PRINS RC3 7X8 5PM ACCTH ADTOC AFFHD BBTPI UNPAY - 02 AAPBV ABPTK ADACO BBAFP KM |
| ID | FETCH-LOGICAL-c758t-fbcda7da8e78d7575495e5bd1b491be751b59e1287b28f5b2275df771ed356e33 |
| IEDL.DBID | M48 |
| ISSN | 1932-6203 |
| IngestDate | Fri Nov 26 17:12:39 EST 2021 Tue Oct 14 14:58:48 EDT 2025 Wed Oct 29 12:00:02 EDT 2025 Tue Sep 30 16:53:29 EDT 2025 Mon Sep 08 05:32:41 EDT 2025 Tue Oct 07 08:07:12 EDT 2025 Mon Oct 20 22:07:57 EDT 2025 Mon Oct 20 16:19:35 EDT 2025 Thu Oct 16 15:12:14 EDT 2025 Thu Oct 16 15:00:35 EDT 2025 Thu May 22 21:19:31 EDT 2025 Wed Feb 19 02:05:08 EST 2025 Wed Oct 01 05:10:02 EDT 2025 Thu Apr 24 22:58:12 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. cc-by Creative Commons Attribution License |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c758t-fbcda7da8e78d7575495e5bd1b491be751b59e1287b28f5b2275df771ed356e33 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: I have read the journal’s policy and have the following conflicts. I (Roland Dunbrack) have previously served as a guest editor for PLOS ONE. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials. Conceived and designed the experiments: QW RLD. Performed the experiments: QW. Analyzed the data: QW RLD. Contributed reagents/materials/analysis tools: QW. Wrote the paper: QW RLD. |
| OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pone.0067863 |
| PMID | 23874456 |
| PQID | 1974583479 |
| PQPubID | 1436336 |
| PageCount | e67863 |
| ParticipantIDs | plos_journals_1974583479 doaj_primary_oai_doaj_org_article_80946bd52e6c49b4a9a0e590b0b04382 unpaywall_primary_10_1371_journal_pone_0067863 pubmedcentral_primary_oai_pubmedcentral_nih_gov_3706434 proquest_miscellaneous_1411633354 proquest_journals_1974583479 gale_infotracmisc_A478358003 gale_infotracacademiconefile_A478358003 gale_incontextgauss_ISR_A478358003 gale_incontextgauss_IOV_A478358003 gale_healthsolutions_A478358003 pubmed_primary_23874456 crossref_primary_10_1371_journal_pone_0067863 crossref_citationtrail_10_1371_journal_pone_0067863 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2013-07-09 |
| PublicationDateYYYYMMDD | 2013-07-09 |
| PublicationDate_xml | – month: 07 year: 2013 text: 2013-07-09 day: 09 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: San Francisco – name: San Francisco, USA |
| PublicationTitle | PloS one |
| PublicationTitleAlternate | PLoS One |
| PublicationYear | 2013 |
| Publisher | Public Library of Science Public Library of Science (PLoS) |
| Publisher_xml | – name: Public Library of Science – name: Public Library of Science (PLoS) |
| References | L Chin (ref26) 2011; 25 ref59 JJ Ward (ref15) 2004; 20 T Zhang (ref16) 2012; 29 SF Altschul (ref53) 1997; 25 BW Matthews (ref63) 1975; 405 C Ferrer-Costa (ref44) 2004; 57 SF Altschul (ref54) 2005; 272 SO Garbuzynskiy (ref11) 2004; 13 DL Masica (ref24) 2012; 33 R Calabrese (ref23) 2009; 30 RC Edgar (ref57) 2004; 32 A Estabrooks (ref29) 2004; 20 X Deng (ref76) 2009; 10 M Magrane (ref49) 2011; 2011 I Walsh (ref65) 2012; 28 L Bao (ref4) 2005; 21 S Teng (ref46) 2010; 11 NV Chawla (ref30) 2002; 16 K Pearson (ref61) 1900; 50 M Asgary (ref19) 2007; 23 P Yue (ref42) 2005; 353 G Wainreb (ref7) 2010; 38 KM Ting (ref37) 2002; 14 C Ferrer-Costa (ref2) 2005; 21 A Mottaz (ref48) 2010; 26 H He (ref39) 2009; 21 RJ Dobson (ref47) 2006; 7 G Cui (ref78) 2012; 29 P Yue (ref1) 2006; 7 IA Adzhubei (ref73) 2010; 7 Q Wei (ref25) 2010; 78 K Peng (ref66) 2006; 7 C-C Chang (ref71) 2011; 21–27 JJ Ward (ref75) 2004; 337 (ref74); 79 T Alber (ref52) 1987; 26 ref79 Z Dosztanyi (ref10) 2005; 21 MA Care (ref50) 2007; 23 ref36 YD Cai (ref20) 2002; 8 ref31 J Pei (ref58) 2001; 17 ref33 ref77 ref32 T Jo (ref35) 2004; 6 ref38 DR Velez (ref62) 2007; 31 P Yue (ref67) 2006; 356 V Ramensky (ref8) 2002; 30 Z Dosztanyi (ref9) 2005; 347 L Bao (ref3) 2005; 33 E Capriotti (ref41) 2008; 9 S Velankar (ref55) 2005; 33 ZR Yang (ref13) 2005; 21 I Tomek (ref34) 1976; 6 HC Pace (ref51) 1997; 22 CL Worth (ref22) 2011; 39 C Ferrer-Costa (ref43) 2002; 315 RC Edgar (ref56) 2004; 5 ref70 ref72 T Ishida (ref14) 2007; 35 OV Galzitskaya (ref12) 2006; 22 ref68 ref69 ref64 Y Bromberg (ref6) 2008; 24 E Capriotti (ref45) 2008; 29 ref28 ref27 E Capriotti (ref40) 2006; 22 EL Sonnhammer (ref21) 1998; 6 H Kaur (ref18) 2004; 20 Y Bromberg (ref5) 2007; 35 ref60 S Hirose (ref17) 2010; 10 |
| References_xml | – volume: 13 start-page: 2871 year: 2004 ident: ref11 article-title: To be folded or to be unfolded? publication-title: Protein Sci doi: 10.1110/ps.04881304 – volume: 6 start-page: 40 year: 2004 ident: ref35 article-title: Class imbalances versus small disjuncts publication-title: ACM SIGKDD Explorations Newsletter doi: 10.1145/1007730.1007737 – volume: 16 start-page: 321 year: 2002 ident: ref30 article-title: SMOTE: Synthetic minority over-sampling technique publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.953 – volume: 23 start-page: 3125 year: 2007 ident: ref19 article-title: Analysis and prediction of beta-turn types using multinomial logistic regression and artificial neural network publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm324 – volume: 39 start-page: W215 year: 2011 ident: ref22 article-title: SDM–a server for predicting effects of mutations on protein stability and malfunction publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr363 – volume: 29 start-page: 341 year: 2012 ident: ref78 article-title: Cost-Sensitive Learning via Priority Samling to Improve the Return on Markering and CRM Investment publication-title: Journal of Management Information Systems doi: 10.2753/MIS0742-1222290110 – volume: 21 start-page: 3433 year: 2005 ident: ref10 article-title: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti541 – volume: 6 start-page: 175 year: 1998 ident: ref21 article-title: A hidden Markov model for predicting transmembrane helices in protein sequences publication-title: Proc Int Conf Intell Syst Mol Biol – volume: 11 start-page: S5 year: 2010 ident: ref46 article-title: Sequence feature-based prediction of protein stability changes upon amino acid substitutions publication-title: BMC Genomics doi: 10.1186/1471-2164-11-S2-S5 – ident: ref27 – volume: 21 start-page: 3176 year: 2005 ident: ref2 article-title: PMUT: a web-based tool for the annotation of pathological mutations on proteins publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti486 – volume: 17 start-page: 700 year: 2001 ident: ref58 article-title: AL2CO: calculation of positional conservation in a protein sequence alignment publication-title: Bioinformatics doi: 10.1093/bioinformatics/17.8.700 – volume: 10 start-page: 436 year: 2009 ident: ref76 article-title: PreDisorder: ab initio sequence-based prediction of protein disordered regions publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-10-436 – volume: 5 start-page: 113 year: 2004 ident: ref56 article-title: MUSCLE: a multiple sequence alignment method with reduced time and space complexity publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-5-113 – ident: ref72 doi: 10.1002/jcc.21701 – volume: 353 start-page: 459 year: 2005 ident: ref42 article-title: Loss of protein structure stability as a major causative factor in monogenic disease publication-title: J Mol Biol doi: 10.1016/j.jmb.2005.08.020 – volume: 31 start-page: 306 year: 2007 ident: ref62 article-title: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction publication-title: Genet Epidemiol doi: 10.1002/gepi.20211 – volume: 20 start-page: 18 year: 2004 ident: ref29 article-title: A multiple resampling method for learning from imbalanced data sets publication-title: Computational Intelligence doi: 10.1111/j.0824-7935.2004.t01-1-00228.x – volume: 35 start-page: 3823 year: 2007 ident: ref5 article-title: SNAP: predict effect of non-synonymous polymorphisms on function publication-title: Nucleic Acids Res doi: 10.1093/nar/gkm238 – ident: ref36 – ident: ref32 doi: 10.1007/11538059_91 – volume: 29 start-page: 799 year: 2012 ident: ref16 article-title: SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method publication-title: J Biomol Struct Dyn doi: 10.1080/073911012010525022 – volume: 22 start-page: 334 year: 1997 ident: ref51 article-title: Lac repressor genetic map in real space publication-title: Trends Biochem Sci doi: 10.1016/S0968-0004(97)01104-3 – volume: 7 start-page: 208 year: 2006 ident: ref66 article-title: Length-dependent prediction of protein intrinsic disorder publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-7-208 – volume: 26 start-page: 3754 year: 1987 ident: ref52 article-title: Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibility in the folded protein publication-title: Biochemistry doi: 10.1021/bi00387a002 – volume: 33 start-page: W480 year: 2005 ident: ref3 article-title: nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms publication-title: Nucleic Acids Res doi: 10.1093/nar/gki372 – volume: 35 start-page: W460 year: 2007 ident: ref14 article-title: PrDOS: prediction of disordered protein regions from amino acid sequence publication-title: Nucleic Acids Res doi: 10.1093/nar/gkm363 – volume: 7 start-page: 248 year: 2010 ident: ref73 article-title: A method and server for predicting damaging missense mutations publication-title: Nature Methods doi: 10.1038/nmeth0410-248 – volume: 405 start-page: 442 year: 1975 ident: ref63 article-title: Comparison of the predicted and observed secondary structure of T4 phage lysozyme publication-title: Biochim Biophys Acta doi: 10.1016/0005-2795(75)90109-9 – volume: 21–27 start-page: 27 year: 2011 ident: ref71 article-title: LIBSVM: a library for support vector machines publication-title: ACM Transactions on Intelligent Systems and Technology 2: 27 – volume: 20 start-page: 2751 year: 2004 ident: ref18 article-title: A neural network method for prediction of beta-turn types in proteins using evolutionary information publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth322 – volume: 33 start-page: 1267 year: 2012 ident: ref24 article-title: Phenotype-optimized sequence ensembles substantially improve prediction of disease-causing mutation in cystic fibrosis publication-title: Hum Mutat doi: 10.1002/humu.22110 – volume: 337 start-page: 635 year: 2004 ident: ref75 article-title: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life publication-title: J Mol Biol doi: 10.1016/j.jmb.2004.02.002 – volume: 21 start-page: 1263 year: 2009 ident: ref39 article-title: Learning from Imbalaned data publication-title: IEEE transactions on Knowledge and Data Engineering doi: 10.1109/TKDE.2008.239 – volume: 356 start-page: 1263 year: 2006 ident: ref67 article-title: Identification and analysis of deleterious human SNPs publication-title: J Mol Biol doi: 10.1016/j.jmb.2005.12.025 – volume: 78 start-page: 2058 year: 2010 ident: ref25 article-title: Testing computational prediction of missense mutation phenotypes: functional characterization of 204 mutations of human cystathionine beta synthase publication-title: Proteins doi: 10.1002/prot.22722 – ident: ref28 doi: 10.1007/3-540-48229-6_9 – ident: ref33 – volume: 57 start-page: 811 year: 2004 ident: ref44 article-title: Sequence-based prediction of pathological mutations publication-title: Proteins doi: 10.1002/prot.20252 – volume: 32 start-page: 1792 year: 2004 ident: ref57 article-title: MUSCLE: multiple sequence alignment with high accuracy and high throughput publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh340 – volume: 2011 start-page: bar009 year: 2011 ident: ref49 article-title: UniProt Knowledgebase: a hub of integrated protein data publication-title: Database (Oxford) doi: 10.1093/database/bar009 – ident: ref68 doi: 10.1007/978-1-4757-2440-0 – volume: 23 start-page: 664 year: 2007 ident: ref50 article-title: Deleterious SNP prediction: be mindful of your training data! Bioinformatics – ident: ref70 doi: 10.1007/978-1-4615-0907-3 – ident: ref77 doi: 10.1007/978-3-642-22589-5_2 – volume: 25 start-page: 534 year: 2011 ident: ref26 article-title: Making sense of cancer genomic data publication-title: Genes Dev doi: 10.1101/gad.2017311 – volume: 25 start-page: 3389 year: 1997 ident: ref53 article-title: Gapped BLAST and PSI-BLAST: a new generation of database programs publication-title: Nucleic Acids Research doi: 10.1093/nar/25.17.3389 – volume: 9 start-page: S6 year: 2008 ident: ref41 article-title: A three-state prediction of single point mutations on protein stability changes publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-9-S2-S6 – volume: 21 start-page: 2185 year: 2005 ident: ref4 article-title: Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti365 – ident: ref60 – volume: 28 start-page: 503 year: 2012 ident: ref65 article-title: ESpritz: accurate and fast prediction of protein disorder publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr682 – volume: 315 start-page: 771 year: 2002 ident: ref43 article-title: Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties publication-title: J Mol Biol doi: 10.1006/jmbi.2001.5255 – volume: 7 start-page: 166 year: 2006 ident: ref1 article-title: SNPs3D: candidate gene and SNP selection for association studies publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-7-166 – volume: 22 start-page: 2729 year: 2006 ident: ref40 article-title: Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl423 – ident: ref79 doi: 10.1109/IJCNN.2010.5596486 – volume: 22 start-page: 2948 year: 2006 ident: ref12 article-title: FoldUnfold: web server for the prediction of disordered regions in protein chain publication-title: Bioinformatics doi: 10.1093/bioinformatics/btl504 – volume: 79 start-page: 107 ident: ref74 article-title: Monastyrskyy B, Fidelis K, Moult J, Tramontano A, Kryshtafovych A Evaluation of disorder predictions in CASP9 publication-title: Proteins – volume: 50 start-page: 157 year: 1900 ident: ref61 article-title: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling publication-title: Philosophical Magazine Series 5 doi: 10.1080/14786440009463897 – ident: ref64 doi: 10.1148/radiology.143.1.7063747 – volume: 20 start-page: 2138 year: 2004 ident: ref15 article-title: The DISOPRED server for the prediction of protein disorder publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth195 – ident: ref38 – ident: ref59 – volume: 33 start-page: D262 year: 2005 ident: ref55 article-title: E-MSD: an integrated data resource for bioinformatics publication-title: Nucleic Acids Res doi: 10.1093/nar/gki058 – volume: 21 start-page: 3369 year: 2005 ident: ref13 article-title: RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti534 – volume: 272 start-page: 5101 year: 2005 ident: ref54 article-title: Protein database searches using compositionally adjusted substitution matrices publication-title: Febs J doi: 10.1111/j.1742-4658.2005.04945.x – volume: 347 start-page: 827 year: 2005 ident: ref9 article-title: The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins publication-title: J Mol Biol doi: 10.1016/j.jmb.2005.01.071 – volume: 6 start-page: 769 year: 1976 ident: ref34 article-title: Two modifications of CNN publication-title: IEEE Trans System, Man Cybernetics doi: 10.1109/TSMC.1976.4309452 – ident: ref69 doi: 10.7551/mitpress/1130.003.0015 – volume: 7 start-page: 217 year: 2006 ident: ref47 article-title: Predicting deleterious nsSNPs: an analysis of sequence and structural attributes publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-7-217 – volume: 30 start-page: 1237 year: 2009 ident: ref23 article-title: Functional annotations improve the predictive score of human disease-related mutations in proteins publication-title: Hum Mutat doi: 10.1002/humu.21047 – volume: 10 start-page: 185 year: 2010 ident: ref17 article-title: POODLE-I: Disordered Region Prediction by Integrating POODLE Series and Structural Information Predictors Based on a Workflow Approach publication-title: In Silico Biol doi: 10.3233/ISB-2010-0426 – volume: 29 start-page: 198 year: 2008 ident: ref45 article-title: Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans publication-title: Hum Mutat doi: 10.1002/humu.20628 – volume: 26 start-page: 851 year: 2010 ident: ref48 article-title: Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq028 – volume: 8 start-page: 297 year: 2002 ident: ref20 article-title: Support vector machines for the classification and prediction of beta-turn types publication-title: J Pept Sci doi: 10.1002/psc.401 – volume: 30 start-page: 3894 year: 2002 ident: ref8 article-title: Human non-synonymous SNPs: server and survey publication-title: Nucleic Acids Res doi: 10.1093/nar/gkf493 – volume: 14 start-page: 659 year: 2002 ident: ref37 article-title: An instance-weighing method to induce cost-sensitive trees publication-title: IEEE Trans Knowledge and Data Eng doi: 10.1109/TKDE.2002.1000348 – volume: 24 start-page: 2397 year: 2008 ident: ref6 article-title: SNAP predicts effect of mutations on protein function publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn435 – volume: 38 year: 2010 ident: ref7 article-title: MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data publication-title: Nucleic Acids Res doi: 10.1093/nar/gkq1208 – ident: ref31 |
| SSID | ssj0053866 |
| Score | 2.5303023 |
| Snippet | Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant... |
| SourceID | plos doaj unpaywall pubmedcentral proquest gale pubmed crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | e67863 |
| SubjectTerms | Accuracy Algorithms Analysis Animals Artificial Intelligence Bioinformatics Biology Cancer Classifiers Computational Biology - methods Computer Science Correlation coefficient Correlation coefficients Data points Databases, Genetic Datasets Genetic Association Studies Genomes Genomics Genotype & phenotype Humans Learning algorithms Machine learning Mathematical models Medical research Missense mutation Models, Biological Mutation Mutation, Missense Oversampling Phenotype Polymorphism, Genetic Proteins Reproducibility of Results Social and Behavioral Sciences Teaching methods Training |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3di9QwEA-yL_oinl9XPTWKoD50r22Spn08xeMUVFBP7kEI-dSFpV3sLuJ_70yblise3D3Ivuw20y6ZmSS_aSa_IeR5JZnQuD_obZmlnBV1asocAldRBCwrkOUOA8UPH8uTU_7-TJydK_WFOWEDPfCguMMK4o_SOFH40vLacF3rzIs6M_DBTSycfbOqHoOpYQ6GUVyW8aAck_lhtMty0zZ-2U_QJZstRD1f_zQrLzbrtrsIcv6bOXl912z0n996vT63LB3fIjcjnqRHQz_2yDXf3CZ7ccR29GWklX51h3wHj6CYTEjbQA1mNFrv6FgiguoGfiDlBnzHtFHa-W1HAdJS0x_ZpRZx9ipg6Wy6auBqG0lXkej5Ljk9fvv1zUkaayukFiKEbRqMdVo6XXlZOQmYDQIlL4zLDa9z46XIjag9LF7SFFUQpiikcEHK3DsmSs_YPbJoQJv7hFY2sKriNtjM8hKs7a0G4SwYAGsuhISwUdHKRuJx7Nxa9btpEgKQQVcKzaOieRKSTndtBuKNS-Rfow0nWaTN7i-AM6noTOoyZ0rIE_QANZxBnQa_OuL4ggywNfzNs14CqTMazM35oXddp959-nYFoS-fZ0IvolBoQR1Wx_MQ0Cek5JpJHswkYQKws-Z99NdRK53KIUYUFR4RhjtHH764-enUjA_FfLvGtzuQ4TkAdcYET8j9weUnzQLIkxxwd0LkbDDMVD9vaVY_e-ZyJhEBwzOX07C5knEf_A_jPiQ3ir6WiUyz-oAstr92_hEgyq153E8efwHi6nNz priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEF6V9AAX1PJqaIEFIQEHp7bX67UPCDXQqiARUNqiHpCsfZZIkR3qRIh_z4yzNlhUUOWSeMeOPDsz_sa78w0hzzPBuMT1QavTMEhYnAcqjSBx5bHDtgJhZDBR_DhJj8-SD-f8fINM2loY3FbZxsQmUJtK4zvy_QiAL8-w7vHN4nuAXaNwdbVtoSF9awXzuqEYu0E2Y2TGGpDN8eHk87SNzeDdaeoL6JiI9v18jRZVaUdN4E5Z7wHV8Ph30XqwmFf1VVD07x2VN1flQv78IefzPx5XR1vktseZ9GBtGNtkw5Z3yLb35Jq-9HTTr-6Sr2ApdFrNLa0cHeNOR20NPfWtI6gs4QdSccD3d3Ip6Yld1hSgLh03pby0aas5c9hSm85KOFp5MlYkgL5Hzo4OT98eB77nQqAhc1gGTmkjhZGZFZkRgOUggbJcmUgleaSs4JHiuYWHmlBx5riKY8GNEyKyhvHUMnafDErQ5g6hmXYsyxLtdKiTFKzAagnCoVMA4oxzQ8JaRRfaE5JjX4x50ayyCUhM1roqcHoKPz1DEnRnLdaEHP-RH-McdrJIp90cqC4vCu-dRQZJbqoMj22qk1wlMpeh5Xmo4IMrpUPyBC2gWNemdkGhOEjwxRlgbvibZ40EUmqUuGfnQq7qunj_6cs1hE6mPaEXXshVoA4tfZ0E3BNSdfUk93qSEBh0b3gH7bXVSl38diE4s7Xhq4efdsN4UdyHV9pqBTJJBACeMZ4MyYO1yXeaBfAnEsDjQyJ6ztBTfX-knH1rGM2ZQGQM1xx1bnOtyX347_vYJbfipnuJCMJ8jwyWlyv7CDDkUj32geEXRzRxJg priority: 102 providerName: ProQuest – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdG9wAvwPhaYYBBSMBDsiSO4-SxBaaBxEDbioYEimzHhooqqUgqBA_87dwlbkRgiPGA-pLGZ7c5n893Od_vCHmQCsYlxgeNTgIvZlHmqSQEx5VHFssKBGGBjuLLg2R_Fr844Scb5P06F8ZxEHzERVW3kXy8qEqz6zi5i3hFXfTUD5kI1z38JRD5rfKFFdMiDuGbsQYTkM6RzYSDqT4im7OD15O3XaQ58pIoYC6d7k8jDbarFtW_190j_GenGaa_n688vyqX8usXuVj8tHntXSLf14_dnVn55K8a5etvvyBC_je-XCYXndlLJ90oW2TDlFfIllMsNX3k0K8fXyXvQHDpYbUwtLJ0igcvtSnosatkQWUJXxAZBK6fykbSI9PUFCxvOm0zi2lb5XNuscI3nZdwt3LYsIhHfY3M9p4dP9n3XAkIT4Mj03hW6UKKQqZGpIUA0xL8OcNVEao4C5URPFQ8M7DHChWllqsoErywQoSmYDwxjF0noxIYsE1oqi1L01hbHeg4AaE0WgJxYBXYlIW1Y8LWM51rh4-OZToWeRv0E-AndbzKkaO54-iYeH2vZYcP8hf6KQpRT4vo3u0NmNLcTWWegs-dqIJHJtFxpmKZycDwLFDwwcDtmNxFEcy7VNleR-WTGN_jgQsAP3O_pUCEjxKPEH2Qq7rOn796cwaio8MB0UNHZCtgh5YubQOeCSVuQLkzoAQ9pQfN2yiya67UeQiuLE8xkxl6rhfR6c33-mYcFI8FlqZaAU0cgj_BGI_H5Ea35nrOgi0qYnAPxkQMVuOA9cOWcv6xBVhnAg11GNPv1-2ZJvfmv3a4RS5EbXkV4QXZDhk1n1fmNhi5jbrjVNUPo9qqTg priority: 102 providerName: Unpaywall |
| Title | The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/23874456 https://www.proquest.com/docview/1974583479 https://www.proquest.com/docview/1411633354 https://pubmed.ncbi.nlm.nih.gov/PMC3706434 https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0067863&type=printable https://doaj.org/article/80946bd52e6c49b4a9a0e590b0b04382 http://dx.doi.org/10.1371/journal.pone.0067863 |
| UnpaywallVersion | publishedVersion |
| Volume | 8 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVFSB databaseName: Free Full-Text Journals in Chemistry customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: HH5 dateStart: 20060101 isFulltext: true titleUrlDefault: http://abc-chemistry.org/ providerName: ABC ChemistRy – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: KQ8 dateStart: 20060101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: KQ8 dateStart: 20061001 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: DOA dateStart: 20060101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Academic Search Ultimate (EBSCO) customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: ABDBF dateStart: 20080101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: EBSCOhost Food Science Source customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: A8Z dateStart: 20080101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=fsr providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: DIK dateStart: 20060101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: GX1 dateStart: 20060101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: M~E dateStart: 20060101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: RPM dateStart: 20060101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: 7X7 dateStart: 20061201 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: BENPR dateStart: 20061201 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Proquest Public Health Database customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: 8C1 dateStart: 20061201 isFulltext: true titleUrlDefault: https://search.proquest.com/publichealth providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1932-6203 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: 8FG dateStart: 20061201 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1932-6203 dateEnd: 20250930 omitProxy: true ssIdentifier: ssj0053866 issn: 1932-6203 databaseCode: M48 dateStart: 20061201 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELe27gFeEONrhVEMQgIeUuXDjpMHhNqxMpBWpm5FRUKKbMcZlaqka1rB_nvuUjciosAUKUrts6uc7-y7nH0_Ql5GIuAS44NGh67DAj92VOiB48r9DGEFXC9FR_F0GJ6M2acJn-yQTaDdMrDc6tohntR4Mev-vLp-Bwr_tkJtEN6mUXde5KZbTb-gM_MrB6GlMARrcTZ2yR4sXzHiO5yyOtQACh-G9kzd3zprrFlVav96Am_NZ0W5zTr9c5PlrVU-l9c_5Gz22wo2uEvuWNOT9taysk92TH6P7FvlLulrm4H6zX3yDYSHjoqZoUVG-7j5UZuUXlg0CSpz-IHZOeD5vVxKem6WJQXrl_ar0720QtqcZoiyTac5lBY2PyvmhH5AxoPji6MTx8IwOBqciaWTKZ1KkcrIiCgVYN6BT2W4Sj3FYk8ZwT3FYwPrnFB-lHHl-4KnmRCeSQMemiB4SFo5cPOA0EhnQRQxnWlXsxAEw2gJxG6mwK5Ls6xNgg2jE21zlCNUxiypAm8CfJU1rxIcnsQOT5s4dav5OkfHf-j7OIY1LWbYrgqKxWViFTaJwO8NVcp9E2oWKyZj6RoeuwouDJ62yTOUgGR9XLWeJ5Iew29pYIbD37yoKDDLRo7beC7lqiyTj5-_3IDofNQgemWJsgLYoaU9OgHvhNm7GpSHDUqYK3Sj-gDldcOVMvHAneQRniaGlhsZ3l79vK7GTnFrXm6KFdAwD2z6IOCsTR6tRb7mLNiDgoGJ3iaioQwN1jdr8un3Ksl5INBYhj67tdrcaHAf__s9npDbfgVoIhw3PiSt5WJlnoJZuVQdsismAu7RkYf3wYcO2esfD89GnepDTaeaNqBsPDzrff0FWqh-yQ |
| linkProvider | Scholars Portal |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELem8TBeEONrhcEMAgEP6fJhx8kDQitjatkH0tahPiAF23FGpSopS6tp_xR_I3eJE4iYYC9TX9r4kirn88-_s313hLyMRMAl7g8aHboOC_zYUaEHjiv3Mywr4HopOoqHR-HwlH2a8MkK-dnEwuCxygYTK6BOC41r5NseEF8eYdzj-_kPB6tG4e5qU0KjNot9c3kBLlv5brQL_fvK9_c-jj8MHVtVwNHAjRdOpnQqRSojI6JUAFsBF8FwlXqKxZ4ygnuKxwZgWyg_yrjyfcHTTAjPpAEPDS6AAuTfYgFgCYwfMWkdPMCOMLTheYHwtq019OdFbvrVtBAGnemvqhLQzgWr81lRXkV0_z6vubbM5_LyQs5mf0yGe3fJHcti6U5tdutkxeT3yLrFiZK-scms394nX8EO6XExM7TI6ADPUWqT0rEtTEFlDj8w0Qd835ULSU_MoqRApOmgChSmVdHOaYYFu-k0h6uFTfWK6aUfkNMb0f1DspqDNjcIjXQWRBHTmXY1C8HGjJYg7GYKKGKaZT0SNIpOtE13jlU3Zkm1hyfA7al1lWD3JLZ7esRp75rX6T7-Iz_APmxlMVl3daE4P0vs2E8icKFDlXLfhJrFislYuobHroIP7sP2yBZaQFJHvraQk-wwXJYDRg9_86KSwIQdOZ4IOpPLskxGn79cQ-jkuCP02gplBahDSxuFAe-EicA6kpsdSYAd3WneQHtttFImvwco3NnY8NXNz9tmfCie8stNsQQZ5oF7EASc9cij2uRbzQK1FAzYfo-IzmDoqL7bkk-_V_nSA4G8G57Zb4fNtTr38b_fY4usDceHB8nB6Gj_CbntV3VShOPGm2R1cb40T4GtLtSzCiIo-XbTmPQLRUan5w |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELemIgEviPG1wmAGgYCHtPlynDwgtFKqlcFA-1IfkILt2KNSlZSl1bR_jb-Ou8QNREywl6kvbXxJlfP559_ZvjtCnsc8YAL3B7WKXCcM_MSRkQeOK_MNlhVwvQwdxU970c5R-GHCJmvk5yoWBo9VrjCxAuqsULhG3veA-LIY4x77xh6L-DIcvZ3_cLCCFO60rspp1Cayq8_PwH0r34yH0NcvfH_0_vDdjmMrDDgKePLCMVJlgmci1jzOODAXcBc0k5knw8STmjNPskQDhHPpx4ZJ3-csM5x7OgtYpHExFOD_Gg-CBI8T8knj7AGORJEN1Qu417eW0ZsXue5VU0QUtKbCqmJAMy905rOivIj0_n1288Yyn4vzMzGb_TExjm6TW5bR0u3aBNfJms7vkHWLGSV9ZRNbv75LvoJN0v1ipmlh6ADPVCqd0UNbpIKKHH5g0g_4PhQLQQ_0oqRAqumgChqmVQHPqcHi3XSaw9XCpn3FVNP3yNGV6P4-6eSgzQ1CY2WCOA6VUa4KI7A3rQQIu0YCXcyM6ZJgpehU2dTnWIFjllb7eRxcoFpXKXZParunS5zmrnmd-uM_8gPsw0YWE3dXF4rTk9TiQBqDOx3JjPk6UmEiQ5EIV7PElfDBPdku2UILSOso2AZ-0u0Ql-iA3cPfPKskMHlHjsPgRCzLMh1_Pr6E0MF-S-ilFTIFqEMJG5EB74RJwVqSmy1JgCDVat5Ae11ppUx_D1a4c2XDFzc_bZrxoXjiL9fFEmRCD1yFIGBhlzyoTb7RLNBMHgLz7xLeGgwt1bdb8un3Knd6wJGDwzN7zbC5VOc-_Pd7bJHrgEbpx_He7iNy069KpnDHTTZJZ3G61I-BuC7kkwohKPl21ZD0C2b-rCo |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELdG9wAvwPhaYYBBSMBDsiSO4-SxBaaBxEDbioYEimzHhooqqUgqBA_87dwlbkRgiPGA-pLGZ7c5n893Od_vCHmQCsYlxgeNTgIvZlHmqSQEx5VHFssKBGGBjuLLg2R_Fr844Scb5P06F8ZxEHzERVW3kXy8qEqz6zi5i3hFXfTUD5kI1z38JRD5rfKFFdMiDuGbsQYTkM6RzYSDqT4im7OD15O3XaQ58pIoYC6d7k8jDbarFtW_190j_GenGaa_n688vyqX8usXuVj8tHntXSLf14_dnVn55K8a5etvvyBC_je-XCYXndlLJ90oW2TDlFfIllMsNX3k0K8fXyXvQHDpYbUwtLJ0igcvtSnosatkQWUJXxAZBK6fykbSI9PUFCxvOm0zi2lb5XNuscI3nZdwt3LYsIhHfY3M9p4dP9n3XAkIT4Mj03hW6UKKQqZGpIUA0xL8OcNVEao4C5URPFQ8M7DHChWllqsoErywQoSmYDwxjF0noxIYsE1oqi1L01hbHeg4AaE0WgJxYBXYlIW1Y8LWM51rh4-OZToWeRv0E-AndbzKkaO54-iYeH2vZYcP8hf6KQpRT4vo3u0NmNLcTWWegs-dqIJHJtFxpmKZycDwLFDwwcDtmNxFEcy7VNleR-WTGN_jgQsAP3O_pUCEjxKPEH2Qq7rOn796cwaio8MB0UNHZCtgh5YubQOeCSVuQLkzoAQ9pQfN2yiya67UeQiuLE8xkxl6rhfR6c33-mYcFI8FlqZaAU0cgj_BGI_H5Ea35nrOgi0qYnAPxkQMVuOA9cOWcv6xBVhnAg11GNPv1-2ZJvfmv3a4RS5EbXkV4QXZDhk1n1fmNhi5jbrjVNUPo9qqTg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Role+of+Balanced+Training+and+Testing+Data+Sets+for+Binary+Classifiers+in+Bioinformatics&rft.jtitle=PloS+one&rft.au=Wei%2C+Qiong&rft.au=Dunbrack%2C+Roland+L&rft.date=2013-07-09&rft.pub=Public+Library+of+Science&rft.eissn=1932-6203&rft.volume=8&rft.issue=7&rft.spage=e67863&rft_id=info:doi/10.1371%2Fjournal.pone.0067863&rft.externalDBID=HAS_PDF_LINK |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon |