Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies

The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on computational biology and bioinformatics Vol. 5; no. 3; pp. 368 - 384
Main Authors	Lee, George, Rodriguez, Carlos, Madabhushi, Anant
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2008 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms and association rules Availability Bioinformatics Bioinformatics (genome or protein) databases Biomedical measurements Cancer classification Clustering Data and knowledge visualization Data Interpretation, Statistical Data mining Decision trees Discriminant analysis Diseases Explosions Feature extraction or construction Gene Expression Profiling - methods Machine learning Nonlinear Dynamics Pattern Recognition, Automated - methods Principal component analysis Procurement Protein engineering Reproducibility of Results Sensitivity and Specificity Software Studies Bioinformatics (genome or protein) databases and association rules Feature extraction or construction Data and knowledge visualization Clustering classification Data mining
Online Access	Get full text
ISSN	1545-5963 1557-9964 2374-0043 1557-9964
DOI	10.1109/TCBB.2008.36

Cover

Abstract	The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
AbstractList	The recent explosion in procurement and availability of high-dimensional gene and protein expression profile data sets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. While some investigators are focused on identifying informative genes and proteins that play a role in specific diseases, other researchers have attempted instead to use patients based on their expression profiles to prognosticate disease status. A major limitation in the ability to accurately classify these high-dimensional data sets stems from the "curse of dimensionality," occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, principal component analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. While some researchers have begun to explore nonlinear DR methods for computer vision problems such as face detection and recognition, to the best of our knowledge, few such attempts have been made for classification and visualization of high-dimensional biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene and protein expression studies. Toward this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, and Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, and Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable. Owing to the inherent nonlinear structure- - of gene and protein expression studies, our claim is that the nonlinear DR methods provide a more truthful low-dimensional representation of the data compared to the linear DR schemes. Evaluation of the DR schemes was done by 1) assessing the discriminability of two supervised classifiers (Support Vector Machine and C4.5 Decision Trees) in the different low- dimensional data embeddings and 2) five cluster validity measures to evaluate the size, distance, and tightness of object aggregates in the low-dimensional space. For each of the seven evaluation measures considered, statistically significant improvement in the quality of the embeddings across 10 cancer data sets via the use of three nonlinear DR schemes over three linear DR techniques was observed. Similar trends were observed when linear and nonlinear DR was applied to the high-dimensional data following feature pruning to isolate the most informative features. Qualitative evaluation of the low-dimensional data embedding obtained via the six DR methods further suggests that the nonlinear schemes are better able to identify potential novel classes (e.g., cancer subtypes) within the data. Evaluation of the DR schemes was done by 1 assessing the discriminability of two supervised classifiers (Support Vector Machine and C4.5 Decision Trees) in the different low- dimensional data embeddings and 2 five cluster validity measures to evaluate the size, distance, and tightness of object aggregates in the low-dimensional space. The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable. The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. While some investigators are focused on identifying informative genes and proteins that play a role in specific diseases, other researchers have attempted instead to use patients based on their expression profiles to prognosticate disease status. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the ‘curse of dimensionality’, occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. While some researchers have begun to explore nonlinear DR methods for computer vision problems such as face detection and recognition, to the best of our knowledge, few such attempts have been made for classification and visualization of high-dimensional biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable. Owing to the to the inherent nonlinear structure of gene- and protein-expression studies, our claim is that the nonlinear DR methods provide a more truthful low-dimensional representation of the data compared to the linear DR schemes. Evaluation of the DR schemes was done by (i) assessing the discriminability of two supervised classifiers (Support Vector Machine and C4.5 Decision Trees) in the different low-dimensional data embeddings and (ii) 5 cluster validity measures to evaluate the size, distance and tightness of object aggregates in the low-dimensional space. For each of the 7 evaluation measures considered, statistically significant improvement in the quality of the embeddings across 10 cancer datasets via the use of 3 nonlinear DR schemes over 3 linear DR techniques was observed. Similar trends were observed when linear and nonlinear DR was applied to the high-dimensional data following feature pruning to isolate the most informative features. Qualitative evaluation of the low-dimensional data embedding obtained via the 6 DR methods further suggests that the nonlinear schemes are better able to identify potential novel classes (e.g. cancer subtypes) within the data. The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze [abstract truncated by publisher]. The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
Author	Rodriguez, Carlos Madabhushi, Anant Lee, George
AuthorAffiliation	2 University of Puerto Rico, Mayagez, PR 00681-9000 1 Rutgers, The State University of New Jersey, Department of Biomedical Engineering, Piscataway, NJ 08854, USA
AuthorAffiliation_xml	– name: 1 Rutgers, The State University of New Jersey, Department of Biomedical Engineering, Piscataway, NJ 08854, USA – name: 2 University of Puerto Rico, Mayagez, PR 00681-9000
Author_xml	– sequence: 1 givenname: George surname: Lee fullname: Lee, George email: geolee@eden.rutgers.edu organization: Rutgers University, Piscataway – sequence: 2 givenname: Carlos surname: Rodriguez fullname: Rodriguez, Carlos email: carlos@evri.com organization: University of Puerto Rico, Mayagez – sequence: 3 givenname: Anant surname: Madabhushi fullname: Madabhushi, Anant email: anantm@rci.rutgers.edu organization: Rutgers University, Piscataway
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/18670041$$D View this record in MEDLINE/PubMed
BookMark	eNqFkkFv1DAQhS1URNuFGzckFHGAC1lsx3biCxJdllKpAgTlHDnOeNdV1l5ipxDx53HYFS1FwGnGmm-e7TdzjA6cd4DQQ4LnhGD54mJxcjKnGFfzQtxBR4TzMpdSsIMpZzznUhSH6DiES4wpk5jdQ4ekEiXGjByh72fuCkK0KxWtW2VxDdnSGKuVHjNvsnfeddaB6rPXdgMuWO9UZ-OYfYR20DEds096DRsImXXZolMhWDNOSqfgIFOuzT70PkIqLr9tewjhZ0scWgvhPrprVBfgwT7O0Oc3y4vF2_z8_enZ4tV5rjmvYk5ow4gSbckLkQKpGmNAG9NSYdoWm5QSAUxUHEvATdkqAcrwgjYKl7okxQzlO93BbdX4VXVdve3tRvVjTXA9mVhH3TT1ZGJdiMS_3PHbodlAq8HFXl33eGXr3yvOruuVv6opF1SkZ87Qs71A778Myd56Y4OGrlMO_BDqSkheMcrKRD79JylkkQbF_w9SzGQpKpnAJ7fASz_0aWrTtQWVBSOT2uObP7y2Y78XCXi-A3TvQ-jB_OHYtHY3HKO3cG2jmtYj-WO7vzU92jVZAPilz5ikpWDFD9iW5Vw
CODEN	ITCBCY
CitedBy_id	crossref_primary_10_1002_dta_304 crossref_primary_10_1016_j_tifs_2009_07_002 crossref_primary_10_1038_srep27306 crossref_primary_10_1093_nar_gkw545 crossref_primary_10_1109_RBME_2010_2083647 crossref_primary_10_1007_s11042_019_7181_8 crossref_primary_10_1109_ACCESS_2018_2876162 crossref_primary_10_1186_1471_2105_13_26 crossref_primary_10_1093_bioinformatics_bts108 crossref_primary_10_1016_j_eswa_2010_07_104 crossref_primary_10_1016_j_eswa_2010_03_002 crossref_primary_10_1109_TMI_2015_2456188 crossref_primary_10_1007_s10439_024_03459_3 crossref_primary_10_1002_nme_7427 crossref_primary_10_1186_s12880_016_0172_6 crossref_primary_10_1016_j_patcog_2013_07_011 crossref_primary_10_1118_1_3180955 crossref_primary_10_1007_s13258_019_00896_6 crossref_primary_10_1016_j_jbi_2016_03_002 crossref_primary_10_1002_nme_7072 crossref_primary_10_1007_s10278_010_9298_1 crossref_primary_10_1016_j_procs_2016_07_213 crossref_primary_10_1371_journal_pone_0118220 crossref_primary_10_1016_j_cmpb_2011_12_007 crossref_primary_10_1016_j_neuroimage_2015_10_026 crossref_primary_10_1109_TMI_2014_2355175 crossref_primary_10_1016_j_compmedimag_2014_07_002 crossref_primary_10_1016_j_compbiomed_2010_06_007 crossref_primary_10_1155_2014_769159 crossref_primary_10_4103_2153_3539_159441 crossref_primary_10_1002_mp_12208 crossref_primary_10_1146_annurev_bioeng_112415_114722 crossref_primary_10_1186_s12885_016_2198_0 crossref_primary_10_1016_j_chroma_2009_01_094 crossref_primary_10_1111_cas_12880 crossref_primary_10_1016_j_procs_2015_07_463 crossref_primary_10_1016_j_ymeth_2012_08_012 crossref_primary_10_1038_s41598_019_42392_7 crossref_primary_10_1016_j_compbiomed_2010_09_010 crossref_primary_10_1118_1_4790466 crossref_primary_10_1007_s10115_014_0813_4 crossref_primary_10_1016_j_eswa_2014_01_011 crossref_primary_10_1002_cpe_5497 crossref_primary_10_3390_bioengineering11040314 crossref_primary_10_1016_j_knosys_2015_09_005 crossref_primary_10_1016_j_compmedimag_2011_01_008 crossref_primary_10_1016_j_jmgm_2011_12_006 crossref_primary_10_1371_journal_pone_0159088 crossref_primary_10_1007_s11227_021_03962_7 crossref_primary_10_17759_sps_2024150208 crossref_primary_10_3390_a2031155 crossref_primary_10_1109_TBME_2009_2035305 crossref_primary_10_1155_2018_7341973
Cites_doi	10.1162/089976603321780317 10.1038/35000501 10.1023/A:1007608224229 10.1007/3-540-45014-9_1 10.1016/j.neunet.2006.05.014 10.1073/pnas.95.26.15623 10.1142/9781860947322_0021 10.1016/j.jspi.2007.06.019 10.1073/pnas.0506637102 10.1016/j.artmed.2005.01.006 10.1038/nm733 10.1016/S1535-6108(02)00032-6 10.1007/11566465_90 10.1186/1471-2407-7-55 10.1089/106652700750050943 10.1016/S1535-6108(02)00030-2 10.1093/bioinformatics/btm216 10.1109/TCBB.2004.45 10.1002/pmic.200600165 10.1109/ISBI.2007.357094 10.1109/34.868688 10.1016/j.ygeno.2004.09.007 10.1038/415436a 10.1073/pnas.96.6.2907 10.1515/9781400874668 10.1016/0890-5401(89)90010-2 10.1016/j.compbiomed.2005.04.001 10.1023/A:1022627411411 10.1016/j.bbadis.2007.05.005 10.1126/science.290.5500.2319 10.1016/j.artmed.2006.06.002 10.1007/11889762_3 10.1186/1471-2105-6-195 10.1073/pnas.96.12.6745 10.1155/JBB.2005.155 10.1158/0008-5472.CAN-04-0452 10.1186/1471-2105-8-90 10.1126/science.286.5439.531 10.1016/S0140-6736(02)07746-2 10.1093/bioinformatics/bth267 10.2202/1544-6115.1147 10.1016/S0014-5793(02)02873-9 10.1037/h0071325 10.1093/bioinformatics/bti517 10.1038/415530a 10.1038/nm0102-68 10.1073/pnas.97.1.262 10.1007/BF02345820 10.1093/bioinformatics/btg496 10.1126/science.290.5500.2323 10.1007/978-3-540-75759-7_34
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008
DBID	97E RIA RIE AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 RC3 7X8 5PM ADTOC UNPAY
DOI	10.1109/TCBB.2008.36
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Civil Engineering Abstracts Aluminium Industry Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Ceramic Abstracts Materials Business File METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Genetics Abstracts MEDLINE - Academic
DatabaseTitleList	Genetics Abstracts Materials Research Database MEDLINE Genetics Abstracts MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1557-9964
EndPage	384
ExternalDocumentID	oai:pubmedcentral.nih.gov:2562675 PMC2562675 2328962871 18670041 10_1109_TCBB_2008_36 4492764
Genre	orig-research Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural
GrantInformation_xml	– fundername: NCI NIH HHS grantid: R21 CA127186 – fundername: NCI NIH HHS grantid: R03 CA128081 – fundername: NCI NIH HHS grantid: R03CA128081-01 – fundername: NCI NIH HHS grantid: R21CA127186-01
GroupedDBID	0R~ 29I 4.4 53G 5GY 5VS 6IK 8US 97E AAJGR AAKMM AALFJ AARMG AASAJ AAWTH AAWTV ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACM ACPRK ADBCU ADL AEBYY AEFXT AEJOY AENEX AENSD AETIX AFRAH AFWIH AFWXC AGQYO AGSQL AHBIQ AIBXA AIKLT AKJIK AKQYR AKRVB ALMA_UNASSIGNED_HOLDINGS ASPBG ATWAV AVWKF BDXCO BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF CS3 DU5 EBS EJD FEDTE GUFHI HGAVV HZ~ I07 IEDLZ IFIPE IPLJI JAVBF LAI LHSKQ M43 O9- OCL P1C P2P PQQKQ RIA RIE RNI RNS ROL RZB TN5 XOL AAYXX CITATION CGR CUY CVF ECM EIF NPM RIG 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 RC3 7X8 5PM ADTOC UNPAY
ID	FETCH-LOGICAL-c558t-12b41a6d7536a6d18bffecffd26fdd0fcff16e468509e0b7da6eaf532ba07c713
IEDL.DBID	RIE
ISSN	1545-5963 1557-9964 2374-0043
IngestDate	Sun Oct 26 03:27:11 EDT 2025 Tue Sep 30 16:21:44 EDT 2025 Tue Oct 07 10:02:46 EDT 2025 Wed Oct 01 14:06:11 EDT 2025 Tue Oct 07 10:11:23 EDT 2025 Mon Jun 30 07:00:15 EDT 2025 Mon Jul 21 05:37:02 EDT 2025 Thu Apr 24 22:56:30 EDT 2025 Wed Oct 01 05:55:44 EDT 2025 Wed Aug 27 01:47:16 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	Bioinformatics (genome or protein) databases and association rules Feature extraction or construction Data and knowledge visualization Clustering classification Data mining
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c558t-12b41a6d7536a6d18bffecffd26fdd0fcff16e468509e0b7da6eaf532ba07c713
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 These datasets were downloaded from the Biomedical Kent-Ridge Repositories at http://sdmc.lit.org.sg/GEDatasets/Datasets, http://sdmc.i2r.a-star.edu.sg/rp and the Gene Expression Omnibus(GEO) Repository at http://www.ncbi.nlm.nih.gov/geo/.
OpenAccessLink	https://proxy.k.utb.cz/login?url=http://doi.org/10.1109/TCBB.2008.36
PMID	18670041
PQID	863293417
PQPubID	23462
PageCount	17
ParticipantIDs	unpaywall_primary_10_1109_tcbb_2008_36 crossref_primary_10_1109_TCBB_2008_36 crossref_citationtrail_10_1109_TCBB_2008_36 proquest_miscellaneous_69370057 pubmed_primary_18670041 pubmedcentral_primary_oai_pubmedcentral_nih_gov_2562675 proquest_miscellaneous_869584247 ieee_primary_4492764 proquest_miscellaneous_20497689 proquest_journals_863293417
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2008-07-01
PublicationDateYYYYMMDD	2008-07-01
PublicationDate_xml	– month: 07 year: 2008 text: 2008-07-01 day: 01
PublicationDecade	2000
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	IEEE/ACM transactions on computational biology and bioinformatics
PublicationTitleAbbrev	TCBB
PublicationTitleAlternate	IEEE/ACM Trans Comput Biol Bioinform
PublicationYear	2008
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 Gordon (ref54) 2002; 62 ref57 ref12 ref14 ref58 ref53 ref52 ref55 ref10 ref17 ref16 ref19 ref18 Quinlan (ref56); 1 ref51 ref50 ref46 ref45 ref47 ref42 ref41 ref44 ref43 Kovacs (ref59) Liu (ref15) 2002; 13 ref49 Doyle (ref48) ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref36 ref30 Wigle (ref27) 2002; 62 ref33 ref32 ref2 ref1 ref39 ref38 Tan (ref11) 2003; 2 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 Duda (ref31) 2000 ref29
References_xml	– ident: ref44 doi: 10.1162/089976603321780317 – ident: ref8 doi: 10.1038/35000501 – volume: 13 start-page: 51 year: 2002 ident: ref15 article-title: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns publication-title: Genome Informatics – ident: ref53 doi: 10.1023/A:1007608224229 – ident: ref51 doi: 10.1007/3-540-45014-9_1 – volume: 1 start-page: 725 volume-title: Proc. 13th Nat’l Conf. Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conf. (AAAI/IAAI ’96) ident: ref56 article-title: Bagging, Boosting, and C4.5 – ident: ref40 doi: 10.1016/j.neunet.2006.05.014 – ident: ref4 doi: 10.1073/pnas.95.26.15623 – ident: ref3 doi: 10.1142/9781860947322_0021 – ident: ref18 doi: 10.1016/j.jspi.2007.06.019 – ident: ref36 doi: 10.1073/pnas.0506637102 – ident: ref6 doi: 10.1016/j.artmed.2005.01.006 – ident: ref26 doi: 10.1038/nm733 – volume-title: Proc. 10th Int’l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI) ident: ref48 article-title: Using Manifold Learning for Content-Based Image Retrieval of Prostate Histopathology – ident: ref32 doi: 10.1016/S1535-6108(02)00032-6 – ident: ref45 doi: 10.1007/11566465_90 – ident: ref7 doi: 10.1186/1471-2407-7-55 – ident: ref9 doi: 10.1089/106652700750050943 – ident: ref17 doi: 10.1016/S1535-6108(02)00030-2 – volume: 62 start-page: 4963 year: 2002 ident: ref54 article-title: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma publication-title: Cancer Research – ident: ref12 doi: 10.1093/bioinformatics/btm216 – ident: ref29 doi: 10.1109/TCBB.2004.45 – volume-title: Pattern Classification year: 2000 ident: ref31 – ident: ref37 doi: 10.1002/pmic.200600165 – ident: ref47 doi: 10.1109/ISBI.2007.357094 – ident: ref41 doi: 10.1109/34.868688 – ident: ref13 doi: 10.1016/j.ygeno.2004.09.007 – ident: ref23 doi: 10.1038/415436a – ident: ref20 doi: 10.1073/pnas.96.6.2907 – ident: ref28 doi: 10.1515/9781400874668 – ident: ref57 doi: 10.1016/0890-5401(89)90010-2 – ident: ref2 doi: 10.1016/j.compbiomed.2005.04.001 – ident: ref55 doi: 10.1023/A:1022627411411 – ident: ref21 doi: 10.1016/j.bbadis.2007.05.005 – ident: ref42 doi: 10.1126/science.290.5500.2319 – ident: ref5 doi: 10.1016/j.artmed.2006.06.002 – ident: ref52 doi: 10.1007/11889762_3 – ident: ref34 doi: 10.1186/1471-2105-6-195 – ident: ref16 doi: 10.1073/pnas.96.12.6745 – ident: ref30 doi: 10.1155/JBB.2005.155 – ident: ref24 doi: 10.1158/0008-5472.CAN-04-0452 – ident: ref35 doi: 10.1186/1471-2105-8-90 – ident: ref1 doi: 10.1126/science.286.5439.531 – ident: ref19 doi: 10.1016/S0140-6736(02)07746-2 – ident: ref14 doi: 10.1093/bioinformatics/bth267 – volume: 2 start-page: S75-S83 year: 2003 ident: ref11 article-title: Ensemble Machine Learning on Gene Expression Data for Cancer Classification publication-title: Applied Bioinformatics – ident: ref33 doi: 10.2202/1544-6115.1147 – ident: ref38 doi: 10.1016/S0014-5793(02)02873-9 – ident: ref39 doi: 10.1037/h0071325 – ident: ref58 doi: 10.1093/bioinformatics/bti517 – ident: ref22 doi: 10.1038/415530a – ident: ref25 doi: 10.1038/nm0102-68 – ident: ref10 doi: 10.1073/pnas.97.1.262 – volume-title: Proc. Sixth Int’l Symp. Hungarian Researchers on Computational Intelligence (CINTI) ident: ref59 article-title: Cluster Validity Measurement Techniques – ident: ref49 doi: 10.1007/BF02345820 – volume: 62 start-page: 3005 year: 2002 ident: ref27 article-title: Molecular Profiling of Non-Small Cell Lung Cancer and Correlation with Disease-Free Survival publication-title: Cancer Research – ident: ref50 doi: 10.1093/bioinformatics/btg496 – ident: ref43 doi: 10.1126/science.290.5500.2323 – ident: ref46 doi: 10.1007/978-3-540-75759-7_34
SSID	ssj0024904
Score	2.1517625
Snippet	The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated... Evaluation of the DR schemes was done by 1 assessing the discriminability of two supervised classifiers (Support Vector Machine and C4.5 Decision Trees) in the... The recent explosion in procurement and availability of high-dimensional gene and protein expression profile data sets for cancer diagnostics has necessitated...
SourceID	unpaywall pubmedcentral proquest pubmed crossref ieee
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	368
SubjectTerms	Algorithms and association rules Availability Bioinformatics Bioinformatics (genome or protein) databases Biomedical measurements Cancer classification Clustering Data and knowledge visualization Data Interpretation, Statistical Data mining Decision trees Discriminant analysis Diseases Explosions Feature extraction or construction Gene Expression Profiling - methods Machine learning Nonlinear Dynamics Pattern Recognition, Automated - methods Principal component analysis Procurement Protein engineering Reproducibility of Results Sensitivity and Specificity Software Studies
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEB6VVIheeBWKKY89ABe0re192cc-UlVIRBU0UjlFu_YaIoJTNY4g8OeZ9domUQjiFCs7tuT1rP2N9pvvA3iVRMxYkwuaJNJQzgtGDUsZzXRmY8MEU9Y1Cr8fyPMhf3clrragNbxb3r6PwvTw8uT42BMembwF21Ig4O7B9nBwcfSpVkLlguJ_NY1eCEURvfPaTk5x6ja5Oqp7elhlxrSXWvkI1a4qfwOY6zzJO_PyWi--68lk6SN0dg9O21Yezz35ejCvzEH2c13Z8V_3dx_uNiCUHPmseQBbtnwIt70t5WIXfi2Jb5SfCUJE0ndKEzpbkGlBBl5dQ9-QU2cN4GU9EMyTD04G1j1o8hFT4ZudkXFJatfNcd1NRZzGNdFlTi6cOgQO9n80PFw8xRMaH8HwrH95ck4bkwaaCZFUNIoNj7TMseyR-BMlxvFQiiKPZZHnYYGHkbRcJohMbGhUrqXVhWCx0aHKsER-DL1yWtonQGQei1xhzWa15SIyaaozfPtimilEGZYF8LZ9aqOsUTB3RhqTUV3JhOnITac31mQygNdd9LVX7tgQt-sSoIvhPI2V5AHstwkxapb1bJRIhvCIRyqAl90orke3yaJLO53P8KocEV6Sbo6QiAhdD3AAZENEIlPEhTHHkD2fgX_uIHF9VTwKQK3kZhfg5MJXR8rxl1o2HMFtjOVhAG-6LF6bGLdI2ol5-r-B-7DjWTSOxPwMetXN3D5HqFaZF81S_Q1ZmTnc priority: 102 providerName: Unpaywall
Title	Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
URI	https://ieeexplore.ieee.org/document/4492764 https://www.ncbi.nlm.nih.gov/pubmed/18670041 https://www.proquest.com/docview/863293417 https://www.proquest.com/docview/20497689 https://www.proquest.com/docview/69370057 https://www.proquest.com/docview/869584247 https://pubmed.ncbi.nlm.nih.gov/PMC2562675 http://doi.org/10.1109/TCBB.2008.36
UnpaywallVersion	submittedVersion
Volume	5
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9964 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0024904 issn: 1545-5963 databaseCode: RIE dateStart: 20040101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEB61RYheeJWHKZQ9ABdwGnvX6_WxLakqpEYVNFI5WfsyRA1O1diCwJ9n1uu4DW0kTra0Y0u7M2t_Y3_zDcAbEVFllUlCIbgKGStoqGhGQy21jRVNaGpdofDxkB-N2Kez5GwNPnS1MNbahnxme-60-Zdvprp2n8p2GcvilLN1WE8F97VaV7p6WdMq0CGCMMGo6kju2e7pwf6-Z03SpmORcKUpLFp6EzWtVW5DmTfJkvfq8kLOf8rJ5Nqb6PABHC_m4Ako5726Uj39-x95x_-d5EO430JSsudj6BGs2fIx3PVNKudb8OeaFEf5jSBgJAOnOyH1nEwLMvRaG_KSfHSNArzIB0J78tmJwjq3ky8YGD_sjIxL0vTgHDe1VcQpXhNZGnLitCJwcPCrZeXiJZ7e-ARGh4PTg6OwbdkQ6iQRVRjFikWSG0yCOB4ioRwrpShMzAtj-gWeRtwyLhCn2L5KjeRWFgmNleynGhPmp7BRTkv7HAg3cWJSzOCstCyJVJZJjc9iDLoUMYelAbxfuC_XrZ65a6sxyZu8pp_lzu--zSblAbztrC-8jscKuy3nks6m9UYA24vIyNtNPssFpwiWWJQG8Lobxd3pfrnI0k7rGd6VId4T2WoLjvjQVQQHQFZYCJ4hSowZmjzzoXg1gzaUA0iXgrQzcOLhyyPl-HsjIo5QN8ZkMYB3XTjfWJhKK7VYmBe3L8w2bHoGjSMwv4SN6rK2rxCmVWqn2Z87cGc0PNn7-hdP-D-u
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fb9MwED6NITRe-DUGYcD8ALxAuib-EeeRjU4F1gpBJ-0tshMHKko6rY2g8M9zttNsZavEUyL5Esm-c_Jd8t13AC9kRLXRBQ-lFDpkrKShpikNc5WbWFNOE2MLhQdD0T9hH0756Qa8aWthjDGOfGY69tT9yy-meW0_le0zlsaJYDfgJmeMcV-tdaGsl7pmgRYThBzjqqW5p_ujw4MDz5ukrmeRtMUpLFp5F7nmKtfhzKt0ya26OlOLn2oyufQuOroLg-UsPAXle6ee607--x-Bx_-d5j2404BS8tZH0X3YMNUDuOXbVC624c8lMY7qK0HISHpWeULlCzItydCrbahz8s62CvAyHwjuyWcrC2sdT75gaPwwMzKuiOvCOXbVVcRqXhNVFeSTVYvAwd6vhpeLl3iC40M4OeqNDvth07QhzDmX8zCKNYuUKDANEniIpLa8lLIsYlEWRbfE00gYJiQiFdPVSaGEUSWnsVbdJMeUeQc2q2llHgMRRcyLBHM4owzjkU5TlePTGMMuQdRhaACvl-7L8kbR3DbWmGQus-mmmfW7b7RJRQAvW-szr-Sxxm7buqS1abwRwO4yMrJmm88yKSjCJRYlAey1o7g_7U8XVZlpPcO7MkR8Ml1vIRAh2prgAMgaCylSxIkxQ5NHPhQvZtCEcgDJSpC2BlY-fHWkGn9zMuIIdmNMFwN41YbzlYWZ51ovF-bJ9QuzB1v90eA4O34__LgLtz2fxtKZn8Lm_Lw2zxC0zfVzt1f_AguEQUs
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEB6VVIheeBWKKY89ABe0re192cc-UlVIRBU0UjlFu_YaIoJTNY4g8OeZ9domUQjiFCs7tuT1rP2N9pvvA3iVRMxYkwuaJNJQzgtGDUsZzXRmY8MEU9Y1Cr8fyPMhf3clrragNbxb3r6PwvTw8uT42BMembwF21Ig4O7B9nBwcfSpVkLlguJ_NY1eCEURvfPaTk5x6ja5Oqp7elhlxrSXWvkI1a4qfwOY6zzJO_PyWi--68lk6SN0dg9O21Yezz35ejCvzEH2c13Z8V_3dx_uNiCUHPmseQBbtnwIt70t5WIXfi2Jb5SfCUJE0ndKEzpbkGlBBl5dQ9-QU2cN4GU9EMyTD04G1j1o8hFT4ZudkXFJatfNcd1NRZzGNdFlTi6cOgQO9n80PFw8xRMaH8HwrH95ck4bkwaaCZFUNIoNj7TMseyR-BMlxvFQiiKPZZHnYYGHkbRcJohMbGhUrqXVhWCx0aHKsER-DL1yWtonQGQei1xhzWa15SIyaaozfPtimilEGZYF8LZ9aqOsUTB3RhqTUV3JhOnITac31mQygNdd9LVX7tgQt-sSoIvhPI2V5AHstwkxapb1bJRIhvCIRyqAl90orke3yaJLO53P8KocEV6Sbo6QiAhdD3AAZENEIlPEhTHHkD2fgX_uIHF9VTwKQK3kZhfg5MJXR8rxl1o2HMFtjOVhAG-6LF6bGLdI2ol5-r-B-7DjWTSOxPwMetXN3D5HqFaZF81S_Q1ZmTnc
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Investigating+the+Efficacy+of+Nonlinear+Dimensionality+Reduction+Schemes+in+Classifying+Gene+and+Protein+Expression+Studies&rft.jtitle=IEEE%2FACM+transactions+on+computational+biology+and+bioinformatics&rft.au=Lee%2C+G&rft.au=Rodriguez%2C+C&rft.au=Madabhushi%2C+A&rft.date=2008-07-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1545-5963&rft.eissn=1557-9964&rft.volume=5&rft.issue=3&rft.spage=368&rft_id=info:doi/10.1109%2FTCBB.2008.36&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2328962871
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1545-5963&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1545-5963&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1545-5963&client=summon