MMCo-Clus - An Evolutionary Co-clustering Algorithm for Gene Selection
In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated to...
Saved in:
| Published in | IEEE transactions on knowledge and data engineering Vol. 34; no. 9; pp. 4371 - 4384 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1041-4347 1558-2191 |
| DOI | 10.1109/TKDE.2020.3035695 |
Cover
| Abstract | In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated tool than conventional one-way clustering. Moreover, the advent of multi-view learning shows that the subjects of a data set can be interpreted in many ways. Interestingly, a minimal number of existing feature selection algorithms take advantage of the co-clustering method and are designed to consider multi-view data. Motivated by this, in the current article, we propose a feature (gene) selection method for high dimensional gene expression (GE) data through a m ulti-objective optimization based m ulti-view Co -Clus tering algorithm (named MMCo- Clus ). A popular evolutionary technique - Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been utilized as the proposed method's underlying optimization strategy. First, we construct two views of a chosen data set, utilizing knowledge from two different biological data sources. Next, we develop the MMCo- Clus algorithm considering the constructed views to identify a set of "good" co-clustering solutions. Finally, based on a concept of consensus operation on the co-clustering outcome, a small number of most relevant and non-redundant features are extracted from the original feature-space. The reduced dimension formed by new feature-space causes to decrease the computational burden and noise level of original data. For experimental analysis, we have chosen three benchmark GE data sets. Our feature selection method's effectiveness is evaluated through sample-classification accuracy, accompanied by the cluster profile plot/Eisen plot/t-SNE plot, and biological/statistical significance test. A thorough comparative analysis with existing feature selection algorithms using external and internal evaluation metrics supports our proposed method's potency. |
|---|---|
| AbstractList | In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated tool than conventional one-way clustering. Moreover, the advent of multi-view learning shows that the subjects of a data set can be interpreted in many ways. Interestingly, a minimal number of existing feature selection algorithms take advantage of the co-clustering method and are designed to consider multi-view data. Motivated by this, in the current article, we propose a feature (gene) selection method for high dimensional gene expression (GE) data through a m ulti-objective optimization based m ulti-view Co -Clus tering algorithm (named MMCo- Clus ). A popular evolutionary technique - Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been utilized as the proposed method's underlying optimization strategy. First, we construct two views of a chosen data set, utilizing knowledge from two different biological data sources. Next, we develop the MMCo- Clus algorithm considering the constructed views to identify a set of "good" co-clustering solutions. Finally, based on a concept of consensus operation on the co-clustering outcome, a small number of most relevant and non-redundant features are extracted from the original feature-space. The reduced dimension formed by new feature-space causes to decrease the computational burden and noise level of original data. For experimental analysis, we have chosen three benchmark GE data sets. Our feature selection method's effectiveness is evaluated through sample-classification accuracy, accompanied by the cluster profile plot/Eisen plot/t-SNE plot, and biological/statistical significance test. A thorough comparative analysis with existing feature selection algorithms using external and internal evaluation metrics supports our proposed method's potency. |
| Author | Cui, Laizhong Acharya, Sudipta Pan, Yi Mishra, Sumit Huang, Joshua Zhexue |
| Author_xml | – sequence: 1 givenname: Laizhong orcidid: 0000-0003-1991-290X surname: Cui fullname: Cui, Laizhong email: cuilz@szu.edu.cn organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China – sequence: 2 givenname: Sudipta orcidid: 0000-0002-3014-0907 surname: Acharya fullname: Acharya, Sudipta email: sudiptaszu@outlook.com organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China – sequence: 3 givenname: Sumit surname: Mishra fullname: Mishra, Sumit email: sumit@iiitg.ac.in organization: Department of Computer Science and Engineering, IIIT Guwahati, Guwahati, Assam, India – sequence: 4 givenname: Yi orcidid: 0000-0002-2766-3096 surname: Pan fullname: Pan, Yi email: yipan@gsu.edu organization: Department of Computer Science, Georgia State University, Atlanta, GA, USA – sequence: 5 givenname: Joshua Zhexue orcidid: 0000-0002-6797-2571 surname: Huang fullname: Huang, Joshua Zhexue email: zx.huang@szu.edu.cn organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China |
| BookMark | eNp9kD1PwzAQhi0EEm3hByAWS8wp_ohje6xCWxCtGOhupY5dXKVxsRMk_j2OWjEwMN3p7n3v4xmDy9a3BoA7jKYYI_m4eX2aTwkiaEoRZYVkF2CEGRMZwRJfphzlOMtpzq_BOMY9QkhwgUdgsV6XPiubPsIMzlo4__JN3znfVuEbpo5Onc4E1-7grNn54LqPA7Q-wKVpDXw3jdGD-gZc2aqJ5vYcJ2CzmG_K52z1tnwpZ6tME0m7jAnGKUG6stzWVU0JsYhxSbZSSyGEqQqrNaslZ7pmtsZbVvOtxbkmOhUonYCH09hj8J-9iZ3a-z60aaMiRZqAC8JFUvGTSgcfYzBWaddVw5ldqFyjMFIDMzUwUwMzdWaWnPiP8xjcIaH413N_8jhjzK9eEpY-4_QHab54tA |
| CODEN | ITKEEH |
| CitedBy_id | crossref_primary_10_1016_j_artmed_2021_102228 crossref_primary_10_1109_TKDE_2022_3198800 crossref_primary_10_1145_3681793 crossref_primary_10_1109_THMS_2024_3483848 crossref_primary_10_3389_fcomp_2024_1441879 crossref_primary_10_1016_j_asoc_2024_112332 crossref_primary_10_3390_app122211795 crossref_primary_10_1109_TCYB_2024_3451292 |
| Cites_doi | 10.1007/978-3-540-73731-5_2 10.1609/aaai.v31i1.10905 10.1007/s12038-015-9559-8 10.1016/j.patcog.2016.09.013 10.1109/ICDM.2016.0160 10.1371/journal.pone.0090949 10.1145/564691.564737 10.1080/01969727408546059 10.1093/bioinformatics/18.11.1454 10.26599/BDMA.2018.9020003 10.1109/TCBB.2019.2897679 10.1109/3477.678624 10.1016/j.inffus.2018.11.019 10.1109/TCBB.2020.3005972 10.1093/database/bav117 10.1073/pnas.95.25.14863 10.1109/TEVC.2013.2281534 10.1007/978-3-319-70093-9_38 10.1109/TPAMI.1979.4766909 10.1109/TKDE.2018.2875712 10.1109/JBHI.2017.2784898 10.1109/TKDE.2019.2891622 10.1109/JBHI.2015.2404971 10.1109/4235.996017 10.1109/TKDE.2018.2872061 10.1109/TKDE.2018.2846252 10.1039/C4MB00101J 10.1142/S0219720005001004 10.1186/1471-2105-15-102 10.1186/s12859-017-1933-0 10.1007/s00500-018-3227-5 10.1109/BIBM.2013.6732509 10.1093/bioinformatics/btp053 10.1109/TSMCC.2012.2209416 10.1186/s12920-018-0447-6 10.1007/978-981-10-7814-9_12 10.1109/ICDM.2016.0039 10.1186/1752-0509-6-15 10.1109/CHASE.2016.27 10.1093/bib/bbw113 10.1109/TKDE.2018.2874881 10.1109/TNB.2013.2279131 10.1109/TCBB.2020.2973563 10.1109/TKDE.2019.2903810 10.1016/j.eswa.2005.12.011 10.1109/TEVC.2013.2281535 10.1016/j.patcog.2015.12.007 10.1371/journal.pone.0008250 10.1016/0377-0427(87)90125-7 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TKDE.2020.3035695 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Biology Computer Science |
| EISSN | 1558-2191 |
| EndPage | 4384 |
| ExternalDocumentID | 10_1109_TKDE_2020_3035695 9250577 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Shenzhen University funderid: 10.13039/501100009019 – fundername: National Key Research and Development Plan of China grantid: 2018YFB1800302 – fundername: Science and Technology Plan of Shenzhen grantid: JCYJ20190808142207420 – fundername: National Natural Science Foundation of China grantid: 61772345 funderid: 10.13039/501100001809 |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c293t-5857320caf7fdad322f05792b9c9888ea6fcc5d975cd5fd1b5d7bf14c2ccd533 |
| IEDL.DBID | RIE |
| ISSN | 1041-4347 |
| IngestDate | Mon Jun 30 06:53:33 EDT 2025 Thu Apr 24 22:55:44 EDT 2025 Wed Oct 01 02:06:25 EDT 2025 Wed Aug 27 02:23:34 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 9 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-5857320caf7fdad322f05792b9c9888ea6fcc5d975cd5fd1b5d7bf14c2ccd533 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-1991-290X 0000-0002-2766-3096 0000-0002-3014-0907 0000-0002-6797-2571 |
| PQID | 2698816278 |
| PQPubID | 85438 |
| PageCount | 14 |
| ParticipantIDs | crossref_citationtrail_10_1109_TKDE_2020_3035695 crossref_primary_10_1109_TKDE_2020_3035695 ieee_primary_9250577 proquest_journals_2698816278 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2022-09-01 |
| PublicationDateYYYYMMDD | 2022-09-01 |
| PublicationDate_xml | – month: 09 year: 2022 text: 2022-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on knowledge and data engineering |
| PublicationTitleAbbrev | TKDE |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref53 ref52 ref11 ref54 ref17 ref16 ref19 Cheng (ref10) ref51 ref50 ref46 ref45 ref48 ref47 ref41 ref44 ref43 Sun (ref42) ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Naseri (ref35) 2019; 17 ref40 ref34 ref37 ref36 ref30 ref33 ref32 ref2 ref1 ref39 ref38 Mankiewicz (ref31) 2000 Gunavathi (ref18) 2014; 8 ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 |
| References_xml | – ident: ref6 doi: 10.1007/978-3-540-73731-5_2 – volume-title: The Story of Mathematics year: 2000 ident: ref31 – ident: ref37 doi: 10.1609/aaai.v31i1.10905 – ident: ref43 doi: 10.1007/s12038-015-9559-8 – ident: ref22 doi: 10.1016/j.patcog.2016.09.013 – ident: ref40 doi: 10.1109/ICDM.2016.0160 – ident: ref29 doi: 10.1371/journal.pone.0090949 – ident: ref45 doi: 10.1145/564691.564737 – ident: ref15 doi: 10.1080/01969727408546059 – ident: ref44 doi: 10.1093/bioinformatics/18.11.1454 – ident: ref49 doi: 10.26599/BDMA.2018.9020003 – ident: ref50 doi: 10.1109/TCBB.2019.2897679 – ident: ref7 doi: 10.1109/3477.678624 – ident: ref52 doi: 10.1016/j.inffus.2018.11.019 – ident: ref53 doi: 10.1109/TCBB.2020.3005972 – ident: ref28 doi: 10.1093/database/bav117 – ident: ref16 doi: 10.1073/pnas.95.25.14863 – ident: ref21 doi: 10.1109/TEVC.2013.2281534 – volume: 17 start-page: 1 issue: 2 year: 2019 ident: ref35 article-title: An unsupervised gene selection method based on multiobjective ant colony optimization publication-title: Int. J. Artif. Intell. – start-page: 757 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref42 article-title: Multi-view sparse co-clustering via proximal alternating linearized minimization – ident: ref9 doi: 10.1007/978-3-319-70093-9_38 – ident: ref11 doi: 10.1109/TPAMI.1979.4766909 – ident: ref26 doi: 10.1109/TKDE.2018.2875712 – ident: ref39 doi: 10.1109/JBHI.2017.2784898 – volume: 8 start-page: 1490 issue: 8 year: 2014 ident: ref18 article-title: Performance analysis of genetic algorithm with KNN and SVM for feature selection in tumor classification publication-title: Int. J. Comput. Elect. Autom. Control Inf. Eng. – ident: ref20 doi: 10.1109/TKDE.2019.2891622 – ident: ref4 doi: 10.1109/JBHI.2015.2404971 – ident: ref13 doi: 10.1109/4235.996017 – ident: ref51 doi: 10.1109/TKDE.2018.2872061 – ident: ref8 doi: 10.1109/TKDE.2018.2846252 – ident: ref36 doi: 10.1039/C4MB00101J – ident: ref14 doi: 10.1142/S0219720005001004 – ident: ref19 doi: 10.1186/1471-2105-15-102 – start-page: 93 volume-title: Proc. Int. Conf. Intell. Syst. Mol. Biol. ident: ref10 article-title: Biclustering of expression data – ident: ref2 doi: 10.1186/s12859-017-1933-0 – ident: ref3 doi: 10.1007/s00500-018-3227-5 – ident: ref41 doi: 10.1109/BIBM.2013.6732509 – ident: ref32 doi: 10.1093/bioinformatics/btp053 – ident: ref33 doi: 10.1109/TSMCC.2012.2209416 – ident: ref5 doi: 10.1186/s12920-018-0447-6 – ident: ref30 doi: 10.1007/978-981-10-7814-9_12 – ident: ref25 doi: 10.1109/ICDM.2016.0039 – ident: ref23 doi: 10.1186/1752-0509-6-15 – ident: ref17 doi: 10.1109/CHASE.2016.27 – ident: ref24 doi: 10.1093/bib/bbw113 – ident: ref47 doi: 10.1109/TKDE.2018.2874881 – ident: ref34 doi: 10.1109/TNB.2013.2279131 – ident: ref1 doi: 10.1109/TCBB.2020.2973563 – ident: ref46 doi: 10.1109/TKDE.2019.2903810 – ident: ref54 doi: 10.1016/j.eswa.2005.12.011 – ident: ref12 doi: 10.1109/TEVC.2013.2281535 – ident: ref48 doi: 10.1016/j.patcog.2015.12.007 – ident: ref27 doi: 10.1371/journal.pone.0008250 – ident: ref38 doi: 10.1016/0377-0427(87)90125-7 |
| SSID | ssj0008781 |
| Score | 2.4570565 |
| Snippet | In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 4371 |
| SubjectTerms | Algorithms Big Data Biology Cluster analysis Clustering Clustering algorithms co-clustering Data analysis Datasets Dimensional analysis Evaluation Evolutionary algorithms Feature extraction Feature selection Gene expression Genetic algorithms Germanium multi-objective optimization multi-view learning Noise levels Optimization sample classification Sorting algorithms |
| Title | MMCo-Clus - An Evolutionary Co-clustering Algorithm for Gene Selection |
| URI | https://ieeexplore.ieee.org/document/9250577 https://www.proquest.com/docview/2698816278 |
| Volume | 34 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2191 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008781 issn: 1041-4347 databaseCode: RIE dateStart: 19890101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT9swFH4CJKRxgK0MUWDIB07TUhI7tpNjVVohpnKhk7hF8a9tWteikiLBX8-z41ZoTIhblDxLjj4_P3-23_sAzmqTqsyhp_lCI0nOtEkURpFEOsu0TRWvw9bF-Fpc_sivbvntBnxb58JYa8PlM9vzj-Es38z10m-VnZc-Xku5CZuyEG2u1nrWLWQQJEV2gZyI5TKeYGZpeT75fjFEJkiRoKaMCy8l8SIGBVGVVzNxCC-jPRivOtbeKvnTWzaqp5_-qdn43p5_hN24ziT9dmB8gg0768B2qzz52IG9lZoDic7dgZ0XpQn3YTQeD-bJYLq8Jwnpz8jwIY7RevFI8IvGL02wJf3pz_nid_PrL8EFMPF1rMlNUNdB688wGQ0ng8skai4kGgN_kyB7kIymunbSmdqguzufrkpVqUsky7YWTmtuSsm14c5kihupXJZrqvEFYwewNZvP7CEQx3OqRcGcVTw3CA8rLJVWFIKXinPVhXQFQqVjPXIvizGtAi9Jy8rjVnncqohbF76um9y1xTjeMt73OKwNIwRdOFkhXUV3va-owJ_LBJXF0f9bHcMH6vMewuWyE9hqFkv7BVcjjToNw_AZgYjZqg |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Pb9MwFH4aQwh2YNAxURjgAydEusQ_4uRYda0KW3ahSLtF8a-B6FrUpUjbX8-z41bTQIhblDxLjj4_P3-23_sA3jcmVZlDT_OFRhLOtEkURpFEOsu0TZVowtZFdZ5Pv_LPF-JiBz5uc2GsteHymR34x3CWb5Z67bfKjksfr6V8AA8F51x02VrbebeQQZIU-QWyIsZlPMPM0vJ4dnoyRi5IkaKmTOReTOJOFAqyKn_MxSHATPah2nStu1fyY7Bu1UDf3qva-L99fwZP40qTDLuh8Rx27KIHjzrtyZse7G_0HEh07x7s3SlOeACTqhotk9F8fU0SMlyQ8a84SpvVDcEvGr-0wZYM55fL1ff22xXBJTDxlazJl6Cvg9YvYDYZz0bTJKouJBpDf5sgf5CMprpx0pnGoMM7n7BKValLpMu2yZ3WwpRSaCOcyZQwUrmMa6rxBWOHsLtYLuxLIE5wqvOCOasENwgPKyyVNi9yUSohVB_SDQi1jhXJvTDGvA7MJC1rj1vtcasjbn34sG3ysyvH8S_jA4_D1jBC0IejDdJ1dNjrmub4c1lOZfHq763ewePprDqrzz6dn76GJ9RnQYSrZkew267W9g2uTVr1NgzJ3xI63Pc |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MMCo-+Clus+%E2%80%93+An+Evolutionary+Co-clustering+Algorithm+for+Gene+Selection&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Cui%2C+Laizhong&rft.au=Acharya%2C+Sudipta&rft.au=Mishra%2C+Sumit&rft.au=Pan%2C+Yi&rft.date=2022-09-01&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=34&rft.issue=9&rft.spage=4371&rft.epage=4384&rft_id=info:doi/10.1109%2FTKDE.2020.3035695&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2020_3035695 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |