MMCo-Clus - An Evolutionary Co-clustering Algorithm for Gene Selection

In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated to...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 34; no. 9; pp. 4371 - 4384
Main Authors Cui, Laizhong, Acharya, Sudipta, Mishra, Sumit, Pan, Yi, Huang, Joshua Zhexue
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1041-4347
1558-2191
DOI10.1109/TKDE.2020.3035695

Cover

Abstract In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated tool than conventional one-way clustering. Moreover, the advent of multi-view learning shows that the subjects of a data set can be interpreted in many ways. Interestingly, a minimal number of existing feature selection algorithms take advantage of the co-clustering method and are designed to consider multi-view data. Motivated by this, in the current article, we propose a feature (gene) selection method for high dimensional gene expression (GE) data through a m ulti-objective optimization based m ulti-view Co -Clus tering algorithm (named MMCo- Clus ). A popular evolutionary technique - Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been utilized as the proposed method's underlying optimization strategy. First, we construct two views of a chosen data set, utilizing knowledge from two different biological data sources. Next, we develop the MMCo- Clus algorithm considering the constructed views to identify a set of "good" co-clustering solutions. Finally, based on a concept of consensus operation on the co-clustering outcome, a small number of most relevant and non-redundant features are extracted from the original feature-space. The reduced dimension formed by new feature-space causes to decrease the computational burden and noise level of original data. For experimental analysis, we have chosen three benchmark GE data sets. Our feature selection method's effectiveness is evaluated through sample-classification accuracy, accompanied by the cluster profile plot/Eisen plot/t-SNE plot, and biological/statistical significance test. A thorough comparative analysis with existing feature selection algorithms using external and internal evaluation metrics supports our proposed method's potency.
AbstractList In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the dimensionality reduction through feature selection becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated tool than conventional one-way clustering. Moreover, the advent of multi-view learning shows that the subjects of a data set can be interpreted in many ways. Interestingly, a minimal number of existing feature selection algorithms take advantage of the co-clustering method and are designed to consider multi-view data. Motivated by this, in the current article, we propose a feature (gene) selection method for high dimensional gene expression (GE) data through a m ulti-objective optimization based m ulti-view Co -Clus tering algorithm (named MMCo- Clus ). A popular evolutionary technique - Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been utilized as the proposed method's underlying optimization strategy. First, we construct two views of a chosen data set, utilizing knowledge from two different biological data sources. Next, we develop the MMCo- Clus algorithm considering the constructed views to identify a set of "good" co-clustering solutions. Finally, based on a concept of consensus operation on the co-clustering outcome, a small number of most relevant and non-redundant features are extracted from the original feature-space. The reduced dimension formed by new feature-space causes to decrease the computational burden and noise level of original data. For experimental analysis, we have chosen three benchmark GE data sets. Our feature selection method's effectiveness is evaluated through sample-classification accuracy, accompanied by the cluster profile plot/Eisen plot/t-SNE plot, and biological/statistical significance test. A thorough comparative analysis with existing feature selection algorithms using external and internal evaluation metrics supports our proposed method's potency.
Author Cui, Laizhong
Acharya, Sudipta
Pan, Yi
Mishra, Sumit
Huang, Joshua Zhexue
Author_xml – sequence: 1
  givenname: Laizhong
  orcidid: 0000-0003-1991-290X
  surname: Cui
  fullname: Cui, Laizhong
  email: cuilz@szu.edu.cn
  organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China
– sequence: 2
  givenname: Sudipta
  orcidid: 0000-0002-3014-0907
  surname: Acharya
  fullname: Acharya, Sudipta
  email: sudiptaszu@outlook.com
  organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China
– sequence: 3
  givenname: Sumit
  surname: Mishra
  fullname: Mishra, Sumit
  email: sumit@iiitg.ac.in
  organization: Department of Computer Science and Engineering, IIIT Guwahati, Guwahati, Assam, India
– sequence: 4
  givenname: Yi
  orcidid: 0000-0002-2766-3096
  surname: Pan
  fullname: Pan, Yi
  email: yipan@gsu.edu
  organization: Department of Computer Science, Georgia State University, Atlanta, GA, USA
– sequence: 5
  givenname: Joshua Zhexue
  orcidid: 0000-0002-6797-2571
  surname: Huang
  fullname: Huang, Joshua Zhexue
  email: zx.huang@szu.edu.cn
  organization: College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, P.R. China
BookMark eNp9kD1PwzAQhi0EEm3hByAWS8wp_ohje6xCWxCtGOhupY5dXKVxsRMk_j2OWjEwMN3p7n3v4xmDy9a3BoA7jKYYI_m4eX2aTwkiaEoRZYVkF2CEGRMZwRJfphzlOMtpzq_BOMY9QkhwgUdgsV6XPiubPsIMzlo4__JN3znfVuEbpo5Onc4E1-7grNn54LqPA7Q-wKVpDXw3jdGD-gZc2aqJ5vYcJ2CzmG_K52z1tnwpZ6tME0m7jAnGKUG6stzWVU0JsYhxSbZSSyGEqQqrNaslZ7pmtsZbVvOtxbkmOhUonYCH09hj8J-9iZ3a-z60aaMiRZqAC8JFUvGTSgcfYzBWaddVw5ldqFyjMFIDMzUwUwMzdWaWnPiP8xjcIaH413N_8jhjzK9eEpY-4_QHab54tA
CODEN ITKEEH
CitedBy_id crossref_primary_10_1016_j_artmed_2021_102228
crossref_primary_10_1109_TKDE_2022_3198800
crossref_primary_10_1145_3681793
crossref_primary_10_1109_THMS_2024_3483848
crossref_primary_10_3389_fcomp_2024_1441879
crossref_primary_10_1016_j_asoc_2024_112332
crossref_primary_10_3390_app122211795
crossref_primary_10_1109_TCYB_2024_3451292
Cites_doi 10.1007/978-3-540-73731-5_2
10.1609/aaai.v31i1.10905
10.1007/s12038-015-9559-8
10.1016/j.patcog.2016.09.013
10.1109/ICDM.2016.0160
10.1371/journal.pone.0090949
10.1145/564691.564737
10.1080/01969727408546059
10.1093/bioinformatics/18.11.1454
10.26599/BDMA.2018.9020003
10.1109/TCBB.2019.2897679
10.1109/3477.678624
10.1016/j.inffus.2018.11.019
10.1109/TCBB.2020.3005972
10.1093/database/bav117
10.1073/pnas.95.25.14863
10.1109/TEVC.2013.2281534
10.1007/978-3-319-70093-9_38
10.1109/TPAMI.1979.4766909
10.1109/TKDE.2018.2875712
10.1109/JBHI.2017.2784898
10.1109/TKDE.2019.2891622
10.1109/JBHI.2015.2404971
10.1109/4235.996017
10.1109/TKDE.2018.2872061
10.1109/TKDE.2018.2846252
10.1039/C4MB00101J
10.1142/S0219720005001004
10.1186/1471-2105-15-102
10.1186/s12859-017-1933-0
10.1007/s00500-018-3227-5
10.1109/BIBM.2013.6732509
10.1093/bioinformatics/btp053
10.1109/TSMCC.2012.2209416
10.1186/s12920-018-0447-6
10.1007/978-981-10-7814-9_12
10.1109/ICDM.2016.0039
10.1186/1752-0509-6-15
10.1109/CHASE.2016.27
10.1093/bib/bbw113
10.1109/TKDE.2018.2874881
10.1109/TNB.2013.2279131
10.1109/TCBB.2020.2973563
10.1109/TKDE.2019.2903810
10.1016/j.eswa.2005.12.011
10.1109/TEVC.2013.2281535
10.1016/j.patcog.2015.12.007
10.1371/journal.pone.0008250
10.1016/0377-0427(87)90125-7
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TKDE.2020.3035695
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Biology
Computer Science
EISSN 1558-2191
EndPage 4384
ExternalDocumentID 10_1109_TKDE_2020_3035695
9250577
Genre orig-research
GrantInformation_xml – fundername: Shenzhen University
  funderid: 10.13039/501100009019
– fundername: National Key Research and Development Plan of China
  grantid: 2018YFB1800302
– fundername: Science and Technology Plan of Shenzhen
  grantid: JCYJ20190808142207420
– fundername: National Natural Science Foundation of China
  grantid: 61772345
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c293t-5857320caf7fdad322f05792b9c9888ea6fcc5d975cd5fd1b5d7bf14c2ccd533
IEDL.DBID RIE
ISSN 1041-4347
IngestDate Mon Jun 30 06:53:33 EDT 2025
Thu Apr 24 22:55:44 EDT 2025
Wed Oct 01 02:06:25 EDT 2025
Wed Aug 27 02:23:34 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-5857320caf7fdad322f05792b9c9888ea6fcc5d975cd5fd1b5d7bf14c2ccd533
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-1991-290X
0000-0002-2766-3096
0000-0002-3014-0907
0000-0002-6797-2571
PQID 2698816278
PQPubID 85438
PageCount 14
ParticipantIDs crossref_citationtrail_10_1109_TKDE_2020_3035695
crossref_primary_10_1109_TKDE_2020_3035695
ieee_primary_9250577
proquest_journals_2698816278
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-09-01
PublicationDateYYYYMMDD 2022-09-01
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref53
ref52
ref11
ref54
ref17
ref16
ref19
Cheng (ref10)
ref51
ref50
ref46
ref45
ref48
ref47
ref41
ref44
ref43
Sun (ref42)
ref49
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Naseri (ref35) 2019; 17
ref40
ref34
ref37
ref36
ref30
ref33
ref32
ref2
ref1
ref39
ref38
Mankiewicz (ref31) 2000
Gunavathi (ref18) 2014; 8
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
ref29
References_xml – ident: ref6
  doi: 10.1007/978-3-540-73731-5_2
– volume-title: The Story of Mathematics
  year: 2000
  ident: ref31
– ident: ref37
  doi: 10.1609/aaai.v31i1.10905
– ident: ref43
  doi: 10.1007/s12038-015-9559-8
– ident: ref22
  doi: 10.1016/j.patcog.2016.09.013
– ident: ref40
  doi: 10.1109/ICDM.2016.0160
– ident: ref29
  doi: 10.1371/journal.pone.0090949
– ident: ref45
  doi: 10.1145/564691.564737
– ident: ref15
  doi: 10.1080/01969727408546059
– ident: ref44
  doi: 10.1093/bioinformatics/18.11.1454
– ident: ref49
  doi: 10.26599/BDMA.2018.9020003
– ident: ref50
  doi: 10.1109/TCBB.2019.2897679
– ident: ref7
  doi: 10.1109/3477.678624
– ident: ref52
  doi: 10.1016/j.inffus.2018.11.019
– ident: ref53
  doi: 10.1109/TCBB.2020.3005972
– ident: ref28
  doi: 10.1093/database/bav117
– ident: ref16
  doi: 10.1073/pnas.95.25.14863
– ident: ref21
  doi: 10.1109/TEVC.2013.2281534
– volume: 17
  start-page: 1
  issue: 2
  year: 2019
  ident: ref35
  article-title: An unsupervised gene selection method based on multiobjective ant colony optimization
  publication-title: Int. J. Artif. Intell.
– start-page: 757
  volume-title: Proc. Int. Conf. Mach. Learn.
  ident: ref42
  article-title: Multi-view sparse co-clustering via proximal alternating linearized minimization
– ident: ref9
  doi: 10.1007/978-3-319-70093-9_38
– ident: ref11
  doi: 10.1109/TPAMI.1979.4766909
– ident: ref26
  doi: 10.1109/TKDE.2018.2875712
– ident: ref39
  doi: 10.1109/JBHI.2017.2784898
– volume: 8
  start-page: 1490
  issue: 8
  year: 2014
  ident: ref18
  article-title: Performance analysis of genetic algorithm with KNN and SVM for feature selection in tumor classification
  publication-title: Int. J. Comput. Elect. Autom. Control Inf. Eng.
– ident: ref20
  doi: 10.1109/TKDE.2019.2891622
– ident: ref4
  doi: 10.1109/JBHI.2015.2404971
– ident: ref13
  doi: 10.1109/4235.996017
– ident: ref51
  doi: 10.1109/TKDE.2018.2872061
– ident: ref8
  doi: 10.1109/TKDE.2018.2846252
– ident: ref36
  doi: 10.1039/C4MB00101J
– ident: ref14
  doi: 10.1142/S0219720005001004
– ident: ref19
  doi: 10.1186/1471-2105-15-102
– start-page: 93
  volume-title: Proc. Int. Conf. Intell. Syst. Mol. Biol.
  ident: ref10
  article-title: Biclustering of expression data
– ident: ref2
  doi: 10.1186/s12859-017-1933-0
– ident: ref3
  doi: 10.1007/s00500-018-3227-5
– ident: ref41
  doi: 10.1109/BIBM.2013.6732509
– ident: ref32
  doi: 10.1093/bioinformatics/btp053
– ident: ref33
  doi: 10.1109/TSMCC.2012.2209416
– ident: ref5
  doi: 10.1186/s12920-018-0447-6
– ident: ref30
  doi: 10.1007/978-981-10-7814-9_12
– ident: ref25
  doi: 10.1109/ICDM.2016.0039
– ident: ref23
  doi: 10.1186/1752-0509-6-15
– ident: ref17
  doi: 10.1109/CHASE.2016.27
– ident: ref24
  doi: 10.1093/bib/bbw113
– ident: ref47
  doi: 10.1109/TKDE.2018.2874881
– ident: ref34
  doi: 10.1109/TNB.2013.2279131
– ident: ref1
  doi: 10.1109/TCBB.2020.2973563
– ident: ref46
  doi: 10.1109/TKDE.2019.2903810
– ident: ref54
  doi: 10.1016/j.eswa.2005.12.011
– ident: ref12
  doi: 10.1109/TEVC.2013.2281535
– ident: ref48
  doi: 10.1016/j.patcog.2015.12.007
– ident: ref27
  doi: 10.1371/journal.pone.0008250
– ident: ref38
  doi: 10.1016/0377-0427(87)90125-7
SSID ssj0008781
Score 2.4570565
Snippet In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the Curse of dimensionality . To overcome this problem, the...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 4371
SubjectTerms Algorithms
Big Data
Biology
Cluster analysis
Clustering
Clustering algorithms
co-clustering
Data analysis
Datasets
Dimensional analysis
Evaluation
Evolutionary algorithms
Feature extraction
Feature selection
Gene expression
Genetic algorithms
Germanium
multi-objective optimization
multi-view learning
Noise levels
Optimization
sample classification
Sorting algorithms
Title MMCo-Clus - An Evolutionary Co-clustering Algorithm for Gene Selection
URI https://ieeexplore.ieee.org/document/9250577
https://www.proquest.com/docview/2698816278
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2191
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008781
  issn: 1041-4347
  databaseCode: RIE
  dateStart: 19890101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT9swFH4CJKRxgK0MUWDIB07TUhI7tpNjVVohpnKhk7hF8a9tWteikiLBX8-z41ZoTIhblDxLjj4_P3-23_sAzmqTqsyhp_lCI0nOtEkURpFEOsu0TRWvw9bF-Fpc_sivbvntBnxb58JYa8PlM9vzj-Es38z10m-VnZc-Xku5CZuyEG2u1nrWLWQQJEV2gZyI5TKeYGZpeT75fjFEJkiRoKaMCy8l8SIGBVGVVzNxCC-jPRivOtbeKvnTWzaqp5_-qdn43p5_hN24ziT9dmB8gg0768B2qzz52IG9lZoDic7dgZ0XpQn3YTQeD-bJYLq8Jwnpz8jwIY7RevFI8IvGL02wJf3pz_nid_PrL8EFMPF1rMlNUNdB688wGQ0ng8skai4kGgN_kyB7kIymunbSmdqguzufrkpVqUsky7YWTmtuSsm14c5kihupXJZrqvEFYwewNZvP7CEQx3OqRcGcVTw3CA8rLJVWFIKXinPVhXQFQqVjPXIvizGtAi9Jy8rjVnncqohbF76um9y1xTjeMt73OKwNIwRdOFkhXUV3va-owJ_LBJXF0f9bHcMH6vMewuWyE9hqFkv7BVcjjToNw_AZgYjZqg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Pb9MwFH4aQwh2YNAxURjgAydEusQ_4uRYda0KW3ahSLtF8a-B6FrUpUjbX8-z41bTQIhblDxLjj4_P3-23_sA3jcmVZlDT_OFRhLOtEkURpFEOsu0TZVowtZFdZ5Pv_LPF-JiBz5uc2GsteHymR34x3CWb5Z67bfKjksfr6V8AA8F51x02VrbebeQQZIU-QWyIsZlPMPM0vJ4dnoyRi5IkaKmTOReTOJOFAqyKn_MxSHATPah2nStu1fyY7Bu1UDf3qva-L99fwZP40qTDLuh8Rx27KIHjzrtyZse7G_0HEh07x7s3SlOeACTqhotk9F8fU0SMlyQ8a84SpvVDcEvGr-0wZYM55fL1ff22xXBJTDxlazJl6Cvg9YvYDYZz0bTJKouJBpDf5sgf5CMprpx0pnGoMM7n7BKValLpMu2yZ3WwpRSaCOcyZQwUrmMa6rxBWOHsLtYLuxLIE5wqvOCOasENwgPKyyVNi9yUSohVB_SDQi1jhXJvTDGvA7MJC1rj1vtcasjbn34sG3ysyvH8S_jA4_D1jBC0IejDdJ1dNjrmub4c1lOZfHq763ewePprDqrzz6dn76GJ9RnQYSrZkew267W9g2uTVr1NgzJ3xI63Pc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MMCo-+Clus+%E2%80%93+An+Evolutionary+Co-clustering+Algorithm+for+Gene+Selection&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Cui%2C+Laizhong&rft.au=Acharya%2C+Sudipta&rft.au=Mishra%2C+Sumit&rft.au=Pan%2C+Yi&rft.date=2022-09-01&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=34&rft.issue=9&rft.spage=4371&rft.epage=4384&rft_id=info:doi/10.1109%2FTKDE.2020.3035695&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TKDE_2020_3035695
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon