A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data

In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of pre...

Full description

Saved in:
Bibliographic Details
Published in2021 13th International Conference on Computer and Automation Engineering (ICCAE) pp. 24 - 30
Main Authors Yan, Chaokun, Kang, Xi, Li, Mengyuan, Wang, Jianlin
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.03.2021
Subjects
Online AccessGet full text
DOI10.1109/ICCAE51876.2021.9426130

Cover

Abstract In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its search ability, and an adaptive kre,t particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10fold-CV method with the K? N classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed NH-LBGSA has superior performance than other algorithms.
AbstractList In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its search ability, and an adaptive kre,t particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10fold-CV method with the K? N classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed NH-LBGSA has superior performance than other algorithms.
Author Yan, Chaokun
Li, Mengyuan
Wang, Jianlin
Kang, Xi
Author_xml – sequence: 1
  givenname: Chaokun
  surname: Yan
  fullname: Yan, Chaokun
  email: ckyan@henu.edu.cn
  organization: Henan University,School of Computer and Information Engineering,Kaifeng,China
– sequence: 2
  givenname: Xi
  surname: Kang
  fullname: Kang, Xi
  email: kx159951@163.com
  organization: Henan University,School of Computer and Information Engineering,Kaifeng,China
– sequence: 3
  givenname: Mengyuan
  surname: Li
  fullname: Li, Mengyuan
  email: myli@henu.edu.cn
  organization: Henan University,School of Computer and Information Engineering,Kaifeng,China
– sequence: 4
  givenname: Jianlin
  surname: Wang
  fullname: Wang, Jianlin
  email: jlwang@henu.edu.cn
  organization: Henan University,School of Computer and Information Engineering,Kaifeng,China
BookMark eNotUMtOwzAQNBIcaOELOOAfaLGziVsfQ_qKVOBQkLhVm3jTWEriynUr8QX8NqHtZXakeUizA3bbuY4Ye5ZiLKXQL3mWpfNETidqHIlIjnUcKQnihg2kUkksI51837PflL-7EzV8QRiOnviGGiqDdR1_o1A7w__ZMRyx4XlXOd_iWcTO8Lzd-z5r-NLjyYaz0Ns2hL6sedrsnLehbnmf4iu7q_nMttQdLq5X61oytuzpDAM-sLsKmwM9Xu-QfS3mn9lqtP5Y5lm6HtlIQBiVWqkCYgBTAfSzUMZQTGRVCV1MFRqZ9AgICRAIRSTjShtVFkUkJiZBDUP2dOm1RLTde9ui_9lefwN_JHJguA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCAE51876.2021.9426130
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
EISBN 166541295X
9781665412957
EndPage 30
ExternalDocumentID 9426130
Genre orig-research
GrantInformation_xml – fundername: Technology Development
  funderid: 10.13039/100006180
– fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-c966b3433df33202a143b71ff09b86ad1586a3a353e306ee14f9d6cbb207d5a93
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:52 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-c966b3433df33202a143b71ff09b86ad1586a3a353e306ee14f9d6cbb207d5a93
PageCount 7
ParticipantIDs ieee_primary_9426130
PublicationCentury 2000
PublicationDate 2021-March-20
PublicationDateYYYYMMDD 2021-03-20
PublicationDate_xml – month: 03
  year: 2021
  text: 2021-March-20
  day: 20
PublicationDecade 2020
PublicationTitle 2021 13th International Conference on Computer and Automation Engineering (ICCAE)
PublicationTitleAbbrev ICCAE
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7836078
Snippet In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis...
SourceID ieee
SourceType Publisher
StartPage 24
SubjectTerms adaptive kbest particle update
Classification algorithms
Feature extraction
Feature selection
Filtering algorithms
fitness sharing strategy
Gravitational Search Algorithm
Heuristic algorithms
Information filters
Mutual Information
Niche algorithm
scale function
Sociology
Statistics
Title A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data
URI https://ieeexplore.ieee.org/document/9426130
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVKTz2xtIhdPnAkqRNnPZYuFKRWSFCpt8qOJ1BRElQlPfAD_DYeJxSBOHCJrMSJI0-imbHfm0fIpae9jPSZsqKAccsTKrEicGNLSBccGYaJC8gdnkyD8cy7m_vzBrnacmEAwIDPwMam2ctXeVLiUlk3xnif6wR9J4yCiqtVQ7YcFndv-_3e0Hf0763TPtex694_ZFOM1xjtksnXeBVY5MUuC2kn779KMf73hfZI55ufR--3nmefNCA7IC2MG6uyy23y0aPTfAMriiFeuQb6YPRutBHoxGhGU2yVSB6hNSPJXBSZotU6Ayh6sxabuoS37lYBk2lv9ZSvl8XzK9V3UYSJ0AEqBFTVPei1ofOj5elAFKJDZqPhY39s1ZoL1tJlvLASnf5I7nGuUo7S6kLHUzJ00pTFMgqEcnx95IL7HHSyAeB4aayCREqXhcoXMT8kzSzP4IhQlkptjTQAHglP8lDqp3Lh4F5kGuow8Zi0cUYXb1VZjUU9mSd_nz4lLbQqwr9cdkaaxbqEcx0PFPLCfAif-E-39A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaqMtCJR4t444GRpE7svMbSFlpoKiRaqVtlxxeoKAmqkg78Af42dhKKQAwskZU4ceRLdHf2992H0CVTXkY4RBq-S6jBuIwMH-zA4MIGS3heZIPmDodjdzBldzNnVkNXGy4MABTgMzB1s9jLl2mU66WydqDjfaoS9C2HMeaUbK0KtGWRoD3sdjt9x1I_uEr8bMus-v8QTin8xs0OCr9GLOEiL2aeCTN6_1WM8b-vtIta3ww9_LDxPXuoBsk-aujIsSy83EQfHTxO17DEOsjLV4AfC8UbZQYcFqrRWLdyTR_BFSepuMgTicuVBpD4dsXXVRFv1a2EJuPO8ildLbLnV6zuwhoogntaI6Cs74GvC0K_tj3u8Yy30PSmP-kOjEp1wVjYhGZGpBIgQRmlMqZaXJ2riEp4VhyTQPgul5ajjpRTh4JKNwAsFgfSjYSwiScdHtADVE_SBA4RJrFQ1ohdoD5ngnpCPZVyS-9Gxp4KFI9QU8_o_K0srDGvJvP479MXaHswCUfz0XB8f4Ia2sIaDGaTU1TPVjmcqeggE-fFR_EJw-u7QQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+13th+International+Conference+on+Computer+and+Automation+Engineering+%28ICCAE%29&rft.atitle=A+Novel+Feature+Selection+Method+on+Mutual+Information+and+Improved+Gravitational+Search+Algorithm+for+High+Dimensional+Biomedical+Data&rft.au=Yan%2C+Chaokun&rft.au=Kang%2C+Xi&rft.au=Li%2C+Mengyuan&rft.au=Wang%2C+Jianlin&rft.date=2021-03-20&rft.pub=IEEE&rft.spage=24&rft.epage=30&rft_id=info:doi/10.1109%2FICCAE51876.2021.9426130&rft.externalDocID=9426130