A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data
In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of pre...
Saved in:
| Published in | 2021 13th International Conference on Computer and Automation Engineering (ICCAE) pp. 24 - 30 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
20.03.2021
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/ICCAE51876.2021.9426130 |
Cover
| Abstract | In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its search ability, and an adaptive kre,t particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10fold-CV method with the K? N classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed NH-LBGSA has superior performance than other algorithms. |
|---|---|
| AbstractList | In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its search ability, and an adaptive kre,t particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10fold-CV method with the K? N classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed NH-LBGSA has superior performance than other algorithms. |
| Author | Yan, Chaokun Li, Mengyuan Wang, Jianlin Kang, Xi |
| Author_xml | – sequence: 1 givenname: Chaokun surname: Yan fullname: Yan, Chaokun email: ckyan@henu.edu.cn organization: Henan University,School of Computer and Information Engineering,Kaifeng,China – sequence: 2 givenname: Xi surname: Kang fullname: Kang, Xi email: kx159951@163.com organization: Henan University,School of Computer and Information Engineering,Kaifeng,China – sequence: 3 givenname: Mengyuan surname: Li fullname: Li, Mengyuan email: myli@henu.edu.cn organization: Henan University,School of Computer and Information Engineering,Kaifeng,China – sequence: 4 givenname: Jianlin surname: Wang fullname: Wang, Jianlin email: jlwang@henu.edu.cn organization: Henan University,School of Computer and Information Engineering,Kaifeng,China |
| BookMark | eNotUMtOwzAQNBIcaOELOOAfaLGziVsfQ_qKVOBQkLhVm3jTWEriynUr8QX8NqHtZXakeUizA3bbuY4Ye5ZiLKXQL3mWpfNETidqHIlIjnUcKQnihg2kUkksI51837PflL-7EzV8QRiOnviGGiqDdR1_o1A7w__ZMRyx4XlXOd_iWcTO8Lzd-z5r-NLjyYaz0Ns2hL6sedrsnLehbnmf4iu7q_nMttQdLq5X61oytuzpDAM-sLsKmwM9Xu-QfS3mn9lqtP5Y5lm6HtlIQBiVWqkCYgBTAfSzUMZQTGRVCV1MFRqZ9AgICRAIRSTjShtVFkUkJiZBDUP2dOm1RLTde9ui_9lefwN_JHJguA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICCAE51876.2021.9426130 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics |
| EISBN | 166541295X 9781665412957 |
| EndPage | 30 |
| ExternalDocumentID | 9426130 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Technology Development funderid: 10.13039/100006180 – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i203t-c966b3433df33202a143b71ff09b86ad1586a3a353e306ee14f9d6cbb207d5a93 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:37:52 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c966b3433df33202a143b71ff09b86ad1586a3a353e306ee14f9d6cbb207d5a93 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_9426130 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-March-20 |
| PublicationDateYYYYMMDD | 2021-03-20 |
| PublicationDate_xml | – month: 03 year: 2021 text: 2021-March-20 day: 20 |
| PublicationDecade | 2020 |
| PublicationTitle | 2021 13th International Conference on Computer and Automation Engineering (ICCAE) |
| PublicationTitleAbbrev | ICCAE |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7836078 |
| Snippet | In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 24 |
| SubjectTerms | adaptive kbest particle update Classification algorithms Feature extraction Feature selection Filtering algorithms fitness sharing strategy Gravitational Search Algorithm Heuristic algorithms Information filters Mutual Information Niche algorithm scale function Sociology Statistics |
| Title | A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data |
| URI | https://ieeexplore.ieee.org/document/9426130 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELVKTz2xtIhdPnAkqRNnPZYuFKRWSFCpt8qOJ1BRElQlPfAD_DYeJxSBOHCJrMSJI0-imbHfm0fIpae9jPSZsqKAccsTKrEicGNLSBccGYaJC8gdnkyD8cy7m_vzBrnacmEAwIDPwMam2ctXeVLiUlk3xnif6wR9J4yCiqtVQ7YcFndv-_3e0Hf0763TPtex694_ZFOM1xjtksnXeBVY5MUuC2kn779KMf73hfZI55ufR--3nmefNCA7IC2MG6uyy23y0aPTfAMriiFeuQb6YPRutBHoxGhGU2yVSB6hNSPJXBSZotU6Ayh6sxabuoS37lYBk2lv9ZSvl8XzK9V3UYSJ0AEqBFTVPei1ofOj5elAFKJDZqPhY39s1ZoL1tJlvLASnf5I7nGuUo7S6kLHUzJ00pTFMgqEcnx95IL7HHSyAeB4aayCREqXhcoXMT8kzSzP4IhQlkptjTQAHglP8lDqp3Lh4F5kGuow8Zi0cUYXb1VZjUU9mSd_nz4lLbQqwr9cdkaaxbqEcx0PFPLCfAif-E-39A |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaqMtCJR4t444GRpE7svMbSFlpoKiRaqVtlxxeoKAmqkg78Af42dhKKQAwskZU4ceRLdHf2992H0CVTXkY4RBq-S6jBuIwMH-zA4MIGS3heZIPmDodjdzBldzNnVkNXGy4MABTgMzB1s9jLl2mU66WydqDjfaoS9C2HMeaUbK0KtGWRoD3sdjt9x1I_uEr8bMus-v8QTin8xs0OCr9GLOEiL2aeCTN6_1WM8b-vtIta3ww9_LDxPXuoBsk-aujIsSy83EQfHTxO17DEOsjLV4AfC8UbZQYcFqrRWLdyTR_BFSepuMgTicuVBpD4dsXXVRFv1a2EJuPO8ildLbLnV6zuwhoogntaI6Cs74GvC0K_tj3u8Yy30PSmP-kOjEp1wVjYhGZGpBIgQRmlMqZaXJ2riEp4VhyTQPgul5ajjpRTh4JKNwAsFgfSjYSwiScdHtADVE_SBA4RJrFQ1ohdoD5ngnpCPZVyS-9Gxp4KFI9QU8_o_K0srDGvJvP479MXaHswCUfz0XB8f4Ia2sIaDGaTU1TPVjmcqeggE-fFR_EJw-u7QQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+13th+International+Conference+on+Computer+and+Automation+Engineering+%28ICCAE%29&rft.atitle=A+Novel+Feature+Selection+Method+on+Mutual+Information+and+Improved+Gravitational+Search+Algorithm+for+High+Dimensional+Biomedical+Data&rft.au=Yan%2C+Chaokun&rft.au=Kang%2C+Xi&rft.au=Li%2C+Mengyuan&rft.au=Wang%2C+Jianlin&rft.date=2021-03-20&rft.pub=IEEE&rft.spage=24&rft.epage=30&rft_id=info:doi/10.1109%2FICCAE51876.2021.9426130&rft.externalDocID=9426130 |