GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection
With the development of data age, data quality has become one of the problems that people pay much attention to. As a field of data mining, outlier detection is related to the quality of data. The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in r...
Saved in:
| Published in | Transactions of Nanjing University of Aeronautics & Astronautics Vol. 36; no. 6; pp. 1026 - 1038 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | Chinese English |
| Published |
Nanjing
Nanjing University of Aeronautics and Astronautics
01.12.2019
College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China%State Grid Liaoning Electric Power Supply Co.,LTD,Shenyang 110004,P.R.China |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1005-1120 |
| DOI | 10.16356/j.1005-1120.2019.06.015 |
Cover
| Abstract | With the development of data age, data quality has become one of the problems that people pay much attention to. As a field of data mining, outlier detection is related to the quality of data. The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years. In the process of constructing the isolation tree by the isolated forest algorithm, as the isolation tree is continuously generated, the difference of isolation trees will gradually decrease or even no difference, which will result in the waste of memory and reduced efficiency of outlier detection. And in the constructed isolation trees, some isolation trees cannot detect outlier. In this paper, an improved iForest-based method GA-iForest is proposed. This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees, thereby reducing some duplicate, similar and poor detection isolation trees and improving the accuracy and stability of outlier detection. In the experiment, Ubuntu system and Spark platform are used to build the experiment environment. The outlier datasets provided by ODDS are used as test. According to indicators such as the accuracy, recall rate, ROC curves, AUC and execution time, the performance of the proposed method is evaluated. Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection, but also reduce the number of isolation trees by 20%-40% compared with the original iForest method. |
|---|---|
| AbstractList | TP301.6; With the development of data age,data quality has become one of the problems that people pay muchattention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method. With the development of data age, data quality has become one of the problems that people pay much attention to. As a field of data mining, outlier detection is related to the quality of data. The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years. In the process of constructing the isolation tree by the isolated forest algorithm, as the isolation tree is continuously generated, the difference of isolation trees will gradually decrease or even no difference, which will result in the waste of memory and reduced efficiency of outlier detection. And in the constructed isolation trees, some isolation trees cannot detect outlier. In this paper, an improved iForest-based method GA-iForest is proposed. This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees, thereby reducing some duplicate, similar and poor detection isolation trees and improving the accuracy and stability of outlier detection. In the experiment, Ubuntu system and Spark platform are used to build the experiment environment. The outlier datasets provided by ODDS are used as test. According to indicators such as the accuracy, recall rate, ROC curves, AUC and execution time, the performance of the proposed method is evaluated. Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection, but also reduce the number of isolation trees by 20%-40% compared with the original iForest method. |
| Author | Li, Kexin Bo, Jue Li, Jing Liu, Shuji Liu, Biqi Li, Zhao |
| AuthorAffiliation | College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China%State Grid Liaoning Electric Power Supply Co.,LTD,Shenyang 110004,P.R.China |
| AuthorAffiliation_xml | – name: College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China%State Grid Liaoning Electric Power Supply Co.,LTD,Shenyang 110004,P.R.China |
| Author_xml | – sequence: 1 givenname: Kexin surname: Li fullname: Li, Kexin – sequence: 2 givenname: Jing surname: Li fullname: Li, Jing – sequence: 3 givenname: Shuji surname: Liu fullname: Liu, Shuji – sequence: 4 givenname: Zhao surname: Li fullname: Li, Zhao – sequence: 5 givenname: Jue surname: Bo fullname: Bo, Jue – sequence: 6 givenname: Biqi surname: Liu fullname: Liu, Biqi |
| BookMark | eNpFkM1uwjAQhH2gUinlHSz1nHRtE8fpLeWvSKhc2jNykg0Ygk0dR_D4TUXVnlb6dmZHOw9kYJ1FQiiDmEmRyOdDzACSiDEOMQeWxSBjYMmADP_4PRm3rSkAZAoiVXJI_DKPzMJ5bMMLzS2d17UpDdpAV61rdMCK3rZ04fUJL84f6atue-wsXaLFYEqaNzvnTdifaO08fe9O6E2pGzrTQdNNFxqDns4wYBmMs4_krtZNi-PfOSKfi_nH9C1ab5arab6OzkxIGXEmk5qzTEqlOVcZn9RZBVoxVMigVIUAxmCitBQZK4qqwkxrqdJqIqskTZQYkeh296Jtre1ue3Cdt33i1h72x32ortdiiz9NgYS-whF5uunP3n11_cv_Bi6UgCzt48Q3chRtXA |
| ClassificationCodes | TP301.6 |
| ContentType | Journal Article |
| Copyright | Copyright Nanjing University of Aeronautics and Astronautics 2019 Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| Copyright_xml | – notice: Copyright Nanjing University of Aeronautics and Astronautics 2019 – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
| DBID | 7TB 8FD FR3 H8D L7M 2B. 4A8 92I 93N PSX TCJ |
| DOI | 10.16356/j.1005-1120.2019.06.015 |
| DatabaseName | Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database Aerospace Database Advanced Technologies Database with Aerospace Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
| DatabaseTitle | Aerospace Database Engineering Research Database Technology Research Database Mechanical & Transportation Engineering Abstracts Advanced Technologies Database with Aerospace |
| DatabaseTitleList | Aerospace Database |
| DeliveryMethod | fulltext_linktorsrc |
| EndPage | 1038 |
| ExternalDocumentID | njhkhtdxxb_e201906016 |
| GrantInformation_xml | – fundername: the StateGrid Liaoning Electric Power Supply CO,LTD.We are grateful to the reviewers who have given their support and valuable comments and the financial support for the"Key Technology and Application Research of the Self-Service Grid Big Data Governance " funderid: (No.SGLNXT00YJJS1800110)" |
| GroupedDBID | 7TB 8FD ALMA_UNASSIGNED_HOLDINGS CDYEO FR3 H8D L7M 2B. 4A8 92I 93N PSX TCJ |
| ID | FETCH-LOGICAL-p1366-2165f219668a228924f9d0a81e8e10c8b3011048a6391bbdde9aa687d46d57583 |
| ISSN | 1005-1120 |
| IngestDate | Thu May 29 04:06:45 EDT 2025 Mon Jun 30 04:13:48 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 6 |
| Keywords | genetic algorithm isolation tree isolated forest outlier detection feature selection |
| Language | Chinese English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p1366-2165f219668a228924f9d0a81e8e10c8b3011048a6391bbdde9aa687d46d57583 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2383097104 |
| PQPubID | 2047913 |
| PageCount | 13 |
| ParticipantIDs | wanfang_journals_njhkhtdxxb_e201906016 proquest_journals_2383097104 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-12-01 |
| PublicationDateYYYYMMDD | 2019-12-01 |
| PublicationDate_xml | – month: 12 year: 2019 text: 2019-12-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Nanjing |
| PublicationPlace_xml | – name: Nanjing |
| PublicationTitle | Transactions of Nanjing University of Aeronautics & Astronautics |
| PublicationTitle_FL | Transactions of Nanjing University of Aeronautics and Astronautics |
| PublicationYear | 2019 |
| Publisher | Nanjing University of Aeronautics and Astronautics College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China%State Grid Liaoning Electric Power Supply Co.,LTD,Shenyang 110004,P.R.China |
| Publisher_xml | – name: Nanjing University of Aeronautics and Astronautics – name: College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,P.R.China%State Grid Liaoning Electric Power Supply Co.,LTD,Shenyang 110004,P.R.China |
| SSID | ssib006703786 ssib051367785 ssib022315922 ssib018830051 ssib000269172 ssib001051656 ssib001129202 |
| Score | 2.1334808 |
| Snippet | With the development of data age, data quality has become one of the problems that people pay much attention to. As a field of data mining, outlier detection... TP301.6; With the development of data age,data quality has become one of the problems that people pay muchattention to.As a field of data mining,outlier... |
| SourceID | wanfang proquest |
| SourceType | Aggregation Database |
| StartPage | 1026 |
| SubjectTerms | Accuracy Algorithms Data analysis Data mining Genetic algorithms Outliers (statistics) Stability Trees |
| Title | GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection |
| URI | https://www.proquest.com/docview/2383097104 https://d.wanfangdata.com.cn/periodical/njhkhtdxxb-e201906016 |
| Volume | 36 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text issn: 1005-1120 databaseCode: ADMLS dateStart: 20181001 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text omitProxy: false ssIdentifier: ssib000269172 providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELf2ISReEAgQGwP5AfMSZSRp4ji8uWv2RTcQ26SJl8lpHJoxUuhSMfiH-De5c9LUm4Y0eIkc23EbOz_fnX3-HSGvelmQxL4u3DwRuRuO8p6bJQV3w1jnYB0osH3QUDw45Lsn4f5pdLq0_NvyWprV2ebo163nSv5nVCEPxhVPyf7DyHaNQgakYXzhCiMM1zuN8Y50S4yteYlPywrDJJfmgKOzBz-sUJlsilE_bXywnD6IrRy3CJBwGtla5cXnybSsx1-Nx-HhrNnCgclQ1cp5P6sv8DzKQNfGZ6uylVmWRkwMWBKzNGR9yeQWSwUTgknJ0pjJBDMXORGTIWa2idgkBkxyK8GZgHYilm6z_hY-iI_3Wd_HogRKY2xZJKaoqdOtZAz3nHf6qqys-_25YDa3J87ReHZeWuWfxmpir3v4yQ0fEmtdZR7_opsOccthsTMB3ypIqnNcebnu7CL1FKydWUeHLS_rLsN6CExksJsh48Pmx00T1ZwFkTEFnJ1pmTvDUjUr56kJXIQnGjDAnWPCsuK8jtP88HgA16Oxrn4qqIpEfV5ot2nJIEMO6weeLaQalpgWjLbEAQWRW9oL8t3fKhmRh7ARjfPm0a8xMeS1zYHaG7zj1fn4y7jOr66yM401DXHPMlkNQHx6K2RVDg6GR9aGNQdoW0ewYbK3mfhQrQ8spjkOksZiLvSFwLgJnagBpRX07AWzZITMgrEJsNv9_9YvD1_szV9e65rZeO-HqgrofEt_PH5IHrSGH5UNih-RJV09Jt8XCH4rK9rhl87xS5tC2uGXGvzSSUVb_NIOvxTwSzv8UsQvbfFLO_w-ISfb6fHWrtsGQXG_wTtzN4BuLECt4FyoIBBJEBZJ7inha6F9byQylNAghhWYGn6WgbaSKMVFnIc8B1NM9J6SlWpS6WeEgu2VRb4qRkWow0KNMp0lXAXKKwTo_Zm_RjbmvXXWznKXZ6DS95BnzgvXyOu2Bxelt34k63et-JzcXyB7g6zU05l-ATp-nb1sv68_0bnhrg |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GA-iForest%3AAn+Efficient+Isolated+Forest+Framework+Based+on+Genetic+Algorithm+for+Numerical+Data+Outlier+Detection&rft.jtitle=%E5%8D%97%E4%BA%AC%E8%88%AA%E7%A9%BA%E8%88%AA%E5%A4%A9%E5%A4%A7%E5%AD%A6%E5%AD%A6%E6%8A%A5%EF%BC%88%E8%8B%B1%E6%96%87%E7%89%88%EF%BC%89&rft.au=LI+Kexin&rft.au=LI+Jing&rft.au=LIU+Shuji&rft.au=LI+Zhao&rft.date=2019-12-01&rft.pub=College+of+Computer+Science+and+Technology%2CNanjing+University+of+Aeronautics+and+Astronautics%2CNanjing+211106%2CP.R.China%25State+Grid+Liaoning+Electric+Power+Supply+Co.%2CLTD%2CShenyang+110004%2CP.R.China&rft.issn=1005-1120&rft.volume=36&rft.issue=6&rft.spage=1026&rft.epage=1038&rft_id=info:doi/10.16356%2Fj.1005-1120.2019.06.015&rft.externalDocID=njhkhtdxxb_e201906016 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fnjhkhtdxxb-e%2Fnjhkhtdxxb-e.jpg |