Robust method of sparse feature selection for multi-label classification with Naive Bayes
The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature...
Saved in:
| Published in | 2014 Federated Conference on Computer Science and Information Systems Vol. 2; pp. 375 - 380 |
|---|---|
| Main Author | |
| Format | Conference Proceeding Journal Article |
| Language | English |
| Published |
Polish Information Processing Society
01.09.2014
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2300-5963 2300-5963 |
| DOI | 10.15439/2014F502 |
Cover
| Abstract | The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%. |
|---|---|
| AbstractList | The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%. |
| Author | Ruta, Dymitr |
| Author_xml | – sequence: 1 givenname: Dymitr surname: Ruta fullname: Ruta, Dymitr email: dymitr.ruta@kustar.ac.ae organization: British Telecom Innovation Centre, Khalifa Univ., Abu Dhabi, United Arab Emirates |
| BookMark | eNpNkMFuEzEQQA0qEm3JgTMXn5G2jD221z5CRaFS1UpVOXBazTpj6sqJo_WGKn9PSFDFaUYzT-_wzsTJuq5ZiPcKLpQ1GD5pUObKgn4lFqH3Hh14Bdbja3GqEaCzweHJf_tbsWjtCQC0MqCNOxU_7-u4bbNc8fxYl7Im2TY0NZaJad5OLBsXjnOua5nqJFfbMueu0MhFxkKt5ZQjHd7PeX6Ut5R_s_xCO27vxJtEpfHi3zwXP66-Plx-727uvl1ffr7plrp3c-cxBoPe2JFtRLTkHfQpcVQOrYp974MLvQrKRmv6hITaamRrnY0EPuG5uD56l5Wehs2UVzTthkp5OBzq9Gugac6x8LBMwCEacqSDYTTeKeJeh30ON_oR9q6PR9d2vaHdM5XyIlQwHJoPf5unffM9_OEIZ2Z-4VxABAP4BwooeiY |
| ContentType | Conference Proceeding Journal Article |
| DBID | 6IE 6IL CBEJK RIE RIL ADTOC UNPAY DOA |
| DOI | 10.15439/2014F502 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Xplore Digital Library (LUT) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9788360810583 8360810583 |
| EISSN | 2300-5963 |
| EndPage | 380 |
| ExternalDocumentID | oai_doaj_org_article_df0e9c4a6a294e34861ae7290216b8b0 10.15439/2014f502 6933040 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL 6IF 6IN AAJGR AAWTH ABLEC ADBBV ADTOC ADZIZ ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ CHZPO GROUPED_DOAJ IEGSK IPNFZ M~E OCL OK1 RIG UNPAY Y2W |
| ID | FETCH-LOGICAL-d276t-83c943845be5c335a8607ffec16351c77896971915c547f3a32523e5565ca08f3 |
| IEDL.DBID | RIE |
| ISSN | 2300-5963 |
| IngestDate | Fri Oct 03 12:51:25 EDT 2025 Mon Sep 15 08:24:20 EDT 2025 Thu Jun 29 18:36:57 EDT 2023 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-d276t-83c943845be5c335a8607ffec16351c77896971915c547f3a32523e5565ca08f3 |
| OpenAccessLink | https://doaj.org/article/df0e9c4a6a294e34861ae7290216b8b0 |
| PageCount | 6 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_df0e9c4a6a294e34861ae7290216b8b0 ieee_primary_6933040 unpaywall_primary_10_15439_2014f502 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-Sept. |
| PublicationDateYYYYMMDD | 2014-09-01 |
| PublicationDate_xml | – month: 09 year: 2014 text: 2014-Sept. |
| PublicationDecade | 2010 |
| PublicationTitle | 2014 Federated Conference on Computer Science and Information Systems |
| PublicationTitleAbbrev | FedCSIS |
| PublicationYear | 2014 |
| Publisher | Polish Information Processing Society |
| Publisher_xml | – name: Polish Information Processing Society |
| SSID | ssj0002140246 |
| Score | 1.9695225 |
| Snippet | The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features... |
| SourceID | doaj unpaywall ieee |
| SourceType | Open Website Open Access Repository Publisher |
| StartPage | 375 |
| SubjectTerms | Big data Data mining Data models Feature extraction Measurement Predictive models Robustness |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQF2DgqyDKlyzBGpHEH7FHiqgqBB0olcoU2cl5qtqqTYT67zk7oerGwupIlvUu9r1nne8R8pAwyAxAHNlCZxEvSxkpJm0ETlgnhEHFEap8R3I44a9TMd2x-vI1YU174Aa4x9LFoAtupEk1B8aVTAwgI8TcJK2yQa3HSu-IKX8Gp6gbUi7bVkICsy6q_IQPhL8-Cc35WzeVQ7Jfz5dm821ms53EMjghRy0jpE_NSk7JHszPyPGv2wJtN1-XfH0sbL2u6HvwfKYLR8dLVKVAPYurV0DHwdIGcaZIRGl4WRu9GQszGpwvfU1QCAP1d690ZPCco32zgfU5mQxePp-HUWuMEJVpJitEs9CcKS4siIIxYZSMM1_-geRKJEWWKS11hkpMFIJnjhmWot4EgeStMLFy7IJ05os5XBKaxqwsEocIg-PMP9Q1mjlMa1KKVBjRI32PVr5sel_kvht1GMAY5W2M8r9i1CNdj_V2EhluUXD4fov99psXJT5euY-Xw3hd_ccKrsmBn68pDrshnWpVwy2yicrehR_nB9pExKE priority: 102 providerName: Directory of Open Access Journals – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB3R5dD2QilUpV-yRK8GJ_Y4ybFUIITKCkFXhVNkO7aEWO2uSKKK_vqOvdml4lRukeNYVsYf79kz8wC-ZtIXxnvBrasKrppG81Jqy31AGxANMY7k5TvWpxN1do3XG6BXsTAmpQ3mrr1t0z3-4zLeEk_P1CFBs7v2EAXxwSa8gE2NhMFHsDkZX3y7SUpyQnAqk0MaIaQdN30ZMB6dpMT8g5LKa3jZzxbm4beZTv_ZVE624NeqO0tfkruDvrMH7s-TTI3P7-8b2H0M5GMX67rbsOFnb2FrpeTAhom9AzeXc9u3HTtPetJsHtjVghivZxEh9veeXSW5HLIhI5DLUtQu_2Gsn7Kkqhn9jZKJWTzXZWNDayg7Mg--3YXJyfHP76d8EF3gTV7ojizlKiVLhdajkxJNqUURXUsIuGHmiqKsdFUQy0OHqgjSyJy4rEcChs6IMsh3MJrNZ_49sFzIxmWhcsoHJWMQsKlkoC1Ta8zR4B4cRWvUi2VejTpmuk4F9CPrYeLUTRCeWjDa5JXyUpU6M54YAWETbUsr9mAn2nLdiE4nNFS8v7bt-l0kPHE81Kvx8OG_an2EV_Fx6Vn2CUbdfe8_ExTp7Jdh3P0Fj7fc7Q priority: 102 providerName: Unpaywall |
| Title | Robust method of sparse feature selection for multi-label classification with Naive Bayes |
| URI | https://ieeexplore.ieee.org/document/6933040 https://annals-csis.org/proceedings/2014/pliks/502.pdf https://doaj.org/article/df0e9c4a6a294e34861ae7290216b8b0 |
| UnpaywallVersion | publishedVersion |
| Volume | 2 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwEB0BPZReSgWILWVlqT2SbRJ_JceCukJIXaGKleAU2c74wmp3xSZC8OsZOyFFiENvUT6caMbJvOe8mQH4kXHUBjFNrCt1IupaJQVXNkEvrZfSEOOIKt-ZupiLyxt5swWnQy4MIkbxGU7CZvyXX69cG5bKfqrIvomgb-tCdblafbEgSXGVeHwmpjIskMTy-32_lE_wsV2uzeODWSxehY7pZ_jzctNOMXI3aRs7cU9v6jH-71PtwcG_JD12NYSfL7CFy324_buy7aZhXWNotvKMvhj3G2QeYwVPtol9b8gZjNAqi3LChCYCLpgLODoIh6KvWFigZTNDH0N2Zh5xcwDz6e_r84uk756Q1LlWDZnclYIXQlqUjnNpCpXqoBEhBCYzp3VRqlITXZNOCu254TmRUpSE8JxJC88PYWe5WuIRsDzltct86QR6wUM2rym5p9inlMylkSM4Cwav1l2BjCqUrI47yE5V_wZUtU-RRjDK5KVALgqVGSRoTyBD2cKmI9gPth0G6c06gu-D-4ZjgbkEl1fB5Z5c_vX9i49hN5zRacK-wU5z3-IJgYjGjiP5Hsc5NIYP89nVr9tn8ajKlQ |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwEB0heoBeKIKq20JrqT02SxJ7nOQKYrW0sKoqkOAU2c740tUuYhMh-us7dtJQVT30FuXDiWaczHvOmxmAT5mkwhCliXVVkaim0UkptU3Io_WIhhlHVPku9PxGfbnF2y34PObCEFEUn9E0bMZ_-c3adWGp7ERH9s0E_QUqpbDP1hrKBSFHVmbymZphWCKJBfiHjikvYadb3ZunR7Nc_hE8Zntw9fu2vWbkx7Rr7dT9_Ksi4_8-1ys4fE7TE9_GALQPW7Q6gLvva9ttWtG3hhZrL_ib8bAh4SnW8BSb2PmG3SEYr4ooKEx4KtBSuICkg3QoekuEJVqxMPw5FKfmiTaHcDM7vz6bJ0P_hKTJC92y0V2lZKnQEjop0ZQ6LYJKhDEYZq4oykpXBRM2dKgKL43MmZYSMsZzJi29fA3bq_WK3oDIU9m4zFdOkVcy5POaSnqOflpjjgYncBoMXt_3JTLqULQ67mA71cM7UDc-JR7BaJNXiqQqdWaIwT3DDG1Lm07gINh2HGQw6wQ-ju4bjwXuElxeB5d7dvnbf1_8AXbm11eX9eXF4us72A1n9wqxI9huHzo6ZkjR2vdxJv0CknbLPQ |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB3R5dD2QilUpV-yRK8GJ_Y4ybFUIITKCkFXhVNkO7aEWO2uSKKK_vqOvdml4lRukeNYVsYf79kz8wC-ZtIXxnvBrasKrppG81Jqy31AGxANMY7k5TvWpxN1do3XG6BXsTAmpQ3mrr1t0z3-4zLeEk_P1CFBs7v2EAXxwSa8gE2NhMFHsDkZX3y7SUpyQnAqk0MaIaQdN30ZMB6dpMT8g5LKa3jZzxbm4beZTv_ZVE624NeqO0tfkruDvrMH7s-TTI3P7-8b2H0M5GMX67rbsOFnb2FrpeTAhom9AzeXc9u3HTtPetJsHtjVghivZxEh9veeXSW5HLIhI5DLUtQu_2Gsn7Kkqhn9jZKJWTzXZWNDayg7Mg--3YXJyfHP76d8EF3gTV7ojizlKiVLhdajkxJNqUURXUsIuGHmiqKsdFUQy0OHqgjSyJy4rEcChs6IMsh3MJrNZ_49sFzIxmWhcsoHJWMQsKlkoC1Ta8zR4B4cRWvUi2VejTpmuk4F9CPrYeLUTRCeWjDa5JXyUpU6M54YAWETbUsr9mAn2nLdiE4nNFS8v7bt-l0kPHE81Kvx8OG_an2EV_Fx6Vn2CUbdfe8_ExTp7Jdh3P0Fj7fc7Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+Federated+Conference+on+Computer+Science+and+Information+Systems&rft.atitle=Robust+method+of+sparse+feature+selection+for+multi-label+classification+with+Naive+Bayes&rft.au=Ruta%2C+Dymitr&rft.date=2014-09-01&rft.pub=Polish+Information+Processing+Society&rft.spage=375&rft.epage=380&rft_id=info:doi/10.15439%2F2014F502&rft.externalDocID=6933040 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2300-5963&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2300-5963&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2300-5963&client=summon |