Robust method of sparse feature selection for multi-label classification with Naive Bayes

The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature...

Full description

Saved in:
Bibliographic Details
Published in2014 Federated Conference on Computer Science and Information Systems Vol. 2; pp. 375 - 380
Main Author Ruta, Dymitr
Format Conference Proceeding Journal Article
LanguageEnglish
Published Polish Information Processing Society 01.09.2014
Subjects
Online AccessGet full text
ISSN2300-5963
2300-5963
DOI10.15439/2014F502

Cover

Abstract The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%.
AbstractList The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features from text often leads to many thousands of sparse features rarely taking non-zero values. In this work we propose a very fast and robust feature selection method that is optimised with the Naive Bayes classifier. The method takes advantage of the sparse feature representation and uses diversified backward-forward greedy search to arrive with the highly competitive solution at the minimum processing time. It promotes the paradigm of shifting the complexity of predictive systems away from the model algorithm, but towards careful data preprocessing and filtering that allows to accomplish predictive big data tasks on a single processor despite billions of data examples nominally exposed for processing. This method was applied to the AAIA Data Mining Competition 2014 concerned with predicting human injuries as a result of fire incidents based on nearly 12000 risk factors extracted from thousands of fire incident reports and scored the second place with the predictive accuracy of 96%.
Author Ruta, Dymitr
Author_xml – sequence: 1
  givenname: Dymitr
  surname: Ruta
  fullname: Ruta, Dymitr
  email: dymitr.ruta@kustar.ac.ae
  organization: British Telecom Innovation Centre, Khalifa Univ., Abu Dhabi, United Arab Emirates
BookMark eNpNkMFuEzEQQA0qEm3JgTMXn5G2jD221z5CRaFS1UpVOXBazTpj6sqJo_WGKn9PSFDFaUYzT-_wzsTJuq5ZiPcKLpQ1GD5pUObKgn4lFqH3Hh14Bdbja3GqEaCzweHJf_tbsWjtCQC0MqCNOxU_7-u4bbNc8fxYl7Im2TY0NZaJad5OLBsXjnOua5nqJFfbMueu0MhFxkKt5ZQjHd7PeX6Ut5R_s_xCO27vxJtEpfHi3zwXP66-Plx-727uvl1ffr7plrp3c-cxBoPe2JFtRLTkHfQpcVQOrYp974MLvQrKRmv6hITaamRrnY0EPuG5uD56l5Wehs2UVzTthkp5OBzq9Gugac6x8LBMwCEacqSDYTTeKeJeh30ON_oR9q6PR9d2vaHdM5XyIlQwHJoPf5unffM9_OEIZ2Z-4VxABAP4BwooeiY
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IL
CBEJK
RIE
RIL
ADTOC
UNPAY
DOA
DOI 10.15439/2014F502
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Xplore Digital Library (LUT)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9788360810583
8360810583
EISSN 2300-5963
EndPage 380
ExternalDocumentID oai_doaj_org_article_df0e9c4a6a294e34861ae7290216b8b0
10.15439/2014f502
6933040
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
6IF
6IN
AAJGR
AAWTH
ABLEC
ADBBV
ADTOC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CHZPO
GROUPED_DOAJ
IEGSK
IPNFZ
M~E
OCL
OK1
RIG
UNPAY
Y2W
ID FETCH-LOGICAL-d276t-83c943845be5c335a8607ffec16351c77896971915c547f3a32523e5565ca08f3
IEDL.DBID RIE
ISSN 2300-5963
IngestDate Fri Oct 03 12:51:25 EDT 2025
Mon Sep 15 08:24:20 EDT 2025
Thu Jun 29 18:36:57 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-d276t-83c943845be5c335a8607ffec16351c77896971915c547f3a32523e5565ca08f3
OpenAccessLink https://doaj.org/article/df0e9c4a6a294e34861ae7290216b8b0
PageCount 6
ParticipantIDs doaj_primary_oai_doaj_org_article_df0e9c4a6a294e34861ae7290216b8b0
ieee_primary_6933040
unpaywall_primary_10_15439_2014f502
PublicationCentury 2000
PublicationDate 2014-Sept.
PublicationDateYYYYMMDD 2014-09-01
PublicationDate_xml – month: 09
  year: 2014
  text: 2014-Sept.
PublicationDecade 2010
PublicationTitle 2014 Federated Conference on Computer Science and Information Systems
PublicationTitleAbbrev FedCSIS
PublicationYear 2014
Publisher Polish Information Processing Society
Publisher_xml – name: Polish Information Processing Society
SSID ssj0002140246
Score 1.9695225
Snippet The explosive growth of big data poses a processing challenge for predictive systems in terms of both data size and its dimensionality. Generating features...
SourceID doaj
unpaywall
ieee
SourceType Open Website
Open Access Repository
Publisher
StartPage 375
SubjectTerms Big data
Data mining
Data models
Feature extraction
Measurement
Predictive models
Robustness
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQF2DgqyDKlyzBGpHEH7FHiqgqBB0olcoU2cl5qtqqTYT67zk7oerGwupIlvUu9r1nne8R8pAwyAxAHNlCZxEvSxkpJm0ETlgnhEHFEap8R3I44a9TMd2x-vI1YU174Aa4x9LFoAtupEk1B8aVTAwgI8TcJK2yQa3HSu-IKX8Gp6gbUi7bVkICsy6q_IQPhL8-Cc35WzeVQ7Jfz5dm821ms53EMjghRy0jpE_NSk7JHszPyPGv2wJtN1-XfH0sbL2u6HvwfKYLR8dLVKVAPYurV0DHwdIGcaZIRGl4WRu9GQszGpwvfU1QCAP1d690ZPCco32zgfU5mQxePp-HUWuMEJVpJitEs9CcKS4siIIxYZSMM1_-geRKJEWWKS11hkpMFIJnjhmWot4EgeStMLFy7IJ05os5XBKaxqwsEocIg-PMP9Q1mjlMa1KKVBjRI32PVr5sel_kvht1GMAY5W2M8r9i1CNdj_V2EhluUXD4fov99psXJT5euY-Xw3hd_ccKrsmBn68pDrshnWpVwy2yicrehR_nB9pExKE
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB3R5dD2QilUpV-yRK8GJ_Y4ybFUIITKCkFXhVNkO7aEWO2uSKKK_vqOvdml4lRukeNYVsYf79kz8wC-ZtIXxnvBrasKrppG81Jqy31AGxANMY7k5TvWpxN1do3XG6BXsTAmpQ3mrr1t0z3-4zLeEk_P1CFBs7v2EAXxwSa8gE2NhMFHsDkZX3y7SUpyQnAqk0MaIaQdN30ZMB6dpMT8g5LKa3jZzxbm4beZTv_ZVE624NeqO0tfkruDvrMH7s-TTI3P7-8b2H0M5GMX67rbsOFnb2FrpeTAhom9AzeXc9u3HTtPetJsHtjVghivZxEh9veeXSW5HLIhI5DLUtQu_2Gsn7Kkqhn9jZKJWTzXZWNDayg7Mg--3YXJyfHP76d8EF3gTV7ojizlKiVLhdajkxJNqUURXUsIuGHmiqKsdFUQy0OHqgjSyJy4rEcChs6IMsh3MJrNZ_49sFzIxmWhcsoHJWMQsKlkoC1Ta8zR4B4cRWvUi2VejTpmuk4F9CPrYeLUTRCeWjDa5JXyUpU6M54YAWETbUsr9mAn2nLdiE4nNFS8v7bt-l0kPHE81Kvx8OG_an2EV_Fx6Vn2CUbdfe8_ExTp7Jdh3P0Fj7fc7Q
  priority: 102
  providerName: Unpaywall
Title Robust method of sparse feature selection for multi-label classification with Naive Bayes
URI https://ieeexplore.ieee.org/document/6933040
https://annals-csis.org/proceedings/2014/pliks/502.pdf
https://doaj.org/article/df0e9c4a6a294e34861ae7290216b8b0
UnpaywallVersion publishedVersion
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwEB0BPZReSgWILWVlqT2SbRJ_JceCukJIXaGKleAU2c74wmp3xSZC8OsZOyFFiENvUT6caMbJvOe8mQH4kXHUBjFNrCt1IupaJQVXNkEvrZfSEOOIKt-ZupiLyxt5swWnQy4MIkbxGU7CZvyXX69cG5bKfqrIvomgb-tCdblafbEgSXGVeHwmpjIskMTy-32_lE_wsV2uzeODWSxehY7pZ_jzctNOMXI3aRs7cU9v6jH-71PtwcG_JD12NYSfL7CFy324_buy7aZhXWNotvKMvhj3G2QeYwVPtol9b8gZjNAqi3LChCYCLpgLODoIh6KvWFigZTNDH0N2Zh5xcwDz6e_r84uk756Q1LlWDZnclYIXQlqUjnNpCpXqoBEhBCYzp3VRqlITXZNOCu254TmRUpSE8JxJC88PYWe5WuIRsDzltct86QR6wUM2rym5p9inlMylkSM4Cwav1l2BjCqUrI47yE5V_wZUtU-RRjDK5KVALgqVGSRoTyBD2cKmI9gPth0G6c06gu-D-4ZjgbkEl1fB5Z5c_vX9i49hN5zRacK-wU5z3-IJgYjGjiP5Hsc5NIYP89nVr9tn8ajKlQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwEB0heoBeKIKq20JrqT02SxJ7nOQKYrW0sKoqkOAU2c740tUuYhMh-us7dtJQVT30FuXDiWaczHvOmxmAT5mkwhCliXVVkaim0UkptU3Io_WIhhlHVPku9PxGfbnF2y34PObCEFEUn9E0bMZ_-c3adWGp7ERH9s0E_QUqpbDP1hrKBSFHVmbymZphWCKJBfiHjikvYadb3ZunR7Nc_hE8Zntw9fu2vWbkx7Rr7dT9_Ksi4_8-1ys4fE7TE9_GALQPW7Q6gLvva9ttWtG3hhZrL_ib8bAh4SnW8BSb2PmG3SEYr4ooKEx4KtBSuICkg3QoekuEJVqxMPw5FKfmiTaHcDM7vz6bJ0P_hKTJC92y0V2lZKnQEjop0ZQ6LYJKhDEYZq4oykpXBRM2dKgKL43MmZYSMsZzJi29fA3bq_WK3oDIU9m4zFdOkVcy5POaSnqOflpjjgYncBoMXt_3JTLqULQ67mA71cM7UDc-JR7BaJNXiqQqdWaIwT3DDG1Lm07gINh2HGQw6wQ-ju4bjwXuElxeB5d7dvnbf1_8AXbm11eX9eXF4us72A1n9wqxI9huHzo6ZkjR2vdxJv0CknbLPQ
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB3R5dD2QilUpV-yRK8GJ_Y4ybFUIITKCkFXhVNkO7aEWO2uSKKK_vqOvdml4lRukeNYVsYf79kz8wC-ZtIXxnvBrasKrppG81Jqy31AGxANMY7k5TvWpxN1do3XG6BXsTAmpQ3mrr1t0z3-4zLeEk_P1CFBs7v2EAXxwSa8gE2NhMFHsDkZX3y7SUpyQnAqk0MaIaQdN30ZMB6dpMT8g5LKa3jZzxbm4beZTv_ZVE624NeqO0tfkruDvrMH7s-TTI3P7-8b2H0M5GMX67rbsOFnb2FrpeTAhom9AzeXc9u3HTtPetJsHtjVghivZxEh9veeXSW5HLIhI5DLUtQu_2Gsn7Kkqhn9jZKJWTzXZWNDayg7Mg--3YXJyfHP76d8EF3gTV7ojizlKiVLhdajkxJNqUURXUsIuGHmiqKsdFUQy0OHqgjSyJy4rEcChs6IMsh3MJrNZ_49sFzIxmWhcsoHJWMQsKlkoC1Ta8zR4B4cRWvUi2VejTpmuk4F9CPrYeLUTRCeWjDa5JXyUpU6M54YAWETbUsr9mAn2nLdiE4nNFS8v7bt-l0kPHE81Kvx8OG_an2EV_Fx6Vn2CUbdfe8_ExTp7Jdh3P0Fj7fc7Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+Federated+Conference+on+Computer+Science+and+Information+Systems&rft.atitle=Robust+method+of+sparse+feature+selection+for+multi-label+classification+with+Naive+Bayes&rft.au=Ruta%2C+Dymitr&rft.date=2014-09-01&rft.pub=Polish+Information+Processing+Society&rft.spage=375&rft.epage=380&rft_id=info:doi/10.15439%2F2014F502&rft.externalDocID=6933040
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2300-5963&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2300-5963&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2300-5963&client=summon