Solution Methods for Classification Problems with Categorical Attributes
The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor attributes), i.e., describe the membership of an object in a category. Some methods are a simple generalization of classical methods (Bayesian algo...
Saved in:
| Published in | Computational mathematics and modeling Vol. 26; no. 3; pp. 408 - 428 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.07.2015
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1046-283X 1573-837X |
| DOI | 10.1007/s10598-015-9281-2 |
Cover
| Abstract | The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor attributes), i.e., describe the membership of an object in a category. Some methods are a simple generalization of classical methods (Bayesian algorithms, singular decomposition methods), others are fundamentally novel. An efficient technique is proposed for encoding categorical attributes by real numbers, which makes it possible to apply classical machine-learning methods (e.g., the random forest). A generalization of the
k
nearest neighbors (kNN) algorithm and Zhuravlev’s estimate calculation algorithm (AEC) achieve best performance on real-life data. All methods have been tested on an applied problem involving construction of a recommender system for a security service. |
|---|---|
| AbstractList | The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor attributes), i.e., describe the membership of an object in a category. Some methods are a simple generalization of classical methods (Bayesian algorithms, singular decomposition methods), others are fundamentally novel. An efficient technique is proposed for encoding categorical attributes by real numbers, which makes it possible to apply classical machine-learning methods (e.g., the random forest). A generalization of the
k
nearest neighbors (kNN) algorithm and Zhuravlev’s estimate calculation algorithm (AEC) achieve best performance on real-life data. All methods have been tested on an applied problem involving construction of a recommender system for a security service. |
| Author | D’yakonov, A. G. |
| Author_xml | – sequence: 1 givenname: A. G. surname: D’yakonov fullname: D’yakonov, A. G. email: djakonov@mail.ru organization: Faculty of Computational Mathematics and Cybernetics, Moscow State University |
| BookMark | eNp9kMFKAzEQhoNUsK0-gLd9gWgm2d0kx7KoLVQUVOgtJNmk3bLdSJIivr1b69nTDPz_NwzfDE2GMDiEboHcASH8PgGppMAEKiypAEwv0BQqzrBgfDMZd1LWmAq2uUKzlPaEEEEZmaLlW-iPuQtD8ezyLrSp8CEWTa9T6nxn9W_0GoPp3SEVX13eFY3ObhviGPbFIufYmWN26Rpdet0nd_M35-jj8eG9WeL1y9OqWayxBSEoZrwEY2zra8lERWrrgGvbCiO1EJJ7VnsrvRUcSme1qUzJHQHLNXBZtVqwOYLzXRtDStF59Rm7g47fCog6qVBnFWpUoU4qFB0ZembS2B22Lqp9OMZhfPMf6AePvmSu |
| Cites_doi | 10.1007/978-3-642-32115-3_51 10.1145/1390156.1390208 10.1023/A:1010933404324 10.1109/MC.2009.263 10.1137/07070111X 10.4169/amer.math.monthly.119.10.838 10.1145/2168752.2168771 10.1016/j.patrec.2005.10.010 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media New York 2015 |
| Copyright_xml | – notice: Springer Science+Business Media New York 2015 |
| DBID | AAYXX CITATION |
| DOI | 10.1007/s10598-015-9281-2 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Computer Science |
| EISSN | 1573-837X |
| EndPage | 428 |
| ExternalDocumentID | 10_1007_s10598_015_9281_2 |
| GroupedDBID | -52 -5D -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 1N0 1SB 2.D 28- 29F 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 642 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AZFZN B-. BA0 BAPOH BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG6 HMJXF HQYDN HRMNR HVGLF HZ~ IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9R PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 XU3 YLTOR Z83 Z8W ZMTXR ZWQNP ~EX AAPKM AAYXX ABDBE ABFSG ABRTQ ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR CITATION |
| ID | FETCH-LOGICAL-c1882-3741bbcdf6938506ce17acd8b9a8897f36fc9fc8714ecab5b47e01c7a1795da83 |
| IEDL.DBID | U2A |
| ISSN | 1046-283X |
| IngestDate | Wed Oct 01 03:37:00 EDT 2025 Fri Feb 21 02:36:28 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | categorical attribute factor attribute singular decomposition nominal attribute classification category factor machine learning encoding |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1882-3741bbcdf6938506ce17acd8b9a8897f36fc9fc8714ecab5b47e01c7a1795da83 |
| PageCount | 21 |
| ParticipantIDs | crossref_primary_10_1007_s10598_015_9281_2 springer_journals_10_1007_s10598_015_9281_2 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 20150700 |
| PublicationDateYYYYMMDD | 2015-07-01 |
| PublicationDate_xml | – month: 7 year: 2015 text: 20150700 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | Computational mathematics and modeling |
| PublicationTitleAbbrev | Comput Math Model |
| PublicationYear | 2015 |
| Publisher | Springer US |
| Publisher_xml | – name: Springer US |
| References | Amazon.com — Employee Access Challenge, international competition on data analysis [http://www.kaggle.com/c/amazon-employee-access-challenge]. FawcettTAn introduction to ROC analysisPattern Recognition Letters20062786186810.1016/j.patrec.2005.10.010 A. D’yakonov, “A blending of simple algorithms for topical classification,” in: Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, 7413/2012, 432–438 (2012) [http://www.springerlink.com/content/73g4kl50m6112420/]. GolubGHVan LoanCFMatrix Computations19963BaltimoreThe Johns Hopkins University Press0865.65009 Yu. I. Zhuravlev, “An algebraic approach to recognition or classification problems,” Probl. Kibernet., No. 33, 5–68 (1978). MartinCDPorterMA“The extraordinary SVD”American Mathematical Monthly20121191083885110.4169/amer.math.monthly.119.10.8381261.150122999587 WikiMart Olympiad – data analysis competition http://olymp.wikimart.ru. The R Project for Statistical Computing [http://cran.r-project.org]. D’yakonovAGA theory of systems of equivalences for the descriptions of algebraic closures of the generalized estimate calculation modelZh. Vychisl. Matem. i Mat. Fiz.20105023884001224.681152681163 KorenYBellRMVolinskyC“Matrix factorization techniques for recommender systems”IEEE Computer2009428303710.1109/MC.2009.263 C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, “A dual coordinate descent method for large-scale linear SVM,” ICML (2008). G. Strang, Linear Algebra and Its Applications, fourth edition, Thomson Brooks/Cole (2005). ManningKDRaghavanPSchutzeHAn Introduction to Information Retrieval [Russian translation]2011MoscowI. D. Vil’yams Publ RendleS“Factorization machines with libFM,”ACM Trans. Intell. Syst. Technol.20123357:157:2210.1145/2168752.2168771 K. V. Vorontsov, Machine Learning [in Russian] [http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf]. Library scikit-learn for Python [https://github.com/scikit-learn/scikit-learn]. KoldaTGBaderBW“Tensor decompositions and applications”SIAM Review200951345550010.1137/07070111X1173.650292535056 A Library for Large Linear Classification [http://www.csie.ntu.edu.tw/~cjlin/liblinear/]. BreimanL“Random forests”Machine Learning200145153210.1023/A:10109334043241007.68152 A. G. D’yakonov, “Two recommendation algorithms based on deformed linear combinations,” Proc. of ECML-PKDD 2011 Discovery Challenge Workshop (2011), pp 21–28. A. G. D’yakonov, “Predicting supermarket customer behavior by weighted schemes that estimate probabilities and densities,” Biznes-informatika (2014) (in press). S. Funk, “Netflix update: Try this at home,” [http://sifter.org/~simon/journal/20061211.html]. T Fawcett (9281_CR6) 2006; 27 Y Koren (9281_CR3) 2009; 42 S Rendle (9281_CR22) 2012; 3 9281_CR19 L Breiman (9281_CR20) 2001; 45 9281_CR16 9281_CR17 9281_CR18 9281_CR4 9281_CR5 9281_CR12 9281_CR13 9281_CR14 9281_CR8 KD Manning (9281_CR15) 2011 AG D’yakonov (9281_CR7) 2010; 50 CD Martin (9281_CR9) 2012; 119 9281_CR21 GH Golub (9281_CR10) 1996 TG Kolda (9281_CR11) 2009; 51 9281_CR1 9281_CR2 |
| References_xml | – reference: A. D’yakonov, “A blending of simple algorithms for topical classification,” in: Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, 7413/2012, 432–438 (2012) [http://www.springerlink.com/content/73g4kl50m6112420/]. – reference: A Library for Large Linear Classification [http://www.csie.ntu.edu.tw/~cjlin/liblinear/]. – reference: KorenYBellRMVolinskyC“Matrix factorization techniques for recommender systems”IEEE Computer2009428303710.1109/MC.2009.263 – reference: Amazon.com — Employee Access Challenge, international competition on data analysis [http://www.kaggle.com/c/amazon-employee-access-challenge]. – reference: G. Strang, Linear Algebra and Its Applications, fourth edition, Thomson Brooks/Cole (2005). – reference: C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan, “A dual coordinate descent method for large-scale linear SVM,” ICML (2008). – reference: RendleS“Factorization machines with libFM,”ACM Trans. Intell. Syst. Technol.20123357:157:2210.1145/2168752.2168771 – reference: The R Project for Statistical Computing [http://cran.r-project.org]. – reference: KoldaTGBaderBW“Tensor decompositions and applications”SIAM Review200951345550010.1137/07070111X1173.650292535056 – reference: MartinCDPorterMA“The extraordinary SVD”American Mathematical Monthly20121191083885110.4169/amer.math.monthly.119.10.8381261.150122999587 – reference: ManningKDRaghavanPSchutzeHAn Introduction to Information Retrieval [Russian translation]2011MoscowI. D. Vil’yams Publ – reference: FawcettTAn introduction to ROC analysisPattern Recognition Letters20062786186810.1016/j.patrec.2005.10.010 – reference: D’yakonovAGA theory of systems of equivalences for the descriptions of algebraic closures of the generalized estimate calculation modelZh. Vychisl. Matem. i Mat. Fiz.20105023884001224.681152681163 – reference: Yu. I. Zhuravlev, “An algebraic approach to recognition or classification problems,” Probl. Kibernet., No. 33, 5–68 (1978). – reference: K. V. Vorontsov, Machine Learning [in Russian] [http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf]. – reference: BreimanL“Random forests”Machine Learning200145153210.1023/A:10109334043241007.68152 – reference: WikiMart Olympiad – data analysis competition http://olymp.wikimart.ru. – reference: GolubGHVan LoanCFMatrix Computations19963BaltimoreThe Johns Hopkins University Press0865.65009 – reference: Library scikit-learn for Python [https://github.com/scikit-learn/scikit-learn]. – reference: A. G. D’yakonov, “Two recommendation algorithms based on deformed linear combinations,” Proc. of ECML-PKDD 2011 Discovery Challenge Workshop (2011), pp 21–28. – reference: S. Funk, “Netflix update: Try this at home,” [http://sifter.org/~simon/journal/20061211.html]. – reference: A. G. D’yakonov, “Predicting supermarket customer behavior by weighted schemes that estimate probabilities and densities,” Biznes-informatika (2014) (in press). – ident: 9281_CR14 doi: 10.1007/978-3-642-32115-3_51 – ident: 9281_CR18 – ident: 9281_CR17 – ident: 9281_CR19 – ident: 9281_CR13 doi: 10.1145/1390156.1390208 – ident: 9281_CR16 – ident: 9281_CR1 – ident: 9281_CR21 – ident: 9281_CR2 – ident: 9281_CR12 – volume: 45 start-page: 5 issue: 1 year: 2001 ident: 9281_CR20 publication-title: Machine Learning doi: 10.1023/A:1010933404324 – volume: 42 start-page: 30 issue: 8 year: 2009 ident: 9281_CR3 publication-title: IEEE Computer doi: 10.1109/MC.2009.263 – volume: 51 start-page: 455 issue: 3 year: 2009 ident: 9281_CR11 publication-title: SIAM Review doi: 10.1137/07070111X – volume-title: An Introduction to Information Retrieval [Russian translation] year: 2011 ident: 9281_CR15 – volume-title: Matrix Computations year: 1996 ident: 9281_CR10 – ident: 9281_CR8 – ident: 9281_CR4 – ident: 9281_CR5 – volume: 50 start-page: 388 issue: 2 year: 2010 ident: 9281_CR7 publication-title: Zh. Vychisl. Matem. i Mat. Fiz. – volume: 119 start-page: 838 issue: 10 year: 2012 ident: 9281_CR9 publication-title: American Mathematical Monthly doi: 10.4169/amer.math.monthly.119.10.838 – volume: 3 start-page: 57:1 issue: 3 year: 2012 ident: 9281_CR22 publication-title: ACM Trans. Intell. Syst. Technol. doi: 10.1145/2168752.2168771 – volume: 27 start-page: 861 year: 2006 ident: 9281_CR6 publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2005.10.010 |
| SSID | ssj0008230 |
| Score | 1.9638661 |
| Snippet | The article considers various methods for classification of a set of objects into two classes when all the attributes are categorical (nominal or factor... |
| SourceID | crossref springer |
| SourceType | Index Database Publisher |
| StartPage | 408 |
| SubjectTerms | Applications of Mathematics Computational Mathematics and Numerical Analysis Mathematical Modeling and Industrial Mathematics Mathematics Mathematics and Statistics Optimization |
| Title | Solution Methods for Classification Problems with Categorical Attributes |
| URI | https://link.springer.com/article/10.1007/s10598-015-9281-2 |
| Volume | 26 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1573-837X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008230 issn: 1046-283X databaseCode: AFBBN dateStart: 19900101 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1573-837X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0008230 issn: 1046-283X databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1573-837X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0008230 issn: 1046-283X databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED6hdoGBRwFRHpUHJpClPOw4HqOqpQIVMVCpTJHt2GNBpP3_nJO4UAkGdsvDdz7fZ9_ddwC3yPFN5GREuVIMHyhxRWWUWcoczyMtmI0i3-A8f85mC_a45Muuj7sO1e4hJdnc1D-a3bhvB4s5lUkeU7x3-9yreeEhXiTF9vr1maNWgiCjGDuXIZX52xa7wWg3E9oEmOkxHHbMkBStKU9gz64GcBSmLpDOCQdwMN8qrdanMAv_WmTezIKuCbJQ0oy69EVADe7kpZ0aUxP_60rGXhyilQYhxbqdeGXrM1hMJ6_jGe2mI1ATe1qcIhfQ2lQuk6mXnTM2FspUuZYqz6VwaeaMdAYfRMwapblmwqJhhEIX5JXK03Pord5X9gJIplKHTACZhdAMnVQmVuBGNnZRhUBlQ7gLMJUfrQhG-S137DEtEdPSY1omQ7gPQJadP9R_r7781-or2E-89Zpy2WvorT839gZJwVqPoF88vD1NRs1h-AIveq-e |
| linkProvider | Springer Nature |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07TwMxDLZQOwADhQKiPDMwgVLdI_fIWFUtB32IoZXKdEpyuQWpIO668Otx7lEogqF7FEVObH-O7c8At4jxlZVyi3pCMAxQ7IRyy9eUpV5oyYBpyzINzpOpH83Z08JbVH3cWV3tXqckC0v9o9nNM-1gtke5E9oU7W6TYXziNKDZe3gZDdYG2OSOShICn6L3XNTJzL822XRHm7nQwsUMWzCrD1dWlrx2V7nsqs9fvI1bnv4QDirISXrlGzmCHb1sQ6se50Aq7W7D_mRN4ZodQ1R_mJFJMWQ6IwhvSTFD01QXFRdKnstxNBkx37mkb1gnSs4R0svLUVo6O4H5cDDrR7Qau0CVbfC2iyBDSpWkPncNn53SdiBUEkouwpAHqeuniqcKIy2mlZCeZIHGGw8E6raXiNA9hcbybanPgPjCTRFiIGQJJEPt544OcCNtp1aCYvA7cFdLP34v2TXibx5lI7EYJRYbicVOB-5r0caVomX_rz7favUN7EazyTgeP05HF7DnmHsqanIvoZF_rPQVIo9cXlcv7QvamM3A |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED6hIiEYeBQQ5emBCWQ1DzuJx6oQlUerDlTqFtmOPYaKhP_POY9CJRjYLQ93Pt9n3933AdwixteeFR7lUjJ8oPg5FV5kKLM88VTMjOe5AefpLJos2POSL1ud07Lrdu9Kks1Mg2NpKqrhKrfDH4Nv3I2G-ZyKIPEp3sHbzPEk4IFeBKP1VeyqSA0dQUQxjy67suZvW2wmps2qaJ1s0kPYb1EiGTVuPYItU_ThoFNgIG1A9mFvumZdLY9h0v1xkWmtC10SRKSklr10DUG1D8i8UZApifuBJWNHFNHQhJBR1ahfmfIEFunj23hCW6UEqn0HkUPEBUrp3EYidBR02vix1HmihEwSEdswslpYjY8jZrRUXLHYoJNiieHIc5mEp9Ar3gtzBiSSoUVUgCgjVgwDVgQmxo2Mb70cDRUN4K4zU7ZqCDGyb-pjZ9MMbZo5m2bBAO47Q2ZtbJR_rz7_1-ob2Jk_pNnr0-zlAnYD58i6i_YSetXHp7lCrFCp6_o8fAGl57Uy |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Solution+Methods+for+Classification+Problems+with+Categorical+Attributes&rft.jtitle=Computational+mathematics+and+modeling&rft.au=D%E2%80%99yakonov%2C+A.+G.&rft.date=2015-07-01&rft.issn=1046-283X&rft.eissn=1573-837X&rft.volume=26&rft.issue=3&rft.spage=408&rft.epage=428&rft_id=info:doi/10.1007%2Fs10598-015-9281-2&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s10598_015_9281_2 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1046-283X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1046-283X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1046-283X&client=summon |