Improving the Performance of Naïve Bayes Algorithm for Arabic Text Categorization
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-ID...
Saved in:
| Published in | International Journal of Advanced Studies in Computers, Science and Engineering Vol. 5; no. 11; p. 105 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Gothenburg
International Journal of Advanced Studies in Computers, Science and Engineering
01.11.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2278-7917 |
Cover
| Abstract | Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naïve Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences. |
|---|---|
| AbstractList | Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naive Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naive Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences. Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naïve Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences. |
| Author | Aqoulah, Nibras Jamal Abu Al Mashaykhi, Akram M O Riadh, May H |
| Author_xml | – sequence: 1 givenname: Akram surname: Al Mashaykhi middlename: M O fullname: Al Mashaykhi, Akram M O – sequence: 2 givenname: Nibras surname: Aqoulah middlename: Jamal Abu fullname: Aqoulah, Nibras Jamal Abu – sequence: 3 givenname: May surname: Riadh middlename: H fullname: Riadh, May H |
| BookMark | eNpd0M1Kw0AUBeBBFKy17zDgxk1gMv-zjMWfQlHR7sttetOmJDM1kxT1pXwIX8wpunJ1F-fjcLgX5NQHjydkxLmxmXG5OSeTGHeMMe60tNaOyMus3XfhUPsN7bdIn7GrQteCL5GGij7C99cB6Q18YKRFswld3W9bmggtOljVJV3ge0-n0OMx-4S-Dv6SnFXQRJz83TF5vbtdTB-y-dP9bFrMs73OXcYxZ8iURGOc46WWoIxZK1Y5ACOFcxXIvHRsxUtYay41cGGPTiglnRJjcv3bmua_DRj7ZVvHEpsGPIYhLnObep3ItUv06h_dhaHzaVtS6Q2MK6HFDxVDWf4 |
| ContentType | Journal Article |
| Copyright | Copyright International Journal of Advanced Studies in Computers, Science and Engineering 2016 |
| Copyright_xml | – notice: Copyright International Journal of Advanced Studies in Computers, Science and Engineering 2016 |
| DBID | 7SC 7SP 8FD 8FE 8FG ABJCF AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L6V L7M L~C L~D M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DatabaseName | Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection SciTech Premium Collection ProQuest Central UK/Ireland ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ProQuest Engineering Database (NC LIVE) ProQuest Advanced Technologies & Aerospace Database (NC LIVE) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition Electronics & Communications Abstracts ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Technology Research Database Computer Science Database |
| Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2278-7917 |
| EndPage | 105 |
| ExternalDocumentID | 4278246061 |
| Genre | Feature |
| GroupedDBID | 7SC 7SP 8FD 8FE 8FG ABJCF AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K6V K7- L6V L7M L~C L~D M7S P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS PUEGO |
| ID | FETCH-LOGICAL-p619-2e10e054e77992c64a577d50f9aa74399fa41c90b2cad6246a2382c643554953 |
| IEDL.DBID | BENPR |
| IngestDate | Thu Sep 04 19:14:17 EDT 2025 Fri Jul 25 11:56:05 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-p619-2e10e054e77992c64a577d50f9aa74399fa41c90b2cad6246a2382c643554953 |
| Notes | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| PQID | 1848802536 |
| PQPubID | 2028729 |
| PageCount | 1 |
| ParticipantIDs | proquest_miscellaneous_1879993169 proquest_journals_1848802536 |
| PublicationCentury | 2000 |
| PublicationDate | 20161101 |
| PublicationDateYYYYMMDD | 2016-11-01 |
| PublicationDate_xml | – month: 11 year: 2016 text: 20161101 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Gothenburg |
| PublicationPlace_xml | – name: Gothenburg |
| PublicationTitle | International Journal of Advanced Studies in Computers, Science and Engineering |
| PublicationYear | 2016 |
| Publisher | International Journal of Advanced Studies in Computers, Science and Engineering |
| Publisher_xml | – name: International Journal of Advanced Studies in Computers, Science and Engineering |
| SSID | ssj0002964888 |
| Score | 1.969923 |
| Snippet | Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| StartPage | 105 |
| SubjectTerms | Algorithms Bayesian analysis Classification Classifiers Mathematical models Similarity Texts Thresholds |
| Title | Improving the Performance of Naïve Bayes Algorithm for Arabic Text Categorization |
| URI | https://www.proquest.com/docview/1848802536 https://www.proquest.com/docview/1879993169 |
| Volume | 5 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: ProQuest Central databaseCode: BENPR dateStart: 20120101 customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 isFulltext: true eissn: 2278-7917 dateEnd: 20230131 titleUrlDefault: https://www.proquest.com/central omitProxy: true ssIdentifier: ssj0002964888 providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection databaseCode: 8FG dateStart: 20120101 customDbUrl: isFulltext: true eissn: 2278-7917 dateEnd: 99991231 titleUrlDefault: https://search.proquest.com/technologycollection1 omitProxy: true ssIdentifier: ssj0002964888 providerName: ProQuest |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB76uHgRRcVqLRG8Lu5ms48cRNrSWgSXUiv0VvJaFXS32lbwV_kj_GNOtrvVk-cMYUiGmW8ymfkALiLDeGwi4wSewgRFat-RLJYOxdDCmDapLCYx3SXh6IHdzoJZDZKqF8Z-q6x8YuGoda7sG_klZiJoajTww-vFm2NZo2x1taLQECW1gr4qRozVoUntZKwGNHuDZDzZvrrYImNcsqz8dbxFNBnuwW4JA0l3c2_7UDPZAUy2GT5BWEbGv3_6SZ6SRHx_fRjSE59mSbovj6jc6umVoAhuI-SzIlN0tKRvRz_g2qa98hDuh4Npf-SUnAfOAlMZhxrPNYiiTBRxTlXIRBBFOnBTLkSROqSCeYq7kiqhQ8pCgSHXylnYwAP_CBpZnpljIJpLRWOd8phKpjjCoFBz1xiJCEG4zLSgXR3BvDTb5fz3kFtwvl1Gg7NVBJGZfG1lUDXueyE_-X-LU9hBdBFuGvfa0Fi9r80ZRvCV7EA9Ht50ysv5AVNboVA |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NTgIxEG6IHPRiNGpEUWuix41L6f70QAwgBOQnBDHhJGm3s2qiLApoeCpfwJsv5nRZwJM3zp00u-3szDczO_MRcuEBFz54YDm5AAMUpfOW4r6yGLoWzjWEKp7E1Gq7tXt-23f6KfK96IUxv1UubGJsqHUUmBz5FUYiqGrMybvXozfLsEaZ6uqCQkMm1Aq6EI8YSxo7GjD7xBBuXKjf4H1fMlat9Mo1K2EZsEYYPFgMcjYgbgHPE4IFLpeO52nHDoWUMVgPJc8FwlYskNpl3JXo5IyccdTCcEagA0jzPBcY-qVLlXanu8zxmJKmn3C6_DXzse-q7pDtBHTS4lxLdkkKhnuku8wnUASBtLPqIKBRSNvy5-sDaEnOYEyLL494FJOnV4oiuI1UzwHtoVmnZTNoAtfmzZz75G4N735ANobREA4J1UIFzNeh8JnigUDQ5WphAyjEI9LmkCHZxREMko9kPFhdaYacL5dRvU3NQg4hmhoZfDSRz7ni6P8tzshmrddqDpr1duOYbCGucectg1myMXmfwglih4k6Ta6Ikof16sQvPUDZyA |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDI4mkBAXBALEYECQ4Fity9JHDgiNjbIxmCYY0k5MSesCErSDbaD9Kq7c-WM4fWycuO0cK2pi1_5sxzYhxw5w4YIDhlXx0UFRQdVQ3FUGQ9PCeQChSjox3XTs5j2_6lv9AvnOa2H0s8pcJyaKOoh9HSMvoyeCosasql0Os2cR3YZ3Nnwz9AQpnWnNx2mkItKG6Se6b6PTVgN5fcKYd9GrN41swoAxRMfBYFAxATELOI4QzLe5tBwnsMxQSJkA9VDyii9MxXwZ2IzbEg2cptNGWuh5Eaj8lx3dw13XqHuXs-iOTma62TSXvwo-sVreOlnL4CatpfKxQQoQbZLbWSSBIvyj3XntAI1D2pE_Xx9Az-UURrT28ogHHz-9UiTBbaR69mkPb4XWdYsJXEvLOLfI3QJOvk2WojiCHUIDoXzmBqFwmeK-QLhlB8IEUIhEpMmhSEr5FQyy32M0mDOzSI5myyjYOlshI4gnmgY_TVQrttj9f4tDsoKSMLhuddp7ZBUBjZ3WCpbI0vh9AvsIGsbqIOEPJQ-LFYdfV6HXYg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+the+Performance+of+Na%C3%AFve+Bayes+Algorithm+for+Arabic+Text+Categorization&rft.jtitle=International+Journal+of+Advanced+Studies+in+Computers%2C+Science+and+Engineering&rft.au=Al+Mashaykhi%2C+Akram+M+O&rft.au=Aqoulah%2C+Nibras+Jamal+Abu&rft.au=Riadh%2C+May+H&rft.date=2016-11-01&rft.pub=International+Journal+of+Advanced+Studies+in+Computers%2C+Science+and+Engineering&rft.eissn=2278-7917&rft.volume=5&rft.issue=11&rft.spage=105&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=4278246061 |