Improving the Performance of Naïve Bayes Algorithm for Arabic Text Categorization

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-ID...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Advanced Studies in Computers, Science and Engineering Vol. 5; no. 11; p. 105
Main Authors Al Mashaykhi, Akram M O, Aqoulah, Nibras Jamal Abu, Riadh, May H
Format Journal Article
LanguageEnglish
Published Gothenburg International Journal of Advanced Studies in Computers, Science and Engineering 01.11.2016
Subjects
Online AccessGet full text
ISSN2278-7917

Cover

Abstract Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naïve Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences.
AbstractList Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naive Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naive Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences.
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naïve Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences.
Author Aqoulah, Nibras Jamal Abu
Al Mashaykhi, Akram M O
Riadh, May H
Author_xml – sequence: 1
  givenname: Akram
  surname: Al Mashaykhi
  middlename: M O
  fullname: Al Mashaykhi, Akram M O
– sequence: 2
  givenname: Nibras
  surname: Aqoulah
  middlename: Jamal Abu
  fullname: Aqoulah, Nibras Jamal Abu
– sequence: 3
  givenname: May
  surname: Riadh
  middlename: H
  fullname: Riadh, May H
BookMark eNpd0M1Kw0AUBeBBFKy17zDgxk1gMv-zjMWfQlHR7sttetOmJDM1kxT1pXwIX8wpunJ1F-fjcLgX5NQHjydkxLmxmXG5OSeTGHeMMe60tNaOyMus3XfhUPsN7bdIn7GrQteCL5GGij7C99cB6Q18YKRFswld3W9bmggtOljVJV3ge0-n0OMx-4S-Dv6SnFXQRJz83TF5vbtdTB-y-dP9bFrMs73OXcYxZ8iURGOc46WWoIxZK1Y5ACOFcxXIvHRsxUtYay41cGGPTiglnRJjcv3bmua_DRj7ZVvHEpsGPIYhLnObep3ItUv06h_dhaHzaVtS6Q2MK6HFDxVDWf4
ContentType Journal Article
Copyright Copyright International Journal of Advanced Studies in Computers, Science and Engineering 2016
Copyright_xml – notice: Copyright International Journal of Advanced Studies in Computers, Science and Engineering 2016
DBID 7SC
7SP
8FD
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
L7M
L~C
L~D
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
SciTech Premium Collection
ProQuest Central UK/Ireland
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ProQuest Engineering Database (NC LIVE)
ProQuest Advanced Technologies & Aerospace Database (NC LIVE)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
Electronics & Communications Abstracts
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Technology Research Database
Computer Science Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2278-7917
EndPage 105
ExternalDocumentID 4278246061
Genre Feature
GroupedDBID 7SC
7SP
8FD
8FE
8FG
ABJCF
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K6V
K7-
L6V
L7M
L~C
L~D
M7S
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PUEGO
ID FETCH-LOGICAL-p619-2e10e054e77992c64a577d50f9aa74399fa41c90b2cad6246a2382c643554953
IEDL.DBID BENPR
IngestDate Thu Sep 04 19:14:17 EDT 2025
Fri Jul 25 11:56:05 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p619-2e10e054e77992c64a577d50f9aa74399fa41c90b2cad6246a2382c643554953
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
PQID 1848802536
PQPubID 2028729
PageCount 1
ParticipantIDs proquest_miscellaneous_1879993169
proquest_journals_1848802536
PublicationCentury 2000
PublicationDate 20161101
PublicationDateYYYYMMDD 2016-11-01
PublicationDate_xml – month: 11
  year: 2016
  text: 20161101
  day: 01
PublicationDecade 2010
PublicationPlace Gothenburg
PublicationPlace_xml – name: Gothenburg
PublicationTitle International Journal of Advanced Studies in Computers, Science and Engineering
PublicationYear 2016
Publisher International Journal of Advanced Studies in Computers, Science and Engineering
Publisher_xml – name: International Journal of Advanced Studies in Computers, Science and Engineering
SSID ssj0002964888
Score 1.969923
Snippet Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four...
SourceID proquest
SourceType Aggregation Database
StartPage 105
SubjectTerms Algorithms
Bayesian analysis
Classification
Classifiers
Mathematical models
Similarity
Texts
Thresholds
Title Improving the Performance of Naïve Bayes Algorithm for Arabic Text Categorization
URI https://www.proquest.com/docview/1848802536
https://www.proquest.com/docview/1879993169
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: ProQuest Central
  databaseCode: BENPR
  dateStart: 20120101
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  isFulltext: true
  eissn: 2278-7917
  dateEnd: 20230131
  titleUrlDefault: https://www.proquest.com/central
  omitProxy: true
  ssIdentifier: ssj0002964888
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  databaseCode: 8FG
  dateStart: 20120101
  customDbUrl:
  isFulltext: true
  eissn: 2278-7917
  dateEnd: 99991231
  titleUrlDefault: https://search.proquest.com/technologycollection1
  omitProxy: true
  ssIdentifier: ssj0002964888
  providerName: ProQuest
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB76uHgRRcVqLRG8Lu5ms48cRNrSWgSXUiv0VvJaFXS32lbwV_kj_GNOtrvVk-cMYUiGmW8ymfkALiLDeGwi4wSewgRFat-RLJYOxdDCmDapLCYx3SXh6IHdzoJZDZKqF8Z-q6x8YuGoda7sG_klZiJoajTww-vFm2NZo2x1taLQECW1gr4qRozVoUntZKwGNHuDZDzZvrrYImNcsqz8dbxFNBnuwW4JA0l3c2_7UDPZAUy2GT5BWEbGv3_6SZ6SRHx_fRjSE59mSbovj6jc6umVoAhuI-SzIlN0tKRvRz_g2qa98hDuh4Npf-SUnAfOAlMZhxrPNYiiTBRxTlXIRBBFOnBTLkSROqSCeYq7kiqhQ8pCgSHXylnYwAP_CBpZnpljIJpLRWOd8phKpjjCoFBz1xiJCEG4zLSgXR3BvDTb5fz3kFtwvl1Gg7NVBJGZfG1lUDXueyE_-X-LU9hBdBFuGvfa0Fi9r80ZRvCV7EA9Ht50ysv5AVNboVA
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NTgIxEG6IHPRiNGpEUWuix41L6f70QAwgBOQnBDHhJGm3s2qiLApoeCpfwJsv5nRZwJM3zp00u-3szDczO_MRcuEBFz54YDm5AAMUpfOW4r6yGLoWzjWEKp7E1Gq7tXt-23f6KfK96IUxv1UubGJsqHUUmBz5FUYiqGrMybvXozfLsEaZ6uqCQkMm1Aq6EI8YSxo7GjD7xBBuXKjf4H1fMlat9Mo1K2EZsEYYPFgMcjYgbgHPE4IFLpeO52nHDoWUMVgPJc8FwlYskNpl3JXo5IyccdTCcEagA0jzPBcY-qVLlXanu8zxmJKmn3C6_DXzse-q7pDtBHTS4lxLdkkKhnuku8wnUASBtLPqIKBRSNvy5-sDaEnOYEyLL494FJOnV4oiuI1UzwHtoVmnZTNoAtfmzZz75G4N735ANobREA4J1UIFzNeh8JnigUDQ5WphAyjEI9LmkCHZxREMko9kPFhdaYacL5dRvU3NQg4hmhoZfDSRz7ni6P8tzshmrddqDpr1duOYbCGucectg1myMXmfwglih4k6Ta6Ikof16sQvPUDZyA
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDI4mkBAXBALEYECQ4Fity9JHDgiNjbIxmCYY0k5MSesCErSDbaD9Kq7c-WM4fWycuO0cK2pi1_5sxzYhxw5w4YIDhlXx0UFRQdVQ3FUGQ9PCeQChSjox3XTs5j2_6lv9AvnOa2H0s8pcJyaKOoh9HSMvoyeCosasql0Os2cR3YZ3Nnwz9AQpnWnNx2mkItKG6Se6b6PTVgN5fcKYd9GrN41swoAxRMfBYFAxATELOI4QzLe5tBwnsMxQSJkA9VDyii9MxXwZ2IzbEg2cptNGWuh5Eaj8lx3dw13XqHuXs-iOTma62TSXvwo-sVreOlnL4CatpfKxQQoQbZLbWSSBIvyj3XntAI1D2pE_Xx9Az-UURrT28ogHHz-9UiTBbaR69mkPb4XWdYsJXEvLOLfI3QJOvk2WojiCHUIDoXzmBqFwmeK-QLhlB8IEUIhEpMmhSEr5FQyy32M0mDOzSI5myyjYOlshI4gnmgY_TVQrttj9f4tDsoKSMLhuddp7ZBUBjZ3WCpbI0vh9AvsIGsbqIOEPJQ-LFYdfV6HXYg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+the+Performance+of+Na%C3%AFve+Bayes+Algorithm+for+Arabic+Text+Categorization&rft.jtitle=International+Journal+of+Advanced+Studies+in+Computers%2C+Science+and+Engineering&rft.au=Al+Mashaykhi%2C+Akram+M+O&rft.au=Aqoulah%2C+Nibras+Jamal+Abu&rft.au=Riadh%2C+May+H&rft.date=2016-11-01&rft.pub=International+Journal+of+Advanced+Studies+in+Computers%2C+Science+and+Engineering&rft.eissn=2278-7917&rft.volume=5&rft.issue=11&rft.spage=105&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=4278246061