Improving the Performance of Naïve Bayes Algorithm for Arabic Text Categorization
Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-ID...
Saved in:
| Published in | International Journal of Advanced Studies in Computers, Science and Engineering Vol. 5; no. 11; p. 105 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Gothenburg
International Journal of Advanced Studies in Computers, Science and Engineering
01.11.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2278-7917 |
Cover
| Summary: | Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content, In this paper four techniques are implemented using Naïve Bayes classifier for Arabic text categorization, these techniques are: (TF only ,TF-IDF, Normalized TF-IDF, and N-Gram with N=2 statistical stemmer with threshold similarity 0.8). The four techniques are evaluated by two test set. The results shows that the Normalized TF-IDF and N-Gram with N=2 statistical stemmer with threshold similarity 0.8 technique has the best accuracy ,the Analysis of Naïve Bayes classifier algorithm showed at least two Advantages: first it Work well on numeric and textual data and second its easiness in implementation and computation comparing with other algorithms also the work highlighting at least three Disadvantages: first the Conditional independence assumption is violated by real-world data; second its perform very poorly when features are highly correlated and the last disadvantages it does not consider frequency of word occurrences. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 2278-7917 |