A Rule-Based Subject-Correlated Arabic Stemmer

Arabic is a derivational language that provides invaluable features. Arabic roots are basic forms that are used to formulate words. They are limited sets that encapsulate the word’s linguistic features. The knowledge of roots’ frequencies is a valuable additional feature, especially when it is bound...

Full description

Saved in:
Bibliographic Details
Published inArabian Journal for Science and Engineering Vol. 41; no. 8; pp. 2883 - 2891
Main Authors El-Defrawy, Mahmoud, El-Sonbaty, Yasser, Belal, Nahla A.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2016
Subjects
Online AccessGet full text
ISSN1319-8025
2191-4281
DOI10.1007/s13369-016-2029-2

Cover

More Information
Summary:Arabic is a derivational language that provides invaluable features. Arabic roots are basic forms that are used to formulate words. They are limited sets that encapsulate the word’s linguistic features. The knowledge of roots’ frequencies is a valuable additional feature, especially when it is bound to a specific topic. This paper utilizes collision resulting from the stemming process where two or more words may have the same root. It minimizes the number of extracted roots within a specific subject using roots’ frequencies and explores its effect on multiple roots disambiguation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1319-8025
2191-4281
DOI:10.1007/s13369-016-2029-2