A Rule-Based Subject-Correlated Arabic Stemmer
Arabic is a derivational language that provides invaluable features. Arabic roots are basic forms that are used to formulate words. They are limited sets that encapsulate the word’s linguistic features. The knowledge of roots’ frequencies is a valuable additional feature, especially when it is bound...
Saved in:
| Published in | Arabian Journal for Science and Engineering Vol. 41; no. 8; pp. 2883 - 2891 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.08.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1319-8025 2191-4281 |
| DOI | 10.1007/s13369-016-2029-2 |
Cover
| Summary: | Arabic is a derivational language that provides invaluable features. Arabic roots are basic forms that are used to formulate words. They are limited sets that encapsulate the word’s linguistic features. The knowledge of roots’ frequencies is a valuable additional feature, especially when it is bound to a specific topic. This paper utilizes collision resulting from the stemming process where two or more words may have the same root. It minimizes the number of extracted roots within a specific subject using roots’ frequencies and explores its effect on multiple roots disambiguation. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1319-8025 2191-4281 |
| DOI: | 10.1007/s13369-016-2029-2 |