Naïve Bayes classifier for Kashmiri word sense disambiguation
Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem ve...
Saved in:
| Published in | Sadhana (Bangalore) Vol. 49; no. 3; p. 226 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New Delhi
Springer India
29.07.2024
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0973-7677 0256-2499 0973-7677 |
| DOI | 10.1007/s12046-024-02551-7 |
Cover
| Summary: | Many applications of Natural Language Processing (NLP) like machine translation, document clustering, and information retrieval make use of Word Sense Disambiguation (WSD). WSD automatically predicts the sense of an ambiguous word that exactly fits it as per the given situation. While it may seem very easy for humans to interpret the meaning of natural language, machines require the processing of huge amounts of data for similar tasks. In this paper, we propose an automatic WSD system for the Kashmiri language based on the Naive Bayes classifier. This work is the first attempt towards developing a WSD system for the Kashmiri language to the best of our knowledge. Bag-of-Words (BoW) and Part-of-Speech (PoS) based features are used in this study for developing the WSD system. Experiments are carried out on a manually crafted sense-tagged dataset for 60 ambiguous Kashmiri words. These 60 words are selected based on the frequency in the raw corpus collected. Senses for annotation purposes of these ambiguous words are extracted from Kashmiri WordNet. The performance of the proposed system is measured using accuracy, precision, recall and F-1 measure metrics. The proposed WSD model reported the best performance (accuracy = 89.92, precision = 0.84, recall = 0.89, F-1 measure = 0.86) when both PoS and BoW features were used at the same time. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0973-7677 0256-2499 0973-7677 |
| DOI: | 10.1007/s12046-024-02551-7 |