Named Entity Recognition System for Postpositional Languages: Urdu as a Case Study
Named Entity Recognition and Classification is the process of identifying named entities and classifying them into one of the classes like person name, organization name, location name, etc. In this paper, we propose a tagging scheme Begin Inside Last -2 (BIL2) for the Subject Object Verb (SOV) lang...
Saved in:
Published in | International journal of advanced computer science & applications Vol. 7; no. 10 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
West Yorkshire
Science and Information (SAI) Organization Limited
01.01.2016
|
Subjects | |
Online Access | Get full text |
ISSN | 2158-107X 2156-5570 2156-5570 |
DOI | 10.14569/IJACSA.2016.071019 |
Cover
Summary: | Named Entity Recognition and Classification is the process of identifying named entities and classifying them into one of the classes like person name, organization name, location name, etc. In this paper, we propose a tagging scheme Begin Inside Last -2 (BIL2) for the Subject Object Verb (SOV) languages that contain postposition. We use the Urdu language as a case study. We compare the F-measure values obtained for the tagging schemes IO, BIO2, BILOU and BIL2 using Hidden Markov Model (HMM) and Conditional Random Field (CRF). The BIL2 tagging scheme results are better than the other three tagging schemes using the same parameters including bigram and context window. With HMM, the F-measure values for IO, BIO2, BILOU, and BIL2 are 44.87%, 44.88%, 45.14%, and 45.88%, respectively. With CRF, the F-measure values for IO, BIO2, BILOU, and BIL2 are 35.13%, 35.90%, 37.85%, and 38.39%, respectively. The F-measure values for BIL2 are better than those of previously reported techniques |
---|---|
Bibliography: | ObjectType-Case Study-2 SourceType-Scholarly Journals-1 content type line 14 ObjectType-Report-1 |
ISSN: | 2158-107X 2156-5570 2156-5570 |
DOI: | 10.14569/IJACSA.2016.071019 |