基于词内部结合度和边界自由度的新词发现
新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。...
Saved in:
| Published in | 计算机应用研究 Vol. 32; no. 8; pp. 2302 - 2304 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | Chinese |
| Published |
北京信息科技大学智能信息处理研究所,北京,100192
2015
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1001-3695 |
| DOI | 10.3969/j.issn.1001-3695.2015.08.015 |
Cover
| Summary: | 新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。 |
|---|---|
| Bibliography: | 51-1196/TP New word detection, as a basic research in natural language processing, has gain extensive concern from academic and business communities. This paper transformed the new word detection problem into word boundary determine problem. First, it segmented the corpus and counted up the statistical information of "the scattered words" in the corpus. Then, it proposed a new word detection method based on inner combination degree and boundary freedom degree of words. Experimental results on large-scale corpus verify the effectiveness of this method. Future research will focus on how to effectively identify low-frequency words and improve the overall performance of the system. new word detection; inner combination degree; boundary freedom degree Li Wenkun, Zhang Yangsen, Chen Ruoyu (Institute of Intelligence Information Processing, Beijing Information Science & Technology University, Beijing 100192, China) |
| ISSN: | 1001-3695 |
| DOI: | 10.3969/j.issn.1001-3695.2015.08.015 |