基于词内部结合度和边界自由度的新词发现

新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 32; no. 8; pp. 2302 - 2304
Main Author 李文坤 张仰森 陈若愚
Format Journal Article
LanguageChinese
Published 北京信息科技大学智能信息处理研究所,北京,100192 2015
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2015.08.015

Cover

More Information
Summary:新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。
Bibliography:51-1196/TP
New word detection, as a basic research in natural language processing, has gain extensive concern from academic and business communities. This paper transformed the new word detection problem into word boundary determine problem. First, it segmented the corpus and counted up the statistical information of "the scattered words" in the corpus. Then, it proposed a new word detection method based on inner combination degree and boundary freedom degree of words. Experimental results on large-scale corpus verify the effectiveness of this method. Future research will focus on how to effectively identify low-frequency words and improve the overall performance of the system.
new word detection; inner combination degree; boundary freedom degree
Li Wenkun, Zhang Yangsen, Chen Ruoyu (Institute of Intelligence Information Processing, Beijing Information Science & Technology University, Beijing 100192, China)
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2015.08.015