基于词内部结合度和边界自由度的新词发现

新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。...

Full description

Saved in:

Bibliographic Details
Published in	计算机应用研究 Vol. 32; no. 8; pp. 2302 - 2304
Main Author	李文坤张仰森陈若愚
Format	Journal Article
Language	Chinese
Published	北京信息科技大学智能信息处理研究所,北京,100192 2015
Subjects	内部结合度新词发现边界自由度 new word detection boundary freedom degree 内部结合度边界自由度新词发现 inner combination degree
Online Access	Get full text
ISSN	1001-3695
DOI	10.3969/j.issn.1001-3695.2015.08.015

Cover

More Information
Summary:	新词发现作为自然语言处理领域的一项基础研究,一直受到学术界和企业界的广泛关注。将新词发现问题转换为确定词语边界问题。首先对语料进行中文分词,然后统计“散串”,最后提出一种基于词内部结合度和边界自由度的新词发现方法。通过在大规模语料上进行新词发现实验,验证了该方法的有效性。今后的研究重点将放在如何有效地识别低频新词上,以提高系统的整体性能。
Bibliography:	51-1196/TP New word detection, as a basic research in natural language processing, has gain extensive concern from academic and business communities. This paper transformed the new word detection problem into word boundary determine problem. First, it segmented the corpus and counted up the statistical information of ＂the scattered words＂ in the corpus. Then, it proposed a new word detection method based on inner combination degree and boundary freedom degree of words. Experimental results on large-scale corpus verify the effectiveness of this method. Future research will focus on how to effectively identify low-frequency words and improve the overall performance of the system. new word detection; inner combination degree; boundary freedom degree Li Wenkun, Zhang Yangsen, Chen Ruoyu （Institute of Intelligence Information Processing, Beijing Information Science ＆ Technology University, Beijing 100192, China）
ISSN:	1001-3695
DOI:	10.3969/j.issn.1001-3695.2015.08.015