中文事件相关性语料库构建及识别方法

事件往往围绕主题展开,相互间存在相关性。在大数据时代,从海量信息中筛选出和某个主题相关的事件,有助于信息抽取、文本摘要、文本生成等自然语言处理任务。首先提出一种相关事件的标注方法,并标注了一个中文事件相关性语料库。然后,初步提出了一个基于多种特征的相关性事件识别方法。在标注语料上的实验表明,性能在基准系统上F1值提高了4.08%。...

Full description

Saved in:
Bibliographic Details
Published in计算机工程与科学 Vol. 37; no. 12; pp. 2306 - 2311
Main Author 黄一龙 李培峰 朱巧明
Format Journal Article
LanguageChinese
Published 苏州大学计算机科学与技术学院,江苏苏州,215006%江苏省计算机信息处理技术重点实验室,江苏苏州,215006 2015
Subjects
Online AccessGet full text
ISSN1007-130X
DOI10.3969/j.issn.1007-130X.2015.12.017

Cover

More Information
Summary:事件往往围绕主题展开,相互间存在相关性。在大数据时代,从海量信息中筛选出和某个主题相关的事件,有助于信息抽取、文本摘要、文本生成等自然语言处理任务。首先提出一种相关事件的标注方法,并标注了一个中文事件相关性语料库。然后,初步提出了一个基于多种特征的相关性事件识别方法。在标注语料上的实验表明,性能在基准系统上F1值提高了4.08%。
Bibliography:relevant event corpus ; annotation ; relevance ; event relation
43-1258/TP
HUANG Yi-long,LI Pei-feng,ZHU Qiao-ming(1. School of Computer Science and Technology, Soochow University, Suzhou 215006; 2. Province Key Lab of Computer Information Processing Technology of Jiangsu,Suzhou 215006,China)
There are many relevant events concerning a topic. In the era of big data, extracting those events which are relevant to a specific topic is helpful for many natural language processing applications, such as information extraction, text summarization, and text generation. We propose a method to anno- tate relevant events and construct a Chinese relevant event corpus. We then put forward a relevant event recognition approach based on various distances and semantic features. Experimental results on the annotated corpus show that the proposed approach outperforms the baseline by 4.08% in Fl-measure.
ISSN:1007-130X
DOI:10.3969/j.issn.1007-130X.2015.12.017