基于最大熵模型的汉语标点句缺失话题自动识别初探
本文的任务是判别标点句缺失话题是上句的主语还是宾语,将该任务作为标点句缺失话题自动识别研究的切入点。首先归纳了判别这一任务的一系列字面特征和语义特征,然后结合规则和最大熵模型,进行自动判别实验。结果显示,对特定类别动词的实验F值达到82%。对实验结果的分析说明,动词特征和语义特征对判别该任务的作用最大,规则方法和统计方法在判别任务中不能偏废,精细化的知识对判别的性能有重要影响。...
Saved in:
| Published in | 计算机工程与科学 Vol. 37; no. 12; pp. 2282 - 2293 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | Chinese |
| Published |
北京大学中国语言文学系,北京,100871%北京语言大学语言信息处理研究所,北京,100083
2015
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1007-130X |
| DOI | 10.3969/j.issn.1007-130X.2015.12.014 |
Cover
| Summary: | 本文的任务是判别标点句缺失话题是上句的主语还是宾语,将该任务作为标点句缺失话题自动识别研究的切入点。首先归纳了判别这一任务的一系列字面特征和语义特征,然后结合规则和最大熵模型,进行自动判别实验。结果显示,对特定类别动词的实验F值达到82%。对实验结果的分析说明,动词特征和语义特征对判别该任务的作用最大,规则方法和统计方法在判别任务中不能偏废,精细化的知识对判别的性能有重要影响。 |
|---|---|
| Bibliography: | generalized topic structure; new branch topic; automatic recognition; maximum entropy model 43-1258/TP We focus on the task of the automatic recognition, which identify whether an absent topic of a punctuation clause is the subject or object of its previous sentence. We regard this task as the pointcut of the automatic recognition of absent topics in Chinese punctuation clauses. Several literal features and semantic features are summerized to achieve this task by combining the rules and the maximum en- tropy model. Experimental results show that F-score of this recognition approach reaches 82% for the samples of some specific verbs. Experimental results analysis shows that verb features and semantic features play the most important role in the recognition process; neither rules nor statistics can be neglected, and refined knowledge has great influence on the performance of the recognition . LU Da-wei,SONG Rou(1. Department of Chinese Language and Literature,Peking University, Beijing 100871 ; 2. Institute of Lan |
| ISSN: | 1007-130X |
| DOI: | 10.3969/j.issn.1007-130X.2015.12.014 |