中文分词场景库更新方法和系统

本发明实施例提供种中文分词场景库更新方法和系统,所述方法包括:在语句的正确分词结果中,选择个分词;基于正确分词结果,构造分词的特征;针对场景库的每个主题词典,计算特征的最大熵模型得分;比较最大熵模型得分中的最大值与第预定阈值,如果最大值大于第预定阈值,将分词添加到最大值对应的主题词典中。本发明实施例能够带入场景信息,且具有多种更新方式。 One embodiment of the invention provides a Chinese word segmentation scene library updating method and a system. The method compri...

Full description

Saved in:
Bibliographic Details
Format Patent
LanguageChinese
Published 04.01.2019
Subjects
Online AccessGet full text

Cover

More Information
Summary:本发明实施例提供种中文分词场景库更新方法和系统,所述方法包括:在语句的正确分词结果中,选择个分词;基于正确分词结果,构造分词的特征;针对场景库的每个主题词典,计算特征的最大熵模型得分;比较最大熵模型得分中的最大值与第预定阈值,如果最大值大于第预定阈值,将分词添加到最大值对应的主题词典中。本发明实施例能够带入场景信息,且具有多种更新方式。 One embodiment of the invention provides a Chinese word segmentation scene library updating method and a system. The method comprises following steps: selecting a segmented word from a correct word segmentation result of a sentence; establishing features of the segmented word based on the correct word segmentation result; calculating scores of the maximum entropy model of the features; comparing the maximum value of the scores of the maximum entropy model with a first preset threshold value; adding the segmented word to a subject dictionary corresponding to the maximum value if the maximum value is larger than the first preset threshold value. The embodiment of the invention takes scene information in and comprises multiple update modes.
Bibliography:Application Number: CN20161597548