A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese

Chinese word segmentation is an important research direction in related research on elementary mathematics knowledge extraction. The speed of segmentation directly affects subsequent applications, and the accuracy of segmentation directly affects corresponding research in the next step. In the machi...

Full description

Saved in:
Bibliographic Details
Published inMobile networks and applications Vol. 26; no. 5; pp. 1891 - 1903
Main Authors Liu, Shuai, He, Tenghui, Dai, Jianhua
Format Journal Article
LanguageEnglish
Published New York Springer US 01.10.2021
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1383-469X
1572-8153
DOI10.1007/s11036-020-01725-x

Cover

More Information
Summary:Chinese word segmentation is an important research direction in related research on elementary mathematics knowledge extraction. The speed of segmentation directly affects subsequent applications, and the accuracy of segmentation directly affects corresponding research in the next step. In the machine learning methods for extracting basic mathematical knowledge points, the Conditional Random Field (CRF) model implements new word discovery well, and is increasingly used in knowledge extraction of basic mathematics. This article first introduces the traditional CRF process of named entity recognition. Then, an improved algorithm CRF++for conditional field model is proposed. Since the recognition rate of named entities based on traditional machine learning methods is not high, a post-processing method for entity recognition that automatically generates a dictionary is proposed. After identifying mathematical entities, a pruning strategy combining Viterbi algorithm and rules is proposed to achieve a higher recognition rate of elementary mathematical entities. Finally, several methods of disambiguation after entity recognition are introduced.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1383-469X
1572-8153
DOI:10.1007/s11036-020-01725-x