基于改进模糊语法增量式算法的文本分类方法

针对现有很多文本分类算法必须进行训练一测试一再训练的缺点以及通用模型的语法表现度较差等问题,提出一种改进的模糊语法算法(IFGA)。根据一些选取的文本片段建立学习模型;为了适应轻微变化,采用增量式模型,将选取的文本片段转换到底层架构中,形成模糊语法;利用模糊联合操作将单个文本片段语法进行整合,并将所学习的文本片段转换成更加一般的表示形式。与决策表算法、改进的朴素贝叶斯算法等进行了两组对比实验,第一个实验结果表明,IFGA和其他机器学习算法性能并无明显差异;第二个实验结果说明,增量式学习算法比标准机器学习算法更加具有优势,其性能较平稳,数据的尺寸影响更小。提出的算法具有较低的模型重新训练时间。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 34; no. 11; pp. 3355 - 3358
Main Author 龚静 黄欣阳
Format Journal Article
LanguageChinese
Published 湖南环境生物职业技术学院信息技术系,湖南衡阳,421001%南华大学计算机学院,湖南衡阳,421001 2017
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2017.11.034

Cover

More Information
Summary:针对现有很多文本分类算法必须进行训练一测试一再训练的缺点以及通用模型的语法表现度较差等问题,提出一种改进的模糊语法算法(IFGA)。根据一些选取的文本片段建立学习模型;为了适应轻微变化,采用增量式模型,将选取的文本片段转换到底层架构中,形成模糊语法;利用模糊联合操作将单个文本片段语法进行整合,并将所学习的文本片段转换成更加一般的表示形式。与决策表算法、改进的朴素贝叶斯算法等进行了两组对比实验,第一个实验结果表明,IFGA和其他机器学习算法性能并无明显差异;第二个实验结果说明,增量式学习算法比标准机器学习算法更加具有优势,其性能较平稳,数据的尺寸影响更小。提出的算法具有较低的模型重新训练时间。
Bibliography:51-1196/TP
text classification; machine learning; incremental ; fuzzy grammar; retrained
Concerning that many text classification algorithms need training-testing-retraining, arid the performance of the general models is poor, this paper proposed an improved fuzzy grammar algorithm(IFGA). Firstly, this method built learning model according to some selected text segments. In order to make fit for the slight changes, the learning model used the incre- mental model to transform the selected text segments into the underlying structure, which were the fuzzy grammar. Finally, combined the single text fragment grammar by the fuzzy joint operation, it transformed the learn text fragment into a more general representation. Two group experiments were used for comparing with the decision table algorithm, the improved naive Bias algorithm and some other algorithms. The first experiment results show that there is no significant difference between the IFGA and other machine learning algorithms. The second experimental result
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2017.11.034