Research on TFIDF Algorithm Based on Weighting of Distribution Factors

The current TFIDF (Term Frequency and Inverted Document Frequency) algorithm cannot effectively reflect the relationship between the importance of a word and its distribution. This paper proposes a Class Variance-Term Frequency and Inverted Document Frequency algorithm. This algorithm improves the T...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 1621; no. 1; pp. 12007 - 12014
Main Authors Zhang, Xinming, Shi, Yuanbo, Wei, Haiping
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.08.2020
Subjects
Online AccessGet full text
ISSN1742-6588
1742-6596
1742-6596
DOI10.1088/1742-6596/1621/1/012007

Cover

More Information
Summary:The current TFIDF (Term Frequency and Inverted Document Frequency) algorithm cannot effectively reflect the relationship between the importance of a word and its distribution. This paper proposes a Class Variance-Term Frequency and Inverted Document Frequency algorithm. This algorithm improves the TFIDF algorithm based on three distribution factors: category, inter-class and variance. In order to measure the optimization effect of this method, three algorithms were compared using the original algorithm, improved algorithm and TFIDF algorithm based on dual parallel calculation model. Experiments show that the improved algorithm has significantly improved recall, accuracy, and F metric values, comparing with the original algorithm, and has improved compared with the TFIDF algorithm based on dual parallel calculation model. Therefore, the improved algorithm can well adapt to the demand for feature word extraction and has better text classification performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1742-6588
1742-6596
1742-6596
DOI:10.1088/1742-6596/1621/1/012007