Research on TFIDF Algorithm Based on Weighting of Distribution Factors
The current TFIDF (Term Frequency and Inverted Document Frequency) algorithm cannot effectively reflect the relationship between the importance of a word and its distribution. This paper proposes a Class Variance-Term Frequency and Inverted Document Frequency algorithm. This algorithm improves the T...
Saved in:
| Published in | Journal of physics. Conference series Vol. 1621; no. 1; pp. 12007 - 12014 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Bristol
IOP Publishing
01.08.2020
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1742-6588 1742-6596 1742-6596 |
| DOI | 10.1088/1742-6596/1621/1/012007 |
Cover
| Summary: | The current TFIDF (Term Frequency and Inverted Document Frequency) algorithm cannot effectively reflect the relationship between the importance of a word and its distribution. This paper proposes a Class Variance-Term Frequency and Inverted Document Frequency algorithm. This algorithm improves the TFIDF algorithm based on three distribution factors: category, inter-class and variance. In order to measure the optimization effect of this method, three algorithms were compared using the original algorithm, improved algorithm and TFIDF algorithm based on dual parallel calculation model. Experiments show that the improved algorithm has significantly improved recall, accuracy, and F metric values, comparing with the original algorithm, and has improved compared with the TFIDF algorithm based on dual parallel calculation model. Therefore, the improved algorithm can well adapt to the demand for feature word extraction and has better text classification performance. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1742-6588 1742-6596 1742-6596 |
| DOI: | 10.1088/1742-6596/1621/1/012007 |