A feature selection model for document classification using Tom and Jerry Optimization algorithm

Since the last decade, high-dimensional data has been increasing in various document mining fields, such as text summarization, text clustering, and text classification. The curse of dimensionality has an impact on the classification model’s performance. The feature selection strategy is extremely e...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 83; no. 4; pp. 10273 - 10295
Main Authors Thirumoorthy, K, Britto, J Jerold John
Format Journal Article
LanguageEnglish
Published New York Springer US 01.01.2024
Subjects
Online AccessGet full text
ISSN1380-7501
1573-7721
DOI10.1007/s11042-023-15828-6

Cover

More Information
Summary:Since the last decade, high-dimensional data has been increasing in various document mining fields, such as text summarization, text clustering, and text classification. The curse of dimensionality has an impact on the classification model’s performance. The feature selection strategy is extremely effective in dealing with the curse of dimensionality issue. In this work, we present the Tom and Jerry Optimization technique(TJO) for feature subset selection. The proposed work uses the classifier error rate and the feature chosen rate to measure the candidate’s fitness. The performance of the proposed scheme is examined using two popular benchmark text corpus and compared with five metaheuristic approaches. The best success rate obtained by the proposed scheme is 95.77%, whereas the best precision is 0.9509, recall is 0.9577 and F1-score is 0.9541. According to the comparison results, the proposed feature subset selection scheme outperforms the standard strategy.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-023-15828-6