Comparative Analysis of Resampling Techniques on Class Imbalance in Body Shaming Phrase Detection

In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine...

Full description

Saved in:
Bibliographic Details
Published inMoratuwa Engineering Research Conference pp. 49 - 54
Main Authors Puvanendran, Rukshani, Wijikumar, Pirunthavi, Rathnayaka, Tharuni, Thilakarathna, Kaushila, Jayasiri, Pasindu, Roopasinghe, Hashanthi, Kavishan, Madhan, Jeyamohan, Manoharan, Thurshikan, Kanesalingam
Format Conference Proceeding
LanguageEnglish
Published IEEE 08.08.2024
Subjects
Online AccessGet full text
ISSN2691-364X
DOI10.1109/MERCon63886.2024.10688774

Cover

More Information
Summary:In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine learning are employed to detect body shaming phrases. This research paper presents the machine learning methodology for the detection and classification of body shaming phrases. The standard TF-IDF has been employed for feature extraction. The SVM classifier exhibited an exceptional accuracy of \mathbf{98.5\%}. In order to improve performance and address the issue of overfitting, the dataset is subjected to resampling procedures, which are carefully applied to ensure the appropriate selection of the model. The methodology encompasses four resampling strategies for achieving data balance, namely SMOTE, ADASYN, SMOTE-Tomek and SMOTE-ENN. The efficacy of the suggested methodology is evaluated by comparing it against five machine learning classifiers, The findings suggest that SVM exhibit superior performance compared to other models in both the SMOTE and SMOTE-Tomek balanced datasets. Specifically, SVM achieves accuracy, recall, precision, and F1-score scores of \mathbf{99.1\%, 99.59\%,} \mathbf{99.18\%}, and \mathbf{99.38\%} respectively. The study concludes that the utilization of resampling approaches, specifically SMOTE and SMOTE-Tomek enhances the model's performance.
ISSN:2691-364X
DOI:10.1109/MERCon63886.2024.10688774