Comparative Analysis of Resampling Techniques on Class Imbalance in Body Shaming Phrase Detection

In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine...

Full description

Saved in:

Bibliographic Details
Published in	Moratuwa Engineering Research Conference pp. 49 - 54
Main Authors	Puvanendran, Rukshani, Wijikumar, Pirunthavi, Rathnayaka, Tharuni, Thilakarathna, Kaushila, Jayasiri, Pasindu, Roopasinghe, Hashanthi, Kavishan, Madhan, Jeyamohan, Manoharan, Thurshikan, Kanesalingam
Format	Conference Proceeding
Language	English
Published	IEEE 08.08.2024
Subjects	Accuracy Adaptation models bodyshaming detection Feature extraction Machine learning Natural language processing Performance evaluation Psychology resampling SMOTE Support vector machines SVM Training
Online Access	Get full text
ISSN	2691-364X
DOI	10.1109/MERCon63886.2024.10688774

Cover

More Information
Summary:	In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine learning are employed to detect body shaming phrases. This research paper presents the machine learning methodology for the detection and classification of body shaming phrases. The standard TF-IDF has been employed for feature extraction. The SVM classifier exhibited an exceptional accuracy of \mathbf{98.5\%}. In order to improve performance and address the issue of overfitting, the dataset is subjected to resampling procedures, which are carefully applied to ensure the appropriate selection of the model. The methodology encompasses four resampling strategies for achieving data balance, namely SMOTE, ADASYN, SMOTE-Tomek and SMOTE-ENN. The efficacy of the suggested methodology is evaluated by comparing it against five machine learning classifiers, The findings suggest that SVM exhibit superior performance compared to other models in both the SMOTE and SMOTE-Tomek balanced datasets. Specifically, SVM achieves accuracy, recall, precision, and F1-score scores of \mathbf{99.1\%, 99.59\%,} \mathbf{99.18\%}, and \mathbf{99.38\%} respectively. The study concludes that the utilization of resampling approaches, specifically SMOTE and SMOTE-Tomek enhances the model's performance.
ISSN:	2691-364X
DOI:	10.1109/MERCon63886.2024.10688774