Comparative Analysis of Resampling Techniques on Class Imbalance in Body Shaming Phrase Detection
In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine...
Saved in:
Published in | Moratuwa Engineering Research Conference pp. 49 - 54 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
08.08.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2691-364X |
DOI | 10.1109/MERCon63886.2024.10688774 |
Cover
Summary: | In this present era, body shaming has pervasive and detrimental effects on individuals' psychological and physical well-being. Hence, a profound significance in its potential to elucidate and address the textual based body shaming phrases is needed. Natural Language Processing (NLP) and Machine learning are employed to detect body shaming phrases. This research paper presents the machine learning methodology for the detection and classification of body shaming phrases. The standard TF-IDF has been employed for feature extraction. The SVM classifier exhibited an exceptional accuracy of \mathbf{98.5\%}. In order to improve performance and address the issue of overfitting, the dataset is subjected to resampling procedures, which are carefully applied to ensure the appropriate selection of the model. The methodology encompasses four resampling strategies for achieving data balance, namely SMOTE, ADASYN, SMOTE-Tomek and SMOTE-ENN. The efficacy of the suggested methodology is evaluated by comparing it against five machine learning classifiers, The findings suggest that SVM exhibit superior performance compared to other models in both the SMOTE and SMOTE-Tomek balanced datasets. Specifically, SVM achieves accuracy, recall, precision, and F1-score scores of \mathbf{99.1\%, 99.59\%,} \mathbf{99.18\%}, and \mathbf{99.38\%} respectively. The study concludes that the utilization of resampling approaches, specifically SMOTE and SMOTE-Tomek enhances the model's performance. |
---|---|
ISSN: | 2691-364X |
DOI: | 10.1109/MERCon63886.2024.10688774 |