Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mi...

Full description

Saved in:

Bibliographic Details
Published in	Advances in Computational Intelligence Systems Vol. 513; pp. 187 - 205
Main Authors	Chen, Hao, Mckeever, Susan, Delany, Sarah Jane
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2016 Springer International Publishing
Series	Advances in Intelligent Systems and Computing
Subjects	Artificial intelligence Automatic control engineering Detection Accuracy Document Frequency Feature Selection Image processing Minority Class Singular Value Decomposition
Online Access	Get full text
ISBN	3319465619 9783319465616
ISSN	2194-5357 2194-5365
DOI	10.1007/978-3-319-46562-3_12

Cover

More Information
Summary:	The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat—using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and feature enhancement, demonstrating the impact of these techniques on detection accuracies. In addition, we investigate the need for sampling on imbalanced datasets. Our conclusions are: (1) Dataset balancing boosts accuracies significantly for social media abusive content detection; (2) Feature reduction, important for large feature sets that are typical of social media datasets, improves efficiency whilst maintaining detection accuracies; (3) The use of generic structural features common across all our datasets proved to be of limited use in the automatic detection of abusive content. Our findings can support practitioners in selecting appropriate text mining strategies in this area.
ISBN:	3319465619 9783319465616
ISSN:	2194-5357 2194-5365
DOI:	10.1007/978-3-319-46562-3_12