Abusive words Detection in Persian tweets using machine learning and deep learning techniques

Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share...

Full description

Saved in:

Bibliographic Details
Published in	2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS) pp. 1 - 5
Main Authors	Dehghani, Mohammad, Dehkordy, Diyana Tehrany, Bahrani, Mohammad
Format	Conference Proceeding
Language	English
Published	IEEE 29.12.2021
Subjects	abusive comments Bert Blogs Deep learning machine learning Neural networks Persian tweets Social networking (online) Solid modeling Transfer learning
Online Access	Get full text
DOI	10.1109/ICSPIS54653.2021.9729390

Cover

Abstract	Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.
AbstractList	Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.
Author	Bahrani, Mohammad Dehkordy, Diyana Tehrany Dehghani, Mohammad
Author_xml	– sequence: 1 givenname: Mohammad surname: Dehghani fullname: Dehghani, Mohammad email: mohamad.dehqani@modares.ac.ir organization: Tarbiat Modares University,Department of Industrial and Systems Engineering,Tehran,Iran – sequence: 2 givenname: Diyana Tehrany surname: Dehkordy fullname: Dehkordy, Diyana Tehrany email: d.tehrany@mail.um.ac.ir organization: Ferdowsi University of Mashhad,Department of Computer Engineering,Mashhad,Iran – sequence: 3 givenname: Mohammad surname: Bahrani fullname: Bahrani, Mohammad email: bahrani@atu.ac.ir organization: Allameh Tabataba'i University,Faculty of Statistics, Mathematics and Computer,Tehran,Iran
BookMark	eNpFj8tqwzAURFVoF22aL-hGP2D36mFZWgb3ZQg0kGRZgixdN4JETi2noX9fhwa6GjgMh5k7ch27iIRQBjljYB7rarmol4VUhcg5cJabkhth4IpMTamZUoUEI7S8JR-z5pjCN9JT1_tEn3BAN4Qu0hDpAvsUbKTDCXFIdOzFT7q3bhsi0h3aPp6BjZ56xMM_GRXbGL6OmO7JTWt3CaeXnJD1y_Oqesvm7691NZtngTE9ZEopkIwJPq7SxjnvGuMdlqoxugSBIKTjWmprHLQAgpWtFrrxXDZOMV6ICXn48wZE3Bz6sLf9z-ZyWvwC7vtShw
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICSPIS54653.2021.9729390
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665409384 166540938X
EndPage	5
ExternalDocumentID	9729390
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i118t-66604113240989ccdcb9dce76b98703e034c2848a9c0f00317f838bd24bc61253
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:37:35 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i118t-66604113240989ccdcb9dce76b98703e034c2848a9c0f00317f838bd24bc61253
PageCount	5
ParticipantIDs	ieee_primary_9729390
PublicationCentury	2000
PublicationDate	2021-Dec.-29
PublicationDateYYYYMMDD	2021-12-29
PublicationDate_xml	– month: 12 year: 2021 text: 2021-Dec.-29 day: 29
PublicationDecade	2020
PublicationTitle	2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)
PublicationTitleAbbrev	ICSPIS
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8052135
Snippet	Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years,...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	abusive comments Bert Blogs Deep learning machine learning Neural networks Persian tweets Social networking (online) Solid modeling Transfer learning
Title	Abusive words Detection in Persian tweets using machine learning and deep learning techniques
URI	https://ieeexplore.ieee.org/document/9729390
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2cTf5ODRdrHN0uYo0zGFyWAOdpHRJG9jiN1wnYJ_vS9Zt6F48FZCS0LSx_u-9nvvA7jScZQgxyywpmWIoKAOlI3SwNm8SK4zI7yLQu9JdoficdQaVeB6WwuDiF58hqG79P_y7dys3KeypiIkSBy9CtUkletarY04h6vmQ3vQfxg4c--YeF90E5a3__BN8Wmjsw-9zYRrtchruCp0aL5-9WL874oOoLEr0GP9beo5hArmdXi5dY6SH8g-iVAu2R0WXmaVs1nOnNCdXgTmRFnFkjm1-5S9eSElstI5Ysqy3DKLuNiNbFu8Lhsw7Nw_t7tB6Z4QzIg0FAHxEi6cjzwxuFQZY41W1mAitaIYjZHHwlBuSjNl-MTFdjJJ41TbSGjjYE98BLV8nuMxME4gCm1ESIqil_CIUkLQY1JmKA1BvhOou60ZL9YNMsblrpz-PXwGe-54nCYkUudQK95XeEGZvdCX_ki_AcIbpN8
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEJ0gHvSkBozf7sGjLWu7lO7RoAQUCAmQcDGkuzsQYixEWk389c6WAtF48NZssm2zu5N5r30zD-BG-V4NOUaO0VVNBAWVI40XOtbmJeAq0iJzUeh0g-ZQPI2qowLcbmphEDETn6FrL7N_-WauU_uprCIJCRJH34HdqqB7rKq11vIcLiuter_X6lt7b5-Yn3fn5hN-OKdkiaNxAJ31I1d6kVc3TZSrv351Y_zvOx1CeVuix3qb5HMEBYxL8HJvPSU_kH0SpVyyB0wyoVXMZjGzUnc6CszKspIls3r3KXvLpJTIcu-IKYtiwwziYjuyafK6LMOw8TioN53cP8GZEW1IHGImXFgneeJwodTaaCWNxlqgJEWpj9wXmrJTGEnNJza6a5PQD5XxhNIW-PjHUIznMZ4A4wSj0HiEpSh-CZFIKQRNC4IIA02g7xRKdmnGi1WLjHG-Kmd_D1_DXnPQaY_bre7zOezbrbIKEU9eQDF5T_GS8nyirrLt_QaTbqgs
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+7th+International+Conference+on+Signal+Processing+and+Intelligent+Systems+%28ICSPIS%29&rft.atitle=Abusive+words+Detection+in+Persian+tweets+using+machine+learning+and+deep+learning+techniques&rft.au=Dehghani%2C+Mohammad&rft.au=Dehkordy%2C+Diyana+Tehrany&rft.au=Bahrani%2C+Mohammad&rft.date=2021-12-29&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICSPIS54653.2021.9729390&rft.externalDocID=9729390