Abusive words Detection in Persian tweets using machine learning and deep learning techniques

Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share...

Full description

Saved in:
Bibliographic Details
Published in2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS) pp. 1 - 5
Main Authors Dehghani, Mohammad, Dehkordy, Diyana Tehrany, Bahrani, Mohammad
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.12.2021
Subjects
Online AccessGet full text
DOI10.1109/ICSPIS54653.2021.9729390

Cover

Abstract Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.
AbstractList Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.
Author Bahrani, Mohammad
Dehkordy, Diyana Tehrany
Dehghani, Mohammad
Author_xml – sequence: 1
  givenname: Mohammad
  surname: Dehghani
  fullname: Dehghani, Mohammad
  email: mohamad.dehqani@modares.ac.ir
  organization: Tarbiat Modares University,Department of Industrial and Systems Engineering,Tehran,Iran
– sequence: 2
  givenname: Diyana Tehrany
  surname: Dehkordy
  fullname: Dehkordy, Diyana Tehrany
  email: d.tehrany@mail.um.ac.ir
  organization: Ferdowsi University of Mashhad,Department of Computer Engineering,Mashhad,Iran
– sequence: 3
  givenname: Mohammad
  surname: Bahrani
  fullname: Bahrani, Mohammad
  email: bahrani@atu.ac.ir
  organization: Allameh Tabataba'i University,Faculty of Statistics, Mathematics and Computer,Tehran,Iran
BookMark eNpFj8tqwzAURFVoF22aL-hGP2D36mFZWgb3ZQg0kGRZgixdN4JETi2noX9fhwa6GjgMh5k7ch27iIRQBjljYB7rarmol4VUhcg5cJabkhth4IpMTamZUoUEI7S8JR-z5pjCN9JT1_tEn3BAN4Qu0hDpAvsUbKTDCXFIdOzFT7q3bhsi0h3aPp6BjZ56xMM_GRXbGL6OmO7JTWt3CaeXnJD1y_Oqesvm7691NZtngTE9ZEopkIwJPq7SxjnvGuMdlqoxugSBIKTjWmprHLQAgpWtFrrxXDZOMV6ICXn48wZE3Bz6sLf9z-ZyWvwC7vtShw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSPIS54653.2021.9729390
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665409384
166540938X
EndPage 5
ExternalDocumentID 9729390
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-66604113240989ccdcb9dce76b98703e034c2848a9c0f00317f838bd24bc61253
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:35 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-66604113240989ccdcb9dce76b98703e034c2848a9c0f00317f838bd24bc61253
PageCount 5
ParticipantIDs ieee_primary_9729390
PublicationCentury 2000
PublicationDate 2021-Dec.-29
PublicationDateYYYYMMDD 2021-12-29
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-Dec.-29
  day: 29
PublicationDecade 2020
PublicationTitle 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)
PublicationTitleAbbrev ICSPIS
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8052135
Snippet Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years,...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms abusive comments
Bert
Blogs
Deep learning
machine learning
Neural networks
Persian tweets
Social networking (online)
Solid modeling
Transfer learning
Title Abusive words Detection in Persian tweets using machine learning and deep learning techniques
URI https://ieeexplore.ieee.org/document/9729390
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2cTf5ODRdrHN0uYo0zGFyWAOdpHRJG9jiN1wnYJ_vS9Zt6F48FZCS0LSx_u-9nvvA7jScZQgxyywpmWIoKAOlI3SwNm8SK4zI7yLQu9JdoficdQaVeB6WwuDiF58hqG79P_y7dys3KeypiIkSBy9CtUkletarY04h6vmQ3vQfxg4c--YeF90E5a3__BN8Wmjsw-9zYRrtchruCp0aL5-9WL874oOoLEr0GP9beo5hArmdXi5dY6SH8g-iVAu2R0WXmaVs1nOnNCdXgTmRFnFkjm1-5S9eSElstI5Ysqy3DKLuNiNbFu8Lhsw7Nw_t7tB6Z4QzIg0FAHxEi6cjzwxuFQZY41W1mAitaIYjZHHwlBuSjNl-MTFdjJJ41TbSGjjYE98BLV8nuMxME4gCm1ESIqil_CIUkLQY1JmKA1BvhOou60ZL9YNMsblrpz-PXwGe-54nCYkUudQK95XeEGZvdCX_ki_AcIbpN8
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEJ0gHvSkBozf7sGjLWu7lO7RoAQUCAmQcDGkuzsQYixEWk389c6WAtF48NZssm2zu5N5r30zD-BG-V4NOUaO0VVNBAWVI40XOtbmJeAq0iJzUeh0g-ZQPI2qowLcbmphEDETn6FrL7N_-WauU_uprCIJCRJH34HdqqB7rKq11vIcLiuter_X6lt7b5-Yn3fn5hN-OKdkiaNxAJ31I1d6kVc3TZSrv351Y_zvOx1CeVuix3qb5HMEBYxL8HJvPSU_kH0SpVyyB0wyoVXMZjGzUnc6CszKspIls3r3KXvLpJTIcu-IKYtiwwziYjuyafK6LMOw8TioN53cP8GZEW1IHGImXFgneeJwodTaaCWNxlqgJEWpj9wXmrJTGEnNJza6a5PQD5XxhNIW-PjHUIznMZ4A4wSj0HiEpSh-CZFIKQRNC4IIA02g7xRKdmnGi1WLjHG-Kmd_D1_DXnPQaY_bre7zOezbrbIKEU9eQDF5T_GS8nyirrLt_QaTbqgs
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+7th+International+Conference+on+Signal+Processing+and+Intelligent+Systems+%28ICSPIS%29&rft.atitle=Abusive+words+Detection+in+Persian+tweets+using+machine+learning+and+deep+learning+techniques&rft.au=Dehghani%2C+Mohammad&rft.au=Dehkordy%2C+Diyana+Tehrany&rft.au=Bahrani%2C+Mohammad&rft.date=2021-12-29&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICSPIS54653.2021.9729390&rft.externalDocID=9729390