Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are hug...

Full description

Saved in:
Bibliographic Details
Published in2020 Moratuwa Engineering Research Conference (MERCon) pp. 272 - 276
Main Authors Thavareesan, Sajeetha, Mahesan, Sinnathamby
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2020
Subjects
Online AccessGet full text
DOI10.1109/MERCon50084.2020.9185369

Cover

Abstract Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained.
AbstractList Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained.
Author Mahesan, Sinnathamby
Thavareesan, Sajeetha
Author_xml – sequence: 1
  givenname: Sajeetha
  surname: Thavareesan
  fullname: Thavareesan, Sajeetha
  organization: Eastern University,Dept. of Mathematics,Sri Lanka
– sequence: 2
  givenname: Sinnathamby
  surname: Mahesan
  fullname: Mahesan, Sinnathamby
  organization: University of Jaffna,Dept. of Computer Science,Sri Lanka
BookMark eNpFj89KxDAYxCPoQdd9Ai95gdYkbZrkKKX-gYqiFY_l2-SLBLbp0kapb2_FBS8zc5gZ-F2Q0zhGJIRylnPOzPVj81KPUTKmy1wwwXLDtSwqc0K2RmmuhObGKKnOSf-KMYVhFdriEuwYabMcIM5hTZ9ziB_0fZyc-EJLITrqYU4dLon6caL_2-cJXbDpdxQi7WAIe5rW2nxJzjzsZ9wefUPebpuuvs_ap7uH-qbNgmBFyji3ZcmwVNo4pTzuCg0c0IByFWjNKiZ3gNZXgnOoSictSoFsRfASVqJiQ67-fgMi9ocpDDB990fs4gcZ-1Q9
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MERCon50084.2020.9185369
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781728199757
1728199751
EndPage 276
ExternalDocumentID 9185369
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813
IEDL.DBID RIE
IngestDate Mon Jul 08 05:38:35 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813
PageCount 5
ParticipantIDs ieee_primary_9185369
PublicationCentury 2000
PublicationDate 2020-July
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-July
PublicationDecade 2020
PublicationTitle 2020 Moratuwa Engineering Research Conference (MERCon)
PublicationTitleAbbrev MERCon
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.0194855
Snippet Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the...
SourceID ieee
SourceType Publisher
StartPage 272
SubjectTerms Analytical models
Classification algorithms
Computational modeling
conjunction
grammar rule
lexicon
Motion pictures
Predictive models
Sentiment analysis
Tamil
Task analysis
Title Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts
URI https://ieeexplore.ieee.org/document/9185369
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF7anjyptOKbPXg06eax2ey5pBSxUrTF3spmM5EipNKmUvz1ziR9oHjwFpYsWXbYfPPNfjPD2F2gYpKFBU6G-I4EJdeO8TPhWIQKZYSfplX7tuFTNJiED1M5bbD7fS4MAFTiM3DpsbrLzxZ2TaGyriZwiXSTNZXSda7WTpwjdHeYPPcWhaQK8cj7fOFuX__RN6WCjf4xG-4-WKtF3t11mbr261ctxv-u6IR1Dgl6fLSHnlPWgKLNZi-k_KEJ_BE2aOGCJxs87BQP46Rvf-OvSDX9T7DcFBnPzaoc47-Zo9_KD3NHS7q7IXvxecHHFADhJA9Zddikn4x7A2fbP8GZ-yIoHc-zYSggVLHOlMqRNsfGM6CNyiKD5xaZTGrA5hGSQBOFmbQgfUAfQuXSoA2DM9YqFgWcM57HkdSByEOLDhSkysQSAhMK6xmZSi-6YG3anNlHXSJjtt2Xy7-Hr9gRGahWvV6zVrlcww1ie5neVkb9Bj60pkU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qPehJpRXf7sGjSTePzSbn0lK1KUVT7C1sNhMpQiptIsVf72zSB4oHbyEwJOyw-803-80MIXeO8LUszDFSxHckKFlgSDtlhkKoEJLZSVKNbwtH3mDiPk75tEHut7UwAFCJz8DUj9VdfjpXpU6VdQINLl6wR_Y5sgpRV2tt5Dks6IS95-4857pHPDI_m5lrgx-TUyrg6B-RcPPJWi_ybpZFYqqvX90Y__tPx6S9K9Gj4y34nJAG5C0Sv2jtjzagQ1ihj3PaW-F21xkxqhXub_QVyab9CYrKPKWZXBYRns4UI1e6sx0v9O2N9hid5TTSKRCqBSLLNpn0e1F3YKwnKBgzmzmFYVnKdRm4wg9SITIkzr60JARSpJ7EnYtcJpGgMg9poPTclCvgNmAUITIu0YvOKWnm8xzOCM18jwcOy1yFIRQkQvocHOkyZUmecMs7Jy29OPFH3SQjXq_Lxd-vb8nBIAqH8fBh9HRJDrWzag3sFWkWixKuEemL5KZy8DcnuKmW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+Moratuwa+Engineering+Research+Conference+%28MERCon%29&rft.atitle=Sentiment+Lexicon+Expansion+using+Word2vec+and+fastText+for+Sentiment+Prediction+in+Tamil+texts&rft.au=Thavareesan%2C+Sajeetha&rft.au=Mahesan%2C+Sinnathamby&rft.date=2020-07-01&rft.pub=IEEE&rft.spage=272&rft.epage=276&rft_id=info:doi/10.1109%2FMERCon50084.2020.9185369&rft.externalDocID=9185369