Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts
Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are hug...
Saved in:
Published in | 2020 Moratuwa Engineering Research Conference (MERCon) pp. 272 - 276 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2020
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/MERCon50084.2020.9185369 |
Cover
Abstract | Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained. |
---|---|
AbstractList | Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained. |
Author | Mahesan, Sinnathamby Thavareesan, Sajeetha |
Author_xml | – sequence: 1 givenname: Sajeetha surname: Thavareesan fullname: Thavareesan, Sajeetha organization: Eastern University,Dept. of Mathematics,Sri Lanka – sequence: 2 givenname: Sinnathamby surname: Mahesan fullname: Mahesan, Sinnathamby organization: University of Jaffna,Dept. of Computer Science,Sri Lanka |
BookMark | eNpFj89KxDAYxCPoQdd9Ai95gdYkbZrkKKX-gYqiFY_l2-SLBLbp0kapb2_FBS8zc5gZ-F2Q0zhGJIRylnPOzPVj81KPUTKmy1wwwXLDtSwqc0K2RmmuhObGKKnOSf-KMYVhFdriEuwYabMcIM5hTZ9ziB_0fZyc-EJLITrqYU4dLon6caL_2-cJXbDpdxQi7WAIe5rW2nxJzjzsZ9wefUPebpuuvs_ap7uH-qbNgmBFyji3ZcmwVNo4pTzuCg0c0IByFWjNKiZ3gNZXgnOoSictSoFsRfASVqJiQ67-fgMi9ocpDDB990fs4gcZ-1Q9 |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/MERCon50084.2020.9185369 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781728199757 1728199751 |
EndPage | 276 |
ExternalDocumentID | 9185369 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813 |
IEDL.DBID | RIE |
IngestDate | Mon Jul 08 05:38:35 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813 |
PageCount | 5 |
ParticipantIDs | ieee_primary_9185369 |
PublicationCentury | 2000 |
PublicationDate | 2020-July |
PublicationDateYYYYMMDD | 2020-07-01 |
PublicationDate_xml | – month: 07 year: 2020 text: 2020-July |
PublicationDecade | 2020 |
PublicationTitle | 2020 Moratuwa Engineering Research Conference (MERCon) |
PublicationTitleAbbrev | MERCon |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 2.0194855 |
Snippet | Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 272 |
SubjectTerms | Analytical models Classification algorithms Computational modeling conjunction grammar rule lexicon Motion pictures Predictive models Sentiment analysis Tamil Task analysis |
Title | Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts |
URI | https://ieeexplore.ieee.org/document/9185369 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF7anjyptOKbPXg06eax2ey5pBSxUrTF3spmM5EipNKmUvz1ziR9oHjwFpYsWXbYfPPNfjPD2F2gYpKFBU6G-I4EJdeO8TPhWIQKZYSfplX7tuFTNJiED1M5bbD7fS4MAFTiM3DpsbrLzxZ2TaGyriZwiXSTNZXSda7WTpwjdHeYPPcWhaQK8cj7fOFuX__RN6WCjf4xG-4-WKtF3t11mbr261ctxv-u6IR1Dgl6fLSHnlPWgKLNZi-k_KEJ_BE2aOGCJxs87BQP46Rvf-OvSDX9T7DcFBnPzaoc47-Zo9_KD3NHS7q7IXvxecHHFADhJA9Zddikn4x7A2fbP8GZ-yIoHc-zYSggVLHOlMqRNsfGM6CNyiKD5xaZTGrA5hGSQBOFmbQgfUAfQuXSoA2DM9YqFgWcM57HkdSByEOLDhSkysQSAhMK6xmZSi-6YG3anNlHXSJjtt2Xy7-Hr9gRGahWvV6zVrlcww1ie5neVkb9Bj60pkU |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qPehJpRXf7sGjSTePzSbn0lK1KUVT7C1sNhMpQiptIsVf72zSB4oHbyEwJOyw-803-80MIXeO8LUszDFSxHckKFlgSDtlhkKoEJLZSVKNbwtH3mDiPk75tEHut7UwAFCJz8DUj9VdfjpXpU6VdQINLl6wR_Y5sgpRV2tt5Dks6IS95-4857pHPDI_m5lrgx-TUyrg6B-RcPPJWi_ybpZFYqqvX90Y__tPx6S9K9Gj4y34nJAG5C0Sv2jtjzagQ1ihj3PaW-F21xkxqhXub_QVyab9CYrKPKWZXBYRns4UI1e6sx0v9O2N9hid5TTSKRCqBSLLNpn0e1F3YKwnKBgzmzmFYVnKdRm4wg9SITIkzr60JARSpJ7EnYtcJpGgMg9poPTclCvgNmAUITIu0YvOKWnm8xzOCM18jwcOy1yFIRQkQvocHOkyZUmecMs7Jy29OPFH3SQjXq_Lxd-vb8nBIAqH8fBh9HRJDrWzag3sFWkWixKuEemL5KZy8DcnuKmW |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+Moratuwa+Engineering+Research+Conference+%28MERCon%29&rft.atitle=Sentiment+Lexicon+Expansion+using+Word2vec+and+fastText+for+Sentiment+Prediction+in+Tamil+texts&rft.au=Thavareesan%2C+Sajeetha&rft.au=Mahesan%2C+Sinnathamby&rft.date=2020-07-01&rft.pub=IEEE&rft.spage=272&rft.epage=276&rft_id=info:doi/10.1109%2FMERCon50084.2020.9185369&rft.externalDocID=9185369 |