Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are hug...

Full description

Saved in:

Bibliographic Details
Published in	2020 Moratuwa Engineering Research Conference (MERCon) pp. 272 - 276
Main Authors	Thavareesan, Sajeetha, Mahesan, Sinnathamby
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2020
Subjects	Analytical models Classification algorithms Computational modeling conjunction grammar rule lexicon Motion pictures Predictive models Sentiment analysis Tamil Task analysis
Online Access	Get full text
DOI	10.1109/MERCon50084.2020.9185369

Cover

Abstract	Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained.
AbstractList	Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained.
Author	Mahesan, Sinnathamby Thavareesan, Sajeetha
Author_xml	– sequence: 1 givenname: Sajeetha surname: Thavareesan fullname: Thavareesan, Sajeetha organization: Eastern University,Dept. of Mathematics,Sri Lanka – sequence: 2 givenname: Sinnathamby surname: Mahesan fullname: Mahesan, Sinnathamby organization: University of Jaffna,Dept. of Computer Science,Sri Lanka
BookMark	eNpFj89KxDAYxCPoQdd9Ai95gdYkbZrkKKX-gYqiFY_l2-SLBLbp0kapb2_FBS8zc5gZ-F2Q0zhGJIRylnPOzPVj81KPUTKmy1wwwXLDtSwqc0K2RmmuhObGKKnOSf-KMYVhFdriEuwYabMcIM5hTZ9ziB_0fZyc-EJLITrqYU4dLon6caL_2-cJXbDpdxQi7WAIe5rW2nxJzjzsZ9wefUPebpuuvs_ap7uH-qbNgmBFyji3ZcmwVNo4pTzuCg0c0IByFWjNKiZ3gNZXgnOoSictSoFsRfASVqJiQ67-fgMi9ocpDDB990fs4gcZ-1Q9
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/MERCon50084.2020.9185369
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781728199757 1728199751
EndPage	276
ExternalDocumentID	9185369
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813
IEDL.DBID	RIE
IngestDate	Mon Jul 08 05:38:35 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-11c440e4789d77feb38a1ae9a7d6a880605baecf6211a64d5ce52e0757f5a7813
PageCount	5
ParticipantIDs	ieee_primary_9185369
PublicationCentury	2000
PublicationDate	2020-July
PublicationDateYYYYMMDD	2020-07-01
PublicationDate_xml	– month: 07 year: 2020 text: 2020-July
PublicationDecade	2020
PublicationTitle	2020 Moratuwa Engineering Research Conference (MERCon)
PublicationTitleAbbrev	MERCon
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	2.0194855
Snippet	Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the...
SourceID	ieee
SourceType	Publisher
StartPage	272
SubjectTerms	Analytical models Classification algorithms Computational modeling conjunction grammar rule lexicon Motion pictures Predictive models Sentiment analysis Tamil Task analysis
Title	Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts
URI	https://ieeexplore.ieee.org/document/9185369
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF7anjyptOKbPXg06eax2ey5pBSxUrTF3spmM5EipNKmUvz1ziR9oHjwFpYsWXbYfPPNfjPD2F2gYpKFBU6G-I4EJdeO8TPhWIQKZYSfplX7tuFTNJiED1M5bbD7fS4MAFTiM3DpsbrLzxZ2TaGyriZwiXSTNZXSda7WTpwjdHeYPPcWhaQK8cj7fOFuX__RN6WCjf4xG-4-WKtF3t11mbr261ctxv-u6IR1Dgl6fLSHnlPWgKLNZi-k_KEJ_BE2aOGCJxs87BQP46Rvf-OvSDX9T7DcFBnPzaoc47-Zo9_KD3NHS7q7IXvxecHHFADhJA9Zddikn4x7A2fbP8GZ-yIoHc-zYSggVLHOlMqRNsfGM6CNyiKD5xaZTGrA5hGSQBOFmbQgfUAfQuXSoA2DM9YqFgWcM57HkdSByEOLDhSkysQSAhMK6xmZSi-6YG3anNlHXSJjtt2Xy7-Hr9gRGahWvV6zVrlcww1ie5neVkb9Bj60pkU
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qPehJpRXf7sGjSTePzSbn0lK1KUVT7C1sNhMpQiptIsVf72zSB4oHbyEwJOyw-803-80MIXeO8LUszDFSxHckKFlgSDtlhkKoEJLZSVKNbwtH3mDiPk75tEHut7UwAFCJz8DUj9VdfjpXpU6VdQINLl6wR_Y5sgpRV2tt5Dks6IS95-4857pHPDI_m5lrgx-TUyrg6B-RcPPJWi_ybpZFYqqvX90Y__tPx6S9K9Gj4y34nJAG5C0Sv2jtjzagQ1ihj3PaW-F21xkxqhXub_QVyab9CYrKPKWZXBYRns4UI1e6sx0v9O2N9hid5TTSKRCqBSLLNpn0e1F3YKwnKBgzmzmFYVnKdRm4wg9SITIkzr60JARSpJ7EnYtcJpGgMg9poPTclCvgNmAUITIu0YvOKWnm8xzOCM18jwcOy1yFIRQkQvocHOkyZUmecMs7Jy29OPFH3SQjXq_Lxd-vb8nBIAqH8fBh9HRJDrWzag3sFWkWixKuEemL5KZy8DcnuKmW
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+Moratuwa+Engineering+Research+Conference+%28MERCon%29&rft.atitle=Sentiment+Lexicon+Expansion+using+Word2vec+and+fastText+for+Sentiment+Prediction+in+Tamil+texts&rft.au=Thavareesan%2C+Sajeetha&rft.au=Mahesan%2C+Sinnathamby&rft.date=2020-07-01&rft.pub=IEEE&rft.spage=272&rft.epage=276&rft_id=info:doi/10.1109%2FMERCon50084.2020.9185369&rft.externalDocID=9185369