Low-resource YouTube comment encoding for Luganda sentiment classification performance

The recent boom in social networks usage has generated some multilingual opinion data for low-resource languages. Luganda is one of the major languages in Uganda, thus it is a low-resource language and Luganda corpora for sentiment analysis especially for YouTube is not easily available. In this pap...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Digital Contents Society Vol. 21; no. 5; pp. 951 - 958
Main Authors	Ssentumbwe, Abdul Male, Jung, YuChul, Lee, Hyunah, Kim, Byeong Man
Format	Journal Article
Language	English
Published	한국디지털콘텐츠학회 31.05.2020
Subjects	컴퓨터학 Luganda Low-resource language 의견 마이닝 Opinion Mining 유튜브 댓글 YouTube Comments 저자원 언어 Sentiment Analysis 감성분석
Online Access	Get full text
ISSN	1598-2009 2287-738X
DOI	10.9728/dcs.2020.21.5.951

Cover

More Information
Summary:	The recent boom in social networks usage has generated some multilingual opinion data for low-resource languages. Luganda is one of the major languages in Uganda, thus it is a low-resource language and Luganda corpora for sentiment analysis especially for YouTube is not easily available. In this paper, we propose assumptions to guide collection of Luganda comments using Luganda YouTube video opinions for sentiment analysis. We evaluate the suitability of our clean YouTube comments (158) dataset for sentiment analysis using selected machine learning and deep learning classification algorithms. Given the low-resource setting, the dataset performs best with Gaussian Naive Bayes for machine learning (55%) and deep learning Multilayer Perceptron sequential model scoring (68.8%) when dataset splitting is at 10% for test set with Luganda comment segmentation. KCI Citation Count: 0
Bibliography:	http://dx.doi.org/10.9728/dcs.2020.21.5.951
ISSN:	1598-2009 2287-738X
DOI:	10.9728/dcs.2020.21.5.951