Low-resource YouTube comment encoding for Luganda sentiment classification performance
The recent boom in social networks usage has generated some multilingual opinion data for low-resource languages. Luganda is one of the major languages in Uganda, thus it is a low-resource language and Luganda corpora for sentiment analysis especially for YouTube is not easily available. In this pap...
Saved in:
Published in | Journal of Digital Contents Society Vol. 21; no. 5; pp. 951 - 958 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
한국디지털콘텐츠학회
31.05.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 1598-2009 2287-738X |
DOI | 10.9728/dcs.2020.21.5.951 |
Cover
Summary: | The recent boom in social networks usage has generated some multilingual opinion data for low-resource languages. Luganda is one of the major languages in Uganda, thus it is a low-resource language and Luganda corpora for sentiment analysis especially for YouTube is not easily available. In this paper, we propose assumptions to guide collection of Luganda comments using Luganda YouTube video opinions for sentiment analysis. We evaluate the suitability of our clean YouTube comments (158) dataset for sentiment analysis using selected machine learning and deep learning classification algorithms. Given the low-resource setting, the dataset performs best with Gaussian Naive Bayes for machine learning (55%) and deep learning Multilayer Perceptron sequential model scoring (68.8%) when dataset splitting is at 10% for test set with Luganda comment segmentation. KCI Citation Count: 0 |
---|---|
Bibliography: | http://dx.doi.org/10.9728/dcs.2020.21.5.951 |
ISSN: | 1598-2009 2287-738X |
DOI: | 10.9728/dcs.2020.21.5.951 |