Overall Rating Prediction from Review Texts using Category-oriented Japanese Sentiment Polarity Dictionary

Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted...

Full description

Saved in:

Bibliographic Details
Published in	International Journal of Networking and Computing Vol. 14; no. 1; pp. 93 - 106
Main Authors	Morimoto, Yasuhiko, Sayaka, Sayaka, Kusunoki, Zaku
Format	Journal Article
Language	English
Published	IJNC Editorial Committee 2024
Subjects	BERT Natural language processing Rating Prediction Sentiment analysis
Online Access	Get full text
ISSN	2185-2839 2185-2847 2185-2847
DOI	10.15803/ijnc.14.1_93

Cover

Abstract	Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted along with a low rating, or vice versa). Such evaluations may need to be clarified for site users. To resolve such problems, we propose three highly accurate methods to predict an overall numerical rating from a textual review. Our new proposal is to use Categoryoriented Sentiment Polarity Dictionaries (CSPD), which are automatically compiled for each category using a Rakuten Travel review database. The CSPD gives the sentiment polarity value (i.e., the positivity/negativity value) for each sentiment word for each category. Our proposed methods first predict category ratings from the BERT vector for the review and the CSPD. After that, based on the predicted category ratings and the BERT vector, our methods predict the overall rating. We conducted evaluation experiments using the Rakuten Travel review datasetfor 2014-2019. Our experimental results show that our methods achieve higher accuracy than using only BERT vectors and successfully detect inconsistent evaluations.
AbstractList	Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted along with a low rating, or vice versa). Such evaluations may need to be clarified for site users. To resolve such problems, we propose three highly accurate methods to predict an overall numerical rating from a textual review. Our new proposal is to use Categoryoriented Sentiment Polarity Dictionaries (CSPD), which are automatically compiled for each category using a Rakuten Travel review database. The CSPD gives the sentiment polarity value (i.e., the positivity/negativity value) for each sentiment word for each category. Our proposed methods first predict category ratings from the BERT vector for the review and the CSPD. After that, based on the predicted category ratings and the BERT vector, our methods predict the overall rating. We conducted evaluation experiments using the Rakuten Travel review datasetfor 2014-2019. Our experimental results show that our methods achieve higher accuracy than using only BERT vectors and successfully detect inconsistent evaluations.
Author	Kusunoki, Zaku Sayaka, Sayaka Morimoto, Yasuhiko
Author_xml	– sequence: 1 fullname: Morimoto, Yasuhiko organization: Hiroshima University – sequence: 1 fullname: Sayaka, Sayaka organization: Hiroshima University – sequence: 1 fullname: Kusunoki, Zaku organization: Hiroshima University
BookMark	eNqFkE1PAjEQhhuDiYgcvfcPLLbbwm4PHgx-xwSCeG667YDdLF3SLuD-e4triDfnMF95ZjLzXqKeqx0gdE3JiI5zwm5s6fSI8hGVgp2hfkrzcZLmPOudciYu0DCEkkTLMkFS1kflbA9eVRVeqMa6NZ57MFY3tnZ45esNXsDewgEv4asJeBeOyFQ1sK59m9TegmvA4Fe1VQ4C4PdY2010eF5XytumxffdNuXbK3S-UlWA4W8coI_Hh-X0OXmbPb1M794SnTLCEmHyosiV0JobQYvUAGQTRjWkGeXGGMKNKozJMq654DFwMS70hNAUuC4iOkCjbu_ObVV7iM_JrbebeIGkRP6IJY9iScrlUaw4kHQD2tcheFj9y992fBkatYYTrXxjdQV_4OgEO_X1p_ISHPsGmR2Hww
Cites_doi	10.1111/j.2517-6161.1958.tb00292.x 10.1007/BF01589116 10.3115/1219840.1219857 10.1109/CANDARW51189.2020.00090 10.1109/CANDAR.2016.0129 10.1037/h0031619 10.1109/CANDAR57322.2022.00024 10.1145/3357384.3358086 10.1137/0916069 10.1002/asi.21416
ContentType	Journal Article
Copyright	2024 International Journal of Networking and Computing
Copyright_xml	– notice: 2024 International Journal of Networking and Computing
DBID	AAYXX CITATION ADTOC UNPAY
DOI	10.15803/ijnc.14.1_93
DatabaseName	CrossRef Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2185-2847
EndPage	106
ExternalDocumentID	10.15803/ijnc.14.1_93 10_15803_ijnc_14_1_93 article_ijnc_14_1_14_93_article_char_en
GroupedDBID	7.U ALMA_UNASSIGNED_HOLDINGS JSF JSH KQ8 KWQ OK1 RJT RZJ AAYXX CITATION ISHAI ADTOC UNPAY
ID	FETCH-LOGICAL-c2303-9d8bb8a9cc4d91b2dee7631ce2714ddd04dabdd774c494774495bc6012e4cb763
IEDL.DBID	UNPAY
ISSN	2185-2839 2185-2847
IngestDate	Tue Aug 19 16:27:45 EDT 2025 Wed Oct 01 01:51:15 EDT 2025 Wed Sep 03 06:30:31 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2303-9d8bb8a9cc4d91b2dee7631ce2714ddd04dabdd774c494774495bc6012e4cb763
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_pdf
PageCount	14
ParticipantIDs	unpaywall_primary_10_15803_ijnc_14_1_93 crossref_primary_10_15803_ijnc_14_1_93 jstage_primary_article_ijnc_14_1_14_93_article_char_en
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024 2024-00-00
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 2024
PublicationDecade	2020
PublicationTitle	International Journal of Networking and Computing
PublicationTitleAlternate	IJNC
PublicationYear	2024
Publisher	IJNC Editorial Committee
Publisher_xml	– name: IJNC Editorial Committee
References	[5] TripAdvisor. https://www.tripadvisor.jp/. [8] David R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958. [14] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, pages II–1188–II–1196, 2014. [9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [15] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989. [23] Yota Toyama, Makoto Miwa, and Yutaka Sasaki. Rating prediction by considering relations among documents and sentences and among categories. In Proceedings of the 22th Annual Meeting of the Association for Natural Language Processing, pages 158–161, 2016. (in Japanese). [1] Janome(ja). https://mocobeta.github.io/janome/. [12] Rakuten Group, Inc. Rakuten Dataset. Informatics Research Data Repository - National Institute of Informatics (dataset). https://doi.org/10.32130/idr.2.0, 2010. [16] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [7] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5):1190–1208, 1995. [4] Rakuten Travel. https://travel.rakuten.com/. [21] Koji Takuma, Junya Yamamoto, Sayaka Kamei, and Satoshi Fujita. A hotel recommendation system based on reviews: What do you attach importance to? In Proceedings of the 4th International Symposium on Computing and Networking, pages 710–712. IEEE, 2016. [19] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465, 2012. [22] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544–2558, 2010. [13] Kusunoki, Zaku, Sayaka Kamei, and Morimoto, Yasuhiko. Overall rating prediction from review texts using category-oriented japanese sentiment polarity dictionary. In Proceedings of the Tenth International Symposium on Computing and Networking (CANDAR), pages 124–129, 2022. [20] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Extracting semantic orientations of words using spin model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 133–140, 2005. [2] Japanese Sentiment Polarity Dictionary. https://www.nlp.ecei.tohoku.ac.jp/. Inui-Suzuki Lab. in Tohoku University. [3] Pretrained Japanese BERT models. https://github.com/cl-tohoku/bert-japanese. Tohoku NLP Group Github Repositoris. [17] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented sentiment polarity dictionary for rating prediction of Japanese hotels. In Proceedings of the 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pages 440–444, 2020. [24] Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang, and Xing Xie. ARP: Aspect-aware neural review rating prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2169–2172, 2019. [10] Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971. [18] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented Japanese sentiment polarity dictionary for rating prediction of hotels. IPSJ TOD, 14(3):16–29, 2021. (in Japanese). [6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. [11] Yoshinori Fujitani, Makoto Miwa, and Yutaka Sasaki. Prediction of ratings for hotel reviews using hidden states. In Proceedings of the 21th Annual meeting of the Association for Natural Language Processing, pages 764–767, 2015. (in Japanese). 11 22 12 23 13 24 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 20 10 21
References_xml	– reference: [19] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465, 2012. – reference: [21] Koji Takuma, Junya Yamamoto, Sayaka Kamei, and Satoshi Fujita. A hotel recommendation system based on reviews: What do you attach importance to? In Proceedings of the 4th International Symposium on Computing and Networking, pages 710–712. IEEE, 2016. – reference: [8] David R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958. – reference: [11] Yoshinori Fujitani, Makoto Miwa, and Yutaka Sasaki. Prediction of ratings for hotel reviews using hidden states. In Proceedings of the 21th Annual meeting of the Association for Natural Language Processing, pages 764–767, 2015. (in Japanese). – reference: [22] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544–2558, 2010. – reference: [3] Pretrained Japanese BERT models. https://github.com/cl-tohoku/bert-japanese. Tohoku NLP Group Github Repositoris. – reference: [7] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5):1190–1208, 1995. – reference: [13] Kusunoki, Zaku, Sayaka Kamei, and Morimoto, Yasuhiko. Overall rating prediction from review texts using category-oriented japanese sentiment polarity dictionary. In Proceedings of the Tenth International Symposium on Computing and Networking (CANDAR), pages 124–129, 2022. – reference: [23] Yota Toyama, Makoto Miwa, and Yutaka Sasaki. Rating prediction by considering relations among documents and sentences and among categories. In Proceedings of the 22th Annual Meeting of the Association for Natural Language Processing, pages 158–161, 2016. (in Japanese). – reference: [5] TripAdvisor. https://www.tripadvisor.jp/. – reference: [2] Japanese Sentiment Polarity Dictionary. https://www.nlp.ecei.tohoku.ac.jp/. Inui-Suzuki Lab. in Tohoku University. – reference: [4] Rakuten Travel. https://travel.rakuten.com/. – reference: [12] Rakuten Group, Inc. Rakuten Dataset. Informatics Research Data Repository - National Institute of Informatics (dataset). https://doi.org/10.32130/idr.2.0, 2010. – reference: [18] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented Japanese sentiment polarity dictionary for rating prediction of hotels. IPSJ TOD, 14(3):16–29, 2021. (in Japanese). – reference: [16] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. – reference: [20] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Extracting semantic orientations of words using spin model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 133–140, 2005. – reference: [17] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented sentiment polarity dictionary for rating prediction of Japanese hotels. In Proceedings of the 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pages 440–444, 2020. – reference: [24] Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang, and Xing Xie. ARP: Aspect-aware neural review rating prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2169–2172, 2019. – reference: [1] Janome(ja). https://mocobeta.github.io/janome/. – reference: [6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. – reference: [14] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, pages II–1188–II–1196, 2014. – reference: [10] Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971. – reference: [15] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989. – reference: [9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. – ident: 2 – ident: 3 – ident: 8 doi: 10.1111/j.2517-6161.1958.tb00292.x – ident: 5 – ident: 4 – ident: 15 doi: 10.1007/BF01589116 – ident: 20 doi: 10.3115/1219840.1219857 – ident: 1 – ident: 12 – ident: 11 – ident: 18 doi: 10.1109/CANDARW51189.2020.00090 – ident: 19 – ident: 16 – ident: 17 doi: 10.1109/CANDARW51189.2020.00090 – ident: 14 – ident: 21 doi: 10.1109/CANDAR.2016.0129 – ident: 10 doi: 10.1037/h0031619 – ident: 13 doi: 10.1109/CANDAR57322.2022.00024 – ident: 24 doi: 10.1145/3357384.3358086 – ident: 6 – ident: 7 doi: 10.1137/0916069 – ident: 9 – ident: 22 doi: 10.1002/asi.21416 – ident: 23
SSID	ssj0000779023
Score	2.2452216
Snippet	Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical...
SourceID	unpaywall crossref jstage
SourceType	Open Access Repository Index Database Publisher
StartPage	93
SubjectTerms	BERT Natural language processing Rating Prediction Sentiment analysis
Title	Overall Rating Prediction from Review Texts using Category-oriented Japanese Sentiment Polarity Dictionary
URI	https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_article/-char/en https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_pdf
UnpaywallVersion	publishedVersion
Volume	14
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	International Journal of Networking and Computing, 2024, Vol.14(1), pp.93-106
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2185-2847 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000779023 issn: 2185-2847 databaseCode: KQ8 dateStart: 20110101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9tAEB5BqIQ48ChFBFG0B0RPa8f2-rHHiBYhEBBaIlEu1r4SESInCglV-uuZiZ007QFVXLyWNdpdz9gz8-3OzgAcizhzIkoED3WquLCqw7WNUq5UQ5lEuzCZhfxfXSfnbXFxH98vlfqisMoe-kVdR43XHXi9oV8x0X_sFcYPhB_gJZeRnw9tZxXWkhjd8Bqsta9bzZ9UTA5tEEezKf_ci7TKrxlnjWjWD6oIL8hl9Jc9-lAOvQHrk2Kopr9Uv79kbc624GE-zzLI5MmbjLVnfv-TwvFdL7INm5UPypol2Q6suOIjbM3rO7Dqd9-F3s0LrVj12XdFwdGsNaJdHZIko1MprNxXYHeo358ZBdB32SklnhiMpnxA-ZPRm2UXaI2pyiX7QXFJtBbJWgSn0ftnX8ve1Gj6Cdpn3-5Oz3lVm4EbBC0RlzbTOlPSGGFloEPrHGqqwLgwDYS1toFi19aic2mEFNggENMG0V_ohNFIuge1YlC4fWCoVrSQ2FngrDANpeM46shUImAOXCJdHU7mEsqHZQqOnKALiTInTiJ8yUmUdUhKji_IKn4vUc34vXhOJ9tQPdThy0Lebw9x8N-Uh1AbjybuM7osY30Eq5e32VH1ib4Ca0bylw
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swED-6dDD6sLTdRlO2oYeyPsmObdmOHku2EgpNw9pAtxejr4RmwTFu0pL99buLnSzbQyl9sYw5JPnOvrufdLoDOBFxx4koETzUqeLCqhHXNkq5Um1lEu3CZBXyf9lPekNxcRvfbpX6orDKCfpFY0eNN555k8KvmejfTXLjB8IP8JLJyM8KO3oFu0mMbngDdof9wdkPKiaHNoij2ZR_70Va59eMO-1o1Q-qCC_IZPSPPXpdDb0HbxZ5oZaPajrdsjbnTfi5nmcVZPLLW8y1Z37_l8LxRS-yD29rH5SdVWQHsOPyQ2iu6zuw-nd_B5OrB1qxmrLvioKj2aCkXR2SJKNTKazaV2A3qN_vGQXQj1mXEk_MyiWfUf5k9GbZBVpjqnLJrikuidYi2YDgNHr_7GvVmyqX72F4_u2m2-N1bQZuELREXNqO1h0ljRFWBjq0zqGmCowL00BYa9sodm0tOpdGSIENAjFtEP2FThiNpB-gkc9ydwQM1YoWEjsLnBWmrXQcRyOZSgTMgUuka8GXtYSyokrBkRF0IVFmxEmELxmJsgVJxfENWc3vLaoVvzfP6WQbqocWnG7k_fQQx8-m_AiNeblwn9BlmevP9cf5B8Ed8aI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Overall+Rating+Prediction+from+Review+Texts+using+Category-oriented+Japanese+Sentiment+Polarity+Dictionary&rft.jtitle=International+journal+of+networking+and+computing&rft.au=Kusunoki%2C+Zaku&rft.au=Sayaka%2C+Sayaka&rft.au=Morimoto%2C+Yasuhiko&rft.date=2024&rft.issn=2185-2839&rft.eissn=2185-2847&rft.volume=14&rft.issue=1&rft.spage=93&rft.epage=106&rft_id=info:doi/10.15803%2Fijnc.14.1_93&rft.externalDBID=n%2Fa&rft.externalDocID=10_15803_ijnc_14_1_93
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2185-2839&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2185-2839&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2185-2839&client=summon