Overall Rating Prediction from Review Texts using Category-oriented Japanese Sentiment Polarity Dictionary

Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Networking and Computing Vol. 14; no. 1; pp. 93 - 106
Main Authors Morimoto, Yasuhiko, Sayaka, Sayaka, Kusunoki, Zaku
Format Journal Article
LanguageEnglish
Published IJNC Editorial Committee 2024
Subjects
Online AccessGet full text
ISSN2185-2839
2185-2847
2185-2847
DOI10.15803/ijnc.14.1_93

Cover

Abstract Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted along with a low rating, or vice versa). Such evaluations may need to be clarified for site users. To resolve such problems, we propose three highly accurate methods to predict an overall numerical rating from a textual review. Our new proposal is to use Categoryoriented Sentiment Polarity Dictionaries (CSPD), which are automatically compiled for each category using a Rakuten Travel review database. The CSPD gives the sentiment polarity value (i.e., the positivity/negativity value) for each sentiment word for each category. Our proposed methods first predict category ratings from the BERT vector for the review and the CSPD. After that, based on the predicted category ratings and the BERT vector, our methods predict the overall rating. We conducted evaluation experiments using the Rakuten Travel review datasetfor 2014-2019. Our experimental results show that our methods achieve higher accuracy than using only BERT vectors and successfully detect inconsistent evaluations.
AbstractList Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical ratings, and there are some evaluations in which textual reviews and numerical ratings are inconsistent (i.e., a positive review text is posted along with a low rating, or vice versa). Such evaluations may need to be clarified for site users. To resolve such problems, we propose three highly accurate methods to predict an overall numerical rating from a textual review. Our new proposal is to use Categoryoriented Sentiment Polarity Dictionaries (CSPD), which are automatically compiled for each category using a Rakuten Travel review database. The CSPD gives the sentiment polarity value (i.e., the positivity/negativity value) for each sentiment word for each category. Our proposed methods first predict category ratings from the BERT vector for the review and the CSPD. After that, based on the predicted category ratings and the BERT vector, our methods predict the overall rating. We conducted evaluation experiments using the Rakuten Travel review datasetfor 2014-2019. Our experimental results show that our methods achieve higher accuracy than using only BERT vectors and successfully detect inconsistent evaluations.
Author Kusunoki, Zaku
Sayaka, Sayaka
Morimoto, Yasuhiko
Author_xml – sequence: 1
  fullname: Morimoto, Yasuhiko
  organization: Hiroshima University
– sequence: 1
  fullname: Sayaka, Sayaka
  organization: Hiroshima University
– sequence: 1
  fullname: Kusunoki, Zaku
  organization: Hiroshima University
BookMark eNqFkE1PAjEQhhuDiYgcvfcPLLbbwm4PHgx-xwSCeG667YDdLF3SLuD-e4triDfnMF95ZjLzXqKeqx0gdE3JiI5zwm5s6fSI8hGVgp2hfkrzcZLmPOudciYu0DCEkkTLMkFS1kflbA9eVRVeqMa6NZ57MFY3tnZ45esNXsDewgEv4asJeBeOyFQ1sK59m9TegmvA4Fe1VQ4C4PdY2010eF5XytumxffdNuXbK3S-UlWA4W8coI_Hh-X0OXmbPb1M794SnTLCEmHyosiV0JobQYvUAGQTRjWkGeXGGMKNKozJMq654DFwMS70hNAUuC4iOkCjbu_ObVV7iM_JrbebeIGkRP6IJY9iScrlUaw4kHQD2tcheFj9y992fBkatYYTrXxjdQV_4OgEO_X1p_ISHPsGmR2Hww
Cites_doi 10.1111/j.2517-6161.1958.tb00292.x
10.1007/BF01589116
10.3115/1219840.1219857
10.1109/CANDARW51189.2020.00090
10.1109/CANDAR.2016.0129
10.1037/h0031619
10.1109/CANDAR57322.2022.00024
10.1145/3357384.3358086
10.1137/0916069
10.1002/asi.21416
ContentType Journal Article
Copyright 2024 International Journal of Networking and Computing
Copyright_xml – notice: 2024 International Journal of Networking and Computing
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.15803/ijnc.14.1_93
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2185-2847
EndPage 106
ExternalDocumentID 10.15803/ijnc.14.1_93
10_15803_ijnc_14_1_93
article_ijnc_14_1_14_93_article_char_en
GroupedDBID 7.U
ALMA_UNASSIGNED_HOLDINGS
JSF
JSH
KQ8
KWQ
OK1
RJT
RZJ
AAYXX
CITATION
ISHAI
ADTOC
UNPAY
ID FETCH-LOGICAL-c2303-9d8bb8a9cc4d91b2dee7631ce2714ddd04dabdd774c494774495bc6012e4cb763
IEDL.DBID UNPAY
ISSN 2185-2839
2185-2847
IngestDate Tue Aug 19 16:27:45 EDT 2025
Wed Oct 01 01:51:15 EDT 2025
Wed Sep 03 06:30:31 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2303-9d8bb8a9cc4d91b2dee7631ce2714ddd04dabdd774c494774495bc6012e4cb763
OpenAccessLink https://proxy.k.utb.cz/login?url=https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_pdf
PageCount 14
ParticipantIDs unpaywall_primary_10_15803_ijnc_14_1_93
crossref_primary_10_15803_ijnc_14_1_93
jstage_primary_article_ijnc_14_1_14_93_article_char_en
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024
2024-00-00
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 2024
PublicationDecade 2020
PublicationTitle International Journal of Networking and Computing
PublicationTitleAlternate IJNC
PublicationYear 2024
Publisher IJNC Editorial Committee
Publisher_xml – name: IJNC Editorial Committee
References [5] TripAdvisor. https://www.tripadvisor.jp/.
[8] David R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958.
[14] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, pages II–1188–II–1196, 2014.
[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[15] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989.
[23] Yota Toyama, Makoto Miwa, and Yutaka Sasaki. Rating prediction by considering relations among documents and sentences and among categories. In Proceedings of the 22th Annual Meeting of the Association for Natural Language Processing, pages 158–161, 2016. (in Japanese).
[1] Janome(ja). https://mocobeta.github.io/janome/.
[12] Rakuten Group, Inc. Rakuten Dataset. Informatics Research Data Repository - National Institute of Informatics (dataset). https://doi.org/10.32130/idr.2.0, 2010.
[16] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[7] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5):1190–1208, 1995.
[4] Rakuten Travel. https://travel.rakuten.com/.
[21] Koji Takuma, Junya Yamamoto, Sayaka Kamei, and Satoshi Fujita. A hotel recommendation system based on reviews: What do you attach importance to? In Proceedings of the 4th International Symposium on Computing and Networking, pages 710–712. IEEE, 2016.
[19] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465, 2012.
[22] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544–2558, 2010.
[13] Kusunoki, Zaku, Sayaka Kamei, and Morimoto, Yasuhiko. Overall rating prediction from review texts using category-oriented japanese sentiment polarity dictionary. In Proceedings of the Tenth International Symposium on Computing and Networking (CANDAR), pages 124–129, 2022.
[20] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Extracting semantic orientations of words using spin model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 133–140, 2005.
[2] Japanese Sentiment Polarity Dictionary. https://www.nlp.ecei.tohoku.ac.jp/. Inui-Suzuki Lab. in Tohoku University.
[3] Pretrained Japanese BERT models. https://github.com/cl-tohoku/bert-japanese. Tohoku NLP Group Github Repositoris.
[17] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented sentiment polarity dictionary for rating prediction of Japanese hotels. In Proceedings of the 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pages 440–444, 2020.
[24] Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang, and Xing Xie. ARP: Aspect-aware neural review rating prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2169–2172, 2019.
[10] Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971.
[18] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented Japanese sentiment polarity dictionary for rating prediction of hotels. IPSJ TOD, 14(3):16–29, 2021. (in Japanese).
[6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006.
[11] Yoshinori Fujitani, Makoto Miwa, and Yutaka Sasaki. Prediction of ratings for hotel reviews using hidden states. In Proceedings of the 21th Annual meeting of the Association for Natural Language Processing, pages 764–767, 2015. (in Japanese).
11
22
12
23
13
24
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
20
10
21
References_xml – reference: [19] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455–465, 2012.
– reference: [21] Koji Takuma, Junya Yamamoto, Sayaka Kamei, and Satoshi Fujita. A hotel recommendation system based on reviews: What do you attach importance to? In Proceedings of the 4th International Symposium on Computing and Networking, pages 710–712. IEEE, 2016.
– reference: [8] David R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958.
– reference: [11] Yoshinori Fujitani, Makoto Miwa, and Yutaka Sasaki. Prediction of ratings for hotel reviews using hidden states. In Proceedings of the 21th Annual meeting of the Association for Natural Language Processing, pages 764–767, 2015. (in Japanese).
– reference: [22] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12):2544–2558, 2010.
– reference: [3] Pretrained Japanese BERT models. https://github.com/cl-tohoku/bert-japanese. Tohoku NLP Group Github Repositoris.
– reference: [7] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5):1190–1208, 1995.
– reference: [13] Kusunoki, Zaku, Sayaka Kamei, and Morimoto, Yasuhiko. Overall rating prediction from review texts using category-oriented japanese sentiment polarity dictionary. In Proceedings of the Tenth International Symposium on Computing and Networking (CANDAR), pages 124–129, 2022.
– reference: [23] Yota Toyama, Makoto Miwa, and Yutaka Sasaki. Rating prediction by considering relations among documents and sentences and among categories. In Proceedings of the 22th Annual Meeting of the Association for Natural Language Processing, pages 158–161, 2016. (in Japanese).
– reference: [5] TripAdvisor. https://www.tripadvisor.jp/.
– reference: [2] Japanese Sentiment Polarity Dictionary. https://www.nlp.ecei.tohoku.ac.jp/. Inui-Suzuki Lab. in Tohoku University.
– reference: [4] Rakuten Travel. https://travel.rakuten.com/.
– reference: [12] Rakuten Group, Inc. Rakuten Dataset. Informatics Research Data Repository - National Institute of Informatics (dataset). https://doi.org/10.32130/idr.2.0, 2010.
– reference: [18] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented Japanese sentiment polarity dictionary for rating prediction of hotels. IPSJ TOD, 14(3):16–29, 2021. (in Japanese).
– reference: [16] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
– reference: [20] Hiroya Takamura, Takashi Inui, and Manabu Okumura. Extracting semantic orientations of words using spin model. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 133–140, 2005.
– reference: [17] Akito Shibata, Sayaka Kamei, and Koji Nakano. Category-oriented sentiment polarity dictionary for rating prediction of Japanese hotels. In Proceedings of the 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pages 440–444, 2020.
– reference: [24] Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang, and Xing Xie. ARP: Aspect-aware neural review rating prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2169–2172, 2019.
– reference: [1] Janome(ja). https://mocobeta.github.io/janome/.
– reference: [6] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006.
– reference: [14] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, pages II–1188–II–1196, 2014.
– reference: [10] Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971.
– reference: [15] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989.
– reference: [9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
– ident: 2
– ident: 3
– ident: 8
  doi: 10.1111/j.2517-6161.1958.tb00292.x
– ident: 5
– ident: 4
– ident: 15
  doi: 10.1007/BF01589116
– ident: 20
  doi: 10.3115/1219840.1219857
– ident: 1
– ident: 12
– ident: 11
– ident: 18
  doi: 10.1109/CANDARW51189.2020.00090
– ident: 19
– ident: 16
– ident: 17
  doi: 10.1109/CANDARW51189.2020.00090
– ident: 14
– ident: 21
  doi: 10.1109/CANDAR.2016.0129
– ident: 10
  doi: 10.1037/h0031619
– ident: 13
  doi: 10.1109/CANDAR57322.2022.00024
– ident: 24
  doi: 10.1145/3357384.3358086
– ident: 6
– ident: 7
  doi: 10.1137/0916069
– ident: 9
– ident: 22
  doi: 10.1002/asi.21416
– ident: 23
SSID ssj0000779023
Score 2.2452216
Snippet Hotel booking sites provide evaluations, including textual reviews and numerical ratings by hotel guests. However, some evaluations do not include numerical...
SourceID unpaywall
crossref
jstage
SourceType Open Access Repository
Index Database
Publisher
StartPage 93
SubjectTerms BERT
Natural language processing
Rating Prediction
Sentiment analysis
Title Overall Rating Prediction from Review Texts using Category-oriented Japanese Sentiment Polarity Dictionary
URI https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_article/-char/en
https://www.jstage.jst.go.jp/article/ijnc/14/1/14_93/_pdf
UnpaywallVersion publishedVersion
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX International Journal of Networking and Computing, 2024, Vol.14(1), pp.93-106
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2185-2847
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000779023
  issn: 2185-2847
  databaseCode: KQ8
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9tAEB5BqIQ48ChFBFG0B0RPa8f2-rHHiBYhEBBaIlEu1r4SESInCglV-uuZiZ007QFVXLyWNdpdz9gz8-3OzgAcizhzIkoED3WquLCqw7WNUq5UQ5lEuzCZhfxfXSfnbXFxH98vlfqisMoe-kVdR43XHXi9oV8x0X_sFcYPhB_gJZeRnw9tZxXWkhjd8Bqsta9bzZ9UTA5tEEezKf_ci7TKrxlnjWjWD6oIL8hl9Jc9-lAOvQHrk2Kopr9Uv79kbc624GE-zzLI5MmbjLVnfv-TwvFdL7INm5UPypol2Q6suOIjbM3rO7Dqd9-F3s0LrVj12XdFwdGsNaJdHZIko1MprNxXYHeo358ZBdB32SklnhiMpnxA-ZPRm2UXaI2pyiX7QXFJtBbJWgSn0ftnX8ve1Gj6Cdpn3-5Oz3lVm4EbBC0RlzbTOlPSGGFloEPrHGqqwLgwDYS1toFi19aic2mEFNggENMG0V_ohNFIuge1YlC4fWCoVrSQ2FngrDANpeM46shUImAOXCJdHU7mEsqHZQqOnKALiTInTiJ8yUmUdUhKji_IKn4vUc34vXhOJ9tQPdThy0Lebw9x8N-Uh1AbjybuM7osY30Eq5e32VH1ib4Ca0bylw
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swED-6dDD6sLTdRlO2oYeyPsmObdmOHku2EgpNw9pAtxejr4RmwTFu0pL99buLnSzbQyl9sYw5JPnOvrufdLoDOBFxx4koETzUqeLCqhHXNkq5Um1lEu3CZBXyf9lPekNxcRvfbpX6orDKCfpFY0eNN555k8KvmejfTXLjB8IP8JLJyM8KO3oFu0mMbngDdof9wdkPKiaHNoij2ZR_70Va59eMO-1o1Q-qCC_IZPSPPXpdDb0HbxZ5oZaPajrdsjbnTfi5nmcVZPLLW8y1Z37_l8LxRS-yD29rH5SdVWQHsOPyQ2iu6zuw-nd_B5OrB1qxmrLvioKj2aCkXR2SJKNTKazaV2A3qN_vGQXQj1mXEk_MyiWfUf5k9GbZBVpjqnLJrikuidYi2YDgNHr_7GvVmyqX72F4_u2m2-N1bQZuELREXNqO1h0ljRFWBjq0zqGmCowL00BYa9sodm0tOpdGSIENAjFtEP2FThiNpB-gkc9ydwQM1YoWEjsLnBWmrXQcRyOZSgTMgUuka8GXtYSyokrBkRF0IVFmxEmELxmJsgVJxfENWc3vLaoVvzfP6WQbqocWnG7k_fQQx8-m_AiNeblwn9BlmevP9cf5B8Ed8aI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Overall+Rating+Prediction+from+Review+Texts+using+Category-oriented+Japanese+Sentiment+Polarity+Dictionary&rft.jtitle=International+journal+of+networking+and+computing&rft.au=Kusunoki%2C+Zaku&rft.au=Sayaka%2C+Sayaka&rft.au=Morimoto%2C+Yasuhiko&rft.date=2024&rft.issn=2185-2839&rft.eissn=2185-2847&rft.volume=14&rft.issue=1&rft.spage=93&rft.epage=106&rft_id=info:doi/10.15803%2Fijnc.14.1_93&rft.externalDBID=n%2Fa&rft.externalDocID=10_15803_ijnc_14_1_93
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2185-2839&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2185-2839&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2185-2839&client=summon