Penerapan Metode CRISP-DM dalam Klasifikasi Data Ulasan Pengunjung Destinasi Danau Toba Menggunakan Algoritma Naïve Bayes Classifier (NBC) dan Decision Tree (DT)
This study aims to implement a classification method using the Nave Bayes Classifier (NBC) algorithm on Lake Toba visitor review text data. The Cross Industry Standard Process for Data Mining (CRISP-DM) methodology comprises the following stages: business understanding, data understanding, data prep...
Saved in:
| Published in | JURNAL MEDIA INFORMATIKA BUDIDARMA Vol. 7; no. 3; p. 1551 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
31.07.2023
|
| Online Access | Get full text |
| ISSN | 2614-5278 2548-8368 2548-8368 |
| DOI | 10.30865/mib.v7i3.6461 |
Cover
| Summary: | This study aims to implement a classification method using the Nave Bayes Classifier (NBC) algorithm on Lake Toba visitor review text data. The Cross Industry Standard Process for Data Mining (CRISP-DM) methodology comprises the following stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The findings of this study indicate that during the phase of business comprehension, the context of the discussion focuses on the tourism sector, specifically tourist perceptions of the quality of products and services at Lake Toba tourist destinations. At the data comprehension stage, the source of review data used was the Tripadvisor website, which contained as many as 858 reviews with the following rating classification: 8 reviews with abysmal ratings; 22 reviews with poor ratings; 81 reviews with neutral ratings; 304 reviews with good ratings; 443 reviews with excellent ratings. Data cleansing is performed at the data preparation stage so that 382 data are processed by dividing training data by 70 percent and test data by 30 percent. During the modeling phase, the performance of the NBC and DT algorithms was evaluated using and without SMOTE UPsampling operators. The comparison of NBC and DT algorithm values indicates that the model with the best performance is DT using SMOTE UPsampling operators with accuracy values (98.27 percent), precision values (98.83 percent), recall values (97.71 percent), f-measure values (98.26 percent), and AUC values (98.27 percent) (0.982). At the evaluation stage, the importance of excellent service (Quality Human Resources) and supporting infrastructure was highlighted by analyzing the results of ranking the five most frequently used terms in Lake Toba visitor review data (tourism facilities and infrastructure). At the deployment stage, it is necessary to balance the development of attractions, accessibility, lodging, and tourism-supporting amenities to generate visiting intention and revisit motivation to Lake Toba. |
|---|---|
| ISSN: | 2614-5278 2548-8368 2548-8368 |
| DOI: | 10.30865/mib.v7i3.6461 |