공공 자전거 수요 예측을 위한 사이킷런의 지도 기계 학습 모델 성능 비교

본 연구는 공공 자전거 수요 예측을 위해 사이킷런에서 제공하는 기계 학습 모델들의 성능을 비교 평가를 하였다. 공공에서 제공하고 있는 신뢰성 있는 데이터를 실험에 사용하였는데, 서울시가 제공하는 ‘서울시 공공자전거 이용정보’ 데이터와 기상청이 제공하는 ‘날씨 정보’를 활용하였다. 사이킷런 지도학습의 모델인 랜덤 포레스트, 그래디언트 부스팅, 결정 트리, 선형 회귀를 사용하였고, 성능을 비교 분석하기 위해 RMSE, R2, RMSLE, 정확도를 계산하여 평가 지표로 사용하였다. 그 결과 랜덤 포레스트 모델이 RMSE 347.37, R...

Full description

Saved in:

Bibliographic Details
Published in	디지털콘텐츠학회논문지 Vol. 24; no. 6; pp. 1305 - 1315
Main Authors	권혜진(Hye-Jin Kwon), 하진영(Jin-Young Ha)
Format	Journal Article
Language	Korean
Published	한국디지털콘텐츠학회 01.06.2023
Subjects	컴퓨터학 데이터 분석 공공자전거 수요 예측 파이썬 Demand Prediction Public Bicycles Data Analysis 기계 학습 사이킷런 Machine Learning Scikit-Learn Python
Online Access	Get full text
ISSN	1598-2009 2287-738X
DOI	10.9728/dcs.2023.24.6.1305

Cover

More Information
Summary:	본 연구는 공공 자전거 수요 예측을 위해 사이킷런에서 제공하는 기계 학습 모델들의 성능을 비교 평가를 하였다. 공공에서 제공하고 있는 신뢰성 있는 데이터를 실험에 사용하였는데, 서울시가 제공하는 ‘서울시 공공자전거 이용정보’ 데이터와 기상청이 제공하는 ‘날씨 정보’를 활용하였다. 사이킷런 지도학습의 모델인 랜덤 포레스트, 그래디언트 부스팅, 결정 트리, 선형 회귀를 사용하였고, 성능을 비교 분석하기 위해 RMSE, R2, RMSLE, 정확도를 계산하여 평가 지표로 사용하였다. 그 결과 랜덤 포레스트 모델이 RMSE 347.37, R2 0.74, RMSLE 0.51, 정확도 67.61%로 가장 성능을 보였다. 그래디언트 부스팅과 결정 트리 모델은 랜덤 포레스트보다 다소 낮은 성능을 보였지만, 선형 회귀의 성능은 현저하게 낮음을 확인할 수 있었다. 다양한 모델들을 활용한 수요 예측 분석을 통해 최적의 모델을 선정하여 수요 예측 오차를 줄여 나가는 데 도움이 될 수 있을 것으로 판단한다. This study compares and evaluates the performance of machine learning models provided by scikit-learn for predicting public bicycle demand. Reliable data provided by the government, namely "Seoul public bicycle usage information" provided by the Seoul Metropolitan Government and "weather information" provided by the Korea Meteorological Administration, were used for the experiment. Supervised learning models in scikit-learn, namely random forest, gradient boosting, decision tree, and linear regression, were used, and performance was evaluated using RMSE, R2, RMSLE, and accuracy. The random forest model showed the best performance with an RMSE of 347.37, R2 of 0.74, RMSLE of 0.51, and accuracy of 67.61%. The gradient boosting and decision tree were the next best-performing models, whereas the linear regression had the worst performance, as expected. Thus, from the various models for demand prediction analysis, the optimal model can be selected to reduce demand prediction errors. KCI Citation Count: 0
Bibliography:	http://dx.doi.org/10.9728/dcs.2023.24.6.1305
ISSN:	1598-2009 2287-738X
DOI:	10.9728/dcs.2023.24.6.1305