Evaluating the Performance of Explainable Machine Learning Models in Traffic Accidents Prediction in California

Reducing and preventing road traffic accidents is a major public health problem and a priority for many nations. In this paper, we seek to explore the performance of explainable machine learning models applied to the prediction of road traffic crashes using a dataset containing nearly three million...

Full description

Saved in:

Bibliographic Details
Published in	2020 39th International Conference of the Chilean Computer Science Society (SCCC) pp. 1 - 8
Main Authors	Parra, Camilo, Ponce, Carlos, Rodrigo, Salas F.
Format	Conference Proceeding
Language	English
Published	IEEE 16.11.2020
Subjects	Decision Trees Geolocation data Gradient Boosted Trees Random Forest Traffic Accidents
Online Access	Get full text
DOI	10.1109/SCCC51225.2020.9281196

Cover

More Information
Summary:	Reducing and preventing road traffic accidents is a major public health problem and a priority for many nations. In this paper, we seek to explore the performance of explainable machine learning models applied to the prediction of road traffic crashes using a dataset containing nearly three million records of this type of events and the conditions under which they occurred. To achieve this, the dataset US Accidents -A Countrywide Traffic Accident Dataset is used. First we will clean, standardize and reduce the data, then we will transform the time and location values using a geohashing library developed by Uber, later, we will increase our dataset to obtain events classified as 'not an accident' using web scraping techniques in the data sources of the original authors of the dataset. Then, we will evaluate the performance of different implementations of Random Forest and decision trees, we obtained a performance superior to 70% for the F1 score of these models. Finally, we conclude that weather conditions are strongly related to the car accident.
DOI:	10.1109/SCCC51225.2020.9281196