Integrating scientific knowledge into machine learning using interactive decision trees

Decision Trees (DT) describe a type of machine learning method that has been widely used in the geosciences to automatically extract patterns from complex and high dimensional data. However, like any data-based method, the application of DT is hindered by data limitations, such as significant biases...

Full description

Saved in:
Bibliographic Details
Published inComputers & geosciences Vol. 170; p. 105248
Main Authors Sarailidis, Georgios, Wagener, Thorsten, Pianosi, Francesca
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2023
Subjects
Online AccessGet full text
ISSN0098-3004
1873-7803
1873-7803
DOI10.1016/j.cageo.2022.105248

Cover

More Information
Summary:Decision Trees (DT) describe a type of machine learning method that has been widely used in the geosciences to automatically extract patterns from complex and high dimensional data. However, like any data-based method, the application of DT is hindered by data limitations, such as significant biases, leading to potentially physically unrealistic results. We develop interactive DT (iDT) that put humans in the loop to integrate the power of experts' scientific knowledge with the power of the algorithms to automatically learn patterns from large datasets. We created an open-source Python toolbox that implements the iDT framework. Users can interactively create new composite variables, change the variable and threshold to split, prune and group variables based on their physical meaning. We demonstrate with three case studies how iDT overcomes problems with current DT thus achieving higher interpretability and robustness of the result. •We propose a framework for building Decision Trees that put humans in the loop.•The framework compensates for dataset issues encountered in standard Decision Trees.•Interactive Decision Trees enhance interpretability and physical consistency.•We developed an open-source toolbox for constructing Interactive Decision Trees.
ISSN:0098-3004
1873-7803
1873-7803
DOI:10.1016/j.cageo.2022.105248