Which method to use? An assessment of data mining methods in Environmental Data Science

Data Mining (DM) is a fundamental component of the Data Science process. Over recent years a huge library of DM algorithms has been developed to tackle a variety of problems in fields such as medical imaging and traffic analysis. Many DM techniques are far more flexible than more classical numerial...

Full description

Saved in:
Bibliographic Details
Published inEnvironmental modelling & software : with environment data news Vol. 110; pp. 3 - 27
Main Authors Gibert, Karina, Izquierdo, Joaquín, Sànchez-Marrè, Miquel, Hamilton, Serena H., Rodríguez-Roda, Ignasi, Holmes, Geoff
Format Journal Article Publication
LanguageEnglish
Published Oxford Elsevier Ltd 01.12.2018
Elsevier Science Ltd
Elsevier
Subjects
Online AccessGet full text
ISSN1364-8152
1873-6726
DOI10.1016/j.envsoft.2018.09.021

Cover

More Information
Summary:Data Mining (DM) is a fundamental component of the Data Science process. Over recent years a huge library of DM algorithms has been developed to tackle a variety of problems in fields such as medical imaging and traffic analysis. Many DM techniques are far more flexible than more classical numerial simulation or statistical modelling approaches. These could be usefully applied to data-rich environmental problems. Certain techniques such as artificial neural networks, clustering, case-based reasoning or Bayesian networks have been applied in environmental modelling, while other methods, like support vector machines among others, have yet to be taken up on a wide scale. There is greater scope for many lesser known techniques to be applied in environmental research, with the potential to contribute to addressing some of the current open environmental challenges. However, selecting the best DM technique for a given environmental problem is not a simple decision, and there is a lack of guidelines and criteria that helps the data scientist and environmental scientists to ensure effective knowledge extraction from data. This paper provides a broad introduction to the use of DM in Data Science processes for environmental researchers. Data Science contains three main steps (pre-processing, data mining and post-processing). This paper provides a conceptualization of Environmental Systems and a conceptualization of DM methods, which are in the core step of the Data Science process. These two elements define a conceptual framework that is on the basis of a new methodology proposed for relating the characteristics of a given environmental problem with a family of Data Mining methods. The paper provides a general overview and guidelines of DM techniques to a non-expert user, who can decide with this support which is the more suitable technique to solve their problem at hand. The decision is related to the bidimensional relationship between the type of environmental system and the type of DM method. An illustrative two way table containing references for each pair Environmental System-Data Mining method is presented and discussed. Some examples of how the proposed methodology is used to support DM method selection are also presented, and challenges and future trends are identified. •A methodology to support Data Mining (DM) method choice in environmental sciences.•DMMCM is a conceptual map providing an overview of common Data Mining methods.•DMMT provides templates of main DM families, to simplify use by practicioners.•Extensive real environmental applications and 5 detailed case studies are shown.•Existing and future challenges in DM for Data Science are postulated.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1364-8152
1873-6726
DOI:10.1016/j.envsoft.2018.09.021