Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information

Machine learning algorithms (MLAs) are a powerful group of data-driven inference tools that offer an automated means of recognizing patterns in high-dimensional data. Hence, there is much scope for the application of MLAs to the rapidly increasing volumes of remotely sensed geophysical data for geol...

Full description

Saved in:

Bibliographic Details
Published in	Computers & geosciences Vol. 63; pp. 22 - 33
Main Authors	Cracknell, Matthew J., Reading, Anya M.
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.02.2014
Subjects	automation cartography computers Geological mapping geophysics Machine learning neural networks prediction Remote sensing Spatial clustering spatial data Spatial information Supervised classification support vector machines Spatial information Geological mapping Spatial clustering Remote sensing Supervised classification Machine learning
Online Access	Get full text
ISSN	0098-3004 1873-7803 1873-7803
DOI	10.1016/j.cageo.2013.10.008

Cover

More Information
Summary:	Machine learning algorithms (MLAs) are a powerful group of data-driven inference tools that offer an automated means of recognizing patterns in high-dimensional data. Hence, there is much scope for the application of MLAs to the rapidly increasing volumes of remotely sensed geophysical data for geological mapping problems. We carry out a rigorous comparison of five MLAs: Naive Bayes, k-Nearest Neighbors, Random Forests, Support Vector Machines, and Artificial Neural Networks, in the context of a supervised lithology classification task using widely available and spatially constrained remotely sensed geophysical data. We make a further comparison of MLAs based on their sensitivity to variations in the degree of spatial clustering of training data, and their response to the inclusion of explicit spatial information (spatial coordinates). Our work identifies Random Forests as a good first choice algorithm for the supervised classification of lithology using remotely sensed geophysical data. Random Forests is straightforward to train, computationally efficient, highly stable with respect to variations in classification model parameter values, and as accurate as, or substantially more accurate than the other MLAs trialed. The results of our study indicate that as training data becomes increasingly dispersed across the region under investigation, MLA predictive accuracy improves dramatically. The use of explicit spatial information generates accurate lithology predictions but should be used in conjunction with geophysical data in order to generate geologically plausible predictions. MLAs, such as Random Forests, are valuable tools for generating reliable first-pass predictions for practical geological mapping applications that combine widely available geophysical data. •Robust comparison of five machine learning strategies.•Random Forests is a good first choice for classifying lithology.•Spatial distribution of training data has considerable influence on predictions.•Using coordinates and geophysics data generates accurate and plausible predictions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0098-3004 1873-7803 1873-7803
DOI:	10.1016/j.cageo.2013.10.008