Spatial cross-validation is not the right way to evaluate map accuracy

For decades scientists have produced maps of biological, ecological and environmental variables. These studies commonly evaluate the map accuracy through cross-validation with the data used for calibrating the underlying mapping model. Recent studies, however, have argued that cross-validation stati...

Full description

Saved in:

Bibliographic Details
Published in	Ecological modelling Vol. 457; p. 109692
Main Authors	Wadoux, Alexandre M.J.-C., Heuvelink, Gerard B.M., de Bruin, Sytze, Brus, Dick J.
Format	Journal Article
Language	English
Published	Elsevier B.V 01.10.2021
Subjects	Above-ground biomass aboveground biomass autocorrelation data collection Design-based Design-unbiased Map quality Model performance Model-based probability Random forest Sampling theory spatial data Model performance Design-unbiased Sampling theory Design-based Random forest Model-based Map quality Above-ground biomass
Online Access	Get full text
ISSN	0304-3800 1872-7026
DOI	10.1016/j.ecolmodel.2021.109692

Cover

More Information
Summary:	For decades scientists have produced maps of biological, ecological and environmental variables. These studies commonly evaluate the map accuracy through cross-validation with the data used for calibrating the underlying mapping model. Recent studies, however, have argued that cross-validation statistics of most mapping studies are optimistically biased. They attribute these overoptimistic results to a supposed serious methodological flaw in standard cross-validation methods, namely that these methods ignore spatial autocorrelation in the data. They argue that spatial cross-validation should be used instead, and contend that standard cross-validation methods are inherently invalid in a geospatial context because of the autocorrelation present in most spatial data. Here we argue that these studies propagate a widespread misconception of statistical validation of maps. We explain that unbiased estimates of map accuracy indices can be obtained by probability sampling and design-based inference and illustrate this with a numerical experiment on large-scale above-ground biomass mapping. In our experiment, standard cross-validation (i.e., ignoring autocorrelation) led to smaller bias than spatial cross-validation. Standard cross-validation was deficient in case of a strongly clustered dataset that had large differences in sampling density, but less so than spatial cross-validation. We conclude that spatial cross-validation methods have no theoretical underpinning and should not be used for assessing map accuracy, while standard cross-validation is deficient in case of clustered data. Model-free, design-unbiased and valid accuracy assessment is achieved with probability sampling and design-based inference. It is valid without the need to explicitly incorporate or adjust for spatial autocorrelation and perfectly suited for the validation of large scale biological, ecological and environmental maps. •Both standard and spatial cross-validation methods may provide biased estimates of map accuracy.•Unbiased estimates of map accuracy indices can be obtained by probability sampling and design-based inference.•Spatial cross-validation techniques should not be used for map accuracy assessment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0304-3800 1872-7026
DOI:	10.1016/j.ecolmodel.2021.109692