Improving soil property measurement with geometric-spectral coupling-based spectral data augmentation using conditional variational autoencoders

Soil spectral data are essential for rapid and nondestructive measurement of soil properties. However, the high cost of acquiring such samples has made data augmentation a widely used approach for generating synthetic soil spectra, thereby enriching training databases and improving model robustness....

Full description

Saved in:
Bibliographic Details
Published inMeasurement : journal of the International Measurement Confederation Vol. 257; p. 118875
Main Authors Tang, Jiaze, Liu, Dan, Wang, Qisong, Li, Junbao, Sun, Jinwei
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.01.2026
Subjects
Online AccessGet full text
ISSN0263-2241
DOI10.1016/j.measurement.2025.118875

Cover

More Information
Summary:Soil spectral data are essential for rapid and nondestructive measurement of soil properties. However, the high cost of acquiring such samples has made data augmentation a widely used approach for generating synthetic soil spectra, thereby enriching training databases and improving model robustness. Existing spectral augmentation algorithms, however, typically rely on the detailed richness of chemical element content in the soil as a basis for generating new pseudo-spectra for data enhancement. Without this information, generating valid synthetic soil spectra becomes challenging, limiting the practical effectiveness of existing methods. To address this issue, we propose a data-driven framework, namely, the geometric-spectral coupling conditional Variational Autoencoders (GSC-cVAE), designed to generate pseudo-spectral data, enrich soil spectral datasets, and mitigate data scarcity. In a purpose-built darkroom, we repeatedly image chemically uniform soils, ensuring that any pixel-level variation arises solely from micro-scale surface roughness. Through differential processing and spectral clustering, these variations are compressed into a high-density geometric-disturbance knowledge base, which serves as the conditional prior for the cVAE. Guided by this physically explicit prior, the network generates pseudo-spectra that accurately reproduce real geometric perturbations without requiring chemical-element labels. When these synthetic samples are combined with the original data, R2 increases from 0.57 to 0.64 for the support vector machine (SVM) and from 0.652 to 0.71 for the random forest (RF); additionally, RMSE decreases from 0.094 to 0.088 for SVM and from 0.086 to 0.078 for RF on our self-built dataset (paired t-test, p<0.05). •Formalizes micro-roughness effects as a knowledge base for geometric–spectral coupling.•Uses geometric roughness priors to guide the cVAE in generating physically valid pseudo-spectra.•This physics-guided approach improves SVM and RF performance across datasets.
ISSN:0263-2241
DOI:10.1016/j.measurement.2025.118875