Improving soil property measurement with geometric-spectral coupling-based spectral data augmentation using conditional variational autoencoders
Soil spectral data are essential for rapid and nondestructive measurement of soil properties. However, the high cost of acquiring such samples has made data augmentation a widely used approach for generating synthetic soil spectra, thereby enriching training databases and improving model robustness....
Saved in:
| Published in | Measurement : journal of the International Measurement Confederation Vol. 257; p. 118875 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
15.01.2026
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0263-2241 |
| DOI | 10.1016/j.measurement.2025.118875 |
Cover
| Summary: | Soil spectral data are essential for rapid and nondestructive measurement of soil properties. However, the high cost of acquiring such samples has made data augmentation a widely used approach for generating synthetic soil spectra, thereby enriching training databases and improving model robustness. Existing spectral augmentation algorithms, however, typically rely on the detailed richness of chemical element content in the soil as a basis for generating new pseudo-spectra for data enhancement. Without this information, generating valid synthetic soil spectra becomes challenging, limiting the practical effectiveness of existing methods. To address this issue, we propose a data-driven framework, namely, the geometric-spectral coupling conditional Variational Autoencoders (GSC-cVAE), designed to generate pseudo-spectral data, enrich soil spectral datasets, and mitigate data scarcity. In a purpose-built darkroom, we repeatedly image chemically uniform soils, ensuring that any pixel-level variation arises solely from micro-scale surface roughness. Through differential processing and spectral clustering, these variations are compressed into a high-density geometric-disturbance knowledge base, which serves as the conditional prior for the cVAE. Guided by this physically explicit prior, the network generates pseudo-spectra that accurately reproduce real geometric perturbations without requiring chemical-element labels. When these synthetic samples are combined with the original data, R2 increases from 0.57 to 0.64 for the support vector machine (SVM) and from 0.652 to 0.71 for the random forest (RF); additionally, RMSE decreases from 0.094 to 0.088 for SVM and from 0.086 to 0.078 for RF on our self-built dataset (paired t-test, p<0.05).
•Formalizes micro-roughness effects as a knowledge base for geometric–spectral coupling.•Uses geometric roughness priors to guide the cVAE in generating physically valid pseudo-spectra.•This physics-guided approach improves SVM and RF performance across datasets. |
|---|---|
| ISSN: | 0263-2241 |
| DOI: | 10.1016/j.measurement.2025.118875 |