Data mining techniques for LULC analysis using sparse labels and multisource data integration for the hilly terrain of Nilgiris district, Tamil Nadu, India

Accurate and quantitative assessment of Land Use and Land Cover (LULC) changes is crucial for understanding the spatial dynamics and environmental impacts within specific regions. In hilly terrains like the Nilgiris district in Tamil Nadu, India, these assessments are particularly challenging due to...

Full description

Saved in:
Bibliographic Details
Published inEarth science informatics Vol. 18; no. 1; p. 13
Main Authors Kumaraperumal, Ramalingam, Raj, Moorthi Nivas, Pazhanivelan, Sellaperumal, Jagadesh, M., Selvi, Duraisamy, Muthumanickam, Dhanaraju, Jagadeeswaran, Ramasamy, Karthikkumar, A., Kanna, S. Kamalesh
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.01.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1865-0473
1865-0481
DOI10.1007/s12145-024-01586-y

Cover

More Information
Summary:Accurate and quantitative assessment of Land Use and Land Cover (LULC) changes is crucial for understanding the spatial dynamics and environmental impacts within specific regions. In hilly terrains like the Nilgiris district in Tamil Nadu, India, these assessments are particularly challenging due to the complex topography and when classified using sparse ground truth labels. With numerous data mining algorithms being validated for several earth observation applications, demands are also increasing in selecting the best classifier algorithm for LULC mapping. Popularly implemented pixel-based data mining classifiers such as Random Forest (RF), Support Vector Machine (SVM), C5.0 Decision trees (C50), Naive Bayes (NB), Multinomial Logistic Regression (MLR), AdaBoost, Bagged CART, Nearest Shrunken Centroids (NSC), Genetic Algorithm based CART (Evetree), Neural Networks with PCA (NNPCA), k-Nearest Neighbours (k-NN), Multi-Layer Perceptron (MLP), and 1 Dimensional – Convoluted Neural Networks (1DCNN) were studied by integrating different auxiliary variables with sparse ground truth labels (391 Nos.). The accuracy of the predictions was then validated using Overall Accuracy (OA), Kappa, and disagreement measures based on the validation datasets. The most influential auxiliary variables contributing to the classification determined through PFI (Permutation Feature Importance) analysis, resulted with Digital Elevation Model (DEM) being the most influential auxiliary variable, among others. From the validation measures and the visual assessment facilitated for each algorithm, the effective performance in classification was depicted by Support Vector Machine - Linear Kernel (SVM - L) and followed by Random Forest (RF) algorithms with OA of 88%; 85% and Kappa of 0.84; 0.82, respectively. The algorithms also yielded the least disagreement measures for both algorithms. The findings of the research described the effective performance of the SVM and RF algorithms for classifying LULC at 10 m resolution through multisource data integration and under limited sampling and parameterization conditions. The statistical insights derived indicated a 4.3% decrease in the forest area with 7.2% increase in agricultural area in the last 2 years and 6.6% increase in the tea plantation area in the last 5 years.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1865-0473
1865-0481
DOI:10.1007/s12145-024-01586-y