A Novel Approach for High-Resolution Coastal Areas and Land Use Recognition From Remote Sensing Images Based on Multimodal Network-Level Fusion of SRAN3 and Lightweight Four Encoders ViT

Land use land cover classification from satellite images (remote sensing) has shown many efforts from the last decade due to ecological surveillance, rapid urbanization, law enforcement, climate change, agriculture drought, and disaster recovery. The low-resolution remote sensing images impact on th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of selected topics in applied earth observations and remote sensing Vol. 18; pp. 6844 - 6858
Main Authors	Bhatti, Muhammad Kashif, Khan, Muhammad Attique, Shaheen, Saima, Hamza, Ameer, Arishi, Ali, AlHammadi, Dina Abdulaziz, Algamdi, Shabbab Ali, Nam, Yunyoung
Format	Journal Article
Language	English
Published	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Agricultural drought Bayesian analysis Biological system modeling Classification Climate change Coastal zone Coders Convolutional neural networks Customize vision transformer Datasets Deep learning Disaster recovery Drought Enforcement Environmental monitoring Feature extraction High resolution Image resolution Land cover Land surface Land use Land use planning Machine learning network level fusion Neural networks Predictive models Probability theory Remote sensing remote sensing (RS) residual self-attention CNN Satellite imagery Satellite images SRAN3 super resolution Superresolution Surveillance Urban areas Urbanization Vision transformers
Online Access	Get full text
ISSN	1939-1404 2151-1535
DOI	10.1109/JSTARS.2025.3542194

Cover

More Information
Summary:	Land use land cover classification from satellite images (remote sensing) has shown many efforts from the last decade due to ecological surveillance, rapid urbanization, law enforcement, climate change, agriculture drought, and disaster recovery. The low-resolution remote sensing images impact on the accurate prediction; therefore, the high-resolution deep learning architecture is widely required. This article proposes a new deep network-level fusion approach that merges a stacked residual self-attention CNN (SRAN3) with a lightweight ViT based on 4-encoders to enhance the model performance while reducing computational costs. The SRAN3 model is proposed for extracting sophisticated prominent features, while the 4-encoder-based ViT facilitates effective learning with reduced computation time. These networks are fused using a depth concatenation approach that effectively integrates the strengths of both architectures. The fused model hyperparameters are selected through Bayesian optimization, significantly improving the learning process. The trained model is later utilized in the testing phase, extracting features from the depth-concatenation layer. The extracted features are fed to neural network classifiers and obtain the final prediction. Two publicly available datasets, EuroSAT and NWPU_RESIS45, are employed to obtain improved testing and validation accuracy. The proposed SRAN3 + WNN (Wide Neural Network) and 4-encoder ViT + WNN obtained 96.9% and 92.6% of accuracy; however, the proposed fused network + WNN achieved the highest accuracy of 98.4% on EuroSAT and 94.7% accuracy on the NWPU_RESIS45 dataset, respectively. Also, the proposed fused model interpretation is performed using the explainable artificial technique (XAI), which has shown improved land use and land cover classification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2025.3542194