A Novel Approach for High-Resolution Coastal Areas and Land Use Recognition From Remote Sensing Images Based on Multimodal Network-Level Fusion of SRAN3 and Lightweight Four Encoders ViT

Land use land cover classification from satellite images (remote sensing) has shown many efforts from the last decade due to ecological surveillance, rapid urbanization, law enforcement, climate change, agriculture drought, and disaster recovery. The low-resolution remote sensing images impact on th...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of selected topics in applied earth observations and remote sensing Vol. 18; pp. 6844 - 6858
Main Authors Bhatti, Muhammad Kashif, Khan, Muhammad Attique, Shaheen, Saima, Hamza, Ameer, Arishi, Ali, AlHammadi, Dina Abdulaziz, Algamdi, Shabbab Ali, Nam, Yunyoung
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1939-1404
2151-1535
DOI10.1109/JSTARS.2025.3542194

Cover

More Information
Summary:Land use land cover classification from satellite images (remote sensing) has shown many efforts from the last decade due to ecological surveillance, rapid urbanization, law enforcement, climate change, agriculture drought, and disaster recovery. The low-resolution remote sensing images impact on the accurate prediction; therefore, the high-resolution deep learning architecture is widely required. This article proposes a new deep network-level fusion approach that merges a stacked residual self-attention CNN (SRAN3) with a lightweight ViT based on 4-encoders to enhance the model performance while reducing computational costs. The SRAN3 model is proposed for extracting sophisticated prominent features, while the 4-encoder-based ViT facilitates effective learning with reduced computation time. These networks are fused using a depth concatenation approach that effectively integrates the strengths of both architectures. The fused model hyperparameters are selected through Bayesian optimization, significantly improving the learning process. The trained model is later utilized in the testing phase, extracting features from the depth-concatenation layer. The extracted features are fed to neural network classifiers and obtain the final prediction. Two publicly available datasets, EuroSAT and NWPU_RESIS45, are employed to obtain improved testing and validation accuracy. The proposed SRAN3 + WNN (Wide Neural Network) and 4-encoder ViT + WNN obtained 96.9% and 92.6% of accuracy; however, the proposed fused network + WNN achieved the highest accuracy of 98.4% on EuroSAT and 94.7% accuracy on the NWPU_RESIS45 dataset, respectively. Also, the proposed fused model interpretation is performed using the explainable artificial technique (XAI), which has shown improved land use and land cover classification.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2025.3542194