DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results o...

Full description

Saved in:
Bibliographic Details
Published inRemote sensing (Basel, Switzerland) Vol. 16; no. 5; p. 844
Main Authors Chen, Ming, Jiang, Wanshou, Zhou, Yuan
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.02.2024
Subjects
Online AccessGet full text
ISSN2072-4292
2072-4292
DOI10.3390/rs16050844

Cover

More Information
Summary:Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results often fail to capture target contours accurately. To address these issues, we propose a novel transformer-based hybrid network. In this study, we analyze the structural relationship in bi-temporal images and introduce a cross-attention-based transformer to model this relationship. First, we use a tokenizer to express the high-level features of the bi-temporal image into several semantic tokens. Then, we use a dual temporal transformer (DTT) encoder to capture dense spatiotemporal contextual relationships among the tokens. The features extracted at the coarse scale are refined into finer details through the DTT decoder. Concurrently, we input the backbone’s low-level features into a contour-guided graph interaction module (CGIM) that utilizes joint attention to capture semantic relationships between object regions and the contour. Then, we use the feature pyramid decoder to integrate the multi-scale outputs of the CGIM. The convolutional block attention modules (CBAMs) employ channel and spatial attention to reweight feature maps. Finally, the classifier discriminates change pixels and generates the final change map of the difference feature map. Several experiments have demonstrated that our model shows significant advantages over other methods in terms of efficiency, accuracy, and visual effects.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2072-4292
2072-4292
DOI:10.3390/rs16050844