DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results o...

Full description

Saved in:

Bibliographic Details
Published in	Remote sensing (Basel, Switzerland) Vol. 16; no. 5; p. 844
Main Authors	Chen, Ming, Jiang, Wanshou, Zhou, Yuan
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.02.2024
Subjects	Algorithms Artificial intelligence attention Change detection Contours Deep learning detection Dithiothreitol extracts Feature maps graph convolutional network (GCN) Graph representations graphs image analysis Image processing light Machine learning Methods Modules Neural networks refining Remote sensing Semantics transformer Transformers Visual effects China
Online Access	Get full text
ISSN	2072-4292 2072-4292
DOI	10.3390/rs16050844

Cover

More Information
Summary:	Deep learning has dramatically enhanced remote sensing change detection. However, existing neural network models often face challenges like false positives and missed detections due to factors like lighting changes, scale differences, and noise interruptions. Additionally, change detection results often fail to capture target contours accurately. To address these issues, we propose a novel transformer-based hybrid network. In this study, we analyze the structural relationship in bi-temporal images and introduce a cross-attention-based transformer to model this relationship. First, we use a tokenizer to express the high-level features of the bi-temporal image into several semantic tokens. Then, we use a dual temporal transformer (DTT) encoder to capture dense spatiotemporal contextual relationships among the tokens. The features extracted at the coarse scale are refined into finer details through the DTT decoder. Concurrently, we input the backbone’s low-level features into a contour-guided graph interaction module (CGIM) that utilizes joint attention to capture semantic relationships between object regions and the contour. Then, we use the feature pyramid decoder to integrate the multi-scale outputs of the CGIM. The convolutional block attention modules (CBAMs) employ channel and spatial attention to reweight feature maps. Finally, the classifier discriminates change pixels and generates the final change map of the difference feature map. Several experiments have demonstrated that our model shows significant advantages over other methods in terms of efficiency, accuracy, and visual effects.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs16050844