A Spectral-Spatial Fusion Transformer Network for Hyperspectral Image Classification

In the past, deep learning (DL) technologies have been widely used in hyperspectral image classification tasks. Among them, convolutional neural networks (CNNs) use fixed size receptive field (RF) to obtain spectral and spatial features of hyperspectral images (HSIs), showing great feature extractio...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on geoscience and remote sensing Vol. 61; p. 1
Main Authors	Liao, Diling, Shi, Cuiping, Wang, Liguo
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Classification Convolution Deep learning Distance Feature extraction fusion hyperspectral image Hyperspectral imaging Image classification long-distance dependence Machine learning Modules Neural networks Principal component analysis Receptive field Semantics Task analysis Transformers
Online Access	Get full text
ISSN	0196-2892 1558-0644
DOI	10.1109/TGRS.2023.3286950

Cover

More Information
Summary:	In the past, deep learning (DL) technologies have been widely used in hyperspectral image classification tasks. Among them, convolutional neural networks (CNNs) use fixed size receptive field (RF) to obtain spectral and spatial features of hyperspectral images (HSIs), showing great feature extraction capabilities, which are one of the most popular DL frameworks. However, the convolution using local extraction and global parameter sharing mechanism pays more attention to spatial content information, which changes the spectral sequence information in the learned features. In addition, CNN is difficult to describe the long-distance correlation between HSI pixels and bands. To solve these problems, a spectral-spatial fusion Transformer network (S 2 FTNet) is proposed for the classification of hyperspectral images. Specifically, S 2 FTNet adopts the Transformer framework to build a spatial Transformer module (SpaFormer) and a spectral Transformer module (SpeFormer) to capture image spatial and spectral long-distance dependencies. In addition, an adaptive spectral-spatial fusion mechanism (AS 2 FM) is proposed to effectively fuse the obtained advanced high-level semantic features. Finally, a large number of experiments were carried out on four datasets, Indian Pines, Pavia, Salinas and WHU-Hi-LongKou, which verified that the proposed S 2 FTNet can provide better classification performance than other the state-of-the-art networks.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2023.3286950