Fine-Grained Ship Classification by Combining CNN and Swin Transformer

The mainstream algorithms used for ship classification and detection can be improved based on convolutional neural networks (CNNs). By analyzing the characteristics of ship images, we found that the difficulty in ship image classification lies in distinguishing ships with similar hull structures but...

Full description

Saved in:

Bibliographic Details
Published in	Remote sensing (Basel, Switzerland) Vol. 14; no. 13; p. 3087
Main Authors	Huang, Liang, Wang, Fengxiang, Zhang, Yalun, Xu, Qingxia
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.07.2022
Subjects	Accuracy Algorithms Artificial neural networks Backbone Classification CNN data collection Datasets Design exhibitions extracts Feature extraction Genetic algorithms hulling hulls image analysis Image classification Model accuracy model validation Neural networks Optimization techniques Remote sensing remote sensing images self-attention ship detection shipping ships Superstructures Support vector machines Target recognition transformer
Online Access	Get full text
ISSN	2072-4292 2072-4292
DOI	10.3390/rs14133087

Cover

More Information
Summary:	The mainstream algorithms used for ship classification and detection can be improved based on convolutional neural networks (CNNs). By analyzing the characteristics of ship images, we found that the difficulty in ship image classification lies in distinguishing ships with similar hull structures but different equipment and superstructures. To extract features such as ship superstructures, this paper introduces transformer architecture with self-attention into ship classification and detection, and a CNN and Swin transformer model (CNN-Swin model) is proposed for ship image classification and detection. The main contributions of this study are as follows: (1) The proposed approach pays attention to different scale features in ship image classification and detection, introduces a transformer architecture with self-attention into ship classification and detection for the first time, and uses a parallel network of a CNN and a transformer to extract features of images. (2) To exploit the CNN’s performance and avoid overfitting as much as possible, a multi-branch CNN-Block is designed and used to construct a CNN backbone with simplicity and accessibility to extract features. (3) The performance of the CNN-Swin model is validated on the open FGSC-23 dataset and a dataset containing typical military ship categories based on open-source images. The results show that the model achieved accuracies of 90.9% and 91.9% for the FGSC-23 dataset and the military ship dataset, respectively, outperforming the existing nine state-of-the-art approaches. (4) The good extraction effect on the ship features of the CNN-Swin model is validated as the backbone of the three state-of-the-art detection methods on the open datasets HRSC2016 and FAIR1M. The results show the great potential of the CNN-Swin backbone with self-attention in ship detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs14133087