Learning sparse filters-based convolutional networks without offline training for robust visual tracking

Due to the scarcity of training samples in the visual tracking task, almost all existing Convolutional Neural Networks (CNNs) based deep tracking algorithms rely heavily on large auxiliary datasets to train the tracking model offline. However, such offline training has two inevitable disadvantages:...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 55; no. 7; p. 459
Main Authors Xu, Qi, Xu, Zhuoming, Chen, Zhe, Chen, Yun, Wang, Huabin, Tao, Liang
Format Journal Article
LanguageEnglish
Published Boston Springer Nature B.V 01.05.2025
Subjects
Online AccessGet full text
ISSN0924-669X
1573-7497
DOI10.1007/s10489-025-06350-3

Cover

More Information
Summary:Due to the scarcity of training samples in the visual tracking task, almost all existing Convolutional Neural Networks (CNNs) based deep tracking algorithms rely heavily on large auxiliary datasets to train the tracking model offline. However, such offline training has two inevitable disadvantages: (1) the learned generic features may be less discriminative for tracking specific objects; (2) the training process demands huge computational power provided by high-performance graphics processing units (GPUs), which is not always available in many practical applications. Therefore, learning effective generic features without offline training for robust visual tracking is a necessary and challenging task. This paper tackles this task by proposing the Sparse Filters-based Convolutional Network (SFCN), which is a fully feed-forward convolutional network with a lightweight structure including two convolutional layers. Its convolutional kernels are a set of sparse filters learned and updated online from local patches using sparse dictionary learning. Benefiting from the learned sparse filters, SFCN learns effective generic features by exploiting both the discriminative information between the foreground and background of the target region and the hierarchical layout information among the local patches inside each target candidate region. Furthermore, a dynamic model updating strategy is adopted to alleviate the drift problem. Extensive experiments on five large-scale benchmark datasets show that the proposed method performs favorably against several state-of-the-art tracking algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-025-06350-3