Learning sparse filters-based convolutional networks without offline training for robust visual tracking

Due to the scarcity of training samples in the visual tracking task, almost all existing Convolutional Neural Networks (CNNs) based deep tracking algorithms rely heavily on large auxiliary datasets to train the tracking model offline. However, such offline training has two inevitable disadvantages:...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 55; no. 7; p. 459
Main Authors	Xu, Qi, Xu, Zhuoming, Chen, Zhe, Chen, Yun, Wang, Huabin, Tao, Liang
Format	Journal Article
Language	English
Published	Boston Springer Nature B.V 01.05.2025
Subjects	Algorithms Artificial neural networks Datasets Dynamic models Graphics processing units Internal layout Learning Model updating Neural networks Optical tracking Robustness Visual discrimination Visual tasks
Online Access	Get full text
ISSN	0924-669X 1573-7497
DOI	10.1007/s10489-025-06350-3

Cover

More Information
Summary:	Due to the scarcity of training samples in the visual tracking task, almost all existing Convolutional Neural Networks (CNNs) based deep tracking algorithms rely heavily on large auxiliary datasets to train the tracking model offline. However, such offline training has two inevitable disadvantages: (1) the learned generic features may be less discriminative for tracking specific objects; (2) the training process demands huge computational power provided by high-performance graphics processing units (GPUs), which is not always available in many practical applications. Therefore, learning effective generic features without offline training for robust visual tracking is a necessary and challenging task. This paper tackles this task by proposing the Sparse Filters-based Convolutional Network (SFCN), which is a fully feed-forward convolutional network with a lightweight structure including two convolutional layers. Its convolutional kernels are a set of sparse filters learned and updated online from local patches using sparse dictionary learning. Benefiting from the learned sparse filters, SFCN learns effective generic features by exploiting both the discriminative information between the foreground and background of the target region and the hierarchical layout information among the local patches inside each target candidate region. Furthermore, a dynamic model updating strategy is adopted to alleviate the drift problem. Extensive experiments on five large-scale benchmark datasets show that the proposed method performs favorably against several state-of-the-art tracking algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-025-06350-3