A Lightweight Neural Learning Algorithm for Real-Time Facial Feature Tracking System via Split-Attention and Heterogeneous Convolution

Object tracking has made remarkable progress in the past few years. But most advanced trackers are becoming more expensive, which limits their deployment in mobile devices with limited resources. In addition, the current popular tracker realizes similarity learning through the feature correlation be...

Full description

Saved in:

Bibliographic Details
Published in	Neural processing letters Vol. 55; no. 2; pp. 1555 - 1580
Main Authors	Ma, Yuandong, Song, Qing, Hu, Mengjie, Zhu, Xiaotong
Format	Journal Article
Language	English
Published	New York Springer US 01.04.2023 Springer Nature B.V
Subjects	Accuracy Algorithms Artificial Intelligence Complex Systems Computational Intelligence Computer networks Computer Science Convolution Cross correlation Datasets Deep learning Distance learning Efficiency Feature extraction Feature maps Lightweight Machine learning Neural networks Parameters Tracking systems Siamese networks Attention mechanism Face tracking Lightweight network
Online Access	Get full text
ISSN	1370-4621 1573-773X
DOI	10.1007/s11063-022-10951-1

Cover

More Information
Summary:	Object tracking has made remarkable progress in the past few years. But most advanced trackers are becoming more expensive, which limits their deployment in mobile devices with limited resources. In addition, the current popular tracker realizes similarity learning through the feature correlation between multiple branches. Some of these cross-correlation methods lost a lot of face information, and some introduced a lot of unfavorable background information. Based on this motivation, this paper is committed to reducing the number of algorithm parameters and enhancing the ability of feature extraction. Heterogeneous convolution is introduced into the backbone network to reduce the convolution kernel parameters. Add a search box mechanism to dynamically adjust the network receiving domain to generate more feature maps with cheap operations. Furthermore, we also integrate the split-attention mechanism into the backbone network to standardize the arrangement of heterogeneous convolution. To evaluate the model, we conducted experiments on challenging VTB datasets and actual shooting datasets, which contain 82,351 facial features. Experimental results show that our method distance precision (DP) and overlap success precision (OP) are 93.5% and 67.5% respectively, which are comparable with the state-of-the-art object tracking methods and reduce about one-third of the parameters. Meanwhile, the feature mapping of each convolution module is explored, and the interpretation of lightweight convolution is given.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1370-4621 1573-773X
DOI:	10.1007/s11063-022-10951-1