Multi-scale structural kernel representation for object detection

•The first attempt to integrate high-order statistics into deep CNNs for effective object detection.•The proposed high-order statistical module preserves the spatial information while taking account into their special geometry structures.•Performing favorably in comparison to the state-of-the-art me...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 110; p. 107593
Main Authors	Wang, Hao, Wang, Qilong, Li, Peihua, Zuo, Wangmeng
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.02.2021
Subjects	High-order statistics Matrix power normalization Object detection Polynomial kernel Matrix power normalization High-order statistics Polynomial kernel Object detection
Online Access	Get full text
ISSN	0031-3203 1873-5142
DOI	10.1016/j.patcog.2020.107593

Cover

More Information
Summary:	•The first attempt to integrate high-order statistics into deep CNNs for effective object detection.•The proposed high-order statistical module preserves the spatial information while taking account into their special geometry structures.•Performing favorably in comparison to the state-of-the-art methods and showing good generalization ability to other dense prediction tasks. Existing high-performance object detection methods greatly benefit from the powerful representation ability of deep convolutional neural networks (CNNs). Recent researches show that integration of high-order statistics remarkably improves the representation ability of deep CNNs. However, high-order statistics for object detection lie in two challenges. Firstly, previous methods insert high-order statistics into deep CNNs as global representations, which lose spatial information of inputs, and so are not applicable to object detection. Furthermore, high-order statistics have special structures, which should be considered for proper use of high-order statistics. To overcome above challenges, this paper proposes a Multi-scale Structural Kernel Representation (MSKR) for improving performance of object detection. Our MSKR is developed based on the polynomial kernel approximation, which does not only draw into high-order statistics but also preserve the spatial information of input. To consider geometry structures of high-order representations, a feature power normalization method is introduced before computation of kernel representation. Comparing with the most commonly used first-order statistics in existing CNN-based detectors, our MSKR can generate more discriminative representations, and so be flexibly integrated into deep CNNs for improving performance of object detection. By adopting the proposed MSKR to existing object detection methods (i.e., Faster R-CNN, FPN, Mask R-CNN and RetinaNet), it achieves clear improvement on three widely used benchmarks, while obtaining very competitive performance with state-of-the-art methods.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2020.107593