Multi-scale structural kernel representation for object detection

•The first attempt to integrate high-order statistics into deep CNNs for effective object detection.•The proposed high-order statistical module preserves the spatial information while taking account into their special geometry structures.•Performing favorably in comparison to the state-of-the-art me...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 110; p. 107593
Main Authors Wang, Hao, Wang, Qilong, Li, Peihua, Zuo, Wangmeng
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2021
Subjects
Online AccessGet full text
ISSN0031-3203
1873-5142
DOI10.1016/j.patcog.2020.107593

Cover

More Information
Summary:•The first attempt to integrate high-order statistics into deep CNNs for effective object detection.•The proposed high-order statistical module preserves the spatial information while taking account into their special geometry structures.•Performing favorably in comparison to the state-of-the-art methods and showing good generalization ability to other dense prediction tasks. Existing high-performance object detection methods greatly benefit from the powerful representation ability of deep convolutional neural networks (CNNs). Recent researches show that integration of high-order statistics remarkably improves the representation ability of deep CNNs. However, high-order statistics for object detection lie in two challenges. Firstly, previous methods insert high-order statistics into deep CNNs as global representations, which lose spatial information of inputs, and so are not applicable to object detection. Furthermore, high-order statistics have special structures, which should be considered for proper use of high-order statistics. To overcome above challenges, this paper proposes a Multi-scale Structural Kernel Representation (MSKR) for improving performance of object detection. Our MSKR is developed based on the polynomial kernel approximation, which does not only draw into high-order statistics but also preserve the spatial information of input. To consider geometry structures of high-order representations, a feature power normalization method is introduced before computation of kernel representation. Comparing with the most commonly used first-order statistics in existing CNN-based detectors, our MSKR can generate more discriminative representations, and so be flexibly integrated into deep CNNs for improving performance of object detection. By adopting the proposed MSKR to existing object detection methods (i.e., Faster R-CNN, FPN, Mask R-CNN and RetinaNet), it achieves clear improvement on three widely used benchmarks, while obtaining very competitive performance with state-of-the-art methods.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2020.107593