Second-order Attention Guided Convolutional Activations for Visual Recognition

Recently, modeling deep convolutional activations by the global second-order pooling has shown great advance on visual recognition tasks. However, most of the existing deep second-order statistical models mainly compute second-order statistics of activations of the last convolutional layer as image...

Full description

Saved in:

Bibliographic Details
Published in	2020 25th International Conference on Pattern Recognition (ICPR) pp. 3071 - 3076
Main Authors	Chen, Shannan, Wang, Qian, Sun, Qiule, Liu, Bin, Zhang, Jianxin, Zhang, Qiang
Format	Conference Proceeding
Language	English
Published	IEEE 10.01.2021
Subjects	channel attention Computational modeling deep convolutional networks Image representation Limiting Network topology Pattern recognition second-order statistics Tensors visual recognition Visualization
Online Access	Get full text
DOI	10.1109/ICPR48806.2021.9412350

Cover

More Information
Summary:	Recently, modeling deep convolutional activations by the global second-order pooling has shown great advance on visual recognition tasks. However, most of the existing deep second-order statistical models mainly compute second-order statistics of activations of the last convolutional layer as image representations, and they seldom introduce second-order statistics into earlier layers to better fit network topology, thus limiting the representational ability to a certain extent. Motivated by the flexibility of attention blocks that are commonly plugged into intermediate layers of deep convolutional networks (ConvNets), this work makes an attempt to combine deep second-order statistics with attention mechanisms in ConvNets, and further proposes a novel Second-order Attention Guided Network (SoAG-Net) for visual recognition. More specifically, SoAG-Net involves several SoAG modules seemingly inserted into intermediate layers of the network, in which SoAG collects second-order statistics of convolutional activations by polynomial kernel approximation to predict channel-wise attention maps utilized for guiding the learning of convolutional activations through tensor scaling along channel dimension. SoAG improves the nonlinearity of ConvNets and enables ConvNets to fit more complicated distribution of convolutional activations. Experiment results on three commonly used datasets illuminate that SoAG-Net outperforms its counterparts and achieves competitive performance with state-of-the-art models under the same backbone.
DOI:	10.1109/ICPR48806.2021.9412350