Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification

In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly opti...

Full description

Saved in:

Bibliographic Details
Published in	2018 24th International Conference on Pattern Recognition (ICPR) pp. 3469 - 3476
Main Authors	Soleymani, Sobhan, Dabouei, Ali, Kazemi, Hadi, Dawson, Jeremy, Nasrabadi, Nasser M.
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2018
Subjects	Face Feature extraction Iris recognition Kernel Optimization Training
Online Access	Get full text
DOI	10.1109/ICPR.2018.8545061

Cover

More Information
Summary:	In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs.
DOI:	10.1109/ICPR.2018.8545061