Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification

In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly opti...

Full description

Saved in:
Bibliographic Details
Published in2018 24th International Conference on Pattern Recognition (ICPR) pp. 3469 - 3476
Main Authors Soleymani, Sobhan, Dabouei, Ali, Kazemi, Hadi, Dawson, Jeremy, Nasrabadi, Nasser M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2018
Subjects
Online AccessGet full text
DOI10.1109/ICPR.2018.8545061

Cover

More Information
Summary:In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs.
DOI:10.1109/ICPR.2018.8545061