Non-contact PPG signal and heart rate estimation with multi-hierarchical convolutional network

•The main contribution of our work includes.•An efficient end-to-end 3D spatio-temporal convolutional network with an MHFF based attention model is proposed. The skin map label generated based on sparse optical flow effectively solves the influence of background noise and head movement.•Only 15 s fa...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 139; p. 109421
Main Authors Li, Bin, Zhang, Panpan, Peng, Jinye, Fu, Hong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.07.2023
Subjects
Online AccessGet full text
ISSN0031-3203
1873-5142
DOI10.1016/j.patcog.2023.109421

Cover

More Information
Summary:•The main contribution of our work includes.•An efficient end-to-end 3D spatio-temporal convolutional network with an MHFF based attention model is proposed. The skin map label generated based on sparse optical flow effectively solves the influence of background noise and head movement.•Only 15 s face video clips are needed for efficient reconstruction of rPPG signal and accurate estimation of HR.•The experiments are conducted on three datasets to verify the effectiveness of the proposed network.•A new face video physiological parameters dataset with annotated PPG and HR signal is presented, which contains 300 videos from 300 subjects. Heartbeat rhythm and heart rate (HR) are important physiological parameters of the human body. This study presents an efficient multi-hierarchical spatio-temporal convolutional network that can quickly estimate remote physiological (rPPG) signal and HR from face video clips. First, the facial color distribution characteristics are extracted using a low-level face feature generation (LFFG) module. Then, the three-dimensional (3D) spatio-temporal stack convolution module (STSC) and multi-hierarchical feature fusion module (MHFF) are used to strengthen the spatio-temporal correlation of multi-channel features. In the MHFF, sparse optical flow is used to capture the tiny motion information of faces between frames and generate a self-adaptive region of interest (ROI) skin mask. Finally, the signal prediction module (SP) is used to extract the estimated rPPG signal. The heart rate estimation results show that the proposed network overperforms the state-of-the-art methods on three datasets, 1) UBFC-RPPG, 2) COHFACE, 3) our dataset, with the mean absolute error (MAE) of 2.15, 5.57, 1.75 beats per minute (bpm) respectively.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2023.109421