WhisperNet: Deep Siamese Network For Emotion and Speech Tempo Invariant Visual-Only Lip-Based Biometric

In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's...

Full description

Saved in:
Bibliographic Details
Published in2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS) pp. 1 - 5
Main Authors Zakeri, Abdollah, Hassanpour, Hamid
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.12.2021
Subjects
Online AccessGet full text
DOI10.1109/ICSPIS54653.2021.9729394

Cover

Abstract In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.
AbstractList In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods which reduced the security concerns. Before biometric people verification methods like facial recognition, an imposter could access people's vital information simply by finding out their password via installing a key-logger on their system. Thanks to deep learning, safer biometric approaches to person verification and person re-identification like visual authentication and audio-visual authentication were made possible and applicable on many devices like smartphones and laptops. Unfortunately, facial recognition is considered to be a threat to personal privacy by some people. Additionally, biometric methods that use the audio modality are not always applicable due to reasons like audio noise present in the environment. Lip-based biometric authentication (LBBA) is the process of authenticating a person using a video of their lips' movement while talking. In order to solve the mentioned concerns about other biometric authentication methods, we can use a visual-only LBBA method. Since people might have different emotional states that could potentially affect their utterance and speech tempo, the audio-only LBBA method must be able to produce an emotional and speech tempo invariant embedding of the input utterance video. In this article, we proposed a network inspired by the Siamese architecture that learned to produce emotion and speech tempo invariant representations of the input utterance videos. In order to train and test our proposed network, we used the CREMA-D dataset and achieved 95.41 % accuracy on the validation set.
Author Hassanpour, Hamid
Zakeri, Abdollah
Author_xml – sequence: 1
  givenname: Abdollah
  surname: Zakeri
  fullname: Zakeri, Abdollah
  email: a.zakeri@shahroodut.ac.ir
  organization: Shahrood University of Technology,Faculty of Computer Engineering
– sequence: 2
  givenname: Hamid
  surname: Hassanpour
  fullname: Hassanpour, Hamid
  email: h.hassanpour@shahroodut.ac.ir
  organization: Shahrood University of Technology,Faculty of Computer Engineering
BookMark eNotj91KwzAYQCPohc49gTffC3Tmp0ka79zctDCc0KmXI22-uuCalLQqe3sFd3XgXBw4V-Q8xICEAKMzxqi5LRfVS1nJXEkx45SzmdHcCJOfkanRBVNK5tSIIr8kH-97P_SYnnG8gwfEHipvOxwQ_sxPTJ-wigmWXRx9DGCDg6pHbPawxa6PUIZvm7wNI7z54csesk04HGHt-2xuB3Qw97HDMfnmmly09jDg9MQJeV0tt4unbL15LBf368wzVoyZE1K36KhTyDkTQtNCcye1ZYpLWctacG1a1bBC1m3BBerGcSu5skq3HKmYkJv_rkfEXZ98Z9Nxd9oXv-IOVTY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSPIS54653.2021.9729394
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665409384
166540938X
EndPage 5
ExternalDocumentID 9729394
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-d357fed0d6e2213370872d57a16255b5b3279f6c185bf823e7cd2a526a67f2e03
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:26 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-d357fed0d6e2213370872d57a16255b5b3279f6c185bf823e7cd2a526a67f2e03
PageCount 5
ParticipantIDs ieee_primary_9729394
PublicationCentury 2000
PublicationDate 2021-Dec.-29
PublicationDateYYYYMMDD 2021-12-29
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-Dec.-29
  day: 29
PublicationDecade 2020
PublicationTitle 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)
PublicationTitleAbbrev ICSPIS
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8453361
Snippet In the recent decade, the field of biometrics was revolutionized thanks to the rise of deep learning. Many improvements were done on old biometric methods...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Authentication
Biometrics
Biometrics (access control)
Deep learning
Deep Siamese Network
Face recognition
Lip-Based Biometrics
Privacy
Signal processing
Video Processing
Visualization
Title WhisperNet: Deep Siamese Network For Emotion and Speech Tempo Invariant Visual-Only Lip-Based Biometric
URI https://ieeexplore.ieee.org/document/9729394
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG2Qkyc1YPydHjxaGG3Xbh5BCBhBEkC5kbX9kEUyCA4T_ettN8BoPHhrliZdvqZ5fdt770PoOgCfc8UDoqkRhIPyScSYIFrVqGHK4xKc37nbE-0Rvx_74wK62XlhACATn0HFDbN_-Wah1-5TWTW0N0EW8j20JwORe7W24hwvrHYag35n4Jp7M8v7aK2ymf6jb0oGG60D1N0umKtFXivrVFX0568sxv--0SEqfxv0cH8HPUeoAEkJvTzPYpf73YP0Ft8BLPEgdhpYwL1c641bixVu5n17cJQYPFgC6Bkeungq3EneLW-2hcZP8ds6mpPHZP6BH-IlqVukM7jujPouz7-MRq3msNEmmz4KJLb0ISWG-XIKxjMCKLWcVHqBpMaXUc2SH1_5ilEZToW20K2mAWUgtaGRT0Uk5JSCx45RMVkkcIIw1_aGYvl0oEzA7fFXVGqXUGZcDJ7g3ikquSJNlnlUxmRTn7O_H5-jfbdRTh1CwwtUTFdruLQYn6qrbHO_AAaap1w
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwHG0QD3pSA8Zve_BoYfRj3TyCEFBAE_DjRtb2hyySQXCY6F9vywCj8eBtWbJ0-TXp62vfez-ELgIQnCseEE2NTzgoQSLGfKJVhRqmPC7B-Z07Xb_5wG-exXMOXa69MACwEJ9ByT0u7vLNRM_dUVk5tDtBFvINtGkH4CJza63kOV5YbtV6962ea-_NLPOjldLygx-dUxbA0dhBndWQmV7ktTRPVUl__kpj_O8_7aLit0UP36_BZw_lICmgl6dR7JK_u5Be4WuAKe7FTgULuJupvXFjMsP1rHMPjhKDe1MAPcJ9F1CFW8m7Zc621PgxfptHY3KXjD9wO56SqsU6g6vOqu8S_YvooVHv15pk2UmBxJZApMQwIYdgPOMDpZaVSi-Q1AgZVSz9EUooRmU49LUFbzUMKAOpDY0E9SNfDil4bB_lk0kCBwhzbfcollEHygTcLgCKSu0yyowLwvO5d4gKrkiDaRaWMVjW5-jv1-doq9nvtAftVvf2GG27SXNaERqeoHw6m8OpRfxUnS0m-gv5pqqp
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+7th+International+Conference+on+Signal+Processing+and+Intelligent+Systems+%28ICSPIS%29&rft.atitle=WhisperNet%3A+Deep+Siamese+Network+For+Emotion+and+Speech+Tempo+Invariant+Visual-Only+Lip-Based+Biometric&rft.au=Zakeri%2C+Abdollah&rft.au=Hassanpour%2C+Hamid&rft.date=2021-12-29&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICSPIS54653.2021.9729394&rft.externalDocID=9729394