A Multi-Modal Emotion Recognition System Based on CNN-Transformer Deep Learning Technique

Emotion analysis is a subject that researchers from various fields have been working on for a long time. Different emotion detection methods have been developed for text, audio, photography, and video domains. Automated emotion detection methods using machine learning and deep learning models from v...

Full description

Saved in:
Bibliographic Details
Published in2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) pp. 145 - 150
Main Authors Karatay, Busra, Bestepe, Deniz, Sailunaz, Kashfia, Ozyer, Tansel, Alhajj, Reda
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2022
Subjects
Online AccessGet full text
DOI10.1109/CDMA54072.2022.00029

Cover

Abstract Emotion analysis is a subject that researchers from various fields have been working on for a long time. Different emotion detection methods have been developed for text, audio, photography, and video domains. Automated emotion detection methods using machine learning and deep learning models from videos and pictures have been an interesting topic for researchers. In this paper, a deep learning framework, in which CNN and Transformer models are combined, that classifies emotions using facial and body features extracted from videos is proposed. Facial and body features were extracted using OpenPose, and in the data preprocessing stage 2 operations such as new video creation and frame selection were tried. The experiments were conducted on two datasets, FABO and CK+. Our framework outperformed similar deep learning models with 99% classification accuracy for the FABO dataset, and showed remarkable performance over 90% accuracy for most versions of the framework for both the FABO and CK+ dataset.
AbstractList Emotion analysis is a subject that researchers from various fields have been working on for a long time. Different emotion detection methods have been developed for text, audio, photography, and video domains. Automated emotion detection methods using machine learning and deep learning models from videos and pictures have been an interesting topic for researchers. In this paper, a deep learning framework, in which CNN and Transformer models are combined, that classifies emotions using facial and body features extracted from videos is proposed. Facial and body features were extracted using OpenPose, and in the data preprocessing stage 2 operations such as new video creation and frame selection were tried. The experiments were conducted on two datasets, FABO and CK+. Our framework outperformed similar deep learning models with 99% classification accuracy for the FABO dataset, and showed remarkable performance over 90% accuracy for most versions of the framework for both the FABO and CK+ dataset.
Author Karatay, Busra
Ozyer, Tansel
Sailunaz, Kashfia
Alhajj, Reda
Bestepe, Deniz
Author_xml – sequence: 1
  givenname: Busra
  surname: Karatay
  fullname: Karatay, Busra
  email: bkaratay@etu.edu.tr
  organization: TOBB University of Economics and Technology,Department of Computer Engineering,Ankara,Turkey
– sequence: 2
  givenname: Deniz
  surname: Bestepe
  fullname: Bestepe, Deniz
  email: dbestepe@etu.edu.tr
  organization: University of Calgary,Deptartment of Computer Science,Calgary,AB,Canada
– sequence: 3
  givenname: Kashfia
  surname: Sailunaz
  fullname: Sailunaz, Kashfia
  email: kashfia.sailunaz@ucalgary.ca
  organization: Istanbul Medipol University,Department of Computer Engineering,Istanbul,Turkey
– sequence: 4
  givenname: Tansel
  surname: Ozyer
  fullname: Ozyer, Tansel
  email: ozyer@etu.edu.tr
  organization: Ankara Medipol University,Department of Computer Engineering,Ankara,Turkey
– sequence: 5
  givenname: Reda
  surname: Alhajj
  fullname: Alhajj, Reda
  email: alhajj@ucalgary.ca
  organization: University of Southern Denmark,Department of Health Informatics,Odense,Denmark
BookMark eNotjLtOwzAUQI0EAy18AQz-gYTrR-x6DGl5SEmRIAxMlZNcF0uJU5x06N-DgOmcs5wFOQ9jQEJuGaSMgbkr1lWeSdA85cB5CgDcnJEFUyqTDJiES_KR0-rYzz6pxs72dDOMsx8DfcV23Af_62-nacaB3tsJO_rTxXab1NGGyY1xwEjXiAdaoo3Bhz2tsf0M_uuIV-TC2X7C638uyfvDpi6ekvLl8bnIy8QzIeZEoWlBgdDCSmhMgxlTwJ12gE3X6BaYASnblW64c1ZxB91KS-mU4UYrdGJJbv6-HhF3h-gHG087o4USiolvkHROfw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CDMA54072.2022.00029
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665410140
9781665410144
EndPage 150
ExternalDocumentID 9736361
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i133t-6e9c060373a40b9be51602f7f0ebdb7c019044c87b2ffa62f0d8744f692976ef3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:36:58 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i133t-6e9c060373a40b9be51602f7f0ebdb7c019044c87b2ffa62f0d8744f692976ef3
PageCount 6
ParticipantIDs ieee_primary_9736361
PublicationCentury 2000
PublicationDate 2022-March
PublicationDateYYYYMMDD 2022-03-01
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-March
PublicationDecade 2020
PublicationTitle 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)
PublicationTitleAbbrev CDMA
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.833869
Snippet Emotion analysis is a subject that researchers from various fields have been working on for a long time. Different emotion detection methods have been...
SourceID ieee
SourceType Publisher
StartPage 145
SubjectTerms CNN
Deep learning
emotion
emotion classi-fication
Emotion recognition
Face recognition
Feature extraction
Streaming media
Text analysis
Transformer
Transformers
Title A Multi-Modal Emotion Recognition System Based on CNN-Transformer Deep Learning Technique
URI https://ieeexplore.ieee.org/document/9736361
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA61J08qrfiokoNH02Z3s0lzrH1QhC0iLdRTyWMiorSlbC_-epN9VBEP3pJcEiaEb2Yy3zcI3VFtEhOYvwyEJkxYTpSHJSKYEoYDJEaFfEc249MFe1ymywa6P3BhAKAoPoNuGBZ_-XZj9iFV1pMi4UmIdY6EkCVXq2LDRVT2hqNsEOTkAr0qLmQ4g9_4o2dKARmTE5TVm5WVIu_dfa675vOXDuN_T3OK2t_kPPx0gJ0z1IB1C70McEGlJdnGqg88Lpvz4Oe6PMiPS21y_OBhy2I_H85mZF67rbDDI4AtruRWX_G81nZto8VkPB9OSdU1gbz5eDMnHKShnCYiUYxqqSGNOI2dcBS01cIE8jhjpi907JzisaM2SOA77h0lwcEl56i53qzhAmGb9l0qdWqs7DPtfRkeqcipVMQp-JceXaJWMMtqWwpjrCqLXP29fI2Ow8WUBVwd1Mx3e7jxiJ7r2-IqvwCWQ6Ij
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5FD3pSacW3OXg07T7y6B5rH1TtLiJbqKeySSYiSlvK9uKvN9lHFfHgbZJLQobwzSTzfYPQjSdVqBzzl4KQhArNSWZhiQiaCcUBQpW594444eMpfZixWQPdbrkwAFAUn0HbmcVfvl6qjXsq60Qi5KHLdXaZzSpEydaq-HC-F3X6g7jnBOUcwSoohDhd5Pija0oBGqMDFNfLlbUi7-1NLtvq85cS43_3c4ha3_Q8_LQFniPUgEUTvfRwQaYl8VJnH3hYtufBz3WBkLVLdXJ8Z4FLYzvuJwlJ68AV1ngAsMKV4OorTmt11xaajoZpf0yqvgnkzWacOeEQKY97oQgz6slIAvO5FxhhPJBaCuXo45SqrpCBMRkPjKedCL7hNlQSHEx4jHYWywWcIKxZ17BIMqWjLpU2muF-5puMiYCBvev-KWq6Y5mvSmmMeXUiZ39PX6O9cRpP5pP75PEc7TsnleVcF2gnX2_g0uJ7Lq8Kt34BS7OldA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+7th+International+Conference+on+Data+Science+and+Machine+Learning+Applications+%28CDMA%29&rft.atitle=A+Multi-Modal+Emotion+Recognition+System+Based+on+CNN-Transformer+Deep+Learning+Technique&rft.au=Karatay%2C+Busra&rft.au=Bestepe%2C+Deniz&rft.au=Sailunaz%2C+Kashfia&rft.au=Ozyer%2C+Tansel&rft.date=2022-03-01&rft.pub=IEEE&rft.spage=145&rft.epage=150&rft_id=info:doi/10.1109%2FCDMA54072.2022.00029&rft.externalDocID=9736361