Multi-modal Emotion Recognition Using Canonical Correlations and Acoustic Features

The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the base system. In the article, we present our approach...

Full description

Saved in:
Bibliographic Details
Published in2010 20th International Conference on Pattern Recognition pp. 4133 - 4136
Main Authors Gajsek, Rok, Struc, Vitomir, Mihelic, France
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2010
Subjects
Online AccessGet full text
ISBN1424475422
9781424475421
ISSN1051-4651
DOI10.1109/ICPR.2010.1005

Cover

Abstract The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm fails at detecting the correct area. The system is evaluated on the eNTERFACE database and the recognition accuracy of our audio-video fusion is compared to the published results in the literature.
AbstractList The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video recognition systems. As well as enabling a better user experience, it can also assist in superior recognition accuracy of the base system. In the article, we present our approach to multi-modal (audio-video) emotion recognition system. For audio sub-system, a feature set comprised of prosodic, spectral and cepstrum features is selected and support vector classifier is used to produce the scores for each emotional category. For video sub-system a novel approach is presented, which does not rely on the tracking of specific facial landmarks and thus, eliminates the problems usually caused, if the tracking algorithm fails at detecting the correct area. The system is evaluated on the eNTERFACE database and the recognition accuracy of our audio-video fusion is compared to the published results in the literature.
Author Struc, Vitomir
Gajsek, Rok
Mihelic, France
Author_xml – sequence: 1
  givenname: Rok
  surname: Gajsek
  fullname: Gajsek, Rok
  email: rok.gajsek@fe.uni-lj.si
  organization: Fac. of Electr. Eng., Univ. of Ljubljana, Ljubljana, Slovenia
– sequence: 2
  givenname: Vitomir
  surname: Struc
  fullname: Struc, Vitomir
– sequence: 3
  givenname: France
  surname: Mihelic
  fullname: Mihelic, France
  email: france.mihelic@fe.uni-lj.si
  organization: Fac. of Electr. Eng., Univ. of Ljubljana, Ljubljana, Slovenia
BookMark eNo1jk1Lw0AYhFesYFtz9eIlfyD13c1-Hktoa6GihHoumzebspBkJZsc_Pe2fpxmHmYYZkFmfegdIY8UVpSCed4X7-WKwRUBxA1JjNKUM86V4JTfksU_MDYjcwqCZlwKek-SGH0FTCqphBBzUr5O7eizLtS2TTddGH3o09JhOPf-x39E35_Twl4OeLx0ijAMrrXXLKa2r9M1himOHtOts-M0uPhA7hrbRpf86ZIct5tj8ZId3nb7Yn3IvIExy8FKbqShgIgWdO1ypkwDVV0hCqSc1kIx0DYHIRWTqI2SstGOUasQdb4kT7-z3jl3-hx8Z4evkxBGqZzl34XPU7o
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICPR.2010.1005
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISBN 9781424475414
9780769541099
1424475414
0769541097
EndPage 4136
ExternalDocumentID 5597732
Genre orig-research
GroupedDBID 29J
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i90t-30a6496910ccca08de3279f0bdbcc5c141d57208a3056726c89766f8e21a7cc83
IEDL.DBID RIE
ISBN 1424475422
9781424475421
ISSN 1051-4651
IngestDate Wed Aug 27 02:38:12 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-30a6496910ccca08de3279f0bdbcc5c141d57208a3056726c89766f8e21a7cc83
PageCount 4
ParticipantIDs ieee_primary_5597732
PublicationCentury 2000
PublicationDate 2010-Aug.
PublicationDateYYYYMMDD 2010-08-01
PublicationDate_xml – month: 08
  year: 2010
  text: 2010-Aug.
PublicationDecade 2010
PublicationTitle 2010 20th International Conference on Pattern Recognition
PublicationTitleAbbrev ICPR
PublicationYear 2010
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026767555
ssj0020358
ssj0000452726
Score 1.9013704
Snippet The information of the psycho-physical state of the subject is becoming a valuable addition to the modern audio or video recognition systems. As well as...
SourceID ieee
SourceType Publisher
StartPage 4133
SubjectTerms Correlation
Emotion recognition
Face
Feature extraction
multi-modal emotions
Support vector machines
Video sequences
Title Multi-modal Emotion Recognition Using Canonical Correlations and Acoustic Features
URI https://ieeexplore.ieee.org/document/5597732
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8MgGCdzJ71Mtxnf4eDRblBaCkfTbJkmM8syk90WyiMx6mpMe_GvFyidi_HgjfZE4QO-r_weANwySamQCEU6KeIoMVhGLC4cZgoZbApaEOTYyPMnOntOHtfpugPudlwYrbUHn-mRa_q7fFXK2v0qG3uxNGI33IMs4w1Xq42d2AuPBY6l34WTNM5cahGKL0TShhaX2pqJprgleTkH2LjVfgrPOKg7YsTHD_li2SDAMHIed3seLP4ImvbAvO18gzx5HdVVMZJfv3Qd__t1x2D4Q_aDi90xdgI6etsHvdbtAYbF3wdHe9KFA7D0zN3ovVTiDU4aLyC4bNFItu2xCDAX29IzL2HubEAC8A6KrYL3svROYtBlobWt-odgNZ2s8lkU_BmiF46qiCBBE05tviFtGCCmNIkzblChCilTiROs0ixGTLgqxU6FZDb1oYbpGItMSkZOQdf2QZ8BSLihRkrClJ0BgxU3XEusBSJGI5XwczBw47X5aBQ4NmGoLv5-fQkOmzt-F3JXoFt91vrapg5VceNj5huAtrst
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGG8IHtQLChjf9uDRQR9btx0NwYACIQQTbqTPxKjMmO3iX2_bbUiMB2_dTl37tf2-9fcA4DaRjHGJUKBDQYLQYBkkRDjMFDLYCCYocmzk6YyNnsPHVbRqgLstF0Zr7cFnuuea_i5fZbJwv8r6XiyN2g13L7JVRVyyteroIV56rGJZ-n04jEjskouq_EI0Kolxka2aWIRrmpfzgCW1-lP1jCt9R4zS_ngwX5QYMIycy92OC4s_hB5aYFp3v8SevPaKXPTk1y9lx_9-3xHo_tD94Hx7kB2Dht60Qav2e4DV8m-Dwx3xwg5YeO5u8J4p_gaHpRsQXNR4JNv2aAQ44JvMcy_hwBmBVNA7yDcK3svMe4lBl4cWtu7vguXDcDkYBZVDQ_CSojygiLMwZTbjkDYQUKI0JXFqkFBCykjiEKsoJijhrk6xUyETm_wwk2iCeSxlQk9A0_ZBnwJIU8OMlDRRdgYMVqlJtcSaI2o0UmF6BjpuvNYfpQbHuhqq879f34D90XI6WU_Gs6cLcFDe-LsAvATN_LPQVzaRyMW1j59vjJi-fg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2010+20th+International+Conference+on+Pattern+Recognition&rft.atitle=Multi-modal+Emotion+Recognition+Using+Canonical+Correlations+and+Acoustic+Features&rft.au=Gajsek%2C+Rok&rft.au=Struc%2C+Vitomir&rft.au=Mihelic%2C+France&rft.date=2010-08-01&rft.pub=IEEE&rft.isbn=9781424475421&rft.issn=1051-4651&rft.spage=4133&rft.epage=4136&rft_id=info:doi/10.1109%2FICPR.2010.1005&rft.externalDocID=5597732
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-4651&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-4651&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-4651&client=summon