Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features

Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) pp. 1 - 4
Main Author Sharan, Roneel V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.12.2023
Subjects
Online AccessGet full text
DOI10.1109/ICMLANT59547.2023.10372986

Cover

Abstract Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features.
AbstractList Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features.
Author Sharan, Roneel V.
Author_xml – sequence: 1
  givenname: Roneel V.
  surname: Sharan
  fullname: Sharan, Roneel V.
  email: roneel.sharan@mq.edu.au
  organization: Macquarie University,Australian Institute of Health Innovation,Sydney,NSW,Australia,2109
BookMark eNo1j71OwzAYRY0EA5S-AYPFnuD_xGMV2lIpgATtxFB9cT4XS40TJWbg7Sl_073LPVfnipzHPiIht5zlnDN7t6ke68XTVlutilwwIXPOZCFsac7I3Ba2lJpJJi23l-TtdUB073TZ9Sn0kb6g6w8x_PTdFOKBrqHrIJ0eaIXDlEY40qpH74MLGNNEIbb0HnGgNcIYvxcrhPQx4nRNLjwcJ5z_5YzsVstt9ZDVz-tNtaizIJhKmeUgjDIMjeIMi4aXpUaOBTcKmpZpoRvj3clGo2ta76xsXMvAC9SotCnljNz8cgMi7ocxdDB-7v-d5Rfel1L9
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLANT59547.2023.10372986
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350303919
EndPage 4
ExternalDocumentID 10372986
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683
IEDL.DBID RIE
IngestDate Wed Jan 10 09:27:50 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683
PageCount 4
ParticipantIDs ieee_primary_10372986
PublicationCentury 2000
PublicationDate 2023-Dec.-14
PublicationDateYYYYMMDD 2023-12-14
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-Dec.-14
  day: 14
PublicationDecade 2020
PublicationTitle 2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT)
PublicationTitleAbbrev ICMLANT
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8569744
Snippet Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Deep learning
deep learning features
Emotion recognition
Feature extraction
feature selection
gammatone cepstral coefficients
mel-spectrogram
Mental health
multilayer perceptron
Multilayer perceptrons
Speech enhancement
Speech recognition
Title Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
URI https://ieeexplore.ieee.org/document/10372986
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uTz6pOPFOHnxd13ZJmzxK3bzghugGAx9GLqcq4lakffHXm5O2ioLgWwiEhhPSc8n3nY-Qs1xwm6cMMVUgXIJi3Z0zsexzK4cy50objaWByTS5mrObBV80ZHXPhQEADz6DAIf-Ld-uTYWlsgFy2mIpkg7ppCKpyVpNI9EolIPrbHJ7Pp1xyVkaoCx40C74IZ3iPcd4i0zbb9aAkdegKnVgPn61Y_z3prZJ75ukR---3M8O2YDVLnl8KADMMx3V6jz0vsUHubFHB9BL9eaC1PUKaAaFr3PQbA2-kQRiKqhaWXoBUNCm8-oTxSixcll5j8zHo1l21W_0E_ovcchK9x9TLtxJQkhYFEKqIyE4RIAZktIWBeyQ6ePMxcFomxs51MaGKo-Bg4urxHCPdFduP_uEhjZXLpu1kEeSac6VBG7cXRcKX3JZdEB6aJllUbfIWLZGOfxj_ohs4gEhLiRix6Rbvldw4rx7qU_9qX4CxWumfg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGA46D3pSceK3OXhd13RJ1xylbm66FdENBh5GPt6qiF2R9uKvN0lbRUHwFgKh4Q3p-5HneR-ELtKI6bRPLaYKIpOgaHPnVMA7TPMeT5mQStrSwDQJR3N6s2CLmqzuuDAA4MBn4Nmhe8vXK1XaUlnXctoCHoXraINRSllF16pbiRKfd8fxdHKZzBhntO9ZYXCvWfJDPMX5juE2SpqvVpCRV68spKc-fjVk_Pe2dlD7m6aH774c0C5ag2wPPT7kAOoZDyp9HnzfIITM2OED8LV4M2HqKgMcQ-4qHThegWslYVEVWGQaXwHkuO69-oRtnFiavLyN5sPBLB51agWFzkvg08L8yYQJeEIfQkp86EsSRQwI2BxJSG0l7CzXx5iLgZI6VbwnlfZFGgADE1lFvX3Uysx-DhD2dSpMPqshJZxKxgQHpsxtj4R9y6XkELWtZZZ51SRj2Rjl6I_5c7Q5mk0ny8k4uT1GW_awLEqE0BPUKt5LODW-vpBn7oQ_Aau_qcs
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Machine+Learning+and+Applied+Network+Technologies+%28ICMLANT%29&rft.atitle=Speech+Emotion+Recognition+Using+Gammatone+Cepstral+Coefficients+and+Deep+Learning+Features&rft.au=Sharan%2C+Roneel+V.&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FICMLANT59547.2023.10372986&rft.externalDocID=10372986