Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features

Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) pp. 1 - 4
Main Author	Sharan, Roneel V.
Format	Conference Proceeding
Language	English
Published	IEEE 14.12.2023
Subjects	Deep learning deep learning features Emotion recognition Feature extraction feature selection gammatone cepstral coefficients mel-spectrogram Mental health multilayer perceptron Multilayer perceptrons Speech enhancement Speech recognition
Online Access	Get full text
DOI	10.1109/ICMLANT59547.2023.10372986

Cover

Abstract	Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features.
AbstractList	Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features.
Author	Sharan, Roneel V.
Author_xml	– sequence: 1 givenname: Roneel V. surname: Sharan fullname: Sharan, Roneel V. email: roneel.sharan@mq.edu.au organization: Macquarie University,Australian Institute of Health Innovation,Sydney,NSW,Australia,2109
BookMark	eNo1j71OwzAYRY0EA5S-AYPFnuD_xGMV2lIpgATtxFB9cT4XS40TJWbg7Sl_073LPVfnipzHPiIht5zlnDN7t6ke68XTVlutilwwIXPOZCFsac7I3Ba2lJpJJi23l-TtdUB073TZ9Sn0kb6g6w8x_PTdFOKBrqHrIJ0eaIXDlEY40qpH74MLGNNEIbb0HnGgNcIYvxcrhPQx4nRNLjwcJ5z_5YzsVstt9ZDVz-tNtaizIJhKmeUgjDIMjeIMi4aXpUaOBTcKmpZpoRvj3clGo2ta76xsXMvAC9SotCnljNz8cgMi7ocxdDB-7v-d5Rfel1L9
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICMLANT59547.2023.10372986
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350303919
EndPage	4
ExternalDocumentID	10372986
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683
IEDL.DBID	RIE
IngestDate	Wed Jan 10 09:27:50 EST 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683
PageCount	4
ParticipantIDs	ieee_primary_10372986
PublicationCentury	2000
PublicationDate	2023-Dec.-14
PublicationDateYYYYMMDD	2023-12-14
PublicationDate_xml	– month: 12 year: 2023 text: 2023-Dec.-14 day: 14
PublicationDecade	2020
PublicationTitle	2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT)
PublicationTitleAbbrev	ICMLANT
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8569744
Snippet	Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Deep learning deep learning features Emotion recognition Feature extraction feature selection gammatone cepstral coefficients mel-spectrogram Mental health multilayer perceptron Multilayer perceptrons Speech enhancement Speech recognition
Title	Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
URI	https://ieeexplore.ieee.org/document/10372986
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uTz6pOPFOHnxd13ZJmzxK3bzghugGAx9GLqcq4lakffHXm5O2ioLgWwiEhhPSc8n3nY-Qs1xwm6cMMVUgXIJi3Z0zsexzK4cy50objaWByTS5mrObBV80ZHXPhQEADz6DAIf-Ld-uTYWlsgFy2mIpkg7ppCKpyVpNI9EolIPrbHJ7Pp1xyVkaoCx40C74IZ3iPcd4i0zbb9aAkdegKnVgPn61Y_z3prZJ75ukR---3M8O2YDVLnl8KADMMx3V6jz0vsUHubFHB9BL9eaC1PUKaAaFr3PQbA2-kQRiKqhaWXoBUNCm8-oTxSixcll5j8zHo1l21W_0E_ovcchK9x9TLtxJQkhYFEKqIyE4RIAZktIWBeyQ6ePMxcFomxs51MaGKo-Bg4urxHCPdFduP_uEhjZXLpu1kEeSac6VBG7cXRcKX3JZdEB6aJllUbfIWLZGOfxj_ohs4gEhLiRix6Rbvldw4rx7qU_9qX4CxWumfg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGA46D3pSceK3OXhd13RJ1xylbm66FdENBh5GPt6qiF2R9uKvN0lbRUHwFgKh4Q3p-5HneR-ELtKI6bRPLaYKIpOgaHPnVMA7TPMeT5mQStrSwDQJR3N6s2CLmqzuuDAA4MBn4Nmhe8vXK1XaUlnXctoCHoXraINRSllF16pbiRKfd8fxdHKZzBhntO9ZYXCvWfJDPMX5juE2SpqvVpCRV68spKc-fjVk_Pe2dlD7m6aH774c0C5ag2wPPT7kAOoZDyp9HnzfIITM2OED8LV4M2HqKgMcQ-4qHThegWslYVEVWGQaXwHkuO69-oRtnFiavLyN5sPBLB51agWFzkvg08L8yYQJeEIfQkp86EsSRQwI2BxJSG0l7CzXx5iLgZI6VbwnlfZFGgADE1lFvX3Uysx-DhD2dSpMPqshJZxKxgQHpsxtj4R9y6XkELWtZZZ51SRj2Rjl6I_5c7Q5mk0ny8k4uT1GW_awLEqE0BPUKt5LODW-vpBn7oQ_Aau_qcs
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Machine+Learning+and+Applied+Network+Technologies+%28ICMLANT%29&rft.atitle=Speech+Emotion+Recognition+Using+Gammatone+Cepstral+Coefficients+and+Deep+Learning+Features&rft.au=Sharan%2C+Roneel+V.&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FICMLANT59547.2023.10372986&rft.externalDocID=10372986