Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features
Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use...
Saved in:
| Published in | 2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) pp. 1 - 4 |
|---|---|
| Main Author | |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
14.12.2023
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/ICMLANT59547.2023.10372986 |
Cover
| Abstract | Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features. |
|---|---|
| AbstractList | Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work proposes a method for speech emotion recognition using a combination of handcrafted and deep learning features. In particular, it studies the use of gammatone cepstral coefficients, which make use of gammatone filters which model the human auditory filters, and deep learning feature embeddings extracted from a pretrained network for audio analysis. A multilayer perceptron is employed for classification on the combined feature set where feature selection is performed using one-way analysis of variance. The proposed method is evaluated on a dataset of 535 speech recordings containing 7 types of emotions from 10 subjects. An average accuracy of 0.7631 is achieved in classifying the emotions using speech in leave-one-subject-out cross-validation. Analysis of the results shows that the use of gammatone cepstral coefficients provides improvement in classification accuracy over the conventional mel-frequency cepstral coefficients and the accuracy improves when combined with deep learning features. |
| Author | Sharan, Roneel V. |
| Author_xml | – sequence: 1 givenname: Roneel V. surname: Sharan fullname: Sharan, Roneel V. email: roneel.sharan@mq.edu.au organization: Macquarie University,Australian Institute of Health Innovation,Sydney,NSW,Australia,2109 |
| BookMark | eNo1j71OwzAYRY0EA5S-AYPFnuD_xGMV2lIpgATtxFB9cT4XS40TJWbg7Sl_073LPVfnipzHPiIht5zlnDN7t6ke68XTVlutilwwIXPOZCFsac7I3Ba2lJpJJi23l-TtdUB073TZ9Sn0kb6g6w8x_PTdFOKBrqHrIJ0eaIXDlEY40qpH74MLGNNEIbb0HnGgNcIYvxcrhPQx4nRNLjwcJ5z_5YzsVstt9ZDVz-tNtaizIJhKmeUgjDIMjeIMi4aXpUaOBTcKmpZpoRvj3clGo2ta76xsXMvAC9SotCnljNz8cgMi7ocxdDB-7v-d5Rfel1L9 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICMLANT59547.2023.10372986 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350303919 |
| EndPage | 4 |
| ExternalDocumentID | 10372986 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683 |
| IEDL.DBID | RIE |
| IngestDate | Wed Jan 10 09:27:50 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-91a26460e6410e7b1885e1e7164abd0525b6fc5955ecbdfc93bcd0af2e5e45683 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_10372986 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Dec.-14 |
| PublicationDateYYYYMMDD | 2023-12-14 |
| PublicationDate_xml | – month: 12 year: 2023 text: 2023-Dec.-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | 2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) |
| PublicationTitleAbbrev | ICMLANT |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.8569744 |
| Snippet | Speech emotion recognition finds various applications, such as enhancing human-computer interaction and aiding remote mental health monitoring. This work... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Deep learning deep learning features Emotion recognition Feature extraction feature selection gammatone cepstral coefficients mel-spectrogram Mental health multilayer perceptron Multilayer perceptrons Speech enhancement Speech recognition |
| Title | Speech Emotion Recognition Using Gammatone Cepstral Coefficients and Deep Learning Features |
| URI | https://ieeexplore.ieee.org/document/10372986 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA5uTz6pOPFOHnxd13ZJmzxK3bzghugGAx9GLqcq4lakffHXm5O2ioLgWwiEhhPSc8n3nY-Qs1xwm6cMMVUgXIJi3Z0zsexzK4cy50objaWByTS5mrObBV80ZHXPhQEADz6DAIf-Ld-uTYWlsgFy2mIpkg7ppCKpyVpNI9EolIPrbHJ7Pp1xyVkaoCx40C74IZ3iPcd4i0zbb9aAkdegKnVgPn61Y_z3prZJ75ukR---3M8O2YDVLnl8KADMMx3V6jz0vsUHubFHB9BL9eaC1PUKaAaFr3PQbA2-kQRiKqhaWXoBUNCm8-oTxSixcll5j8zHo1l21W_0E_ovcchK9x9TLtxJQkhYFEKqIyE4RIAZktIWBeyQ6ePMxcFomxs51MaGKo-Bg4urxHCPdFduP_uEhjZXLpu1kEeSac6VBG7cXRcKX3JZdEB6aJllUbfIWLZGOfxj_ohs4gEhLiRix6Rbvldw4rx7qU_9qX4CxWumfg |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGA46D3pSceK3OXhd13RJ1xylbm66FdENBh5GPt6qiF2R9uKvN0lbRUHwFgKh4Q3p-5HneR-ELtKI6bRPLaYKIpOgaHPnVMA7TPMeT5mQStrSwDQJR3N6s2CLmqzuuDAA4MBn4Nmhe8vXK1XaUlnXctoCHoXraINRSllF16pbiRKfd8fxdHKZzBhntO9ZYXCvWfJDPMX5juE2SpqvVpCRV68spKc-fjVk_Pe2dlD7m6aH774c0C5ag2wPPT7kAOoZDyp9HnzfIITM2OED8LV4M2HqKgMcQ-4qHThegWslYVEVWGQaXwHkuO69-oRtnFiavLyN5sPBLB51agWFzkvg08L8yYQJeEIfQkp86EsSRQwI2BxJSG0l7CzXx5iLgZI6VbwnlfZFGgADE1lFvX3Uysx-DhD2dSpMPqshJZxKxgQHpsxtj4R9y6XkELWtZZZ51SRj2Rjl6I_5c7Q5mk0ny8k4uT1GW_awLEqE0BPUKt5LODW-vpBn7oQ_Aau_qcs |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Machine+Learning+and+Applied+Network+Technologies+%28ICMLANT%29&rft.atitle=Speech+Emotion+Recognition+Using+Gammatone+Cepstral+Coefficients+and+Deep+Learning+Features&rft.au=Sharan%2C+Roneel+V.&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FICMLANT59547.2023.10372986&rft.externalDocID=10372986 |