Speech emotion recognition using convolutional and Recurrent Neural Networks

With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in va...

Full description

Saved in:

Bibliographic Details
Published in	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) pp. 1 - 4
Main Authors	Wootaek Lim, Daeyoung Jang, Taejin Lee
Format	Conference Proceeding
Language	English
Published	Asia Pacific Signal and Information Processing Association 01.12.2016
Subjects	Convolution Emotion recognition Recurrent neural networks Speech Speech recognition
Online Access	Get full text
DOI	10.1109/APSIPA.2016.7820699

Cover

Abstract	With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in various recognition tasks such as image, speech, and music recognition. Convolutional Neural Networks (CNNs) especially show remarkable recognition performance for computer vision tasks. In addition, Recurrent Neural Networks (RNNs) show considerable success in many sequential data processing tasks. In this study, we investigate the result of the Speech Emotion Recognition (SER) algorithm based on CNNs and RNNs trained using an emotional speech database. The main goal of our work is to propose a SER method based on concatenated CNNs and RNNs without using any traditional hand-crafted features. By applying the proposed methods to an emotional speech database, the classification result was verified to have better accuracy than that achieved using conventional classification methods.
AbstractList	With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in various recognition tasks such as image, speech, and music recognition. Convolutional Neural Networks (CNNs) especially show remarkable recognition performance for computer vision tasks. In addition, Recurrent Neural Networks (RNNs) show considerable success in many sequential data processing tasks. In this study, we investigate the result of the Speech Emotion Recognition (SER) algorithm based on CNNs and RNNs trained using an emotional speech database. The main goal of our work is to propose a SER method based on concatenated CNNs and RNNs without using any traditional hand-crafted features. By applying the proposed methods to an emotional speech database, the classification result was verified to have better accuracy than that achieved using conventional classification methods.
Author	Wootaek Lim Daeyoung Jang Taejin Lee
Author_xml	– sequence: 1 surname: Wootaek Lim fullname: Wootaek Lim email: wtlim@etri.re.kr organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea – sequence: 2 surname: Daeyoung Jang fullname: Daeyoung Jang organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea – sequence: 3 surname: Taejin Lee fullname: Taejin Lee organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea
BookMark	eNotj8tOwzAURI0EC1r4gm78Awm5TurHMqp4VKpKRWFd-XFdLFK7chIQf0-BrmZ0FkczE3IZU0RCZlCVAJW6azfb5aYtWQW8FJJVXKkLMlFSQiO4ZPKarLZHRPtO8ZCGkCLNaNM-hr8-9iHuqU3xM3XjL9Ed1dHRF7RjzhgHusYxn-Aah6-UP_obcuV11-PtOafk7eH-dfFUrJ4fl4t2VQTWwFDMwToz9waMc77x2hnUXBnua6OZUyisAOG9M7Jm4I09TbVMo0LDhZBW1lMy-_cGRNwdczjo_L07H6x_ADnaTik
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/APSIPA.2016.7820699
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9881476828 9789881476821
EndPage	4
ExternalDocumentID	7820699
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i241t-51cdb5fb1bddf4fadbea69b6f3ba2d9e7c717ffdb8321fbc768c2ae9eb6778c83
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:38:22 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-51cdb5fb1bddf4fadbea69b6f3ba2d9e7c717ffdb8321fbc768c2ae9eb6778c83
PageCount	4
ParticipantIDs	ieee_primary_7820699
PublicationCentury	2000
PublicationDate	2016-12
PublicationDateYYYYMMDD	2016-12-01
PublicationDate_xml	– month: 12 year: 2016 text: 2016-12
PublicationDecade	2010
PublicationTitle	2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
PublicationTitleAbbrev	APSIPA
PublicationYear	2016
Publisher	Asia Pacific Signal and Information Processing Association
Publisher_xml	– name: Asia Pacific Signal and Information Processing Association
Score	2.0761075
Snippet	With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Convolution Emotion recognition Recurrent neural networks Speech Speech recognition
Title	Speech emotion recognition using convolutional and Recurrent Neural Networks
URI	https://ieeexplore.ieee.org/document/7820699
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3fS8MwEMePbU8-qWzib_Lgo-1M16bN4xDHFB3DOdjbyCUXFaUb2r3415u0taL44EsJoZCQK9xd-rnvAZwJijObmczlJhi5h4gCxZ1BuLSSIheiWCwp34kYz-ObRbJowXlTC0NEJXxGoR-W__LNSm_8VVnfa7sJKdvQTjNR1WrVQkL8QvaH09n1dOhpLRHWb_5omVJ6jNE23H2tVYEiL-GmwFB__JJh_O9mdqD3XZvHpo3X2YUW5V24na2J9BOjqikPa7AgN_Zk-yPzdHn9lalXpnLD7v1Nu9dmYl6gw01OKiL8vQfz0dXD5Tio-yQEz87_FkHCtcHEIkdjbGyVQVJCorADVJGRlGqXs1lr0HclsqhdhqEjRZLQq8fpbLAHnXyV0z4wcuGR5HGKWqUxUYouXMlMpJwhBdcSD6DrT2K5rqQwlvUhHP49fQRb3hoV_XEMneJtQyfOhxd4WhrvE2GjoVE
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3PS8MwFMfDnAc9qWzib3PwaDvTpWlzHOKYuo3hNtht5CUvKko3tLv415u0daJ48FJCKCTkFd576ed9HyEXAnlqU5O63AQi9xBRoJgzCJNWYuRCFAsF5TsUvSm_m8WzGrlc18IgYgGfYeiHxb98s9Arf1XW8tpuQsoNshlzzuOyWquSEmJXstUZjW9HHc9ribB690fTlMJndHfI4Gu1EhV5CVc5hPrjlxDjf7ezS5rf1Xl0tPY7e6SGWYP0x0tE_USxbMtD12CQG3u2_ZF6vrz6ztQrVZmhD_6u3aszUS_R4SaHJRP-3iTT7s3kuhdUnRKCZ-eB8yBm2kBsgYExlltlAJWQIGwbVGQkJtplbdYa8H2JLGiXY-hIoUTw-nE6be-TerbI8IBQdAGSZDwBrRKOmIALWFITKWdKwbSEQ9LwJzFflmIY8-oQjv6ePidbvcmgP-_fDu-Pyba3TMmCnJB6_rbCU-fRczgrDPkJ_rWkng
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+%28APSIPA%29&rft.atitle=Speech+emotion+recognition+using+convolutional+and+Recurrent+Neural+Networks&rft.au=Wootaek+Lim&rft.au=Daeyoung+Jang&rft.au=Taejin+Lee&rft.date=2016-12-01&rft.pub=Asia+Pacific+Signal+and+Information+Processing+Association&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FAPSIPA.2016.7820699&rft.externalDocID=7820699