Speech emotion recognition using convolutional and Recurrent Neural Networks

With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in va...

Full description

Saved in:
Bibliographic Details
Published in2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) pp. 1 - 4
Main Authors Wootaek Lim, Daeyoung Jang, Taejin Lee
Format Conference Proceeding
LanguageEnglish
Published Asia Pacific Signal and Information Processing Association 01.12.2016
Subjects
Online AccessGet full text
DOI10.1109/APSIPA.2016.7820699

Cover

Abstract With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in various recognition tasks such as image, speech, and music recognition. Convolutional Neural Networks (CNNs) especially show remarkable recognition performance for computer vision tasks. In addition, Recurrent Neural Networks (RNNs) show considerable success in many sequential data processing tasks. In this study, we investigate the result of the Speech Emotion Recognition (SER) algorithm based on CNNs and RNNs trained using an emotional speech database. The main goal of our work is to propose a SER method based on concatenated CNNs and RNNs without using any traditional hand-crafted features. By applying the proposed methods to an emotional speech database, the classification result was verified to have better accuracy than that achieved using conventional classification methods.
AbstractList With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in a variety of research areas such as pattern recognition, classification, and signal processing. Deep learning methods are being applied in various recognition tasks such as image, speech, and music recognition. Convolutional Neural Networks (CNNs) especially show remarkable recognition performance for computer vision tasks. In addition, Recurrent Neural Networks (RNNs) show considerable success in many sequential data processing tasks. In this study, we investigate the result of the Speech Emotion Recognition (SER) algorithm based on CNNs and RNNs trained using an emotional speech database. The main goal of our work is to propose a SER method based on concatenated CNNs and RNNs without using any traditional hand-crafted features. By applying the proposed methods to an emotional speech database, the classification result was verified to have better accuracy than that achieved using conventional classification methods.
Author Wootaek Lim
Daeyoung Jang
Taejin Lee
Author_xml – sequence: 1
  surname: Wootaek Lim
  fullname: Wootaek Lim
  email: wtlim@etri.re.kr
  organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea
– sequence: 2
  surname: Daeyoung Jang
  fullname: Daeyoung Jang
  organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea
– sequence: 3
  surname: Taejin Lee
  fullname: Taejin Lee
  organization: Audio & Acoust. Res. Sect., ETRI, Daejeon, South Korea
BookMark eNotj8tOwzAURI0EC1r4gm78Awm5TurHMqp4VKpKRWFd-XFdLFK7chIQf0-BrmZ0FkczE3IZU0RCZlCVAJW6azfb5aYtWQW8FJJVXKkLMlFSQiO4ZPKarLZHRPtO8ZCGkCLNaNM-hr8-9iHuqU3xM3XjL9Ed1dHRF7RjzhgHusYxn-Aah6-UP_obcuV11-PtOafk7eH-dfFUrJ4fl4t2VQTWwFDMwToz9waMc77x2hnUXBnua6OZUyisAOG9M7Jm4I09TbVMo0LDhZBW1lMy-_cGRNwdczjo_L07H6x_ADnaTik
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/APSIPA.2016.7820699
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9881476828
9789881476821
EndPage 4
ExternalDocumentID 7820699
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i241t-51cdb5fb1bddf4fadbea69b6f3ba2d9e7c717ffdb8321fbc768c2ae9eb6778c83
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:22 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-51cdb5fb1bddf4fadbea69b6f3ba2d9e7c717ffdb8321fbc768c2ae9eb6778c83
PageCount 4
ParticipantIDs ieee_primary_7820699
PublicationCentury 2000
PublicationDate 2016-12
PublicationDateYYYYMMDD 2016-12-01
PublicationDate_xml – month: 12
  year: 2016
  text: 2016-12
PublicationDecade 2010
PublicationTitle 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
PublicationTitleAbbrev APSIPA
PublicationYear 2016
Publisher Asia Pacific Signal and Information Processing Association
Publisher_xml – name: Asia Pacific Signal and Information Processing Association
Score 2.0761075
Snippet With rapid developments in the design of deep architecture models and learning algorithms, methods referred to as deep learning have come to be widely used in...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Convolution
Emotion recognition
Recurrent neural networks
Speech
Speech recognition
Title Speech emotion recognition using convolutional and Recurrent Neural Networks
URI https://ieeexplore.ieee.org/document/7820699
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3fS8MwEMePbU8-qWzib_Lgo-1M16bN4xDHFB3DOdjbyCUXFaUb2r3415u0taL44EsJoZCQK9xd-rnvAZwJijObmczlJhi5h4gCxZ1BuLSSIheiWCwp34kYz-ObRbJowXlTC0NEJXxGoR-W__LNSm_8VVnfa7sJKdvQTjNR1WrVQkL8QvaH09n1dOhpLRHWb_5omVJ6jNE23H2tVYEiL-GmwFB__JJh_O9mdqD3XZvHpo3X2YUW5V24na2J9BOjqikPa7AgN_Zk-yPzdHn9lalXpnLD7v1Nu9dmYl6gw01OKiL8vQfz0dXD5Tio-yQEz87_FkHCtcHEIkdjbGyVQVJCorADVJGRlGqXs1lr0HclsqhdhqEjRZLQq8fpbLAHnXyV0z4wcuGR5HGKWqUxUYouXMlMpJwhBdcSD6DrT2K5rqQwlvUhHP49fQRb3hoV_XEMneJtQyfOhxd4WhrvE2GjoVE
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3PS8MwFMfDnAc9qWzib3PwaDvTpWlzHOKYuo3hNtht5CUvKko3tLv415u0daJ48FJCKCTkFd576ed9HyEXAnlqU5O63AQi9xBRoJgzCJNWYuRCFAsF5TsUvSm_m8WzGrlc18IgYgGfYeiHxb98s9Arf1XW8tpuQsoNshlzzuOyWquSEmJXstUZjW9HHc9ribB690fTlMJndHfI4Gu1EhV5CVc5hPrjlxDjf7ezS5rf1Xl0tPY7e6SGWYP0x0tE_USxbMtD12CQG3u2_ZF6vrz6ztQrVZmhD_6u3aszUS_R4SaHJRP-3iTT7s3kuhdUnRKCZ-eB8yBm2kBsgYExlltlAJWQIGwbVGQkJtplbdYa8H2JLGiXY-hIoUTw-nE6be-TerbI8IBQdAGSZDwBrRKOmIALWFITKWdKwbSEQ9LwJzFflmIY8-oQjv6ePidbvcmgP-_fDu-Pyba3TMmCnJB6_rbCU-fRczgrDPkJ_rWkng
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+%28APSIPA%29&rft.atitle=Speech+emotion+recognition+using+convolutional+and+Recurrent+Neural+Networks&rft.au=Wootaek+Lim&rft.au=Daeyoung+Jang&rft.au=Taejin+Lee&rft.date=2016-12-01&rft.pub=Asia+Pacific+Signal+and+Information+Processing+Association&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FAPSIPA.2016.7820699&rft.externalDocID=7820699