End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments

We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the XXth Conference of Open Innovations Association FRUCT pp. 533 - 539
Main Authors Zinemanas, Pablo, Cancela, Pablo, Rocamora, Martin
Format Conference Proceeding
LanguageEnglish
Published FRUCT 01.04.2019
Subjects
Online AccessGet full text
ISSN2305-7254
DOI10.23919/FRUCT.2019.8711906

Cover

Abstract We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D CNN for the classification task. The main goal of this two-stage architecture is to bring more interpretability to the first layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel-spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normalization' PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modifies the filter bank as well as the PCEN normalization parameters. The obtained system achieves classification results that are comparable to the state-of-the-art, while decreasing the number of parameters involved.
AbstractList We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D CNN for the classification task. The main goal of this two-stage architecture is to bring more interpretability to the first layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel-spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normalization' PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modifies the filter bank as well as the PCEN normalization parameters. The obtained system achieves classification results that are comparable to the state-of-the-art, while decreasing the number of parameters involved.
Author Rocamora, Martin
Cancela, Pablo
Zinemanas, Pablo
Author_xml – sequence: 1
  givenname: Pablo
  surname: Zinemanas
  fullname: Zinemanas, Pablo
  organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay
– sequence: 2
  givenname: Pablo
  surname: Cancela
  fullname: Cancela, Pablo
  organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay
– sequence: 3
  givenname: Martin
  surname: Rocamora
  fullname: Rocamora, Martin
  organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay
BookMark eNotkMtKw0AYRkdRsNY8QTfzAolzydyWEtMqFAVtwF2ZJP9ANJ2RyUV8e6t2dRbf4Sy-a3ThgweEVpRkjBtqbtcvVbHLGKEm04pSQ-QZSozSRjCppeBanqMF40Skion8CiXD8E4IYVpIY9QCvZW-TceQgm9xEfwc-mnsgrc9foIp_mH8CvFjwC5E_Bqmo1fO4Ed8DyM0vy7uPK5ibT0u_dzF4A_HebhBl872AyQnLlG1LnfFQ7p93jwWd9u0o0qMKdciF1blxORc1lLUrs15LohsailJTVprnWa6MSB43UhowDjpBAXHlVOU8SVa_Xc7ANh_xu5g4_f-9AX_AdTbVXY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.23919/FRUCT.2019.8711906
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9789526865386
9526865391
9526865383
9789526865393
EISSN 2305-7254
EndPage 539
ExternalDocumentID 8711906
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADBBV
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
IPNFZ
OCL
RIE
RIG
RIL
ID FETCH-LOGICAL-i175t-38545a7409436b65bfd434506cb660b0daaf828c9e53bc6ece9f6f51ef37f7123
IEDL.DBID RIE
IngestDate Wed Aug 27 02:47:09 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-38545a7409436b65bfd434506cb660b0daaf828c9e53bc6ece9f6f51ef37f7123
PageCount 7
ParticipantIDs ieee_primary_8711906
PublicationCentury 2000
PublicationDate 2019-April
PublicationDateYYYYMMDD 2019-04-01
PublicationDate_xml – month: 04
  year: 2019
  text: 2019-April
PublicationDecade 2010
PublicationTitle Proceedings of the XXth Conference of Open Innovations Association FRUCT
PublicationTitleAbbrev FRUCT
PublicationYear 2019
Publisher FRUCT
Publisher_xml – name: FRUCT
SSID ssj0002856997
Score 2.2119577
Snippet We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It...
SourceID ieee
SourceType Publisher
StartPage 533
SubjectTerms Convolution
Event detection
Indexes
Neural networks
Task analysis
Training
Urban areas
Title End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
URI https://ieeexplore.ieee.org/document/8711906
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSgMxFA2tK1cqrfgmC5emr0zuTNZ1ShEU0Ra6K3ncQFFSqVMXfr1JptYHLlzNMGTIkGQ45ybnnkvIpQCunbSSQeEylgGGf04WjvGBiqdOkiuV3D7vYDzNbmZi1iBX21wYREziM-zE23SWb5dmHbfKuoHcB_yCJmnmBdS5Wtv9lEEhQMq8NhYKnfRld_QwHU6ieissh_rNHyVUEoKM9sjtZ9-1cOSps650x7z_smX878ftk_ZXrh6936LQAWmgb5FZ6S2rlgy9paHd22Z9qWcazTjSJam_X2ngrPQxllaiZVQ-0muskjjL04Wn05VWnpbfcuHaZDoqJ8Mx29RQYItADCrGi0CRVB6jOA4ahHY245nogdEAPd2zSrkQdBmJgmsDaFA6cKKPjucuD7B2SHb80uMRoegEGm5srk0MqkHbPJAPsC7jkWdkx6QVR2X-UttkzDcDcvL341OyG2emFsGckZ1qtcbzgO-VvkgT-wG4d6Ye
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwFLRKOcAJUIvY8YEj6eYl8bm0KtBWCBqpt8rLs4RALiopB74e2yllEQdOiSJHjmxHM8-eNw-hC8aJssKIhGeWJpSD_-dEZhPSkeHUSRApo9vnmA9yejNl0wq6XOfCAEAUn0Ej3MazfDPXy7BV1vTk3uMX30CbjFLKymyt9Y5KJ2NciLS0FvLdtEWzf593J0G_5RdE-e6PIioRQ_o7aPTZeykdeWosC9XQ77-MGf_7ebuo_pWth-_WOLSHKuBqaNpzJinmCTiDfbu31QqTzzjYccRL1H-_Ys9a8UMoroR7QfuIr6CI8iyHHx3OF0o63PuWDVdHeb836Q6SVRWF5NFTgyIhmSdJMg1xHOGKM2UNJZS1uFact1TLSGl92KUFMKI0Bw3CcsvaYElqUw9s-6jq5g4OEAbLQBNtUqVDWM2VST394MZSEpgGPUS1MCqzl9IoY7YakKO_H5-jrcFkNJwNr8e3x2g7zFIpiTlB1WKxhFOP9oU6i5P8AZnqqWs
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+XXth+Conference+of+Open+Innovations+Association+FRUCT&rft.atitle=End-to-end+Convolutional+Neural+Networks+for+Sound+Event+Detection+in+Urban+Environments&rft.au=Zinemanas%2C+Pablo&rft.au=Cancela%2C+Pablo&rft.au=Rocamora%2C+Martin&rft.date=2019-04-01&rft.pub=FRUCT&rft.eissn=2305-7254&rft.spage=533&rft.epage=539&rft_id=info:doi/10.23919%2FFRUCT.2019.8711906&rft.externalDocID=8711906