End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D...
        Saved in:
      
    
          | Published in | Proceedings of the XXth Conference of Open Innovations Association FRUCT pp. 533 - 539 | 
|---|---|
| Main Authors | , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            FRUCT
    
        01.04.2019
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2305-7254 | 
| DOI | 10.23919/FRUCT.2019.8711906 | 
Cover
| Abstract | We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D CNN for the classification task. The main goal of this two-stage architecture is to bring more interpretability to the first layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel-spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normalization' PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modifies the filter bank as well as the PCEN normalization parameters. The obtained system achieves classification results that are comparable to the state-of-the-art, while decreasing the number of parameters involved. | 
    
|---|---|
| AbstractList | We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D CNN for the classification task. The main goal of this two-stage architecture is to bring more interpretability to the first layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel-spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normalization' PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modifies the filter bank as well as the PCEN normalization parameters. The obtained system achieves classification results that are comparable to the state-of-the-art, while decreasing the number of parameters involved. | 
    
| Author | Rocamora, Martin Cancela, Pablo Zinemanas, Pablo  | 
    
| Author_xml | – sequence: 1 givenname: Pablo surname: Zinemanas fullname: Zinemanas, Pablo organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay – sequence: 2 givenname: Pablo surname: Cancela fullname: Cancela, Pablo organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay – sequence: 3 givenname: Martin surname: Rocamora fullname: Rocamora, Martin organization: Facultad de Ingenieria, Universidad de la República Montevideo, Uruguay  | 
    
| BookMark | eNotkMtKw0AYRkdRsNY8QTfzAolzydyWEtMqFAVtwF2ZJP9ANJ2RyUV8e6t2dRbf4Sy-a3ThgweEVpRkjBtqbtcvVbHLGKEm04pSQ-QZSozSRjCppeBanqMF40Skion8CiXD8E4IYVpIY9QCvZW-TceQgm9xEfwc-mnsgrc9foIp_mH8CvFjwC5E_Bqmo1fO4Ed8DyM0vy7uPK5ibT0u_dzF4A_HebhBl872AyQnLlG1LnfFQ7p93jwWd9u0o0qMKdciF1blxORc1lLUrs15LohsailJTVprnWa6MSB43UhowDjpBAXHlVOU8SVa_Xc7ANh_xu5g4_f-9AX_AdTbVXY | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.23919/FRUCT.2019.8711906 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISBN | 9789526865386 9526865391 9526865383 9789526865393  | 
    
| EISSN | 2305-7254 | 
    
| EndPage | 539 | 
    
| ExternalDocumentID | 8711906 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ADBBV ADZIZ ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI IPNFZ OCL RIE RIG RIL  | 
    
| ID | FETCH-LOGICAL-i175t-38545a7409436b65bfd434506cb660b0daaf828c9e53bc6ece9f6f51ef37f7123 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Wed Aug 27 02:47:09 EDT 2025 | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i175t-38545a7409436b65bfd434506cb660b0daaf828c9e53bc6ece9f6f51ef37f7123 | 
    
| PageCount | 7 | 
    
| ParticipantIDs | ieee_primary_8711906 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2019-April | 
    
| PublicationDateYYYYMMDD | 2019-04-01 | 
    
| PublicationDate_xml | – month: 04 year: 2019 text: 2019-April  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | Proceedings of the XXth Conference of Open Innovations Association FRUCT | 
    
| PublicationTitleAbbrev | FRUCT | 
    
| PublicationYear | 2019 | 
    
| Publisher | FRUCT | 
    
| Publisher_xml | – name: FRUCT | 
    
| SSID | ssj0002856997 | 
    
| Score | 2.2119577 | 
    
| Snippet | We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 533 | 
    
| SubjectTerms | Convolution Event detection Indexes Neural networks Task analysis Training Urban areas  | 
    
| Title | End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments | 
    
| URI | https://ieeexplore.ieee.org/document/8711906 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LSgMxFA2tK1cqrfgmC5emr0zuTNZ1ShEU0Ra6K3ncQFFSqVMXfr1JptYHLlzNMGTIkGQ45ybnnkvIpQCunbSSQeEylgGGf04WjvGBiqdOkiuV3D7vYDzNbmZi1iBX21wYREziM-zE23SWb5dmHbfKuoHcB_yCJmnmBdS5Wtv9lEEhQMq8NhYKnfRld_QwHU6ieissh_rNHyVUEoKM9sjtZ9-1cOSps650x7z_smX878ftk_ZXrh6936LQAWmgb5FZ6S2rlgy9paHd22Z9qWcazTjSJam_X2ngrPQxllaiZVQ-0muskjjL04Wn05VWnpbfcuHaZDoqJ8Mx29RQYItADCrGi0CRVB6jOA4ahHY245nogdEAPd2zSrkQdBmJgmsDaFA6cKKPjucuD7B2SHb80uMRoegEGm5srk0MqkHbPJAPsC7jkWdkx6QVR2X-UttkzDcDcvL341OyG2emFsGckZ1qtcbzgO-VvkgT-wG4d6Ye | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwFLRKOcAJUIvY8YEj6eYl8bm0KtBWCBqpt8rLs4RALiopB74e2yllEQdOiSJHjmxHM8-eNw-hC8aJssKIhGeWJpSD_-dEZhPSkeHUSRApo9vnmA9yejNl0wq6XOfCAEAUn0Ej3MazfDPXy7BV1vTk3uMX30CbjFLKymyt9Y5KJ2NciLS0FvLdtEWzf593J0G_5RdE-e6PIioRQ_o7aPTZeykdeWosC9XQ77-MGf_7ebuo_pWth-_WOLSHKuBqaNpzJinmCTiDfbu31QqTzzjYccRL1H-_Ys9a8UMoroR7QfuIr6CI8iyHHx3OF0o63PuWDVdHeb836Q6SVRWF5NFTgyIhmSdJMg1xHOGKM2UNJZS1uFact1TLSGl92KUFMKI0Bw3CcsvaYElqUw9s-6jq5g4OEAbLQBNtUqVDWM2VST394MZSEpgGPUS1MCqzl9IoY7YakKO_H5-jrcFkNJwNr8e3x2g7zFIpiTlB1WKxhFOP9oU6i5P8AZnqqWs | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+XXth+Conference+of+Open+Innovations+Association+FRUCT&rft.atitle=End-to-end+Convolutional+Neural+Networks+for+Sound+Event+Detection+in+Urban+Environments&rft.au=Zinemanas%2C+Pablo&rft.au=Cancela%2C+Pablo&rft.au=Rocamora%2C+Martin&rft.date=2019-04-01&rft.pub=FRUCT&rft.eissn=2305-7254&rft.spage=533&rft.epage=539&rft_id=info:doi/10.23919%2FFRUCT.2019.8711906&rft.externalDocID=8711906 |