Intelligent recognition of audio scene based on hybrid attention and parallel deep feature processing under genetic evolutionary computing

Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify t...

Full description

Saved in:
Bibliographic Details
Published inNeural computing & applications Vol. 35; no. 36; pp. 25013 - 25026
Main Authors Li, Danyang, Jia, Chunlei
Format Journal Article
LanguageEnglish
Published London Springer London 01.12.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0941-0643
1433-3058
DOI10.1007/s00521-023-08351-0

Cover

More Information
Summary:Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify the scene type. The common feature extraction method of audio signal is Mel frequency cepstrum coefficient. Although this method can capture the most recognizable part of audio data, it can only analyze the short-term characteristics of the signal. This is often not enough to completely describe the structural characteristics of the entire audio data. With the development of computer technology and high-performance processors, audio scene recognition via deep learning solves modeling high-dimensional and multi-classification complex relationships. In this work, we propose an audio scene recognition network that combines deep learning and genetic algorithm called IGA-HA-CNN-BiGRU. First, this work combines CNN and BiGRU networks to build a parallel depth feature extraction network. Parallel neural network has strong learning ability of spatial and temporal features and can effectively extract audio feature parameters. Second, this work combines time-domain attention with channel-domain attention to design a hybrid attention mechanism. This can process features to enhance the discriminability of audio features. Thirdly, in view of the defect of initialization of deep neural network, this work uses improved genetic algorithm to optimize it to improve the model performance. Finally, this work has carried out various experiments on the proposed method, and the experimental data can prove the reliability of the method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-023-08351-0