Intelligent recognition of audio scene based on hybrid attention and parallel deep feature processing under genetic evolutionary computing
Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify t...
Saved in:
| Published in | Neural computing & applications Vol. 35; no. 36; pp. 25013 - 25026 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
London
Springer London
01.12.2023
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0941-0643 1433-3058 |
| DOI | 10.1007/s00521-023-08351-0 |
Cover
| Summary: | Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify the scene type. The common feature extraction method of audio signal is Mel frequency cepstrum coefficient. Although this method can capture the most recognizable part of audio data, it can only analyze the short-term characteristics of the signal. This is often not enough to completely describe the structural characteristics of the entire audio data. With the development of computer technology and high-performance processors, audio scene recognition via deep learning solves modeling high-dimensional and multi-classification complex relationships. In this work, we propose an audio scene recognition network that combines deep learning and genetic algorithm called IGA-HA-CNN-BiGRU. First, this work combines CNN and BiGRU networks to build a parallel depth feature extraction network. Parallel neural network has strong learning ability of spatial and temporal features and can effectively extract audio feature parameters. Second, this work combines time-domain attention with channel-domain attention to design a hybrid attention mechanism. This can process features to enhance the discriminability of audio features. Thirdly, in view of the defect of initialization of deep neural network, this work uses improved genetic algorithm to optimize it to improve the model performance. Finally, this work has carried out various experiments on the proposed method, and the experimental data can prove the reliability of the method. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0941-0643 1433-3058 |
| DOI: | 10.1007/s00521-023-08351-0 |