Intelligent recognition of audio scene based on hybrid attention and parallel deep feature processing under genetic evolutionary computing

Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify t...

Full description

Saved in:

Bibliographic Details
Published in	Neural computing & applications Vol. 35; no. 36; pp. 25013 - 25026
Main Authors	Li, Danyang, Jia, Chunlei
Format	Journal Article
Language	English
Published	London Springer London 01.12.2023 Springer Nature B.V
Subjects	Artificial Intelligence Artificial neural networks Audio data Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Deep learning Feature extraction Genetic algorithms Image Processing and Computer Vision Machine learning Neural networks Probability and Statistics in Computer Science S.I.: Evolutionary Computation based Methods and Applications for Data Processing Special Issue on Evolutionary Computation based Methods and Applications for Data Processing Deep learning Audio scene Intelligent recognition Genetic evolutionary computing
Online Access	Get full text
ISSN	0941-0643 1433-3058
DOI	10.1007/s00521-023-08351-0

Cover

More Information
Summary:	Intelligent recognition of audio scene aims to analyze the environment information of audio signal with computer, which has important research significance. The audio scene recognition methods extract features from the input acoustic feature representation and use the acoustic features to classify the scene type. The common feature extraction method of audio signal is Mel frequency cepstrum coefficient. Although this method can capture the most recognizable part of audio data, it can only analyze the short-term characteristics of the signal. This is often not enough to completely describe the structural characteristics of the entire audio data. With the development of computer technology and high-performance processors, audio scene recognition via deep learning solves modeling high-dimensional and multi-classification complex relationships. In this work, we propose an audio scene recognition network that combines deep learning and genetic algorithm called IGA-HA-CNN-BiGRU. First, this work combines CNN and BiGRU networks to build a parallel depth feature extraction network. Parallel neural network has strong learning ability of spatial and temporal features and can effectively extract audio feature parameters. Second, this work combines time-domain attention with channel-domain attention to design a hybrid attention mechanism. This can process features to enhance the discriminability of audio features. Thirdly, in view of the defect of initialization of deep neural network, this work uses improved genetic algorithm to optimize it to improve the model performance. Finally, this work has carried out various experiments on the proposed method, and the experimental data can prove the reliability of the method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-023-08351-0