A Re-thinking ASR Modeling Framework using Attention Mechanisms
Several reasons have led to the widespread adoption of neural-based algorithms for end-to-end automatic speech recognition (ASR), including their high performance, elegant model designs, and parallel computing capabilities. Numerous ASR models have been proposed to improve the recognition results, b...
Saved in:
| Published in | 2021 IEEE International Conference on Big Data (Big Data) pp. 4530 - 4536 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
15.12.2021
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/BigData52589.2021.9671417 |
Cover
| Summary: | Several reasons have led to the widespread adoption of neural-based algorithms for end-to-end automatic speech recognition (ASR), including their high performance, elegant model designs, and parallel computing capabilities. Numerous ASR models have been proposed to improve the recognition results, but the gains are still insufficient. This paper proposes a re-thinking ASR model, which aims to bridge the gap by rethinking the regularities of a given hypothesis and the relationship between text-level and acoustic-level characteristics of the input speech utterance. For the re-thinking ASR model, a mixed attention mechanism, a self-and-mixed attention mechanism, and a deep acoustic feature extractor are meticulously designed to enable the notion to be realized. A publicly available benchmark corpus is used to evaluate the proposed model. As the experimental results demonstrate, the proposed re-thinking ASR model can provide significant and consistent improvements over popular baseline systems. |
|---|---|
| DOI: | 10.1109/BigData52589.2021.9671417 |