A Re-thinking ASR Modeling Framework using Attention Mechanisms

Several reasons have led to the widespread adoption of neural-based algorithms for end-to-end automatic speech recognition (ASR), including their high performance, elegant model designs, and parallel computing capabilities. Numerous ASR models have been proposed to improve the recognition results, b...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Conference on Big Data (Big Data) pp. 4530 - 4536
Main Authors Yang, Chih-Ying, Chen, Kuan-Yu
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.12.2021
Subjects
Online AccessGet full text
DOI10.1109/BigData52589.2021.9671417

Cover

More Information
Summary:Several reasons have led to the widespread adoption of neural-based algorithms for end-to-end automatic speech recognition (ASR), including their high performance, elegant model designs, and parallel computing capabilities. Numerous ASR models have been proposed to improve the recognition results, but the gains are still insufficient. This paper proposes a re-thinking ASR model, which aims to bridge the gap by rethinking the regularities of a given hypothesis and the relationship between text-level and acoustic-level characteristics of the input speech utterance. For the re-thinking ASR model, a mixed attention mechanism, a self-and-mixed attention mechanism, and a deep acoustic feature extractor are meticulously designed to enable the notion to be realized. A publicly available benchmark corpus is used to evaluate the proposed model. As the experimental results demonstrate, the proposed re-thinking ASR model can provide significant and consistent improvements over popular baseline systems.
DOI:10.1109/BigData52589.2021.9671417