LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition

نویسندگان

چکیده

Recently, Transformer-based models have shown promising results in automatic speech recognition (ASR), outperforming based on recurrent neural networks (RNNs) and convolutional (CNNs). However, directly applying a Transformer to the ASR task does not exploit correlation among frames effectively, leaving model trapped sub-optimal solution. To this end, we propose local attention for that combines high frames. Specifically, use relative positional embedding, rather than absolute improve generalization of sequences different lengths. Secondly, add parametric relations self-attentive module explicitly incorporate prior knowledge into make training process insensitive hyperparameters, thus improving performance. Experiments carried out LibriSpeech dataset show our proposed approach achieves word error rate 2.3/5.5% by language fusion without any external data reduces 17.8/9.8% compared baseline. The are also close to, or better than, other state-of-the-art end-to-end models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the who...

متن کامل

Tied Spatial Transformer Networks for Character Recognition

This paper reports a new approach applied to convolutional neural networks (CNNs), which uses spatial transformer networks (STNs). It consists in training an architecture which combines a localization CNN and a classification CNN, for which most of the weights are tied, which from here on we will name Tied Spatial Transformer Networks (TSTNs). The localization CNN is used for predicting the bes...

متن کامل

developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”

هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...

15 صفحه اول

Laboratory current transformer based on Rogowski coil

This paper cover the analysis and construction of current to voltage transducer based on Rogowski coil which satisfy the requirements of high-accuracy measurement of AC current (up to 20 A at power supply frequency, with the aiming uncertainty of 100 parts per million). Primary source of ac current uncertainty measured by this type of transducer is nonuniform density of turns which, in case of ...

متن کامل

An Object Oriented Model Transformer Framework based on Stereotypes

MDA modelers, like programmers in general, will develop and reuse libraries. Some of these libraries will hide details of the platforms, so the mapping from a PIM to a PSM will have to transform libraries as well. Some libraries provide common object services while others provide domain specific functionalities. These libraries will not just be class libraries, but also profiles containing ster...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information

سال: 2022

ISSN: ['2078-2489']

DOI: https://doi.org/10.3390/info13050250