task based speech

Implementing a Simple Continuous Speech Recognition System on an FPGA

2002

Stephen J. Melnikoff Steven F. Quigley Martin J. Russell

Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. We present an FPGA implementations of the decoder based on continuous hidden Markov models (HMMs) representing monophones, and demonstrate that it can process speech 75 times real time, using 45% of the slices of a Xili...

متن کامل

Semantic Similarity in Automatic Speech Recognition for Meetings

2007

Gernot Kubin Harald Trost

This thesis investigates the application of language models based on semantic similarity to Automatic Speech Recognition for meetings. We consider data-driven Latent Semantic Analysis based and knowledge-driven WordNet-based models. Latent Semantic Analysis based models are trained for several background domains and it is shown that all background models reduce perplexity compared to the n-gram...

متن کامل

Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines

2017

Suman Samui Indrajit Chakrabarti Soumya K. Ghosh

This paper presents a single-channel speech separation method implemented with a deep recurrent neural network (DRNN) using recurrent temporal restricted Boltzmann machines (RTRBM). Although deep neural network (DNN) based speech separation (denoising task) methods perform quite well compared to the conventional statistical model based speech enhancement techniques, in DNN-based methods, the te...

متن کامل

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks

2015

Zhuo Chen Shinji Watanabe Hakan Erdogan John R. Hershey

Long Short-Term Memory (LSTM) recurrent neural network has proven effective in modeling speech and has achieved outstanding performance in both speech enhancement (SE) and automatic speech recognition (ASR). To further improve the performance of noise-robust speech recognition, a combination of speech enhancement and recognition was shown to be promising in earlier work. This paper aims to expl...

متن کامل

Spectrally selective dithering for distorted speech recognition

2015

Michal Borsky Petr Mizera Petr Pollák

The performance of speech recognition systems can be significantly degraded if the speech spectrum is distorted. This includes situations such as the usage of an improper recording device, enhancement technique or speech coder. This paper presents a front-end compensation method called spectrally selective dithering aimed at reconstructing the spectral characteristics of nonlinearly distorted s...

متن کامل

Lexicon-Free Conversational Speech Recognition with Neural Networks

2015

Andrew L. Maas Ziang Xie Daniel Jurafsky Andrew Y. Ng

We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding ...

متن کامل

DWT features performance analysis for automatic speech recognition of Urdu

2014

Hazrat Ali Nasir Ahmad Xianwei Zhou Khalid Iqbal Sahibzada Muhammad Ali

This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of ...

متن کامل

Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder

Journal: Iranian Journal of Medical Sciences 2017

Abdollah Moossavi, Enayatollah Bakhshi, Maryam Banimostafa Maryam Delphi, Yones Lotfi,

Background: Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception a...

متن کامل

Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis

Journal: :IEEE/ACM transactions on audio, speech, and language processing 2022

The cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims to synthesize speech for a target speaker with the transferred from reference recorded by another (source) speaker. During process, identity information of source could also affect synthesized results, resulting issue leakage, i.e., synthetic may have voice rather than This paper proposes new method aim ...

متن کامل

Speech segmentation by statistical learning depends on attention.

Journal: :Cognition 2005

Juan M Toro Scott Sinnett Salvador Soto-Faraco

We addressed the hypothesis that word segmentation based on statistical regularities occurs without the need of attention. Participants were presented with a stream of artificial speech in which the only cue to extract the words was the presence of statistical regularities between syllables. Half of the participants were asked to passively listen to the speech stream, while the other half were ...

متن کامل