نتایج جستجو برای: task based speech

تعداد نتایج: 3190162  

2002
Stephen J. Melnikoff Steven F. Quigley Martin J. Russell

Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. We present an FPGA implementations of the decoder based on continuous hidden Markov models (HMMs) representing monophones, and demonstrate that it can process speech 75 times real time, using 45% of the slices of a Xili...

2007
Gernot Kubin Harald Trost

This thesis investigates the application of language models based on semantic similarity to Automatic Speech Recognition for meetings. We consider data-driven Latent Semantic Analysis based and knowledge-driven WordNet-based models. Latent Semantic Analysis based models are trained for several background domains and it is shown that all background models reduce perplexity compared to the n-gram...

2017
Suman Samui Indrajit Chakrabarti Soumya K. Ghosh

This paper presents a single-channel speech separation method implemented with a deep recurrent neural network (DRNN) using recurrent temporal restricted Boltzmann machines (RTRBM). Although deep neural network (DNN) based speech separation (denoising task) methods perform quite well compared to the conventional statistical model based speech enhancement techniques, in DNN-based methods, the te...

2015
Zhuo Chen Shinji Watanabe Hakan Erdogan John R. Hershey

Long Short-Term Memory (LSTM) recurrent neural network has proven effective in modeling speech and has achieved outstanding performance in both speech enhancement (SE) and automatic speech recognition (ASR). To further improve the performance of noise-robust speech recognition, a combination of speech enhancement and recognition was shown to be promising in earlier work. This paper aims to expl...

2015
Michal Borsky Petr Mizera Petr Pollák

The performance of speech recognition systems can be significantly degraded if the speech spectrum is distorted. This includes situations such as the usage of an improper recording device, enhancement technique or speech coder. This paper presents a front-end compensation method called spectrally selective dithering aimed at reconstructing the spectral characteristics of nonlinearly distorted s...

2015
Andrew L. Maas Ziang Xie Daniel Jurafsky Andrew Y. Ng

We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding ...

2014
Hazrat Ali Nasir Ahmad Xianwei Zhou Khalid Iqbal Sahibzada Muhammad Ali

This paper presents the work on Automatic Speech Recognition of Urdu language, using a comparative analysis for Discrete Wavelets Transform (DWT) based features and Mel Frequency Cepstral Coefficients (MFCC). These features have been extracted for one hundred isolated words of Urdu, each word uttered by ten different speakers. The words have been selected from the most frequently used words of ...

Background: Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception a...

Journal: :IEEE/ACM transactions on audio, speech, and language processing 2022

The cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims to synthesize speech for a target speaker with the transferred from reference recorded by another (source) speaker. During process, identity information of source could also affect synthesized results, resulting issue leakage, i.e., synthetic may have voice rather than This paper proposes new method aim ...

Journal: :Cognition 2005
Juan M Toro Scott Sinnett Salvador Soto-Faraco

We addressed the hypothesis that word segmentation based on statistical regularities occurs without the need of attention. Participants were presented with a stream of artificial speech in which the only cue to extract the words was the presence of statistical regularities between syllables. Half of the participants were asked to passively listen to the speech stream, while the other half were ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید