Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling
نویسندگان
چکیده
In this paper, large vocabulary children’s speech recognition is investigated by using the Deep Neural Network Hidden Markov Model (DNN-HMM) hybrid and the Subspace Gaussian Mixture Model (SGMM) acoustic modeling approach. In the investigated scenario training data is limited to about 7 hours of speech from children in the age range 7-13 and testing data consists in read clean speech from children in the same age range. To tackle inter-speaker acoustic variability, speaker adaptive training, based on feature space maximum likelihood linear regression, as well as vocal tract length normalization are adopted. Experimental results show that with both DNNHMM and SGMM systems very good recognition results can be achieved although best results are obtained with the DNNHMM system.
منابع مشابه
Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios
This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...
متن کاملA study on LVCSR and keyword search for tagalog
We describe a state-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) system trained on roughly 70 hours of conversational telephone speech. Using the Kaldi speech recognition toolkit, we investigate several aspects: for the acoustic front-end, we analyze the use of mel-frequency cepstral coefficients (MFCC), pitch and probability-of-voicing (PoV), and d...
متن کاملA scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR
We present a new scalable approach to using deep neural network (DNN) derived features in Gaussian mixture density hidden Markov model (GMM-HMM) based acoustic modeling for large vocabulary continuous speech recognition (LVCSR). The DNN-based feature extractor is trained from a subset of training data to mitigate the scalability issue of DNN training, while GMM-HMMs are trained by using state-o...
متن کاملComparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition
In this paper we look at real-time computing issues in large vocabulary speech recognition. We use the French broadcast audio transcription task from ETAPE 2011 for this evaluation. We compare word error rate (WER) versus overall computing time for hidden Markov models with Gaussian mixtures (GMM-HMM) and deep neural networks (DNN-HMM). We show that for a similar computing during recognition, t...
متن کاملBuilding DNN acoustic models for large vocabulary speech recognition
Understanding architectural choices for deep neural networks (DNNs) is crucial to improving state-of-the-art speech recognition systems. We investigate which aspects of DNN acoustic model design are most important for speech recognition system performance, focusing on feed-forward networks. We study the effects of parameters like model size (number of layers, total parameters), architecture (co...
متن کامل