phoneme classification

Modular neural networks for low-complex phoneme recognition

1998

Axel Glaeser

We present a Modular Neural Network (MNN) for phoneme recognition within the framework of a hybrid system (neural networks and HMMs) for speakerindependent single word recognition. With this approach, we are taking the computational effort into account which is used as an additional criterion for assessing the system performance. The main idea of the proposed MNN is the distribution of the comp...

متن کامل

Vocabulary Independent Oov D Vector Mach

2002

Tommi Lahti

In this paper, a novel Out-of-Vocabulary (OOV) word detection method relying on phoneme-level acoustic measures and Support Vector Machines (SVM) is proposed. Word level OOV scores are computed from the phoneme level in-vocabulary (IV) and OOV information provided by an HMM based speech recognizer. The OOV word decision is based on the confidence feature vector which is processed by a SVM class...

متن کامل

Improved Tonal Language Speech Recognition by Integrating Spectro-Temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units

2011

Shang-wen Li Yow-Bang Wang Liang-Che Sun Lin-Shan Lee

We propose an improved Tandem system for tonal language speech recognition. Three different types of features, cepstral, spectro-temporal and pitch features, are integrated for modeling tone and phoneme variation simultaneously. Tonal phonemes (or tonemes) are used for MLP posterior estimation, and tonal acoustic units for HMM recognition. In our experiments conducted on Mandarin broadcast news...

متن کامل

Applying Representative Uninorms for Phonetic Classifier Combination

2014

Gábor Gosztolya József Dombi

When combining classifiers, we aggregate the output of different machine learning methods, and base our decision on the aggregated probability values instead of the individual ones. In the phoneme classification task of speech recognition, small excerpts of speech need to be identified as one of the pre-defined phonemes; but the probability value assigned to each possible phoneme also hold valu...

متن کامل

Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion

2007

Sittichai Jiampojamarn Grzegorz Kondrak Tarek Sherif

Letter-to-phoneme conversion generally requires aligned training data of letters and phonemes. Typically, the alignments are limited to one-to-one alignments. We present a novel technique of training with many-to-many alignments. A letter chunking bigram prediction manages double letters and double phonemes automatically as opposed to preprocessing with fixed lists. We also apply an HMM method ...

متن کامل

Recognition of Greek Phonemes Using Support Vector Machines

2006

Iosif Mporas Todor Ganchev Panagiotis Zervas Nikos Fakotakis

In the present work we study the applicability of Support Vector Machines (SVMs) on the phoneme recognition task. Specifically, the Least Squares version of the algorithm (LS-SVM) is employed in recognition of the Greek phonemes in the framework of telephone-driven voice-enabled information service. The N-best candidate phonemes are identified and consequently feed to the speech and language re...

متن کامل

Improving the Robustness of Phoneme Classification Using Hybrid Features

2010

Jibran Yousafzai Zoran Cvetković Peter Sollich

This work is concerned with improving the robustness of phoneme classification to additive noise with hybrid features using support vector machines (SVMs). In particular, the cepstral features are combined with local energy features of acoustic waveform segments to form a hybrid representation. The local energy features are taken into account separately in the SVM kernel, and a simple subtracti...

متن کامل

Topic Identification for Speech Without ASR

2017

Chunxi Liu Jan Trmal Matthew Wiesner Craig Harman Sanjeev Khudanpur

Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervise...

متن کامل

Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors

2013

Keigo Kubo Sakriani Sakti Graham Neubig Tomoki Toda Satoshi Nakamura

The current state-of-the-art approach in grapheme-to-phoneme (g2p) conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regul...

متن کامل

A Pointwise Approach to Pronunciation Estimation for a TTS Front-End

2011

Shinsuke Mori Graham Neubig

In this paper, we propose a pointwise approach to the Japanese TTS front-end. In this approach, phoneme sequence estimation of sentences is decomposed into two tasks: word segmentation of the input sentence and phoneme estimation of each word. Then these two tasks are solved by pointwise classifiers without referring to the neighboring classification results. In contrast to existing sequence-ba...

متن کامل