Connected Text Recognition Using Layered HMMs and Token Passing
نویسنده
چکیده
We present a novel approach to lexical error recovery on textual input. An advanced robust tokenizer has been implemented that can not only correct spelling mistakes, but also recover from segmentation errors. Apart from the orthographic considerations taken, the tokenizer also makes use of linguistic expectations extracted from a training corpus. The idea is to arrange Hidden Markov Models (HMM) in multiple layers where the HMMs in each layer are responsible for different aspects of the processing of the input. We report on experimental evaluations with alternative probabilistic language models to guide the lexical error recovery process.
منابع مشابه
Data driven subword unit modeling for speech recognition and its application to interactive reading tutors
This paper proposes a novel token-passing search architecture for supporting subword unit based speech recognition and a corresponding algorithm based on the well-known LZW text compression method to determine a vocabulary of subword units in an unsupervised manner. We compare our subword unit selection algorithm to an existing approach based on Minimum Description Length (MDL) modeling and als...
متن کاملA Robust Text Processing Technique Applied to Lexical Error Recovery
This thesis addresses automatic lexical error recovery and tokenization of corrupt text input. We propose a technique that can automatically correct mis-spellings, segmentation errors and real-word errors in a unified framework that uses both a model of language production and a model of the typing behavior, and which makes tokenization part of the recovery process. The typing process is modele...
متن کاملAn Improved Token-Based and Starvation Free Distributed Mutual Exclusion Algorithm
Distributed mutual exclusion is a fundamental problem of distributed systems that coordinates the access to critical shared resources. It concerns with how the various distributed processes access to the shared resources in a mutually exclusive manner. This paper presents fully distributed improved token based mutual exclusion algorithm for distributed system. In this algorithm, a process which...
متن کاملTriphone Based Continuous Speech Recognition System for Turkish Language Using Hidden Markov Model
This paper introduces a system which is designed to perform a relatively accurate transcription of speech and in particular, continuous speech recognition based on triphone model for Turkish language. Turkish is generally different from Indo-European languages (English, Spanish, French, German etc.) by its agglutinative and suffixing morphology. Therefore vocabulary growth rate is very high and...
متن کاملText-constrained speaker recognition on a text-independent task
We present an approach to speaker recognition in the textindependent domain of conversational telephone speech using a text-constrained system designed to employ select highfrequency keywords in the speech stream. The system uses speaker word models generated via Hidden Markov Models (HMMs) — a departure from the traditional Gaussian Mixture Model (GMM) approach dominant in text-independent wor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cmp-lg/9607036 شماره
صفحات -
تاریخ انتشار 1996