Structured Language Modeling Forspeech
نویسندگان
چکیده
We present revised Wall Street Journal (WSJ) lattice rescoring experiments using the structured language model (SLM). 1 Experiments We repeated the WSJ lattice rescoring experiments reported in 1] in a standard setup. We chose to work on the DARPA'93 evaluation HUB1 test set | 213 utterances, 3446 words. The 20kwds open vocabulary and baseline 3-gram model are the standard ones provided by NIST. As a rst step we evaluated the perplexity performance of the SLM relative to that of a deleted interpolation 3-gram model trained under the same conditions: training data size 20Mwds (a subset of the training data used for the baseline 3-gram model), standard HUB1 open vocabulary of size 20kwds; both the training data and the vocabulary were re-tokenized such that they conform to the Upenn Treebank tokenization. We have linearly interpolated the SLM with the above 3-gram model: P () = P 3gram () + (1 ?) P SLM () showing a 10% relative reduction over the perplexity of the 3-gram model. The results are presented in Table 1. The SLM parameter reestimation procedure 1 reduces the PPL by 5% (2% after interpolation with the 3-gram model). The main reduction in PPL comes however from the interpolation with the 3-gram model showing that although overlapping, the two models successfully complement each other. The interpolation weight was determined on a held-out set to be = 0:4. Both language models operate in the UPenn Treebank text tokenization. A second batch of experiments evaluated the performance of the SLM for 3-gram 2 lattice decoding. The lattices were generated using the standard baseline 3-gram language model y 1 Due to the fact that the parameter reestimation procedure for the SLM is computationally expensive we ran only a single iteration 2 In the previous experiments reported on WSJ we have accidentally used bigram lattices
منابع مشابه
Advancing the Systems Analysis and Design Curriculum
Computer Information System and related programs are expected to produce students who possess a broad and contemporary understanding of analysis and design for information systems. An empirical analysis of the state of practice in systems analysis and design education revealed an emphasis on structured design in the majority of schools. The opportunity to transition to object-oriented analysis ...
متن کاملStructured queries, language modeling, and relevance modeling in cross-language information retrieval
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...
متن کاملStructured Modeling Language for Automated Modeling in Causal Networks
The paper presents a structured modeling lan guage (SML) and a relational database framework for specification and automated genera tion of causal models. The framework describes a relational database scheme for encoding a li brary of causal network templates modeling the basic components in a modeling domain. SML provides a formal language for specifying mod els as structured components th...
متن کاملExtending SOFL to Support Both Top - Down and Bottom - Up Approaches ∗
This paper presents an integrated approach to support both top-down and bottom-up design of software systems by combining UML (Unified Modeling Language) and the Formal Engineering Method SOFL (Structured Object-oriented Formal Language). We demonstrate by examples that the topdown principle used in conventional Structured Design can be effectively utilized to carry out ObjectOriented design th...
متن کاملSemantic structured language models
In this study, we propose two novel semantic language modeling techniques for spoken dialog systems. These methods are called semantic concept based language modeling and semantic structured language modeling. In the concept based language modeling, we propose to use long span semantic units to model meaning sequences in spoken utterances. In the latter technique, we use statistical semantic pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999