Posterior Decoding for Generative Constituent-Context Grammar Induction

نویسنده

Chuong Do

چکیده

In this project, we study the problem of natural language grammar induction from a database of sentence part-of-speech (POS) tags. We then present an implementation of the EM-based generative constituent-context model by Klein and Manning. We also present two posterior decoding approaches to be used in conjunction with the constituent-context model and evaluate their performance against regular Viterbi parsing on a subset of the sentences from the Penn Treebank.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language Grammar Induction Using a Constituent-Context Model

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This me...

متن کامل

A Generative Constituent-Context Model for Improved Grammar Induction

We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable ...

متن کامل

Mildly context sensitive grammar induction and variational bayesian inference

We define a generative model for a minimalist grammar formalism. We present a generalized algorithm for the application of variational bayesian inference to lexicalized mildly context sensitive grammars. We apply this algorithm to the minimalist grammar model.

متن کامل

Three Generative, Lexicalised Mode l s for Statistical Parsing

In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).

متن کامل