Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing

نویسندگان

  • Slav Petrov
  • Dan Klein
چکیده

We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees, learned by iteratively splitting grammar productions (not categories). Different regions of the grammar are refined to different degrees, yielding grammars which are three orders of magnitude smaller than the single-scale baseline and 20 times smaller than the split-and-merge grammars of Petrov et al. (2006). In addition, our discriminative approach integrally admits features beyond local tree configurations. We present a multiscale training method along with an efficient CKY-style dynamic program. On a variety of domains and languages, this method produces the best published parsing accuracies with the smallest reported grammars.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative and Discriminative Latent Variable Grammars

Latent variable grammars take an observed (coarse) treebank and induce more fine-grained grammar categories, that are better suited for modeling the syntax of natural languages. Estimation can be done in a generative or a discriminative framework, and results in the best published parsing accuracies over a wide range of syntactically divergent languages and domains. In this paper we highlight t...

متن کامل

Discriminative Log-Linear Grammars with Latent Variables

. Grammars with Latent Variables Given a treebank over a set of categories learn an optimally refined grammar for parsing. The parameters can be learned with an Expectation Maximization algorithm. The E-Step involves computing expectations over derivations corresponding to the observed trees. These expectations are normalized in the M-Step to update the rewrite probabilities: Grammar Learning T...

متن کامل

Self-Training with Products of Latent Variable Grammars

We study self-training with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for self-training gives higher accuracy self-trained grammars. Our generative self-trained grammars reach F scores of 91.6 on the WSJ test set and surpass even discriminative reranking systems without selftraining. Additionally, we show that m...

متن کامل

Parsing German with Latent Variable Grammars

We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any la...

متن کامل

Latent-Variable PCFGs: Background and Applications

Latent-variable probabilistic context-free grammars are latent-variable models that are based on context-free grammars. Nonterminals are associated with latent states that provide contextual information during the top-down rewriting process of the grammar. We survey a few of the techniques used to estimate such grammars and to parse text with them. We also give an overview of what the latent st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008