A Framework for Unsupervised Dependency Parsing using a Soft-EM Algorithm and Bilexical Grammars
نویسندگان
چکیده
Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Dependency Model with Valence (DMV) [12] or Extended Valence Grammars (EVGs) [11], in both cases the dependencies between words are modeled by using a fixed structure of automata. We present a framework for unsupervised induction of dependency structures based on CYK parsing that uses a simple rewriting techniques of the training material. Our model is implemented by means of a k-best CYK parser, an inductor for Probabilistic Bilexical Grammars (PBGs) [8] and a simple technique that rewrites the treebank from k trees with their probabilities. An important contribution of our work is that the framework accepts any existing algorithm for automata induction making the automata structure fully modifiable. Our experiments showed that, it is the training size that influences parameterization in a predictable manner. Such flexibility produced good performance results in 8 different languages, in some cases comparable to the state-of-the-art ones.
منابع مشابه
Unsupervised Bayesian Parameter Estimation for Dependency Parsing
We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilitsic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal prior as a prior over the grammar parameters. We derive a variational EM algorithm for that model...
متن کاملBilexical Grammars and Their Cubic-time Parsing Algorithms
This chapter introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ‘bilexicalism’ has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modif...
متن کاملTransforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold
This paper shows how to use the UnfoldFold transformation to transform Projective Bilexical Dependency Grammars (PBDGs) into ambiguity-preserving weakly equivalent Context-Free Grammars (CFGs). These CFGs can be parsed in O(n) time using a CKY algorithm with appropriate indexing, rather than the O(n) time required by a naive encoding. Informally, using the CKY algorithm with such a CFG mimics t...
متن کاملEecient Parsing for Bilexical Context-free Grammars and Head Automaton Grammars
Word Count: 3199 (using detex 2.6) Under consideration for other conferences (specify)? no Abstract Several recent stochastic parsers use bilexical grammars, where each word type idiosyncratically prefers particular complements with particular head words. We present O(n 4) parsing algorithms for two bilexical formalisms, improving the previous upper bounds of O(n 5). Also, for a common special ...
متن کاملLogistic Normal Priors for Unsupervised Probabilistic Grammar Induction
We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Research in Computing Science
دوره 65 شماره
صفحات -
تاریخ انتشار 2013