Smoothing a Probabilistic Lexicon via Syntactic Transformations

نویسنده

  • Jason Michael Eisner
چکیده

SMOOTHING A PROBABILISTIC LEXICON VIA SYNTACTIC TRANSFORMATIONS Jason Michael Eisner Supervisor: Professor Mitch Marcus Probabilistic parsing requires a lexicon that specifies each word’s syntactic preferences in terms of probabilities. To estimate these probabilities for words that were poorly observed during training, this thesis assumes the existence of arbitrarily powerful transformations (also known to linguists as lexical redundancy rules or metarules) that can add, delete, retype or reorder the argument and adjunct positions specified by a lexical entry. In a given language, some transformations apply frequently and others rarely. We describe how to estimate the rates of the transformations from a sample of lexical entries. More deeply, we learn which properties of a transformation increase or decrease its rate in the language. As a result, we can smooth the probabilities of lexical entries. Given enough direct evidence about a lexical entry’s probability, our Bayesian approach trusts the evidence; but when less evidence or no evidence is available, it relies more on the transformations’ rates to guess how often the entry will be derived from related entries. Abstractly, the proposed “transformation models” are probability distributions that arise from graph random walks with a log-linear parameterization. A domain expert constructs the parameterized graph, and a vertex is likely according to whether random walks tend to halt at it. Transformation models are suited to any domain where “related” events (as defined by the graph) may have positively covarying probabilities. Such models admit

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering syntactic deep structure via Bayesian statistics

In the Bayesian framework, a language learner should seek a grammar that explains observed data well and is also a priori probable. This paper proposes such a measure of prior probability. Indeed it develops a full statistical framework for lexicalized syntax. The learner’s job is to discover the system of probabilistic transformations (often called lexical redundancy rules) that underlies the ...

متن کامل

Integration of Data from a Syntactic Lexicon into Generative and Discriminative Probabilistic Parsers

This article evaluates the integration of data extracted from a syntactic lexicon, namely the Lexicon-Grammar, into several probabilistic parsers for French. We show that by modifying the Part-ofSpeech tags of verbs and verbal nouns of a treebank, we obtain accurate performances with a parser based on Probabilistic Context-Free Grammars (Petrov et al., 2006) and a discriminative parser based on...

متن کامل

French parsing enhanced with a word clustering method based on a syntactic lexicon

This article evaluates the integration of data extracted from a French syntactic lexicon, the Lexicon-Grammar (Gross, 1994), into a probabilistic parser. We show that by applying clustering methods on verbs of the French Treebank (Abeillé et al., 2003), we obtain accurate performances on French with a parser based on a Probabilistic Context-Free Grammar (Petrov et al., 2006).

متن کامل

Assigning XTAG Trees to VerbNet

This paper presents the mappings between the syntactic information of our broad-coverage domain-independent verb lexicon, VerbNet, to Xtag trees. This mapping between complementary resources allowed us to increase the syntactic coverage of our verb lexicon by capturing transformations of the basic syntactic description of the verbs present in VerbNet. In addition, having these two resources map...

متن کامل

Towards Accurate Probabilistic Lexicons for Lexicalized Grammars

This paper proposes a method of constructing an accurate probabilistic subcategorization (SCF) lexicon for a lexicalized grammar extracted from a treebank. We employ a latent variable model to smooth co-occurrence probabilities between verbs and SCF types in the extracted lexicalized grammar. We applied our method to a verb SCF lexicon of an HPSG grammar acquired from the Penn Treebank. Experim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001