Inducing Constraint-based Grammars from a Small Semantic Treebank
نویسندگان
چکیده
We present a relational learning framework for grammar induction that is able to learn meaning as well as syntax. We introduce a type of constraint-based grammar, lexicalized well-founded grammar (lwfg), and we prove that it can always be learned from a small set of semantically annotated examples, given a set of assumptions. The semantic representation chosen allows us to learn the constraints together with the grammar rules, as well as an ontology-based semantic interpretation. We performed a set of experiments showing that several fragments of natural language can be covered by a lwfg, and that it is possible to choose the representative examples heuristically, based on linguistic knowledge.
منابع مشابه
Closing the Gap Between Stochastic and Rule-based LFG Grammars
Developing large-scale deep grammars in a constraint-based framework such as Lexical Functional Grammar (LFG) is time-consuming and requires significant linguistic insight. Recently, treebank-based constraint-grammar acquisition approaches have been developed as an alternative to hand-crafting such resources. While treebank-based approaches are wide coverage and robust and achieve competitive e...
متن کاملTransfer Learning for Constituency-Based Grammars
In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they ...
متن کاملRefining Grammars for Parsing with Hierarchical Semantic Knowledge
This paper proposes a novel method to refine the grammars in parsing by utilizing semantic knowledge from HowNet. Based on the hierarchical state-split approach, which can refine grammars automatically in a data-driven manner, this study introduces semantic knowledge into the splitting process at two steps. Firstly, each part-of-speech node will be annotated with a semantic tag of its terminal ...
متن کاملComparing and integrating Tree Adjoining Grammars
Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrajted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary tures and semantic information which are rarely represented in the corpora. lt would be ideal if we could c...
متن کاملTreebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation
This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004