Experiments with Using Semantical Categories in Parsing Systems

نویسندگان

  • W. R. Hogenhout
  • Yuji Matsumoto
چکیده

One much used method in syntactical analysis is statistical training of a handwritten grammar. It is a well-known problem of statistical selection that it ignores lexical and semantical preferences in disambigua-tion. We have developed a technique for using semantical preferences in statistical training and present experimental results. We compare the results obtained by using dierent sources of semantical information. 1 Background Recent developments in the eld of broad coverage parsing show two key developments. First, the traditional stochastical grammars where the probability of a production depends only on the left hand side non-terminal are now considered to be too simple, because they depend only on very local information. Second, instead of focussing on the algorithm, it is becoming clear that the selection of the right information used for learning is at least as important. This has been argued in general [5], in relation to context free grammars [2], and in relation to parsing methods that do not require a grammar [3, 10]. It is very easy to see the fundamental problem of stochastical grammars. In a simple sentence such as \He asked about climbing professionally." it is impossible to decide what was professional by just looking at the structure (compare \He asked about paying impatiently .") Any method that performs well needs to take words (or particular properties of words) into account. We have developed a method to train and use stochastical grammars with richer stochastical models. This allows the words in the sentence and their properties to in BLOCKINuence the parsing results. The method was already described in, for example, [7, 6]. In this paper we generalize on this method, and give experimental results. 2 The Algorithm The main equations used for the algorithm are given below. This method has been described in previous work [7, 6], but the algorithm we describe here is more general than what was described in previous work. It is based on the Inside Outside Algorithm [1, 9] with chart-parsing but uses extra information in the edges. We number the rules in the grammar and we write rule number x as R x. When the left hand side nonter-minal of rule x is p we write R p x. We also distinguish the edges by numbering them and write edge number k as e k. When edge e k was produced with rule x we write e x k. We assume every edge is …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Generalized Parsing and Term Rewriting: Semantics Driven Disambiguation

Generalized parsing technology provides the power and flexibility to attack realworld parsing applications. However, many programming languages have syntactical ambiguities that can only be solved using semantical analysis. In this paper we propose to apply the paradigm of term rewriting to filter ambiguities based on semantical information. We start with the definition of a representation of a...

متن کامل

Prosodical sentence structure inference for natural conversational speech understanding

In order to develop a system capable of understanding natural conversational speech, along with the current developments in technology for phonetic information processing, a technology must be developed that will utilize prosadie information of natural speech. We propose here an algoritha for generating a parsing tree that represents for the semantical relationships between phrases, based on an...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Parsing CCGbank with the Lambek Calculus

This paper will analyze CCGbank, a corpus of CCG derivations, for use with the Lambek calculus. We also present a Java implementation of the parsing algorithm for the Lambek calculus presented in Fowler (2009) and the results of experiments using that algorithm to parse the categories in CCGbank. We conclude that the Lambek calculus is computationally tractable for this task and provide insight...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007