Performance-oriented dependency parsing

نویسنده

  • Alexander Volokh
چکیده

Dependency parsing has become very popular among researchers from all NLP areas, because dependency representations contain very valuable easy-to-use information. In the last decade a lot of dependency parsers have been developed, each of them somehow special with its own unique characteristics. In the course of this thesis I have developed yet another parser MDParser. In this work I discuss the main properties of dependency parsers and motivate MDParser’s development. I present the state of the art in the field of dependency parsing and discuss the shortcomings of the current developments. To my mind the main problem of the current parsers is that the task of dependency parsing is treated independently of what happens before and after it. Therefore the preprocessing steps and the embedding in applications are neglected. However, in practice parsing is rarely done for the sake of parsing itself, but rather in order to use the results in a follow-up application. Additionally, current parsers are accuracy-oriented and focus only on the quality of the results, neglecting other important properties, especially efficiency. The design of MDParser tries to counter all these drawbacks. The evaluation of some NLP technologies is sometimes as difficult as the task itself. For dependency parsing it was long thought not to be the case, however, some recent works show that the current evaluation possibilities are limited. In this thesis I broadly present and discuss both intrinsic and extrinsic evaluation methodologies for dependency parsing. Both approaches have numerous disadvantages which I demonstrate in my work. The attachment scores, which are the most used metric of the intrinsic evaluation, do not differentiate between different dependency types, are being computed for the same portions of treebanks since many years, and thus often promote overfitting to this particular kind of data, which is especially dangerous because the data contains a certain amount of inconsistencies. The extrinsic evaluation, which eval-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Restricted Parallelism in Object-Oriented Lexical Parsing

We present an approach to parallel natural language parsing which is based on a concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing algorithm for a dependency grammar is specified and several restrictions on the degree of its parallelization are discussed.

متن کامل

Numbat: Abolishing Privileges When Licensing New Constituents In Constraint-Oriented Parsing

The constraint-oriented approaches to language processing step back from the generative theory and make it possible, in theory, to deal with all types of linguistic relationships (e.g. dependency, linear precedence or immediate dominance) with the same importance when parsing an input utterance. Yet in practice, all implemented constraint-oriented parsing strategies still need to discriminate b...

متن کامل

Coarse-grained Parallelism in Natural Language Understanding: Parsing as Message Passing

A framework for concurrent, object-oriented natural language parsing is introduced. The underlying grammar model is fully lexicalized, headdriven, dependency-oriented, and structured along multiple inheritance hierarchies. The computation model relies upon the actor paradigm, with concurrency entering through asynchronous message passing. Protocols for establishing basic dependency relations an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013