Universal Dependencies are Hard to Parse - or are They?

نویسندگان

  • Ines Rehbein
  • Julius Steen
  • Bich-Ngoc Do
  • Anette Frank
چکیده

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

What is hard in Universal Dependency Parsing?

Verifying (or at least attempting to falsify) claims about one language being more hard to parse than another, or about one parser being applicable to a maximally wide range of languages, used to quickly devolve into apples-and-oranges comparison since different languages, i.e., different treebanks also mean different annotation schemes and a different source of text. In this paper, we use two ...

متن کامل

Simulating Dependencies to Improve Parse Error Detection

We improve parse error detection, weighting dependency information on the basis of simulated parses. Such simulations extend the training grammar, and, although the simulations are not wholly correct or incorrect—as observed from the results with different weightings for small treebanks—they help to determine whether a new parse fits the training grammar.

متن کامل

Parse Reranking Based on Higher-Order Lexical Dependencies

Existing work shows that lexical dependencies are helpful for constituent tree parsing. However, only first-order lexical dependencies have been employed and investigated in previous work. In this paper, we propose a method to employing higher-order lexical dependencies for constituent tree evaluation. Our method is based on a parse reranking framework, which provides a constrained search space...

متن کامل

First step immersion in interval linear programming with linear dependencies

‎We consider a linear programming problem in a general form and suppose that all coefficients may vary in some prescribed intervals‎. ‎Contrary to classical models‎, ‎where parameters can attain any value from the interval domains independently‎, ‎we study problems with linear dependencies between the parameters‎. ‎We present a class of problems that are easily solved by reduction to the classi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017