Feature Engineering in Persian Dependency Parser

نویسندگان

  • S. Lazemi Department of Computer Eng., University of Kashan, Kashan, Iran.
چکیده مقاله:

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser for Persian. The defined feature space in each parser is one of the important factors in its success. Our goal is to generate and extract appropriate features to dependency parsing of Persian sentences. To achieve this goal, new semantic and syntactic features have been defined and added to the MSTParser by stacking method. Semantic features are obtained by using word clustering algorithms based on syntagmatic analysis and syntactic features are obtained by using the Persian phrase-structure parser and have been used as bit-string. Experiments have been done on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The results indicate that the definition of new features improves the performance of the dependency parser for the Persian. The achieved unlabeled attachment score for PerDT and UPDT are 89.17% and 88.96% respectively.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Engineering in Maximum Spanning Tree Dependency Parser

In this paper we present the results of our experiments with modifications of the feature set used in the Czech mutation of the Maximum Spanning Tree parser. First we show how new feature templates improve the parsing accuracy and second we decrease the dimensionality of the feature space to make the parsing process more effective without sacrificing accuracy.

متن کامل

ParsPer: A Dependency Parser for Persian

We present a dependency parser for Persian, called ParsPer, developed using the graph-based parser in the Mate Tools. The parser is trained on the entire Uppsala Persian Dependency Treebank with a specific configuration that was selected by MaltParser as the best performing parsing representation. The treebank’s syntactic annotation scheme is based on Stanford Typed Dependencies with extensions...

متن کامل

Memory-Based Re-Engineering of a Knowledge-Based Dependency Parser

The emulation of a knowledge-based dependency parser for Dutch by a fast approximation of a memory-based learning algorithm is described. During the development of the original parser, hand-parsed test sentences were collected to offer stochastic guidance in the the parsing process. Training a memory-based parser directly on these collections yields a reasonable but not very accurate emulation....

متن کامل

Dependency Parsers for Persian

We present two dependency parsers for Persian, MaltParser and MSTParser, trained on the Uppsala PErsian Dependency Treebank. The treebank consists of 1,000 sentences today. Its annotation scheme is based on Stanford Typed Dependencies (STD) extended for Persian with regard to object marking and light verb contructions. The parsers and the treebank are developed simultanously in a bootstrapping ...

متن کامل

Dependency parser demo

1 Introduction We are concerned with surface-syntactic parsing of running text. Our main goal is to describe a syntactic analysis of sentences using dependency links that show the head-dependent relations between words. The new dependency parser 1 (Tapanainen and J~ir-vinen, 1997; J~rvinen and Tapanainen, 1997) belongs to a continuous effort to apply rule-based methods to natural languages. It ...

متن کامل

A Dependency Parser for Tweets

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled a...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 7  شماره 3

صفحات  467- 474

تاریخ انتشار 2019-07-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023