Reference Data for Czech Collocation Extraction

نویسنده

  • Pavel Pecina
چکیده

We introduce three reference data sets provided for the MWE 2008 evaluation campaign focused on ranking MWE candidates. The data sets comprise bigrams extracted from the Prague Dependency Treebank and the Czech National Corpus. The extracted bigrams are annotated as collocational and non-collocational and provided with corpus frequency information.

برای دسترسی به متن کامل این مقاله و 23 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-automatic Building of Swedish Collocation Lexicon

This work focuses on semi-automatic extraction of verb-noun collocations from a corpus, performed to provide lexical evidence for the manual lexicographical processing of Support Verb Constructions (SVCs) in the Swedish-Czech Combinatorial Valency Lexicon of Predicate Nouns. Efficiency of pure manual extraction procedure is significantly improved by utilization of automatic statistical methods ...

متن کامل

A Comparative Evaluation of Collocation Extraction Techniques

Abstract This paper describes an experiment that attempts to compare a range of existing collocation extraction techniques as well as the implementation of a new technique based on tests for lexical substitutability. After a description of the experiment details, the techniques are discussed with particular emphasis on any adaptations that are required in order to evaluate it in the way propose...

متن کامل

Experiments on Candidate Data for Collocation Extraction

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).

متن کامل

Multilingual Collocation Extraction: Issues And Solutions

Although traditionally seen as a languageindependent task, collocation extraction relies nowadays more and more on the linguistic preprocessing of texts (e.g., lemmatization, POS tagging, chunking or parsing) prior to the application of statistical measures. This paper provides a language-oriented review of the existing extraction work. It points out several language-specific issues related to ...

متن کامل

A Tool for Multi-Word CoUocation Extraction and Visualization in MultUingual Corpora

This document describes an implemented system of collocation extraction which is designed as aid to translation and which will be used in a real translation environment. Its main functionalities are: retrieving multi-word collocations from an existing corpus of documents in a given language (only French and English are supported for the time being); visualizing the list of extracted terms and t...

متن کامل

افزودن به منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دسترسی به متن کامل این مقاله و 23 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008