Collocation Extraction beyond the Independence Assumption

نویسنده

  • Gerlof Bouma
چکیده

In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining Intercoder Agreement for a Collocation Identification Task

In this paper, we describe an alternative to the kappa statistic for measuring intercoder agreement. We present a model based on the assumption that the observed surface agreement can be divided into (unknown amounts of) true agreement and chance agreement. This model leads to confidence interval estimates for the proportion of true agreement, which turn out to be comparable to confidence inter...

متن کامل

Nonparametric Collocation ODE Parameter Estimation: Application in Biochemical Pathway Modelling

Parameter estimation of non-linear differential equations has long been an active and challenge research area. Conventionally methods are computationally intensive and often poorly conditioned. In the context of biochemical pathway modeling, a new method focused on this paper is the so-called “collocation” method, which is a nonparametric data smoothing based approach. The statistical property ...

متن کامل

The effects of the violation of local independence assumption on the person measures under the Rasch model

Local independence of test items is an assumption in all Item Response Theory (IRT) models. That is, the items in a test should not be related to each other. Sharing a common passage, which is prevalent in reading comprehension tests, cloze tests and C-Tests, can be a potential source of local item dependence (LID). It is argued in the literature that LID results in biased parameter estimation ...

متن کامل

Collocation Translation Acquisition Using Monolingual Corpora

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...

متن کامل

You Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction

In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010