Collocation Extraction beyond the Independence Assumption
نویسنده
چکیده
In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.
منابع مشابه
Determining Intercoder Agreement for a Collocation Identification Task
In this paper, we describe an alternative to the kappa statistic for measuring intercoder agreement. We present a model based on the assumption that the observed surface agreement can be divided into (unknown amounts of) true agreement and chance agreement. This model leads to confidence interval estimates for the proportion of true agreement, which turn out to be comparable to confidence inter...
متن کاملNonparametric Collocation ODE Parameter Estimation: Application in Biochemical Pathway Modelling
Parameter estimation of non-linear differential equations has long been an active and challenge research area. Conventionally methods are computationally intensive and often poorly conditioned. In the context of biochemical pathway modeling, a new method focused on this paper is the so-called “collocation” method, which is a nonparametric data smoothing based approach. The statistical property ...
متن کاملThe effects of the violation of local independence assumption on the person measures under the Rasch model
Local independence of test items is an assumption in all Item Response Theory (IRT) models. That is, the items in a test should not be related to each other. Sharing a common passage, which is prevalent in reading comprehension tests, cloze tests and C-Tests, can be a potential source of local item dependence (LID). It is argued in the literature that LID results in biased parameter estimation ...
متن کاملCollocation Translation Acquisition Using Monolingual Corpora
Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...
متن کاملYou Can't Beat Frequency (Unless You Use Linguistic Knowledge) - A Qualitative Evaluation of Association Measures for Collocation and Term Extraction
In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of fo...
متن کامل