Combining Association Measures for Collocation Extraction
نویسندگان
چکیده
We introduce the possibility of combining lexical association measures and present empirical results of several methods employed in automatic collocation extraction. First, we present a comprehensive summary overview of association measures and their performance on manually annotated data evaluated by precision-recall graphs and mean average precision. Second, we describe several classification methods for combining association measures, followed by their evaluation and comparison with individual measures. Finally, we propose a feature selection algorithm significantly reducing the number of combined measures with only a small performance degradation.
منابع مشابه
Evolving New Lexical Association Measures Using Genetic Programming
Automatic extraction of collocations from large corpora has been the focus of many research efforts. Most approaches concentrate on improving and combining known lexical association measures. In this paper, we describe a genetic programming approach for evolving new association measures, which is not limited to any specific language, corpus, or type of collocation. Our preliminary experimental ...
متن کاملCollocation Extraction beyond the Independence Assumption
In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three colloc...
متن کاملNormalized (Pointwise) Mutual Information in Collocation Extraction
In this paper, we discuss the related information theoretical association measures of mutual information and pointwise mutual information, in the context of collocation extraction. We introduce normalized variants of these measures in order to make them more easily interpretable and at the same time less sensitive to occurrence frequency. We also provide a small empirical study to give more ins...
متن کاملMulti-label Classification of Semantic Relations in German Nominal Compounds using SVMs
The current study compares lexical association measures for automatic extraction of Estonian particle verbs from the text corpus. The central focus lies on the impact of the corpus size on the performance of the compared symmetrical association measures. Additionally a piece of empirical evidence of the advantage of asymmetric association measure ΔP for the task of collocation extra...
متن کامل