Distributional Similarity of Multi-Word Expressions
نویسندگان
چکیده
Most existing systems for automatically extracting lexical-semantic resources neglect multi-word expressions (MWEs), even though approximately 30% of gold-standard thesauri entries are MWEs. We present a distributional similarity system that identifies synonyms for MWEs. We extend Grefenstette’s SEXTANT shallow parser to first identify bigram MWEs using collocation statistics from the Google WEB1T corpus. We extract contexts from WEB1T to increase coverage on the sparser bigrams.
منابع مشابه
A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions
This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both singleand multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, ...
متن کاملUsing Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality
We predict the compositionality of multiword expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into m...
متن کاملTowards Dynamic Word Sense Discrimination with Random Indexing
Most distributional models of word similarity represent a word type by a single vector of contextual features, even though, words commonly have more than one sense. The multiple senses can be captured by employing several vectors per word in a multi-prototype distributional model, prototypes that can be obtained by first constructing all the context vectors for the word and then clustering simi...
متن کاملDistributional Memory: A General Framework for Corpus-Based Semantics
Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this “one task, one model” approach, the Distributional Memory framework extracts distributional information once and...
متن کاملLinked Disambiguated Distributional Semantic Networks
We present a new hybrid lexical knowledge base that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. The computation of our countbased distributional model includes the induction of word senses for single-word and multi-word terms, the disambiguation of word similarity lists, taxonomic relations extracted b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007