Distributional Similarity of Multi-Word Expressions

نویسندگان

  • Laura Ingram
  • James Curran
چکیده

Most existing systems for automatically extracting lexical-semantic resources neglect multi-word expressions (MWEs), even though approximately 30% of gold-standard thesauri entries are MWEs. We present a distributional similarity system that identifies synonyms for MWEs. We extend Grefenstette’s SEXTANT shallow parser to first identify bigram MWEs using collocation statistics from the Google WEB1T corpus. We extract contexts from WEB1T to increase coverage on the sparser bigrams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions

This paper presents the first attempt to use word embeddings to predict the compositionality of multiword expressions. We consider both singleand multi-prototype word embeddings. Experimental results show that, in combination with a back-off method based on string similarity, word embeddings outperform a method using count-based distributional similarity. Our best results are competitive with, ...

متن کامل

Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality

We predict the compositionality of multiword expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into m...

متن کامل

Towards Dynamic Word Sense Discrimination with Random Indexing

Most distributional models of word similarity represent a word type by a single vector of contextual features, even though, words commonly have more than one sense. The multiple senses can be captured by employing several vectors per word in a multi-prototype distributional model, prototypes that can be obtained by first constructing all the context vectors for the word and then clustering simi...

متن کامل

Distributional Memory: A General Framework for Corpus-Based Semantics

Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this “one task, one model” approach, the Distributional Memory framework extracts distributional information once and...

متن کامل

Linked Disambiguated Distributional Semantic Networks

We present a new hybrid lexical knowledge base that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. The computation of our countbased distributional model includes the induction of word senses for single-word and multi-word terms, the disambiguation of word similarity lists, taxonomic relations extracted b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007