Linear Algebraic Structure of Word Senses, with Applications to Polysemy

نویسندگان

  • Sanjeev Arora
  • Yuanzhi Li
  • Yingyu Liang
  • Tengyu Ma
  • Andrej Risteski
چکیده

Word embeddings are ubiquitous in NLP and information retrieval, but it’s unclear what they represent when the word is polysemous, i.e., has multiple senses. Here it is shown that multiple word senses reside in linear superposition within the word embedding and can be recovered by simple sparse coding. The success of the method —which applies to several embedding methods including word2vec— is mathematically explained using the random walk on discourses model (Arora et al., 2015). A novel aspect of our technique is that each word sense is also accompanied by one of about 2000 “discourse atoms” that give a succinct description of which other words co-occur with that word sense. Discourse atoms seem of independent interest, and make the method potentially more useful than the traditional clustering-based approaches to polysemy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rules, Radical Pragmatics and Restrictions on Regular Polysemy

Although regular polysemy [e.g. producer for product (John read Dickens) or container for contents (John drank the bottle)] has been extensively studied, there has been little work on why certain polysemy patterns are more acceptable than others. We take an empirical approach to the question, in particular evaluating an account based on rules against a gradient account of polysemy that is based...

متن کامل

Regular polysemy: from sense vectors to sense patterns

Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been very little studied in distributional semantics. We propose a model for regular polysemy detection that is based on sense vectors and allows to work directly with senses in semantic vector space. Our method is able to detect polysemous words that have the same regular sense alternation as in a given...

متن کامل

First Language Activation during Second Language Lexical Processing in a Sentential Context

 Lexicalization-patterns, the way words are mapped onto concepts, differ from one language      to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...

متن کامل

Semi-automatic Induction of Systematic Polysemy from WordNet

This paper describes a semi-automatic method of inducing underspecified semantic classes from WordNet verbs and nouns. An underspecified semantic class is an abstract semantic class which encodes systematic polysem~f, a set of word senses that are related in systematic and predictable ways. We show the usefulness of the induced classes in the semantic interpretations and contextual inferences o...

متن کامل

Taxonomy Learning Using Word Sense Induction

Taxonomies are an important resource for a variety of Natural Language Processing (NLP) applications. Despite this, the current stateof-the-art methods in taxonomy learning have disregarded word polysemy, in effect, developing taxonomies that conflate word senses. In this paper, we present an unsupervised method that builds a taxonomy of senses learned automatically from an unlabelled corpus. O...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1601.03764  شماره 

صفحات  -

تاریخ انتشار 2016