Emergence of Linguistic Representations by Independent Component Analysis

نویسندگان

  • Timo Honkela
  • Aapo Hyvärinen
  • Jaakko J. Väyrynen
چکیده

Our aim is to find syntactic and semantic relationships and roles of words based on the analysis of corpora. We study three methods for analyzing words in contexts as potential methods for solving this task. The methods are latent semantic analysis, self-organizing map and independent component analysis. Latent semantic analysis is a simple method for automatic generation of concepts that are useful, e.g., in encoding documents for information retrieval purposes. However, these concepts cannot easily be interpreted by humans. Self-organizing maps can be used to generate an explicit diagram which characterizes the relationships between

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Linguistic Codes for Morphemes Using Independent Component Analysis

We study properties of morphemes by analyzing their use in a large Finnish text corpus using Independent Component Analysis (ICA). As a result, we obtain emergent linguistic representations for the morphemes. On a coarse level, main syntactic categories are observed. On a more detailed level, the components depict potential thematic roles of the morphemes. An interesting question is whether the...

متن کامل

WordICA - emergence of linguistic representations for words by independent component analysis

We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found...

متن کامل

Italian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)

The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...

متن کامل

Unsupervised Decomposition of Morphology a Distributed Representation of the Italian Verb System

The paper presents a morphological learning process simulated by an ICA (Independent Component Analysis) algorithm. In an unsupervised manner, the algorithm is able to discover emergent morphologically-motivated features from a representative corpus of Italian verbs. The discovered features can be assumed as non-discrete and distributed representations of the morphological data. Final results a...

متن کامل

Emergence of Linguistic Features: Independent Component Analysis of Contexts

We show that independent component analysis (ICA) (Hyvärinen et al. 2001) applied on word context data gives distinct features that reflect syntactic and semantic categories. The analysis gives features or categories that are both explicit and can easily be interpreted by humans. This result can be obtained without any human supervision or tagged corpora that would have some predetermined morph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003