Emergence of Linguistic Representations by Independent Component Analysis
نویسندگان
چکیده
Our aim is to find syntactic and semantic relationships and roles of words based on the analysis of corpora. We study three methods for analyzing words in contexts as potential methods for solving this task. The methods are latent semantic analysis, self-organizing map and independent component analysis. Latent semantic analysis is a simple method for automatic generation of concepts that are useful, e.g., in encoding documents for information retrieval purposes. However, these concepts cannot easily be interpreted by humans. Self-organizing maps can be used to generate an explicit diagram which characterizes the relationships between
منابع مشابه
Latent Linguistic Codes for Morphemes Using Independent Component Analysis
We study properties of morphemes by analyzing their use in a large Finnish text corpus using Independent Component Analysis (ICA). As a result, we obtain emergent linguistic representations for the morphemes. On a coarse level, main syntactic categories are observed. On a more detailed level, the components depict potential thematic roles of the morphemes. An interesting question is whether the...
متن کاملWordICA - emergence of linguistic representations for words by independent component analysis
We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found...
متن کاملItalian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)
The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...
متن کاملUnsupervised Decomposition of Morphology a Distributed Representation of the Italian Verb System
The paper presents a morphological learning process simulated by an ICA (Independent Component Analysis) algorithm. In an unsupervised manner, the algorithm is able to discover emergent morphologically-motivated features from a representative corpus of Italian verbs. The discovered features can be assumed as non-discrete and distributed representations of the morphological data. Final results a...
متن کاملEmergence of Linguistic Features: Independent Component Analysis of Contexts
We show that independent component analysis (ICA) (Hyvärinen et al. 2001) applied on word context data gives distinct features that reflect syntactic and semantic categories. The analysis gives features or categories that are both explicit and can easily be interpreted by humans. This result can be obtained without any human supervision or tagged corpora that would have some predetermined morph...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003