Manifold Learning for Semantic Visualization

نویسندگان

  • Tuan M. V. Le
  • Hady W. Lauw
چکیده

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the size of the vocabulary. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions, using techniques such as multidimensional scaling. Recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. While aiming for a good fit between the model parameters and the observed data, previous approaches have not considered the local consistency among data instances in terms of the intrinsic geometric structure of the document manifold. We consider the problem of semantic visualization by jointly modeling topics and visualization on the intrinsic document manifold. Each document has both a topic distribution and visualization coordinate. Specifically, we propose an unsupervised probabilistic model, called Semafore, which aims to preserve the manifold in the lower-dimensional spaces through a regularization framework designed for the semantic visualization task. To validate the efficacy of Semafore, our comprehensive experiments on a number of real-life text datasets of news articles and web pages show that the proposed methods outperform the the state-of-the-art baselines on objective evaluation metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Mining Technologies for Multimedia Databases

“This publication details how current semantic mining tasks play an important role in may fields including random sampling techniques and support vector machine for human computer interaction, manifold learning and subspace methods for data visualization, discriminant analysis for feature selection, and classification trees for data indexing.” Dacheng Tao, Hong Kong Polytechnic University, Hong...

متن کامل

Robust cartogram visualization of outliers in manifold learning

Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation m...

متن کامل

Manifold Learning for Jointly Modeling Topic and Visualization

Classical approaches to visualization directly reduce a document’s high-dimensional representation into visualizable two or three dimensions, using techniques such as multidimensional scaling. More recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. We call the latter semantic visu...

متن کامل

Semantic Visualization with Neighborhood Graph Regularization

Visualization of high-dimensional data, such as text documents, is useful to map out the similarities among various data points. In the high-dimensional space, documents are commonly represented as bags of words, with dimensionality equal to the vocabulary size. Classical approaches to document visualization directly reduce this into visualizable two or three dimensions. Recent approaches consi...

متن کامل

Nonlinear Dimensionality Reduction

The visual interpretation of data is an essential step to guide any further processing or decision making. Dimensionality reduction (or manifold learning) tools may be used for visualization if the resulting dimension is constrained to be 2 or 3. The field of machine learning has developed numerous nonlinear dimensionality reduction tools in the last decades. However, the diversity of methods r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015