Visualizing Data using t-SNE

نویسندگان

  • Laurens van der Maaten
  • Geoffrey Hinton
  • Yoshua Bengio
چکیده

We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizing Time-Dependent Data Using Dynamic t-SNE

Many interesting processes can be represented as time-dependent datasets. We define a time-dependent dataset as a sequence of datasets captured at particular time steps. In such a sequence, each dataset is composed of observations (high-dimensional real vectors), and each observation has a corresponding observation across time steps. Dimensionality reduction provides a scalable alternative to c...

متن کامل

Visualizing breast cancer data with t-SNE

One in eight women will get breast cancer in her lifetime and in 2008 it has caused 458.503 deaths among the world [15]. Despite that technology has made considerable improvements in the last decades, there is still room for more advances. A technique that possibly can contribute to this field is t-SNE [24]. The aim of this thesis is to investigate whether t-SNE is able to present the breast ca...

متن کامل

Towards Meaningful Maps of Polish Case Law

In this work, we analyze the utility of two dimensional document maps for exploratory analysis of Polish case law. Such maps reflect the structure of analyzed collection by grouping similar documents in a neighbouring regions of 2D space. This visual aid could be useful for browsing and searching, finding anomalous documents or quickly gaining synthetic knowledge about large corpora. We started...

متن کامل

Graph Layouts by t-SNE

We propose a new graph layout method based on a modification of the t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction technique. Although t-SNE is one of the best techniques for visualizing high-dimensional data as 2D scatterplots, t-SNE has not been used in the context of classical graph layout. We propose a new graph layout method, tsNET, based on representing a gra...

متن کامل

Supplemental Material for Visualizing Data using t - SNE

In this supplementary material, we present the results of our experiments that compare the visualizations produced by t-SNE with those produced by seven other dimensionality reduction techniques on five datasets from a variety of domains. Some of these results were already presented in the paper, however, we present the results here in a different form. The five datasets we employed in our expe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008