Topological Data Analysis with Bregman Divergences
نویسندگان
چکیده
Given a finite set in a metric space, the topological analysis generalizes hierarchical clustering using a 1-parameter family of homology groups to quantify connectivity in all dimensions. Going beyond Euclidean distance and really beyond metrics, we show that the tools of topological data analysis also apply when we measure distance with Bregman divergences. While these divergences violate two of the three axioms of a metric, they have been found suitable for high-dimensional data. Examples are the Kullback–Leibler divergence, which is commonly used for text and images, and the Itakura–Saito divergence, which is popular for speech and sound. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Non-flat Clusteringwhith Alpha-divergences
The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...
متن کاملWorst-Case and Smoothed Analysis of the k-Means Method with Bregman Divergences
The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice despite its exponential worst-case running-time. To narrow the gap between theory and practice, k-means has been studied in the semi-random input model of smoothed analysis, which often leads to more realistic conclusions than mere worst-case analysis. For the case tha...
متن کاملClustering with Bregman Divergences: an Asymptotic Analysis
Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the lim...
متن کاملMatrix Nearness Problems with Bregman Divergences
This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamental geometric properties. In addition, these divergences are intimately connected with exponential families...
متن کاملWorst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences
The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluste...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017