MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
نویسندگان
چکیده
Given a dataset of points in a metric space and an integer k, a diversity maximization problem requires determining a subset of k points maximizing some diversity objective measure, e.g., the minimum or the average distance between a pair of points in the subset. Diversity maximization problems are computationally hard, hence only approximate solutions can be hoped for. Although its applications are mostly in massive data analysis, most of the past research on diversity maximization has concentrated on the standard sequential setting. Thus, there is a need for efficient algorithms in computational settings that can handle very large datasets, such as those at the base of the MapReduce and the Streaming models. In this work we provide algorithms for these models in the special case of metric spaces of bounded doubling dimension, which include the important family of Euclidean spaces of constant dimension. Our results show that despite the inherent space-constraints of the two models, for a variety of diversity objective functions, we can achieve efficient MapReduce or Streaming algorithms yielding an (α+ ε)-approximation ratio, for any constant ε > 0, where α the is best approximation ratio achieved by a standard polynomial-time, linear-space sequential algorithm for the same diversity criterion. As for other approaches in the literature, our algorithms revolve upon the determination of a high-quality core-set, that is, a (much) smaller subset of the input dataset which contains a good approximation to the optimal solution for the whole dataset.
منابع مشابه
Completeness in Probabilistic Metric Spaces
The idea of probabilistic metric space was introduced by Menger and he showed that probabilistic metric spaces are generalizations of metric spaces. Thus, in this paper, we prove some of the important features and theorems and conclusions that are found in metric spaces. At the beginning of this paper, the distance distribution functions are proposed. These functions are essential in defining p...
متن کاملClustering High Dimensional Dynamic Data Streams
We present data streaming algorithms for the kmedian problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1, 2, . . .∆}. Our algorithms use k −2poly(d log ∆) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the cost of...
متن کاملDistributed Spanner Construction in Doubling Metric Spaces
This paper presents a distributed algorithm that runs on an n-node unit ball graph (UBG) G residing in a metric space of constant doubling dimension, and constructs, for any ε > 0, a (1 + ε)-spanner H of G with maximum degree bounded above by a constant. In addition, we show that H is “lightweight”, in the following sense. Let ∆ denote the aspect ratio of G, that is, the ratio of the length of ...
متن کاملThe Weak Gap Property in Metric Spaces of Bounded Doubling Dimension
We introduce the weak gap property for directed graphs whose vertex set S is a metric space of size n. We prove that, if the doubling dimension of S is a constant, any directed graph satisfying the weak gap property has O(n) edges and total weight O(log n) · wt(MST (S)), where wt(MST (S)) denotes the weight of a minimum spanning tree of S. We show that 2-optimal TSP tours and greedy spanners sa...
متن کاملOn some combinatorial problems in metric spaces of bounded doubling dimension
A metric space has doubling dimension d if for every ρ > 0, every ball of radius ρ can be covered by at most 2d balls of radius ρ/2. This generalizes the Euclidean dimension, because the doubling dimension of Euclidean space Rd is proportional to d. The following results are shown, for any d ≥ 1 and any metric space of size n and doubling dimension d: First, the maximum number of diametral pair...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017