MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

نویسندگان

  • Matteo Ceccarello
  • Andrea Pietracaprina
  • Geppino Pucci
  • Eli Upfal
چکیده

Given a dataset of points in a metric space and an integer k, a diversity maximization problem requires determining a subset of k points maximizing some diversity objective measure, e.g., the minimum or the average distance between a pair of points in the subset. Diversity maximization problems are computationally hard, hence only approximate solutions can be hoped for. Although its applications are mostly in massive data analysis, most of the past research on diversity maximization has concentrated on the standard sequential setting. Thus, there is a need for efficient algorithms in computational settings that can handle very large datasets, such as those at the base of the MapReduce and the Streaming models. In this work we provide algorithms for these models in the special case of metric spaces of bounded doubling dimension, which include the important family of Euclidean spaces of constant dimension. Our results show that despite the inherent space-constraints of the two models, for a variety of diversity objective functions, we can achieve efficient MapReduce or Streaming algorithms yielding an (α+ ε)-approximation ratio, for any constant ε > 0, where α the is best approximation ratio achieved by a standard polynomial-time, linear-space sequential algorithm for the same diversity criterion. As for other approaches in the literature, our algorithms revolve upon the determination of a high-quality core-set, that is, a (much) smaller subset of the input dataset which contains a good approximation to the optimal solution for the whole dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Completeness in Probabilistic Metric Spaces

The idea of probabilistic metric space was introduced by Menger and he showed that probabilistic metric spaces are generalizations of metric spaces. Thus, in this paper, we prove some of the important features and theorems and conclusions that are found in metric spaces. At the beginning of this paper, the distance distribution functions are proposed. These functions are essential in defining p...

متن کامل

Clustering High Dimensional Dynamic Data Streams

We present data streaming algorithms for the kmedian problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1, 2, . . .∆}. Our algorithms use k −2poly(d log ∆) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the cost of...

متن کامل

Distributed Spanner Construction in Doubling Metric Spaces

This paper presents a distributed algorithm that runs on an n-node unit ball graph (UBG) G residing in a metric space of constant doubling dimension, and constructs, for any ε > 0, a (1 + ε)-spanner H of G with maximum degree bounded above by a constant. In addition, we show that H is “lightweight”, in the following sense. Let ∆ denote the aspect ratio of G, that is, the ratio of the length of ...

متن کامل

The Weak Gap Property in Metric Spaces of Bounded Doubling Dimension

We introduce the weak gap property for directed graphs whose vertex set S is a metric space of size n. We prove that, if the doubling dimension of S is a constant, any directed graph satisfying the weak gap property has O(n) edges and total weight O(log n) · wt(MST (S)), where wt(MST (S)) denotes the weight of a minimum spanning tree of S. We show that 2-optimal TSP tours and greedy spanners sa...

متن کامل

On some combinatorial problems in metric spaces of bounded doubling dimension

A metric space has doubling dimension d if for every ρ > 0, every ball of radius ρ can be covered by at most 2d balls of radius ρ/2. This generalizes the Euclidean dimension, because the doubling dimension of Euclidean space Rd is proportional to d. The following results are shown, for any d ≥ 1 and any metric space of size n and doubling dimension d: First, the maximum number of diametral pair...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017