Distributed Online Aggregation

نویسندگان

  • Sai Wu
  • Shouxu Jiang
  • Beng Chin Ooi
  • Kian-Lee Tan
چکیده

In many decision making applications, users typically issue aggregate queries. To evaluate these computationally expensive queries, online aggregation has been developed to provide approximate answers (with their respective confidence intervals) quickly, and to continuously refine the answers. In this paper, we extend the online aggregation technique to a distributed context where sites are maintained in a DHT (Distributed Hash Table) network. Our Distributed Online Aggregation (DoA) scheme iteratively and progressively produces approximate aggregate answers as follows: in each iteration, a small set of random samples are retrieved from the data sites and distributed to the processing sites; at each processing site, a local aggregate is computed based on the allocated samples; at a coordinator site, these local aggregates are combined into a global aggregate. DoA adaptively grows the number of processing nodes as the sample size increases. To further reduce the sampling overhead, the samples are retained as a precomputed synopsis over the network to be used for processing future queries. We also study how these synopsis can be maintained incrementally. We have conducted extensive experiments on PlanetLab. The results show that our DoA scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Aggregation of Coherent Generators Based on Electrical Parameters of Synchronous Generators

This paper proposes a novel approach for coherent generators online clustering in a large power system following a wide area disturbance. An interconnected power system may become unstable due to severe contingency when it is operated close to the stability boundaries. Hence, the bulk power system controlled islanding is the last resort to prevent catastrophic cascading outages and wide area bl...

متن کامل

odNEAT: An Algorithm for Distributed Online, Onboard Evolution of Robot Behaviours

We propose and evaluate a novel approach called Online Distributed NeuroEvolution of Augmenting Topologies (odNEAT). odNEAT is a completely distributed evolutionary algorithm for online learning in groups of embodied agents such as robots. While previous approaches to online distributed evolution of neural controllers have been limited to the optimisation of weights, odNEAT evolves both weights...

متن کامل

LOOM: Optimal Aggregation Overlays for In-Memory Big Data Processing

Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressiv...

متن کامل

AVCOL: Availability-aware information aggregation in large distributed systems under uncollaborative behavior

Aggregation of system-wide information in large-scale distributed systems, such as p2p systems and Grids, can be unfairly influenced by nodes that are selfish, colluding with each other, or are offline most of the time. We present AVCOL, which uses probabilistic and gossip-style techniques to provide availability-aware aggregation. Concretely, AVCOL is the first aggregation system that: (1) imp...

متن کامل

Efficient Data Aggregation and Management in Integrated Network Control Environments

Due to the emerging growth of computer networks, broadly based measurements, monitoring and management become necessary, for example, to solve occurring problems. Lots of different concepts exist for each of the mentioned functionality. Therefore, distributed network control architectures integrating all of these functionalities are in the focus of current research. To take advantage of this ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009