Top-k vectorial aggregation queries in a distributed environment

نویسندگان

  • Guy Sagy
  • Izchak Sharfman
  • Daniel Keren
  • Assaf Schuster
چکیده

Given a large set of objects in a distributed database, the goal of a top-k query is to determine the top-k scoring objects and return them to the user. Efficient top-k ranking over distributed databases has been the focus of recent research, with most current algorithms operating on the assumption that each node holds a single or small subset of each object’s numerical attributes. However, in many important setups each nodemight hold instead a full d-dimensional vector of numerical attributes for each object. Examples include website activity in distributed servers, sales statistics for a retail chain, or share price information in different stockmarkets. For these setups, we define a novel ranking problem, top-k vectorial aggregation queries, where each object’s score is determined by first aggregating the attribute vectors held for it and then applying the scoring function over the aggregated vector. Our communication-efficient algorithmuses a blendof geometric and skyline relatedmachinery, some of which is newly developed, as well as an algorithmic framework for defining generic local constraints. Whereas previous algorithms have reduced data sharing by defining local thresholds for each attribute, such tailored solutions might perform poorly. Experimental results on real-world data demonstrate that our algorithm maintains low latency, with a communication cost up to four orders of magnitude lower than that of existing solutions. © 2010 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authenticated Top-K Aggregation in Distributed and Authenticated Top-K Aggregation in Distributed and

Top-k queries have attracted interest in many different areas like network and system monitoring, information retrieval, sensor networks, and so on. Since today many applications issue top-k queries on distributed and outsourced databases, authentication of top-k query results becomes more important. This paper addresses the problem of authenticated top-k aggregation queries (e.g. “find the k o...

متن کامل

Search for the Best but Expect the Worst - Distributed Top-k Queries over Decreasing Aggregated Scores

We consider distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. In contrast to existing work, we exclusively consider distributed top-k queries over decreasing aggregated values. State-of-the-art distributed top-k algorithms usually depend on threshold propagation to reduce expen...

متن کامل

Efficient Processing of Distributed Top-k Queries

Ranking-aware queries, or top-k queries, have received much attention recently in various contexts such as web, multimedia retrieval, relational databases, and distributed systems. Top-k queries play a critical role in many decision-making related activities such as, identifying interesting objects, network monitoring, load balancing, etc. In this paper, we study the ranking aggregation problem...

متن کامل

Top-k aggregation queries in large-scale distributed systems

Distributed top-k query processing has become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed topk queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers.

متن کامل

Research Issues in Supporting Data Intensive Applications within an Exascale System

Analyzing large graphs are crucial to a variety of applications domains, like personalized recommendations in social networks, search engines, communication networks, computational biology, etc. In these domains, there is a need to process aggregation queries over large graphs. Existing approaches for aggregation are not suitable for large graphs, as they involve multi-way relational over gigan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 71  شماره 

صفحات  -

تاریخ انتشار 2011