A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses

نویسندگان

George Candea

Neoklis Polyzotis

Radek Vingralek

چکیده

Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time. We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an “alwayson” pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Prejoin Approach for Performance Optimization in MapReduce-based Warehouses

MapReduce-based warehousing solutions (e.g. Hive) for big data analytics with the capabilities of storing and analyzing high volume of both structured and unstructured data in a scalable file system have emerged recently. Their efficient data loading features enable a so-called near real-time warehousing solution in contrast to those offered by conventional data warehouses with complex, long-ru...

متن کامل

Adaptive Approach for Joining and Submissive View of Data in Data Warehouse Using Etl

Data warehouses have emerged as a new business intelligence paradigm where data store and maintain in concurrent. The modifications are required in the implementation of Extract Transform Load (ETL) operations which now need to be executed in an online fashion. The adaptive approach takes two phases. The Extraction phase and the joining phase. The Extraction phase recognition of the subset of s...

متن کامل

Scalable and Adaptive Online Joins

Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload ar...

متن کامل

A Join Index for XML Data Warehouses

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this paper, we propose a new join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminat...

متن کامل

Evaluation of view maintenance with complex joins in a data warehouse environment

Data warehouse maintenance and maintenance cost has been well studied in the literature. Integrating data sources, in a data warehouse environment, may often need data cleaning, transformation, or any other function applied to the data in order to integrate it. The impact on view maintenance, when data is integrated with other comparison operators than defined in theta join, has, however, not b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 2 شماره

صفحات -

تاریخ انتشار 2009

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses

نویسندگان

چکیده

منابع مشابه

Adaptive Prejoin Approach for Performance Optimization in MapReduce-based Warehouses

Adaptive Approach for Joining and Submissive View of Data in Data Warehouse Using Etl

Scalable and Adaptive Online Joins

A Join Index for XML Data Warehouses

Evaluation of view maintenance with complex joins in a data warehouse environment

عنوان ژورنال:

اشتراک گذاری