Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

نویسندگان

  • Edmon Begoli
  • Jesús Camacho Rodríguez
  • Julian Hyde
  • Michael J. Mior
  • Daniel Lemire
چکیده

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite’s architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in bigdata frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Using Distributed Extended XQuery for Web Data Sources as Services

DeXIN (Distributed extended XQuery for data INtegration) integrates multiple, heterogeneous, highly distributed and rapidly changing web data sources in different formats, e.g. XML, RDF and relational data. DeXIN is a RESTful data integration web service which integrates heterogeneous distributed data sources, including data services (DaaS – data as a service). At the heart of DeXIN is an XQuer...

متن کامل

Query Processing and Optimisation in Integrated Heterogeneous Grid Resources∗

The performance of Grid computing technologies for distributed data access and query processing has been investigated in a number of studies. However, different Grid data sources may have schema conflicts which require fine-grained resolution through the use of data integration technologies that are not supported by the current generation of Grid data access and querying middleware. This is par...

متن کامل

On the analysis of big data indexing execution strategies

Efficient response to search queries is very crucial for data analysts to obtain timely results from big data spanned over heterogeneous machines. Currently, a number of big-data processing frameworks are available in which search operations are performed in distributed and parallel manner. However, implementation of indexing mechanism results in noticeable reduction of overall query processing...

متن کامل

GenoMetric Query Language: a novel approach to large-scale genomic data management

MOTIVATION Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction lev...

متن کامل

A Distributed Event Stream Processing Framework for Materialized Views over Heterogeneous Data Sources

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a looselycoupled distributed environment, providing integrated access to heterogeneous structured data sources such as relational databases and XML data. This paper provides the foundation for defining a framework for materialized views over heterogeneous data sources in an event st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.10233  شماره 

صفحات  -

تاریخ انتشار 2018