Automated Translation of Functional Big Data Queries to SQL

نویسندگان

چکیده

Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their interfaces abstract away much of the minutiae programming required by traditional query languages SQL. However, convenience comes at a cost are often less efficient than SQL counterparts. Motivated this observation, we present new technique for automatically transpiling While our approach is based on standard paradigm counterexample-guided inductive synthesis, it uses novel column-wise decomposition split synthesis task into smaller subquery problems. We implemented as tool called RDD2SQL translating RDD empirically evaluate effectiveness set real-world queries. Our results show that (1) most can be translated SQL, (2) very effective automating translation, (3) performing translation offers significant performance benefits.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data

This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spa...

متن کامل

Translation of SQL Queries Containing Nested Predicated into Pseudonatural Language

An approach to support query verification by translating SQL queries into easy-to-read pseudonatural language expressions is proposed. The method discussed here converts various types of SQL queries, including simple and composite nested ones. Since it employs no complex natural language processing technique, it is feasible even on small computers.

متن کامل

Partial Marking for Automated Grading of SQL Queries

The XData system, currently being developed at IIT Bombay, provides an automated and interactive platform for grading student SQL queries, as well as for learning SQL. Prior work on the XData system focused on generating query specific test cases to catch common errors in queries. These test cases are used to check whether the student queries are correct or not. For grading student assignments,...

متن کامل

Evolving SQL Queries for Data Mining

This paper presents a methodology for applying the principles of evolutionary computation to knowledge discovery in databases by evolving SQL queries that describe datasets. In our system, the fittest queries are rewarded by having their attributes being given a higher probability of surviving in subsequent queries. The advantages of using SQL queries include their readability for non-experts a...

متن کامل

Big-Data-Anwendungsentwicklung mit SQL und NoSQL

Bei der Verarbeitung und Auswertung von Big Data stoßen klassische relationale Datenbanken an ihre Grenzen. So ist die Speicherung von Daten im Multi-Terabyte-Bereich damit zwar möglich, aber in der Regel aufgrund der Lizenzund Speicherkosten unwirtschaftlich. Datenanalysen, bei denen große Datenbereiche gelesen werden müssen, erfordern darüber hinaus cluster-fähige Systeme für die Skalierung.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ACM on programming languages

سال: 2023

ISSN: ['2475-1421']

DOI: https://doi.org/10.1145/3586047