Automated Translation of Functional Big Data Queries to SQL
نویسندگان
چکیده
Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their interfaces abstract away much of the minutiae programming required by traditional query languages SQL. However, convenience comes at a cost are often less efficient than SQL counterparts. Motivated this observation, we present new technique for automatically transpiling While our approach is based on standard paradigm counterexample-guided inductive synthesis, it uses novel column-wise decomposition split synthesis task into smaller subquery problems. We implemented as tool called RDD2SQL translating RDD empirically evaluate effectiveness set real-world queries. Our results show that (1) most can be translated SQL, (2) very effective automating translation, (3) performing translation offers significant performance benefits.
منابع مشابه
Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data
This paper presents Sphinx, a full-fledged open-source system for big spatial data which overcomes the limitations of existing systems by adopting a standard SQL interface, and by providing a high efficient core built inside the core of the Apache Impala system. Sphinx is composed of four main layers, namely, query parser, indexer, query planner, and query executor. The query parser injects spa...
متن کاملTranslation of SQL Queries Containing Nested Predicated into Pseudonatural Language
An approach to support query verification by translating SQL queries into easy-to-read pseudonatural language expressions is proposed. The method discussed here converts various types of SQL queries, including simple and composite nested ones. Since it employs no complex natural language processing technique, it is feasible even on small computers.
متن کاملPartial Marking for Automated Grading of SQL Queries
The XData system, currently being developed at IIT Bombay, provides an automated and interactive platform for grading student SQL queries, as well as for learning SQL. Prior work on the XData system focused on generating query specific test cases to catch common errors in queries. These test cases are used to check whether the student queries are correct or not. For grading student assignments,...
متن کاملEvolving SQL Queries for Data Mining
This paper presents a methodology for applying the principles of evolutionary computation to knowledge discovery in databases by evolving SQL queries that describe datasets. In our system, the fittest queries are rewarded by having their attributes being given a higher probability of surviving in subsequent queries. The advantages of using SQL queries include their readability for non-experts a...
متن کاملBig-Data-Anwendungsentwicklung mit SQL und NoSQL
Bei der Verarbeitung und Auswertung von Big Data stoßen klassische relationale Datenbanken an ihre Grenzen. So ist die Speicherung von Daten im Multi-Terabyte-Bereich damit zwar möglich, aber in der Regel aufgrund der Lizenzund Speicherkosten unwirtschaftlich. Datenanalysen, bei denen große Datenbereiche gelesen werden müssen, erfordern darüber hinaus cluster-fähige Systeme für die Skalierung.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ACM on programming languages
سال: 2023
ISSN: ['2475-1421']
DOI: https://doi.org/10.1145/3586047