Go to page

Bibliographic Metadata

Optimizing Big-Data Queries Using Program Synthesis
AuthorSchlaipfer, Matthias ; Rajan, Kaushik ; Lal, Akash ; Samak, Malavika
Published in
Proceedings of the 26th Symposium on Operating Systems Principles, New York, NY, USA, 2017, page 631-646
Accepted version
Document typeArticle in a collected edition
Keywords (EN)Program Synthesis / Query Optimization / User-Defined Operators
Project-/ReportnumberAustrian Science Fund (FWF): S11403-N23
Project-/ReportnumberAustrian Science Fund (FWF): W1255-N23
URNurn:nbn:at:at-ubtuw:3-3341 Persistent Identifier (URN)
 The work is publicly available
Optimizing Big-Data Queries Using Program Synthesis [0.88 mb]
Abstract (English)

Classical query optimization relies on a predefined set of rewrite rules to re-order and substitute SQL operators at a logical level. This paper proposes Blitz, a system that can synthesize efficient query-specific operators using automated program reasoning. Blitz uses static analysis to identify sub-queries as potential targets for optimization. For each sub-query, it constructs a template that defines a large space of possible operator implementations, all restricted to have linear time and space complexity. Blitz then employs program synthesis to instantiate the template and obtain a data-parallel operator implementation that is functionally equivalent to the original sub-query up to a bound on the input size.

Program synthesis is an undecidable problem in general and often difficult to scale, even for bounded inputs. Blitz therefore uses a series of analyses to judiciously use program synthesis and incrementally construct complex operators.

We integrated Blitz with existing big-data query languages by embedding the synthesized operators back into the query as User Defined Operators. We evaluated Blitz on several production queries from Microsoft running on two state-of-the-art query engines: SparkSQL as well as Scope, the big-data engine of Microsoft. Blitz produces correct optimizations despite the synthesis being bounded. The resulting queries have much more succinct query plans and demonstrate significant performance improvements on both big-data systems (1.3x --- 4.7x).

The PDF-Document has been downloaded 43 times.