Optimizing Big-Data Queries Using Program Synthesis

Schlaipfer, Matthias; Rajan, Kaushik; Lal, Akash; Samak, Malavika

doi:10.1145/3132747.3132773

Record link:

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:3-3341
http://hdl.handle.net/20.500.12708/885

Title:

Optimizing Big-Data Queries Using Program Synthesis

Citation:

Schlaipfer, M., Rajan, K., Lal, A., & Samak, M. (2017). Optimizing Big-Data Queries Using Program Synthesis. In Proceedings of the 26th Symposium on Operating Systems Principles. https://doi.org/10.1145/3132747.3132773

CatalogPlus:

AC11365316

Publisher DOI:

10.1145/3132747.3132773

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Schlaipfer, Matthias
Rajan, Kaushik
Lal, Akash
Samak, Malavika

Organisational Unit:

E192 - Institut für Informationssysteme

ISBN:

9781450350853

Date (published):

2017

Keywords:

Program Synthesis; Query Optimization; User-Defined Operators

Abstract:

Classical query optimization relies on a predefined set of rewrite rules to re-order and substitute SQL operators at a logical level. This paper proposes Blitz, a system that can synthesize efficient query-specific operators using automated program reasoning. Blitz uses static analysis to identify sub-queries as potential targets for optimization. For each sub-query, it constructs a template that defines a large space of possible operator implementations, all restricted to have linear time and space complexity. Blitz then employs program synthesis to instantiate the template and obtain a data-parallel operator implementation that is functionally equivalent to the original sub-query up to a bound on the input size.<br /><br />Program synthesis is an undecidable problem in general and often difficult to scale, even for bounded inputs. Blitz therefore uses a series of analyses to judiciously use program synthesis and incrementally construct complex operators.<br /><br />We integrated Blitz with existing big-data query languages by embedding the synthesized operators back into the query as User Defined Operators. We evaluated Blitz on several production queries from Microsoft running on two state-of-the-art query engines: SparkSQL as well as Scope, the big-data engine of Microsoft. Blitz produces correct optimizations despite the synthesis being bounded. The resulting queries have much more succinct query plans and demonstrate significant performance improvements on both big-data systems (1.3x --- 4.7x).

Additional information:

The final publication is available via <a href="https://doi.org/10.1145/3132747.3132773" target="_blank">https://doi.org/10.1145/3132747.3132773</a>.

License:

In Copyright

Appears in Collections:

Conference Paper