The Community for Technology Leaders
2015 IEEE 31st International Conference on Data Engineering (ICDE) (2015)
Seoul, South Korea
April 13, 2015 to April 17, 2015
ISBN: 978-1-4799-7964-6
pp: 1304-1315
Aditi Pandit , Teradata Aster, USA
Derrick Kondo , Teradata Aster, USA
David Simmen , Teradata Aster, USA
Anjali Norwood , Teradata Aster, USA
Tongxin Bai , Teradata Aster, USA
ABSTRACT
The volume, velocity, and variety of Big Data necessitate the development of new and innovative data processing software. A multitude of SQL implementations on distributed systems have emerged in recent years to enable large-scale data analysis. User-Defined Table operators (written in procedural languages) embedded in these SQL implementations are a powerful mechanism to succinctly express and perform analytic operations typical in Big Data discovery workloads. Table operators can be easily customized to implement different processing models such as map, reduce and graph execution. Despite an inherently parallel execution model, the performance and scalability of these table operators is greatly restricted as they appear as a black box to a typical SQL query optimizer. The optimizer is not able to infer even the basic properties of table operators, prohibiting the application of optimization rules and strategies. In this paper, we introduce an innovative concept of “Collaborative Planning”, which results in the removal of redundant operations and a more optimal rearrangement of query plan operators. The optimization of the query proceeds through a collaborative exchange between the planner and the table operator. Plan properties and context information of surrounding query plan operations are exchanged between the optimizer and the table operator. Knowing these properties also allows the author of the table operator to optimize its embedded logic. Our main contribution in this paper is the design and implementation of Collaborative Planning in the Teradata Aster 6 system. Using real-world workloads, we show that Collaborative Planning reduces query execution times as much as 90.0% in common use cases, resulting in a 24x speedup.
INDEX TERMS
Planning, Contracts, Collaboration, Context, Optimization, Runtime, Big data
CITATION
Aditi Pandit, Derrick Kondo, David Simmen, Anjali Norwood, Tongxin Bai, "Accelerating Big Data analytics with Collaborative Planning in Teradata Aster 6", 2015 IEEE 31st International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 1304-1315, 2015, doi:10.1109/ICDE.2015.7113378
84 ms
(Ver 3.3 (11022016))