2013 IEEE 29th International Conference on Data Engineering (ICDE) (2013)
Brisbane, Australia Australia
Apr. 8, 2013 to Apr. 12, 2013
A. Simitsis , HP Labs., Palo Alto, CA, USA
K. Wilkinson , HP Labs., Palo Alto, CA, USA
U. Dayal , HP Labs., Palo Alto, CA, USA
Meichun Hsu , HP Labs., Palo Alto, CA, USA
To remain competitive, enterprises are evolving their business intelligence systems to provide dynamic, near realtime views of business activities. To enable this, they deploy complex workflows of analytic data flows that access multiple storage repositories and execution engines and that span the enterprise and even outside the enterprise. We call these multi-engine flows hybrid flows. Designing and optimizing hybrid flows is a challenging task. Managing a workload of hybrid flows is even more challenging since their execution engines are likely under different administrative domains and there is no single point of control. To address these needs, we present a Hybrid Flow Management System (HFMS). It is an independent software layer over a number of independent execution engines and storage repositories. It simplifies the design of analytic data flows and includes optimization and executor modules to produce optimized executable flows that can run across multiple execution engines. HFMS dispatches flows for execution and monitors their progress. To meet service level objectives for a workload, it may dynamically change a flow's execution plan to avoid processing bottlenecks in the computing infrastructure. We present the architecture of HFMS and describe its components. To demonstrate its potential benefit, we describe performance results for running sample batch workloads with and without HFMS. The ability to monitor multiple execution engines and to dynamically adjust plans enables HFMS to provide better service guarantees and better system utilization.
Engines, Optimization, Databases, Monitoring, Business, Connectors, Fault tolerance
A. Simitsis, K. Wilkinson, U. Dayal and Meichun Hsu, "HFMS: Managing the lifecycle and complexity of hybrid analytic data flows," 2013 29th IEEE International Conference on Data Engineering (ICDE 2013)(ICDE), Brisbane, QLD, 2013, pp. 1174-1185.