The Community for Technology Leaders
2016 IEEE 9th International Conference on Cloud Computing (2016)
San Francisco, California, USA
June 27, 2016 to July 2, 2016
ISSN: 2159-6190
ISBN: 978-1-5090-2619-7
pp: 108-115
Distributed data stream processing has become an increasingly popular computational framework due to many emerging applications which require real-time processing of data such as dynamic content delivery and security event analysis. These distributed data stream processing applications are often run on shared, multi-tenant clusters as companies try to consolidate from dedicated clusters for each application (batch and streaming) to a single cluster using a global cluster manager such as Hadoop YARN. In shared cluster environments, guaranteeing the quality of service constraints for throughput and response time for both stream processing applications and batch applications is a significant challenge. Stream processing applications often face an elastic demand where the input rate can vary drastically. The typical solution to solve workload elasticity is to guarantee enough resources to the application, but this solution is not possible when resources are being shared among multiple applications. In this paper, we present an approach for supporting elastic scaling of distributed data stream processing applications and efficiently scheduling and coordinating stream processing with batch processing in shared clusters. Our solution consists of a congestion detection monitor which detects bottlenecks in the streaming system and a global state manager that performs non-disruptive, stateful scaling of streaming applications. We implemented our solution using Storm, a popular stream processing framework, and tested our implementation on a Hadoop YARN cluster using a real-time security event processing workload. Our experimental results show that our solution improves stream processing application throughput by 49% over default Storm while decreasing average request response times by 58%.
Storms, Real-time systems, Monitoring, Distributed databases, Yarn, Security, Face

J. Li, C. Pu, Y. Chen, D. Gmach and D. Milojicic, "Enabling Elastic Stream Processing in Shared Clusters," 2016 IEEE 9th International Conference on Cloud Computing(CLOUD), San Francisco, California, USA, 2016, pp. 108-115.
97 ms
(Ver 3.3 (11022016))