The Community for Technology Leaders
RSS Icon
Issue No.12 - Dec. (2013 vol.24)
pp: 2344-2354
Yuzhe Tang , Georgia Institute of Technology, Atlanta
Bugra Gedik , Bilkent University, Ankara
Stream processing applications use online analytics to ingest high-rate data sources, process them on-the-fly, and generate live results in a timely manner. The data flow graph representation of these applications facilitates the specification of stream computing tasks with ease, and also lends itself to possible runtime exploitation of parallelization on multicore processors. While the data flow graphs naturally contain a rich set of parallelization opportunities, exploiting them is challenging due to the combinatorial number of possible configurations. Furthermore, the best configuration is dynamic in nature; it can differ across multiple runs of the application, and even during different phases of the same run. In this paper, we propose an autopipelining solution that can take advantage of multicore processors to improve throughput of streaming applications, in an effective and transparent way. The solution is effective in the sense that it provides good utilization of resources by dynamically finding and exploiting sources of pipeline parallelism in streaming applications. It is transparent in the sense that it does not require any hints from the application developers. As a part of our solution, we describe a light-weight runtime profiling scheme to learn resource usage of operators comprising the application, an optimization algorithm to locate best places in the data flow graph to explore additional parallelism, and an adaptive control scheme to find the right level of parallelism. We have implemented our solution in an industrial-strength stream processing system. Our experimental evaluation based on microbenchmarks, synthetic workloads, as well as real-world applications confirms that our design is effective in optimizing the throughput of stream processing applications without requiring any changes to the application code.
Runtime, Throughput, Instruction sets, Parallel processing, Streaming media, Multicore processing,autopipelining, Stream processing, parallelization
Yuzhe Tang, Bugra Gedik, "Autopipelining for Data Stream Processing", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 12, pp. 2344-2354, Dec. 2013, doi:10.1109/TPDS.2012.333
[1] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom, "STREAM: The Stanford Stream Data Manager," IEEE Data Eng. Bull., vol. 26, no. 1, 2003.
[2] D. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik, "The Design of the Borealis Stream Processing Engine," Proc. Second Biennial Conf. Innovative Data Systems Research (CIDR), 2005.
[3] N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani, "Design, Implementation and Evaluation of the Linear Road Benchmark on the Stream Processing Core," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[4] StreamBase Systems, http:/, Oct. 2011.
[5] B. Gedik and H. Andrade, "A Model-Based Framework for Building Extensible, High Performance Stream Processing Middleware and Programming Language for IBM Infosphere Streams," Software: Practice and Experience, vol. 42, pp. 1363-1391, 2012.
[6] S4 Distributed Stream Computing Platform, http:/, Oct. 2011.
[7] Openmp, http:/, Oct. 2011.
[8] Cilk++. intel-cilk-plus/, Oct. 2011.
[9] J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly, 2007.
[10] B.L. Chamberlain, D. Callahan, and H.P. Zima, "Parallel Programmability and the Chapel Language," Int'l J. High Performance Computing Applications, vol. 21, pp. 291-312, 2007.
[11] G.L. SteeleJr., "Parallel Programming and Code Selection in Fortress," Proc. 11th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2006.
[12] P. Charles, C. Grothoff, V.A. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: An Object-Oriented Approach to Non-Uniform Cluster Computing," Proc. 20th Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2005.
[13] B. Gedik, H. Andrade, K.-L. Wu, P.S. Yu, and M. Doo, "SPADE: The System S Declarative Stream Processing Engine," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[14] M.M. Astrahan et al., "System R: A Relational Approach to Data Management," ACM Trans. Database Systems, vol. 1, no. 2, pp. 97-137, 1976.
[15] B. Gedik, H. Andrade, and K.-L. Wu, "A Code Generation Approach to Optimizing High-Performance Distributed Data Stream Processing," Proc. 18th ACM Conf. Information and Knowledge Management (CIKM), 2009.
[16] R. Khandekar, K. Hildrum, S. Parekh, D. Rajan, J.L. Wolf, K.-L. Wu, H. Andrade, and B. Gedik, "COLA: Optimizing Stream Processing Applications via Graph Partitioning," Proc. ACM/IFIP/USENIX 10th Int'l Conf. Middleware (Middleware), 2009.
[17] J. Giacomoni, T. Moseley, and M. Vachharajani, "FastForward for Efficient Pipeline Parallelism: A Cache-Optimized Concurrent Lock-Free Queue," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2008.
[18] S.L. Graham, P.B. Kessler, and M.K. McKusick, "gprof: A Call Graph Execution Profiler (With Retrospective)," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), pp. 49-57, 1982.
[19] TCMalloc: Thread-Caczhing Malloc, http://goog-perftools. tcmalloc.html, Aug. 2012.
[20] Lois, http:/, Oct. 2011.
[21] H. Andrade, B. Gedik, K.-L. Wu, and P.S. Yu, "Processing High Data Rate Streams in System S," J. Parallel and Distributed Computing, vol. 71, no. 2, pp. 145-156, 2011.
[22] A. Arasu, S. Babu, and J. Widom, "The CQL Continuous Query Language: Semantic Foundations and Query Execution," The VLDB J., vol. 15, no. 2, pp. 121-142, 2006.
[23] S. Garcia, D. Jeon, C.M. Louie, and M.B. Taylor, "Kremlin: Rethinking and Rebooting Gprof for the Multicore Age," Proc. 32nd ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2011.
[24] Y. He, C.E. Leiserson, and W.M. Leiserson, "The Cilkview Scalability Analyzer," Proc. 22nd ACM Symp. Parallelism in Algorithms and Architectures (SPAA), 2010.
[25] T. Klug, M. Ott, J. Weidendorfer, and C. Trinitis, "Autopin: Automated Optimization of Thread-to-Core Pinning on Multicore Systems," Trans. High-Performance Embedded Architectures and Compilers, vol. 3, pp. 219-235, 2011.
[26] M.I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting Coarse-Grained Task Data, and Pipeline Parallelism in Stream Programs," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006.
[27] X. Zhang, A. Navabi, and S. Jagannathan, "Alchemist: A Transparent Dependence Distance Profiling Infrastructure," Proc. IEEE/ACM Seventh Ann. Int'l Symp. Code Generation and Optimization (CGO), pp. 47-58, 2009.
[28] S.H. Bokhari, Assignment Problems in Parallel and Distributed Computing. Kluwer Academic Publishing, 1987.
[29] S.M. Krishnamurthy, "A Brief Survey of Papers on Scheduling for Pipelined Processors," ACM SIGPLAN Notices, vol. 25, no. 7, pp. 97-106, 1990.
[30] W. Du, R. Ferreira, and G. Agrawal, "Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism," Proc. ACM/IEEE Conf. Supercomputing (SC), p. 8, 2003.
[31] M.D. Beynon, T.M. Kurç, Ü.V. Çatalyürek, C. Chang, A. Sussman, and J.H. Saltz, "Distributed Processing of Very Large Data Sets with Datacutter," Parallel Computing J., vol. 27, no. 11, pp. 1457-1478, 2001.
[32] E. Jeřábek, "Dual Weak Pigeonhole Principle, Boolean Complexity, and Derandomization," Annals of Pure and Applied Logic, vol. 129, pp. 1-37, 2004.
[33] S. Liang and D. Viswanathan, "Comprehensive Profiling Support in the Java Virtual Machine," Proc. Fifth Conf. USENIX Object-Oriented Technologies and Systems (COOTS), 1999. pp. 229-242.
[34] Oprofile, http://oprofile.sourceforge.netabout/, Oct. 2011.
[35] J.A.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.T. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl, "Continuous Profiling: Where Have All the Cycles Gone?" Proc. 16th ACM Symp. Operating Systems Principles (SOSP), pp. 1-14, 1997.
50 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool