The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2009 vol.20)
pp: 1439-1453
Sangeetha Seshadri , Georgia Institute of Technology, Atlanta
Vibhore Kumar , IBM T.J Watson Research Center, Hawthorne
Brian Cooper , Yahoo! Research, Santa Clara
Ling Liu , Georgia Institute of Technology, Atlanta
ABSTRACT
This paper addresses the problem of optimizing multiple distributed stream queries that are executing simultaneously in distributed data stream systems. We argue that the static query optimization approach of "plan, then deployment” is inadequate for handling distributed queries involving multiple streams and node dynamics faced in distributed data stream systems and applications. Thus, the selection of an optimal execution plan in such dynamic and networked computing systems must consider operator ordering, reuse, network placement, and search space reduction. We propose to use hierarchical network partitions to exploit various opportunities for operator-level reuse while utilizing network characteristics to maintain a manageable search space during query planning and deployment. We develop top-down, bottom-up, and hybrid algorithms for exploiting operator-level reuse through hierarchical network partitions. Formal analysis is presented to establish the bounds on the search space and suboptimality of our algorithms. We have implemented our algorithms in the IFLOW [CHECK END OF SENTENCE] system, an adaptive distributed stream management system. Through simulations and experiments using a prototype deployed on Emulab [CHECK END OF SENTENCE], we demonstrate the effectiveness of our framework and our algorithms.
INDEX TERMS
Computer-communication networks, distributed systems, distributed databases, distributed applications, database management, systems, query processing.
CITATION
Sangeetha Seshadri, Vibhore Kumar, Brian Cooper, Ling Liu, "A Distributed Stream Query Optimization Framework through Integrated Planning and Deployment", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 10, pp. 1439-1453, October 2009, doi:10.1109/TPDS.2008.232
REFERENCES
[1] V. Kumar et al., “Implementing Diverse Messaging Models with Self-Managing Properties Using IFLOW,” Proc. Third IEEE Int'l Conf. Autonomic Computing (ICAC), 2006.
[2] Emulab Network Testbed, http:/www.emulab.net/, 2008.
[3] Y. Yao and J. Gehrke, “The Cougar Approach to In-Network Query Processing in Sensor Networks,” SIGMOD Record, 2002.
[4] S. Madden, M.J. Franklin, J.M. Hellerstein, and W. Hong, “TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks,” Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI), 2002.
[5] Y. Ahmad and U. Cetintemel, “Network-Aware Query Processing for Stream-Based Applications,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), 2004.
[6] C. Olston, J. Jiang, and J. Widom, “Adaptive Filters for Continuous Queries over Distributed Data Streams,” Proc. ACM SIGMOD, 2003.
[7] D.J. Abadi et al., “The Design of the Borealis Stream Processing Engine,” Proc. Second Biennial Conf. Innovative Data Systems Research (CIDR), 2005.
[8] M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, and M.J. Franklin, “Flux: An Adaptive Partitioning Operator for Continuous Query Systems,” Proc. 19th Int'l Conf. Data Eng. (ICDE), 2003.
[9] P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer, “Network-Aware Operator Placement for Stream-Processing Systems,” Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[10] V. Oleson, K. Schwan, G. Eisenhauer, B. Plale, C. Pu, and D. Amin, “Operational Information Systems—An Example from the Airline Industry,” Proc. First Workshop Industrial Experiences with Systems Software (WIESS), 2000.
[11] “Amazon ec2,” Amazon Elastic Computing Cloud, aws.amazon. comec2, 2008.
[12] IBM Websphere, http://www-306.ibm.com/softwarewebsphere /, 2008.
[13] B. Plale and K. Schwan, “Dynamic Querying of Streaming Data with the dQUOB System,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 4, 2003.
[14] Terascale Supernova Initiative, http://www.phy.ornl.govtsi/, 2005.
[15] S. Xiang, H.B. Lim, K.-L. Tan, and Y. Zhou, “Two-Tier Multiple Query Optimization for Sensor Networks,” Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS), 2007
[16] L. Luo, Q. Cao, C. Huang, T. Abdelzaher, J.A. Stankovic, and M. Ward, “Enviromic: Towards Cooperative Storage and Retrieval in Audio Sensor Networks,” Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS), 2007.
[17] Akamai, http:/akamai.com/, 2008.
[18] TIBCO, http:/www.tibco.com/, 2008.
[19] IBM Unveils Enterprise Stream Processing System, http://www. hpcwire.com/hpc1623603.html, 2008.
[20] L. Chen, K. Reddy, and G. Agrawal, “GATES: A Grid-Based Middleware for Processing Distributed Data Streams,” Proc. 13th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2004.
[21] Z. Cai, V. Kumar, and K. Schwan, “IQ-Paths: Self-Regulating Data Streams Across Network Overlays,” Proc. 15th IEEE Int'l Symp. High Performance Distributed Computing (HPDC '06), citeseer.ist. psu.edu745491.html, 2006.
[22] S. Ganguly, M. Garofalakis, R. Rastogi, and K. Sabnani, “Streaming Algorithms for Robust, Real-Time Detection of DDOs Attacks,” Proc. IEEE Int'l Conf. Distributed Computing Systems (ICDCS), 2007.
[23] X. Li et al., “Mind: A Distributed Multi-Dimensional Indexing System for Network Diagnosis,” Proc. IEEE INFOCOM, 2006.
[24] D. Kossmann, “The State of the Art in Distributed Query Processing,” ACM Computing Surveys, 2000.
[25] J. Moy, “OSPF Version 2,” IETF RFC 2328, 1998.
[26] J. Beaver and M.A. Sharaf, “Location-Aware Routing for Data Aggregation for Sensor Networks,” Proc. Geo Sensor Networks Workshop, 2003.
[27] E.W. Zegura, K.L. Calvert, and S. Bhattacharjee, “How to Model an Internetwork,” Proc. IEEE INFOCOM, 1996.
[28] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, Inc., 1988.
[29] G. Eisenhauer, F.E. Bustamante, and K. Schwan, “Event Services for High Performance Computing,” Proc. Ninth IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2000.
[30] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris, “Practical, Distributed Network Coordinates,” Proc. Second Workshop Hot Topics in Networks (HotNets '03), Nov. 2003.
[31] R. Williams et al., ${\rm R}^{\ast}$ : An Overview of the Architecture, 2008.
[32] M. Stonebraker, “The Design and Implementation of Distributed INGRES,” The INGRES Papers: Anatomy of a Relational Database System, 1986.
[33] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. ACM Symp. Principles of Database Systems (PODS), 2002.
[34] S. Chandrasekaran et al., “TELEGRAPHCQ: Continuous Dataflow Processing for an Uncertain World,” Proc. First Biennial Conf. Innovative Data Systems Research (CIDR), 2003.
[35] J. Chen, D.J. DeWitt, and J.F. Naughton, “Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries,” Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.
[36] S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom, “Adaptive Ordering of Pipelined Stream Filters,” Proc. ACM SIGMOD, 2004.
[37] R. Avnur and J.M. Hellerstein, “Eddies: Continuously Adaptive Query Processing,” Proc. ACM SIGMOD, 2000.
[38] U. Srivastava, K. Munagala, and J. Widom, “Operator Placement for In-Network Stream Query Processing,” Proc. ACM Symp. Principles of Database Systems (PODS), 2005.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool