This Article 
 Bibliographic References 
 Add to: 
Dynamic Querying of Streaming Data with the dQUOB System
April 2003 (vol. 14 no. 4)
pp. 422-432

Abstract—Data streaming has established itself as a viable communication abstraction in data-intensive parallel and distributed computations, occurring in applications such as scientific visualization, performance monitoring, and large-scale data transfer. A known problem in large-scale event communication is tailoring the data received at the consumer. It is the general problem of extracting data of interest from a data source, a problem that the database community has successfully addressed with SQL queries, a time tested, user-friendly way for noncomputer scientists to access data. By leveraging the efficiency of query processing provided by relational queries, the dQUOB system provides a conceptual relational data model and SQL query access over streaming data. Queries can be used to extract data, combine streams, and create new streams. The language augments queries with an action to enable more complex data transformations such as Fourier transforms. The dQUOB system has been applied to two large-scale distributed applications: a safety critical autonomous robotics simulation and scientific software visualization for global atmospheric transport modeling. In this paper, we present the dQUOB system and the results of performance evaluation undertaken to assess its applicability in data-intensive wide-area computations, where the benefit of portable data transformation must be evaluated against the cost of continuous query evaluation.

[1] A. Afjeh, P. Homer, H. Lewandowski, J. Reed, and R. Schlichting, “Development of an Intelligent Monitoring and Control System for a Heterogeneous Numerical Propulsion System Simulation,” Proc. 28th Ann. Simulation Symp., Apr. 1995.
[2] R. Avnur and J.M. Hellerstein, “Eddies: Continuously Adaptive Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 261-272, May 2000.
[3] S. Babu and J. Widom, “Continuous Queries over Data Streams,” Proc. Int'l Conf. Management of Data (SIGMOD), 2001.
[4] G. Banavar, T. Chandra, B. Mukherjee, J. Nagarajarao, R.E. Strom, and D.C. Sturman, “An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems,” Proc. Int'l Conf. Distributed Computing Systems, 1999.
[5] R. Bramley, K. Chiu, S. Diwan, D. Gannon, M. Govindaraju, N. Mukhi, B. Temko, and M. Yechuri, “A Component Based Services Architecture for Building Distributed Applications,” Proc. IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 2000.
[6] F.E. Bustamante and K. Schwan, “Active I/O Streams for Heterogeneous High Performance Computing,” Proc. Parallel Computing (ParCo) '99, Aug. 1999.
[7] H. Iwata, “Pen-Based Haptic Virtual Environment,” Proc. IEEE Virtual Reality Ann. Int'l Symp., pp. 287-292, 1993.
[8] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets,” J. Network and Comput. Appl., (to appear).
[9] E. Deelman, K. Blackburn, P. Ehrens, C. Kesselman, S. Koranda, and A. Lazzarin, “Griphyn and Ligo, Building a Virtual Data Grid for Gravitational Wave Scientists,” Proc. 11th IEEE Int'l High Performance Distributed Computing (HPDC), Aug. 2002.
[10] P.A. Dinda and D.R. O'Hallaron, “An Extensible Toolkit for Resource Prediction in Distributed Systems,” technical report, Carnegie Mellon Univ., 1999.
[11] G. Eisenhauer, F. Bustamente, and K. Schwan, “Event Services for High Performance Computing,” Proc. Ninth IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 2000.
[12] R Ferreira, T. Kurc, M. Beynon, C. Chang, and J. Saltz, “Object-Relational Queries into Multidimensional Databases with the Active Data Repository,” J. Supercomputer Applications and High Performance Computing (IJSA), 1999.
[13] I. Foster, J. Insley, G. von Laszewski, C. Kesselman, and M. Thiebaux, “Distance Visualization: Data Exploration on the Grid,” Computer, vol. 32, no. 12, pp. 36-43, Dec. 1999.
[14] The Grid: Blueprint for a New Computing Infrastructure. I. Foster and C. Kesselman, eds. Morgan Kaufmann, 1999.
[15] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Int'l J. Supercomputer Applications, 2001.
[16] G. Fox and S. Pallickara, “An Event Service to Support Grid Computational Environments,” J. Concurrency and Computation: Practice and Experience—Special Issue on Grid Computing Environments, 2002.
[17] D. Gunter, B. Tierney, B. Crowley, K. Jackson, J. Lee, and M. Stoufer, “Dynamic Monitoring of High-Performance Distributed Applications,” Proc. 11th IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 2002.
[18] M.J. Harrold and G. Rothermel, “Performing Dataflow Testing on Classes,” Proc. ACM Symp. Foundations of Software Eng., Dec. 1994.
[19] W. Hibbard, “VisAD: Connecting People to Computations and People to People,” Computer Graphics, vol. 32, no. 3, pp. 10-12, 1998.
[20] C. Isert and K. Schwan, “ACDS: Adapting Computational Data Streams for High Performance,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), May 2000.
[21] C.E. Kilpatrick and K. Schwan, “Using Languages for Describing Capture, Analysis, and Display of Performance Information for Parallel and Distributed Applications,” Proc. IEEE Int'l Conf. Computer Languages, Mar. 1990.
[22] F. Kon, R. Campbell, M. Mickunas, K. Nahrstedt, and F. Ballesteros, “2K: A Distributed Operating System for Dynamic Heterogeneous Environments,” Proc. IEEE Int'l High Performance Distributed Computing Symp. (HPDC), 2000.
[23] L. Liu, C. Pu, and W. Tang, “Continual Queries for Internet Scale Event-Driven Information Delivery,” IEEE Trans. Knowledge and Data Eng., July/Aug. 1999.
[24] S. Madden and M.J. Franklin, “Fjording the Stream: An Architecture for Queries over Streaming Sensor Data,” Proc. 18th IEEE Int'l Conf. Data Eng., pp. 555-566, Feb. 2002.
[25] D. Ogle, K. Schwan, and R. Snodgrass, “Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 7, pp. 762-778, July 1993.
[26] G. Piatetsky-Shapiro and C. Connel, “Accurate Estimation of the Number of Tuples Satisfying a Condition,” Proc. 1984 ACM-SIGMOD Conf., pp. 256-276, June 1984.
[27] B. Plale, “Leveraging Run Time Knowledge about Event Rates to Improve Memory Utilization in Wide Area Data Stream Filtering,” Proc. 11th IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 2002.
[28] B. Plale, V. Elling, G. Eisenhauer, K. Schwan, D. King, and V. Martin, “Realizing Distributed Computational Laboratories,” Int'l J. Parallel and Distributed Systems and Networks, vol. 2, no. 3, pp. 180-190, 1999.
[29] B. Plale and K. Schwan, “Run-Time Detection in Parallel and Distributed Systems: Application to Safety-Critical Systems,” Proc. Int'l Conf. Distributed Computing Systems (ICDCS), pp. 163-170, June 1999.
[30] B. Plale and K. Schwan, “dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries,” Proc. Ninth IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 2000.
[31] R. Ribler, J. Vetter, H. Simitci, and D. Reed, “Autopilot: Adaptive Control of Distributed Applications,” Proc. IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 1999.
[32] B. (Plale) Schroeder, S. Aggarwal, and K. Schwan, “Software Approach to Hazard Detection Using On-Line Analysis of Safety Constraints,” Proc. 16th Symp. Reliable and Distributed Systems (SRDS '97), pp. 80-87, Oct. 1997.
[33] S. Smallen, H. Casanova, and F. Berman, “Applying Scheduling and Tuning to On-Line Parallel Tomography,” Proc. ACM/IEEE Supercomputing 2001, 2001.
[34] R. Snodgrass, “A Relational Approach to Monitoring Complex Systems,” IEEE Trans. Computers, vol. 37, no. 2, pp. 156-196, May 1988.
[35] D. Terry, D. Goldberg, D. Nichols, and B. Oki, “Continuous Queries over Append-Only Databases,” Proc. Int'l Conf. Management of Data (SIGMOD), 1992.
[36] Active Database Systems. J. Widom and S. Ceri, eds. Morgan Kaufmann, 1996.
[37] R. Wolski, “Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service,” Proc. IEEE Int'l High Performance Distributed Computing Symp. (HPDC), Aug. 1997.

Index Terms:
Data-intensive computations, grid computing, data streams, publish-subscribe event channels, SQL, relational data model, database query processing.
Beth Plale, Karsten Schwan, "Dynamic Querying of Streaming Data with the dQUOB System," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 422-432, April 2003, doi:10.1109/TPDS.2003.1195413
Usage of this product signifies your acceptance of the Terms of Use.