This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators
June 2012 (vol. 24 no. 6)
pp. 1065-1079
Rajeev Gupta, IBM Research, New Delhi
Krithi Ramamritham, Indian Institute of Technology (IIT) Bombay, Mumbai
Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic webpage are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into subqueries and executing subqueries on judiciously chosen data aggregators with their individual subquery incoherency bounds. We provide a technique for getting the optimal set of subqueries with their incoherency bounds which satisfies client query's coherency requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost-based query planning leads to queries being executed using less than one third the number of messages required by existing schemes.

[1] A. Davis, J. Parikh, and W. Weihl, "Edge Computing: Extending Enterprise Applications to the Edge of the Internet," Proc. 13th Int'l World Wide Web Conf. Alternate Track Papers & Posters (WWW), 2004.
[2] D. VanderMeer, A. Datta, K. Dutta, H. Thomas, and K. Ramamritham, "Proxy-Based Acceleration of Dynamically Generated Content on the World Wide Web," ACM Trans. Database Systems, vol. 29, pp. 403-443, June 2004.
[3] J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, and B. Weihl, "Globally Distributed Content Delivery," IEEE Internet Computing, vol. 6, no. 5, pp. 50-58, Sept. 2002.
[4] S. Rangarajan, S. Mukerjee, and P. Rodriguez, "User Specific Request Redirection in a Content Delivery Network," Proc. Eighth Int'l Workshop Web Content Caching and Distribution (IWCW), 2003.
[5] S. Shah, K. Ramamritham, and P. Shenoy, "Maintaining Coherency of Dynamic Data in Cooperating Repositories," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB), 2002.
[6] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press and McGraw-Hill 2001.
[7] Y. Zhou, B. Chin Ooi, and K.-L. Tan, "Disseminating Streaming Data in a Dynamic Environment: An Adaptive and Cost Based Approach," The Int'l J. Very Large Data Bases, vol. 17, pp. 1465-1483, 2008.
[8] "Query Cost Model Validation for Sensor Data," www.cse.iitb.ac. in/~grajeev/sumdiffRaviVijay_BTP06.pdf , 2011.
[9] R. Gupta, A. Puri, and K. Ramamritham, "Executing Incoherency Bounded Continuous Queries at Web Data Aggregators," Proc. 14th Int'l Conf. World Wide Web (WWW), 2005.
[10] A. Populis, Probability, Random Variable and Stochastic Process. Mc. Graw-Hill, 1991.
[11] C. Olston, J. Jiang, and J. Widom, "Adaptive Filter for Continuous Queries over Distributed Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[12] S. Shah, K. Ramamritham, and C. Ravishankar, "Client Assignment in Content Dissemination Networks for Dynamic Data," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[13] NEFSC Scientific Computer System, http://sole.wh.whoi.edu/~jmanning//cruise serve1.cgi, 2011.
[14] S. Madden, M.J. Franklin, J. Hellerstein, and W. Hong, "TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks," Proc. Fifth Symp. Operating Systems Design and Implementation, 2002.
[15] D.S. Johnson and M.R. Garey, Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.
[16] S. Zhu and C. Ravishankar, "Stochastic Consistency and Scalable Pull-Based Caching for Erratic Data Sources," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB) 2004.
[17] D. Chu, A. Deshpande, J. Hellerstein, and W. Hong, "Approximate Data Collection in Sensor Networks Using Probabilistic Models," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[18] A. Deshpande, C. Guestrin, S.R. Madden, J.M. Hellerstein, and W. Hong, "Model-Driven Data Acquisition in Sensor Networks," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), 2004.
[19] Pearson Product Moment Correlation Coefficient, http://www.nyx.net/~tmacfarl/STAT_TUTcorrelat.ssi /, 2011.
[20] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos, "Processing Approximate Aggregate Queries in Wireless Sensor Networks," Information Systems, vol. 31, no. 8, pp. 770-792, 2006.
[21] G. Cormode and M. Garofalakis, "Sketching Streams through the Net: Distributed Approximate Query Tracking," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[22] S. Agrawal, K. Ramamritham, and S. Shah, "Construction of a Temporal Coherency Preserving Dynamic Data Dissemination Network," Proc. IEEE 25th Int'l Real-Time Systems Symp. (RTSS), 2004.
[23] B. Babcock and C. Olston, "Distributed Top-K Monitoring," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[24] A. Silberstein, K. Munagala, and J. Yang, "Energy Efficient Monitoring of Extreme Values in Sensor Networks," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[25] N. Jain, D. Kit, P. Mahajan, P. Yalagandula, M. Dahlin, and Y. Zhang, "STAR: Self-Tuning Aggregation for Scalable Monitoring," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2007.
[26] R. Gupta and K. Ramamritham, "Optimized Query Planning of Continuous Aggregation Queries in Dynamic Data Dissemination Networks," Proc. 16th Int'l Conf. World Wide Web (WWW) 2007.
[27] S. Kashyap, J. Ramamritham, R. Rastogi, and P. Shukla, "Efficient Constraint Monitoring Using Adaptive Thresholds," Proc. IEEE 24th Int'l Conf. Data Eng., 2008.
[28] D.S. Hochbaum, "Approximation Algorithms for the Set Covering and Vertex Cover Problems," SIAM J. Computing, vol. 11, no. 3, pp. 555-556, 1982.
[29] P. Edara, A. Limaye, and K. Ramamritham, "Asynchronous In-Network Prediction: Efficient Aggregation in Sensor Networks," ACM Trans. Sensor Networks, vol. 4, no. 4, pp. 1-34, Aug. 2008.

Index Terms:
Algorithms, continuous queries, distributed query processing, data dissemination, coherency, performance.
Citation:
Rajeev Gupta, Krithi Ramamritham, "Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 6, pp. 1065-1079, June 2012, doi:10.1109/TKDE.2011.12
Usage of this product signifies your acceptance of the Terms of Use.