This Article 
 Bibliographic References 
 Add to: 
Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information
October 2006 (vol. 18 no. 10)
pp. 1352-1367
Many database applications that need to disseminate dynamic information from a server to various clients can suffer from heavy communication costs. Data caching at a client can help mitigate these costs, particularly when individual {\rm PUSH}{\hbox{-}}{\rm PULL} decisions are made for the different semantic regions in the data space. The server is responsible for notifying the client about updates in the {\rm PUSH} regions. The client needs to contact the server for queries that ask for data in the {\rm PULL} regions. We call the idea of partitioning the data space into {\rm PUSH}{\hbox{-}}{\rm PULL} regions to minimize communication cost data gerrymandering. In this paper, we present solutions to technical challenges in adopting this simple but powerful idea. We give a provably optimal-cost dynamic programming algorithm for gerrymandering on a single query attribute. We propose a family of efficient heuristics for gerrymandering on multiple query attributes. We handle the dynamic case in which the workloads of queries and updates evolve over time. We validate our methods through extensive experiments on real and synthetic data sets.

[1] Travel Advisory News Network, http:/, 2005.
[2] The SIGALERT System, http:/, 2005.
[3] S. Dar, M.J. Franklin, B. Jonsson, D. Srivastava, and M. Tan, “Semantic Data Caching and Replacement,” Proc. 22th Int'l Conf. Very Large Databases, 1996.
[4] A.M. Keller and J. Basu, “A Predicate-Based Caching Scheme for Client-Server Database Architectures,” The VLDB J., vol. 5, no. 1, pp. 35-47, 1996.
[5] J. Wang, “A Survey of Web Caching Schemes for the Internet,” ACM Computer Comm. Rev., vol. 29, no. 5, pp. 36-46, Oct. 1999.
[6] C.C. Aggarwal, J.L. Wolf, and P.S. Yu, “Caching on the World Wide Web,” IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, pp. 95-107, Jan./Feb. 1999.
[7] M.J. Frankl, Client Data Caching: A Foundation for High Performance Object Oriented Database Systems. Kluwer, 1996.
[8] K.S. Candan, W.-S. Li, Q. Luo, W.-P. Hsiung, and D. Agrawal, “Enabling Dynamic Content Caching for Database-Driven Web Sites,” Proc. ACM SIGMOD Conf., 2001.
[9] K.S. Candan, D. Agrawal, W.-S. Li, O. Po, and W.-P. Hsiung, “View Invalidation for Dynamic Content Caching in Multitiered Architectures,” Proc. 28th Very Large Data Bases Conf., pp. 562-573, 2002.
[10] O. Wolfson and S. Jajodia, “Distributed Algorithms for Dynamic Replication of Data,” Proc. Symp. Principles of Database Systems, pp. 149-163, 1992.
[11] A. Tomasic, L. Raschid, and P. Valduriez, “Scaling Access to Heterogeneous Data Sources with DISCO,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 5, pp. 808-823, Sept./Oct. 1998.
[12] G. Pierre, M. van Steen, and A. Tanenbaum, “Dynamically Selecting Optimal Distribution Strategies for Web Documents,” IEEE Trans. Computers, vol. 51, no. 6, pp. 637-651, June 2002.
[13] P. Deolasee, A. Katkar, A. Panchbudhe, K. Ramamritham, and P.J. Shenoy, “Adaptive Push-Pull: Disseminating Dynamic Web Data,” Proc. Conf. World Wide Web (WWW '01), pp. 265-274, 2001.
[14] M. Ji, “Affinity-Based Management of Main Memory Database Clusters,” ACM Trans. Internet Technology, vol. 2, no. 4, pp. 307-339, Nov. 2002.
[15] K. Amiri, S. Park, R. Tewari, and S. Padmanabhan, “DBproxy: A Dynamic Data Cache for Web Applications,” Proc. Int'l Conf. Data Eng., pp. 821-831, 2003.
[16] L.M. Haas, D. Kossman, and I. Ursu, “Loading a Cache with Query Results,” Proc. 25th Int'l Conf. Very Large Data Bases (VLDB '99), pp. 351-362, 1999.
[17] S. Acharya, M. Franklin, and S. Zdonik, “Balancing Push and Pull for Data Broadcast,” ACM SIGMOD Record, vol. 26, no. 2, pp. 183-194, June 1997.
[18] S.S. Kim, Y. Chung, S.Y. Jung, and C.-S. Hwang, “Optimistic Transaction Processing Algorithms in Pure-Push and Adaptive Broadcast Environments,” Proc. Eighth Int'l Conf. Parallel and Distributed Systems, pp. 289-296, 2001.
[19] D. Barbará, “Mobile Computing and Databases: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, pp. 108-117, Jan./Feb. 1999.
[20] N. Bruno, S. Chaudhuri, and L. Gravano, “STHoles: A Multidimensional Workload-Aware Histogram,” Proc. ACM SIGMOD Conf., pp. 211-222, 2001.
[21] C. Olston, B.T. Loo, and J. Widom, “Adaptive Precision Setting for Cached Approximate Values,” Proc. ACM SIGMOD Conf., 2001.
[22] N. Trigoni, Y. Yao, A.J. Demers, J. Gehrke, and R. Rajaraman, “Hybrid Push-Pull Query Processing for Sensor Networks,” GI Jahrestagung, pp. 370-374, 2004.
[23] S. Shah, K. Ramamritham, and P.J. Shenoy, “Maintaining Coherency of Dynamic Data in Cooperating Repositories,” Proc. 28th Conf. Very Large Data Bases, pp. 526-537, 2002.
[24] V. Poosala, Y.E. Ioannidis, P.J. Haas, and E.J. Shekita, “Improved Histograms for Selectivity Estimation of Range Predicates,” Proc. ACM SIGMOD Conf., pp. 294-305, 1996.
[25] M. Muralikrishna and D.J. DeWitt, “Equi-Depth Multidimensional Histograms,” Proc. 1988 ACM SIGMOD Int'l Conf. Management of Data, pp. 28-36, 1998.
[26] “Sloan Digital Sky Survey,” http:/, 2005.
[27] “Sdss Data Release 1,” http://www.sdss.orgdr1, 2005.
[28] N. Koudas, S. Muthukrishnan, and D. Srivastava, “Optimal Histograms for Hierarchical Range Queries,” Proc. Symp. Principles of Database Systems, pp. 196-204, 2000.

Index Terms:
Data communications, dissemination, data gerrymandering.
Amitabha Bagchi, Amitabh Chaudhary, Michael T. Goodrich, Chen Li, Michal Shmueli-Scheuer, "Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1352-1367, Oct. 2006, doi:10.1109/TKDE.2006.153
Usage of this product signifies your acceptance of the Terms of Use.