This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On Producing High and Early Result Throughput in Multijoin Query Plans
December 2011 (vol. 23 no. 12)
pp. 1888-1902
Justin J. Levandoski, University of Minnesota, Minneapolis
Mohamed E. Khalefa, University of Minnesota, Minneapolis
Mohamed F. Mokbel, University of Minnesota, Minneapolis
This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-the-art join operators optimized for both single and multijoin query plans.

[1] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, 1993.
[2] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 63-113, 1992.
[3] L.D. Shapiro, "Join Processing in Database Systems with Large Main Memories," ACM Trans. Database Systems, vol. 11, no. 3, pp. 239-264, 1986.
[4] Z.G. Ives, D. Florescu, M. Friedman, A.Y. Levy, and D.S. Weld, "An Adaptive Query Execution System for Data Integration," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1999.
[5] U. Srivastava, K. Munagala, J. Widom, and R. Motwani, "Query Optimization over Web Services," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[6] P.J. Haas and J.M. Hellerstein, "Ripple Joins for Online Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1999.
[7] G. Abdulla, T. Critchlow, and W. Arrighi, "Simulation Data as Data Streams," ACM SIGMOD Record, vol. 33, no. 1, pp. 89-94, 2004.
[8] J. Becla and D.L. Wang, "Lessons Learned from Managing a Petabyte," Proc. Int'l Conf. Innovative Data Systems Research (CIDR), 2005.
[9] R.S. Barga, J. Goldstein, M. Ali, and M. Hong, "Consistent Streaming through Time: A Vision for Event Stream Processing," Proc. Int'l Conf. Innovative Data Systems Research (CIDR), 2007.
[10] S. Viglas, J.F. Naughton, and J. Burger, "Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[11] D.T. Liu, M.J. Franklin, G. Abdulla, J. Garlick, and M. Miller, "Data-Preservation in Scientific Workflow Middleware," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), 2006.
[12] D.J. DeWitt and J. Gray, "Parallel Database Systems: The Future of High Performance Database Systems," Comm. ACM, vol. 35, no. 6, pp. 85-98, 1992.
[13] G. Luo, J.F. Naughton, and C. Ellmann, "A Non-Blocking Parallel Spatial Join Algorithm," Proc. Int'l Conf. Data Eng. (ICDE), 2002.
[14] M.A. Hammad, W.G. Aref, and A.K. Elmagarmid, "Stream Window Join: Tracking Moving Objects in Sensor-Network Databases," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), 2003.
[15] S. Schmidt, M. Fiedler, and W. Lehner, "Source-Aware Join Strategies of Sensor Data Streams," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), 2005.
[16] G.S. Iwerks, H. Samet, and K.P. Smith, "Maintenance of K-NN and Spatial Join Queries on Continuously Moving Points," ACM Trans. Database Systems, vol. 31, no. 2, pp. 485-536, 2006.
[17] J.-P. Dittrich, B. Seeger, D.S. Taylor, and P. Widmayer, "Progressive Merge Join: A Generic and Non-Blocking Sort-Based Join Algorithm," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2002.
[18] B. Liu, Y. Zhu, and E.A. Rundensteiner, "Run-Time Operator State Spilling for Memory Intensive Long-Running Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[19] M.F. Mokbel, M. Lu, and W.G. Aref, "Hash-Merge Join: A Non-Blocking Join Algorithm for Producing Fast and Early Join Results," Proc. Int'l Conf. Data Eng. (ICDE), 2004.
[20] Y. Tao, M.L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis, "RPJ: Producing Fast Join Results on Streams through Rate-Based Optimization," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[21] T. Urhan and M.J. Franklin, "XJoin: A Reactively-Scheduled Pipelined Join Operator," IEEE Data Eng. Bull., vol. 23, no. 2, pp. 27-33, June 2000.
[22] A.N. Wilschut and P.M.G. Apers, "Dataflow Query Execution in a Parallel Main-Memory Environment," Proc. Int'l Conf. Parallel and Distributed Information Systems (PDIS), 1991.
[23] G. Luo, C. Ellmann, P.J. Haas, and J.F. Naughton, "A Scalable Hash Ripple Join Algorithm," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2002.
[24] W.H. Tok, S. Bressan, and M.-L. Lee, "RRPJ : Result-Rate Based Progressive Relational Join," Proc. Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2007.
[25] J.-P. Dittrich, B. Seeger, D.S. Taylor, and P. Widmayer, "On Producing Join Results Early," Proc. ACM Symp. Principles of Database Systems (PODS), 2003.
[26] J. Kang, J.F. Naughton, and S. Viglas, "Evaluating Window Joins over Unbounded Streams," Proc. Int'l Conf. Data Eng. (ICDE), 2003.
[27] U. Srivastava and J. Widom, "Memory-Limited Execution of Windowed Stream Joins," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[28] J. Xie, J. Yang, and Y. Chen, "On Joining and Caching Stochastic Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[29] N. Tatbul, U. Çetintemel, S.B. Zdonik, M. Cherniack, and M. Stonebraker, "Load Shedding in a Data Stream Manager," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[30] Y.-C. Tu, S. Liu, S. Prabhakar, and B. Yao, "Load Shedding in Stream Databases: A Control-Based Approach," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[31] L. Golab and M.T. Özsu, "Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[32] M.A. Hammad, M.J. Franklin, W.G. Aref, and A.K. Elmagarmid, "Scheduling for Shared Window Joins over Data Streams," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[33] J.J. Levandoski, M.E. Khalefa, and M.F. Mokbel, "PermJoin: An Efficient Algorithm for Producing Early Results in Multi-Join Query Plans," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2008.
[34] M.E. Crovella, M.S. Taqqu, and A. Bestavros, "A Practical Guide to Heavy Tails: Statistical Techniques and Applications," Heavy-Tailed Probability Distributions in the World Wideweb, Springer, 1998.

Index Terms:
Database management, systems, query processing.
Citation:
Justin J. Levandoski, Mohamed E. Khalefa, Mohamed F. Mokbel, "On Producing High and Early Result Throughput in Multijoin Query Plans," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 12, pp. 1888-1902, Dec. 2011, doi:10.1109/TKDE.2010.182
Usage of this product signifies your acceptance of the Terms of Use.