This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Graph Theoretical Approach to Determine a Join Reducer Sequence in Distributed Query Processing
February 1994 (vol. 6 no. 1)
pp. 152-165

Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms.

[1] P. M. G. Apers, A. R. Hevner, and S. B. Yao, "Optimization algorithms for distributed queries,"IEEE Trans. Software Eng., vol. SE-9, no. 1, pp. 57-68, Jan. 1983.
[2] P. Bernstein and D. Chiu, "Using semijoins to solve relational queries,"J. ACM, vol. 28, no. 1, pp. 25-40, Jan. 1981.
[3] P. A. Bernstein, N. Goodman, E. Wong, G. L. Reeve, and J. Rothmie, "Query processing in a system for distributed database (SDD-I),"ACM Trans. Database Syst., vol. 6, Dec. 1981.
[4] P. A. Black and W. S. Luk, "A new heuristic for generating semi-join programs for distributed query processing," inProc. IEEE COMPSAC, pp. 581-588, 1982.
[5] S. Ceri and G. Pelagatti,Distributed Databases: Principles and Systems, McGraw-Hill, New York, 1984.
[6] A. L. P. Chen and V. O. K. Li, "Optimizing star queries in distributed database systems," inProc. 10th Int. Conf. Very Large Data Bases, 1984, pp. 429-438.
[7] A. L. P. Chen and V. O. K. Li, "Improvement algorithms for semijoin query processing programs in distributed database systems,"IEEE Truns. Comput., vol. C-33, no. 11, pp. 959-967, Nov. 1984.
[8] M.-S. Chen and P. S. Yu, "Interleaving a join sequence with semijoins in distributed query processing,"IEEE Trans. Parallel Distributed Syst., vol. 3, no. 5, pp. 611-621, Sept. 1992.
[9] M.-S. Chen and P. S. Yu, "Combining join and semi-join operations for distributed query processing,"IEEE Trans. Knowledge Data Eng., vol. 5, no. 3, pp. 534-542, June 1993.
[10] M.-S. Chen, P. S. Yu, and K. L. Wu, "Scheduling and processor allocation for parallel execution of multi-join queries," inProc. 8th Int. Conf. Data Engineeringpp. 58-67, Feb. 1992.
[11] D.-M. Chiu, P. A. Bernstein, and Y.-C. Ho, "Optimizing chain queries in a distributed database system,"SIAM J. Comput., vol. 13, pp. 116-134, Feb. 1984.
[12] S. Even,Graph Algorithms. Rockville, MD: Computer Science Press, 1979.
[13] D. Gardy and C. Puech, "On the effect of join operations on relation dizes,"ACM Trans. Database Syst., vol. 14, no. 4, pp. 574-603, Dec. 1989.
[14] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[15] N. Goodman and 0. Shmueli. "The tree property is fundamental for query processing," inProc. ACM Symp. Principles of Database Systems. pp. 40-48, 1982.
[16] G. Graefe,Rule-Based Query Optimization in Extensible Database Svstems. Madison, WI: Computer Science Dept., Univ. Wisconsin-Madison, 724, Nov. 1987.
[17] F. Harary,Graph Theory. Reading, MA: Addison-Wesley, 1969.
[18] A. Hevner, "The optimization of query processing in distributed database systems," Ph.D. dissertation, Purdue Univ., West Lafayette, IN, Dec. 1979.
[19] A. R. Hevner and S. B. Yao, "Query processing in distributed database systems,"IEEE Trans. Software Eng., vol. SE-5, no. 5, pp. 177-187, May 1979.
[20] Y. E. Ioannidis and S. Christodoulakis, "On the propagation of errors in the size of join results," inProc. ACM SIGMOD, pp. 268-277, May 1991.
[21] Y. Kambayashi, M. Yashikawa, and S. Yajima, "Query processing for distributed databases using generalized semi-joins," inACM Proc. SIGMOD. pp. 151-160, 1982.
[22] S. Lafortune and E. Wong, "A state transition model for distributed query processing,"ACM Trans. Database Syst., vol. 11, pp. 294- 322, Sept. 1986.
[23] B. Charron-Bost, "Combinatorics and Geometry of Consistent Cuts: Application to Concurrency Theory," inDistributed Algorithms, J.-C. Bermond and M. Raynal. eds.,Lecture Notes in Computer Science, Vol. 392, Springer-Verlag, Berlin, 1989.
[24] G. M. Lohman,et al., "Query processing in R*," IBM Almaden Research Laboratory, San Jose, CA., RJ 4272, Apr. 1984.
[25] C. Mohan,Tutorial: Recent Advances in Distributed Data Base Management. New York: IEEE Computer Society, 1984.
[26] S. Pramanik and D. Vineyard, "Optimizing join queries in distributed databases,"IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1319-1326, Sept. 1988.
[27] A. Segev, "Global heuristic for distributed query optimization," inProc. IEEE INFOCOM, pp. 388-394, 1986.
[28] P. G. Selinger and M. E. Adiba, "Access path selection in distributed database management systems," inProc. Int. Conf. Databases. pp. 204-215, 1980.
[29] C. Wang, "The complexity of processing tree queries in distributed databases," inProc. 2nd IEEE Symp. Parallel and Distributed Processing, pp. 604-611, Dec. 1990.
[30] H. Yoo and S. Lafortune, "An intelligent search method for query optimization by semijoins,"IEEE Trans. Knowledge Data Eng., vol. 1, no. 2, pp. 226-237, June 1989.
[31] C. Yu and C. Chang, "Distributed query processing,"ACM Comput. Surveys, vol. 16, no. 4, pp. 399-433, Dec. 1984.
[32] C. Yu, Z. Ozsoyoglu, and K. Lam, "Optimization of distributed tree queries,"J. Comput. Syst. Sci., vol. 29, no. 3, pp. 409-445, Dec. 1984.
[33] P. S. Yu, M.-S. Chen, H. Heiss, and S. H. Lee, "On workload characterization of relational database environments,"IEEE Trans. Software Eng., vol. 18, no. 4, pp. 347-355, Apr. 1992.

Index Terms:
distributed databases; query processing; graph theory; database theory; computational complexity; graph theoretical approach; join reducer sequence; distributed query processing; semijoin; data transmission; join operations; query graph; optimal sequence; exponential complexity; heuristic approaches; heuristic algorithms; polynomial time complexity
Citation:
M.-S. Chen, P.S. Yu, "A Graph Theoretical Approach to Determine a Join Reducer Sequence in Distributed Query Processing," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 1, pp. 152-165, Feb. 1994, doi:10.1109/69.273034
Usage of this product signifies your acceptance of the Terms of Use.