This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On the Complexity of Distributed Query Optimization
August 1996 (vol. 8 no. 4)
pp. 650-662

Abstract—While a significant amount of research efforts has been reported on developing algorithms, based on joins and semijoins, to tackle distributed query processing, there is relatively little progress made toward exploring the complexity of the problems studied. As a result, proving NP-hardness of or devising polynomial-time algorithms for certain distributed query optimization problems has been elaborated upon by many researchers. However, due to its inherent difficulty, the complexity of the majority of problems on distributed query optimization remains unknown. In this paper we generally characterize the distributed query optimization problems and provide a frame work to explore their complexity. As it will be shown, most distributed query optimization problems can be transformed into an optimization problem comprising a set of binary decisions, termed Sum Product Optimization (SPO) problem. We first prove SPO is NP-hard in light of the NP-completeness of a well-known problem, Knapsack (KNAP). Then, using this result as a basis, we prove that five classes of distributed query optimization problems, which cover the majority of distributed query optimization problems previously studied in the literature, are NP-hard by polynomially reducing SPO to each of them. The detail for each problem transformation is derived. We not only prove the conjecture that many prior studies relied upon, but also provide a frame work for future related studies.

[1] P.M.G. Apers, A.R. Hevner, and S.B. Yao, "Optimization Algorithms for Distributed Queries," IEEE Trans. Software Eng., vol. 9, no. 1, pp. 57-68, Jan. 1983.
[2] P.A. Bernstein, N. Goodman, E. Wong, C. Reeve, and J.B. Rothnie, “Query Processing in a System for Distributed Databases,” ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-625, Dec. 1981.
[3] P. Bernstein and D. Chiu,“Using semijoins to solve relational queries,” J. ACM, vol. 28, pp. 25-40, 1981.
[4] P.A. Black and W.S. Luk, "A New Heuristic for Generating Semi-Join Programs for Distributed Query Processing," Proc. IEEE COMPSAC, pp. 581-588, 1982.
[5] S. Ceri and G. Pelagatti, Distributed Databases: Principles and Systems.New York: McGraw-Hill, 1984.
[6] A.L.P. Chen and V.O.K. Li, "Improvement Algorithms for Semi-Join Query Processing Programs in Distributed Database Systems," IEEE Trans. Computers, vol. 33, no. 11, pp. 959-967, Nov. 1984.
[7] A.L.P. Chen and V.O.K. Li, "An Optimal Algorithm for Distributed Star Queries," IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1,097-1,107, Oct. 1985.
[8] J.S.J. Chen and V.O.K. Li, “Optimizing Joins in Fragmented Database Systems on a Broadcast Local Network,” IEEE Trans. Software Eng., vol. 15, no. 1, Jan. 1989.
[9] M.-S. Chen and P.S. Yu, “Interleaving a Join Sequence with Semijoins in Distributed Query Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 611-621, Sept. 1992.
[10] M.-S. Chen and P.S. Yu, “Combining Join and Semijoin Operations for Distributed Query Processing,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 3, pp. 534-542, June 1993.
[11] M.-S. Chen and P.S. Yu, “A Graph Theoretical Approach to Determine a Join Reducer Sequence in Distributed Query Processing,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 1, pp. 152-165, Feb. 1994.
[12] D.M. Chiu, P.A. Bernstein, and Y.C. Ho, "Optimizing Chain Queries in A Distributed Database System," SIAM J. Computing, vol. 13, no. 1, pp. 116-134, Feb. 1984.
[13] D.M. Chiu and Y.C. Ho, "A Methodology for Interpreting Tree Queries into Optimal Semi-Join Expressions," Proc. ACM SIGMOD, pp. 169-178, May 1980.
[14] W.W. Chu and P. Hurley, "Optimal Query Processing for Distributed Database Systems," IEEE Trans. Computers, vol. 31, no. 9, pp. 135-150, Sept. 1982.
[15] D. Gardy and C. Puech, “On the Effect of Join Operations on Relation Sizes,” ACM Trans. Database Systems, vol. 14, no. 4, pp. 574-603, Dec. 1989.
[16] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[17] B. Gavish and A. Segev, "Set Query Optimization in Distributed Database Systems," ACM Trans. Database Systems, vol. 11, no. 3, pp. 266-293, Sept. 1986.
[18] N. Goodman and O. Shmueli, "The Tree Property Is Fundamental for Query Processing," Proc. ACM Symp. Principles of Database Systems, pp. 40-48, 1982.
[19] N. Goodman and O. Shmueli, "Tree Queries: A Simple Class of Relational Queries," ACM Trans. Database Systems, vol. 7, no. 4, pp. 653-677, Dec. 1982.
[20] A.R. Hevner, "The Optimization of Query Processing on Distributed Database Systems," PhD thesis, Purdue Univ., 1979.
[21] A.R. Hevner, O.Q. Wu, and S.B. Yao, "Query Optimization on Local Area Networks," ACM Trans. Office Information, vol. 3, pp. 35-62, Jan. 1985.
[22] A.R. Hevner and S.B Yao, "Query Processing in Distributed Databases," IEEE Trans. Software Eng., vol. 5, no. 3, pp. 177-187, May 1979.
[23] T.H. Horowitz and S. Sahni, Fundamentals of Computer Algorithms. Computer Science Press, 1978.
[24] K.T. Huang, "Query Optimization in Distributed Databases," PhD thesis, Laboratory for Information and Decision Systems, Massachusetts Inst. of Tech nology, 1982.
[25] Y. Kambayashi, M. Yoshikawa, and S. Yajima, "Query Processing for Distributed Databases Using Generalized Semi-Joins," Proc. ACM SIGMOD, pp. 151-160, 1982.
[26] H. Kang and N. Roussopoulos, "Combining Joins and Semijoins in Distributed Query Processing," Technical Report CS-TR-1794, Computer Science Dept., Univ. of Maryland, College Park, 1987.
[27] S. Lafortune and E. Wong, "A State Transition Model for Distributed Query Processing," ACM Trans. Database Systems, vol. 11, no. 3, pp. 294-322, Sept. 1986.
[28] W. Perrizo, J.Y. Li, and W. Hoffman, "Algorithms for Distributed Query Processing in Broadcasting Local Area Networks," IEEE Trans. Knowledge and Data Eng., vol. 1, no. 2, pp. 215-225, June 1989.
[29] S. Pramanik and D. Vineyard,“Optimizing join queries in distributed databases,” IEEE Trans. Software Engineering, vol. 14, pp. 1,319-1,326, 1988.
[30] G. Sacco, "Distributed Query Evaluation in Local Area Networks," Proc. IEEE Data Eng. Conf., pp. 510-516, Apr. 1984.
[31] Arie Segev, "Global Heuristics for Distributed Query Optimization," Proc. IEEE INFOCOM, pp. 388-394, Apr. 1986.
[32] D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases where Relations Are Hash Partitioned,” ACM Trans. Database Systems, vol. 16, no. 2, 1991.
[33] W. Sun and C.T. Yu, “Semantic Query Optimization for Tree and Chain Queries,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 1, pp. 136–151, 1994.
[34] P. Valduriez and G. Gardarin,“Join and semijoin algorithms for a multiprocessor database machine,” ACM Trans. on Database Systems, vol. 9, no. 1, pp. 133-161, Mar. 1984.
[35] C.P. Wang and V.O.K. Li, "The Relation-Partitioning Approach to Distributed Query Processing," Proc. Second IEEE Data Eng. Conf., pp. 21-28, Feb. 1986.
[36] C.P. Wang, V.O.K. Li, and A.L.P. Chen, "One-Shot Semi-Join Execution Strategies for Processing Distributed Queries," Proc. Seventh IEEE Data Eng. Conf., pp. 756-763, Apr. 1991.
[37] E. Wong, "Retrieving Dispersed Data from SDD-1: A System for Distributed Databases," Proc. Second Berkeley Workshop Distributed Data Management and Computer Networks, pp. 217-235, May 1977.
[38] H. Yoo and S. Lafortune, "An Intelligent Search Method for Query Optimization by Semijoins," IEEE Trans. Knowledge and Data Eng., vol. 1, no. 2, pp. 226-237, June 1989.
[39] C.T. Yu and C.C. Chang,“Distributed query processing,” ACM Computing Surveys, vol. 16, pp. 399-433, 1984.
[40] C.T. Yu,Z.M. Ozsoyoglu,, and K. Lam, “Optimization of distributed tree queries,” J. Computer and Systems Science, vol. 29, pp. 409-445, 1984.

Index Terms:
Distributed query optimization, semijoin processing, complexity, NP-hard problems, distributed databases.
Citation:
Chihping Wang, Ming-Syan Chen, "On the Complexity of Distributed Query Optimization," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 4, pp. 650-662, Aug. 1996, doi:10.1109/69.536256
Usage of this product signifies your acceptance of the Terms of Use.