This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program
December 1991 (vol. 3 no. 4)
pp. 486-495

The semijoin has been used as an effective operator in reducing data transmission and processing over a network that allows forward size reduction of relations and intermediate results generated during the processing of a distributed query. The authors propose a relational operator, two-way semijoin, which enhanced the semijoin with backward size reduction capability for more cost-effective query processing. A pipeline N-way join algorithm for joining the reduced relations residing on N sites is introduced. The main advantage of this algorithm is that it eliminates the need for transferring and storing intermediate results among the sites. A set of experiments showing that the proposed algorithm outperforms all known conventional join algorithms that generate intermediate results is included.

[1] A. Amir and N. Rossopoulos, "Optimal view caching,"Inform. Syst., vol. 15, no. 2, pp. 169-171.
[2] P. Apers, A. P. Hevner, and S. B. Yao, "Optimization algorithm for distributed queries,"IEEE Trans. Software Eng., vol. SE-9, Jan. 1983.
[3] P. Bernstein and D. Chiu, "Using semijoins to solve relational queries,"J. ACM, vol. 28, no. 1, pp. 25-40, Jan. 1981.
[4] P. A. Bernstein, N. Goodman, E. Wong, G. L. Reeve, and J. Rothmie, "Query processing in a system for distributed database (SDD-I),"ACM Trans. Database Syst., vol. 6, Dec. 1981.
[5] D. Bitton, D. J. DeWitt, and C. Turbyfill, "Benchmarking database systems--A systematic approach," inProc. 1983 Very Large Data-base conf., Oct. 1983.
[6] P. Black and W. Luk, "A new heuristic for generating semi-join programs for distributed query processing," inProc. IEEE COMPSAC, 1982.
[7] S. Ceri and G. Pelagatti,Distributed Databases: Principles and Systems, McGraw-Hill, New York, 1984.
[8] J. M. Chang, "A heuristic approach to distributed query processing," inProc. 8th Int. Conf. Very Large Data Bases, 1982, pp. 54- 61.
[9] A. L. P. Chen and V. O. K. Li, "Deriving optimal semi-join programs for distributed query processing," inProc. IEEE INFOCOM, 1984.
[10] A. L. P. Chen and V. O. K. Li, "Optimizing star queries in distributed database systems," inProc. 10th Int. Conf. Very Large Data Bases, 1984, pp. 429-438.
[11] A. L. P. Chen and V. O. K. Li, "Improvement algorithms for semijoin query processing programs in distributed database systems,"IEEE Trans. Comput., vol. C-33, Nov. 1984.
[12] D.-M. Chiu, P. A. Bernstein, and Y.-C. Ho, "Optimizing chain queries in a distributed database system,"SIAM J. Comput., vol. 13, pp. 116-134, Feb. 1984.
[13] D. Chiu and Y. Ho, "A methodology for interpreting tree queries into optimal semi-join expressions," inProc. ACM SIGMOD, May 1980, pp. 169-178.
[14] W. Chu and P. Hurley, "Optimal query processing for distributed database systems,"IEEE Trans. Comput., vol. C-31, no. 9, Sept. 1982.
[15] A. Hevner, "The optimization of query processing in distributed database systems," Ph.D. dissertation, Purdue Univ., West Lafayette, IN, Dec. 1979.
[16] A. R. Hevner and S. B. Yao, "Query processing in distributed database system,IEEE Trans. Software Eng., vol. SE-5, no. 3, May 1979.
[17] H. Kang, "On query processing in distributed database systems," Ph.D. dissertation, Dep. Comput. Sci., Univ. of Maryland, College Park, Dec. 1987.
[18] W. Litwin, L. Mark, and N. Roussopoulos, "Interoperability of multiple autonomous database," Tech. Rep. Dep. Comput. Sci. and Institute for Advanced Studies, UMIACS-TR-89-12, Univ. of Maryland, College Park, Feb. 1989.
[19] H. Lu and M. Carey, "Some experimental results on distributed join algorithms in a local area network, " inProc. 11th Int. Conf. Very Large Data Bases, Stockholm, Aug. 1985.
[20] W. S. Luk and P. A. Black, "On cost estimation in processing a query in a distributed database system, " inProc. IEEE COMPSAC, 1981.
[21] W. S. Luk and L. Luk, "Optimizing semi-join programs for distributed query processing," inProc. 2nd Int. Conf. Data Bases, 1983.
[22] L. F. Mackert and G. M. Lohman, "R*optimizer validation and performance evaluation for distributed queries," inProc. 12th Int. Conf. Very Large Data Bases, Kyoto, Japan, 1986, pp. 149-159.
[23] N. Roussopoulos and H. Kang, "Preliminary design of ADMS±: A workstation-mainframe integrated architecture for database management systems," inProc. 12th Int. Conf. Very Large Data Bases, Kyoto, Japan, Aug. 1986.
[24] N. Roussopoulos and H. Kang, "Principles and techniques in the design of ADMS±,"computer, vol. 19, no. 12, Dec. 1986.
[25] N. Roussopoulos, "Overview of ADMS: A high performance database management system," inProc. Fall Join Comput. Conf., Dallas, TX, Oct. 25-29, 1987.
[26] "The incremental access method of view Cache: Concept and cost analysis,ACM Trans. Database Syst.to be published, also Tech. Rep., Dep. Comput. Sci. and Institute for Advanced Studies, UMIACS-TR- 89-15, CS-TR-2193, Univ. of Maryland, Mar. 1989.
[27] N. Roussopoulos, N. Economou, Stamenas, and T. Sellis, "The implementation of ADMS," Tech. Rep., Dep. Comput. Sci. and Institute for Advanced Studies, Univ. of Maryland, June 1990.
[28] E. Wong, "Retrieving dispersed data from SDD-1: A system for distributed databases," inProc. 2nd Berkeley Workshop Distributed Data Management and Comput. Networks, 1977.
[29] C. T. Yu, K. Lam, and M. Z. Ozsoyoglu, "Distributed query optimization for tree queries," Dep. Inform. Eng., Univ. of Illinois at Chicago Circle, July 1980.
[30] C. T. Yu, K. Lam, C. Chang, and S. Chang, "Promising approach to distributed query processing," inProc. 7th Berkeley Workshop Distributed Data Management and Comput. Networks, 1982.
[31] C. T. Yu, L. Lilien, K. Guh, M. Templeton, D. Brill, and A. Chen, "Adaptive techniques for distributed query processing," inProc. IEEE Int. Conf. Data Eng., 1986, pp. 86-93.

Index Terms:
2-way semijoin program; data transmission; network; forward size reduction; intermediate results; distributed query; relational operator; backward size reduction; query processing; pipeline N-way join algorithm; sites; database theory; distributed databases; parallel algorithms; pipeline processing; programming theory; relational databases
Citation:
N. Roussopoulos, H. Kang, "A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program," IEEE Transactions on Knowledge and Data Engineering, vol. 3, no. 4, pp. 486-495, Dec. 1991, doi:10.1109/69.109109
Usage of this product signifies your acceptance of the Terms of Use.