This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Deciding to Correct Distributed Query Processing
June 1992 (vol. 4 no. 3)
pp. 253-265

Most algorithms for determining query processing strategies in distributed databases are static in nature; that is, the strategy is completely determined on the basis of a priori estimates of the size of intermediate results, and it remains unchanged throughout its execution. The static approach may be far from optimal because it denies the opportunity to reschedule operations if size estimates are found to be inaccurate. Adaptive query execution may be used to alleviate this problem. A low overhead delay method is proposed to decide when to correct a strategy. Sampling is used to estimate the size of relations, and alternative heuristic strategies prepared in a background mode are used to decide when to correct. Evaluation using a model of a distributed database indicates that the heuristic strategies are near optimal. Moreover, it also suggests that it is usually correct to abort creation of an intermediate relation which is much larger than predicted.

[1] P. Apers, A. Hevner and S. B. Yao, "Algorithms for distributed queries,"IEEE Trans Software Eng., vol. SE-9, pp. 57-68, Jan. 1983.
[2] D. A. Bell, D. H. O. Ling, and S. McClean, "Pragmatic estimation of join sizes and attribute correlations," inProc Fifth Int. Conf. on Data Engineering, 1989, 76-84.
[3] P. A. Bernstein, N. Goodman, E. Wong, G. L. Reeve, and J. Rothmie, "Query processing in a system for distributed database (SDD-I),"ACM Trans. Database Syst., vol. 6, Dec. 1981.
[4] P. Bodorik and J. S. Riordon, "Distributed query processing optimization objectives," inProc. Fourth Int. Conf. on Data Engineering, Los Angeles, CA, Feb. 2-4, 1988, pp. 320-329.
[5] P. Bodorik and J. S. Riordon, "A threshold mechanism for distributed processing of queries," inProc. ACM CSC'88 Conf., Atlanta, GA, Feb. 23-25, 1988, pp. 616-625.
[6] P. Bodorik and J. S. Riordon, "Evaluating dynamic processing of distributed queries," inProc. 1988 Int. Conf. on Distributed Computing Systems, San Jose, CA, June 13-17, 1988, pp. 510-519.
[7] P. Bodorik and J. S. Riordon, "Heuristic algorithms for distributed query processing," inProc. First Int. Conf. on Databases in Parallel and Distributed Systems, Austin, TX, Dec. 5-8, 1988, pp. 144-155.
[8] P. Bodorik, J. S. Riordon, and C. Jacob, "Dynamic distributed query query processing techniques," inProc. ACM CSC'89 Conf., Louisville, KY, Feb. 1989, pp. 348-357.
[9] P. Bodorik, J. Pyra, and J. S. Riordon, "Correcting execution of distributed queries," inProc. Second Int. Symp. on Databases in Parallel and Distributed Systems, July 1990, Dublin, Ireland, pp. 192-201.
[10] M. Carey, M. Livny, and L. Hongjun, "Dynamic task allocation in a distributed database system," inProc. 1985 Int. Conf. on Distributed Computing Systems, 1985, pp. 282-291.
[11] W. Cellary and D. Meyer, "A multi-query approach to distributed processing in a relational distributed database management system," inDistributed Data Bases: Proceedings of the International Symposium on Distributed Data Bases, C. Delobel and W. Litwin, eds. Amsterdam, The Netherlands: North Holland, Mar. 1980.
[12] S. Ceri and G. Gottlob, "Optimizing joins between two partitioned relations in distributed databases,"J. Parallel and Distributed Comput., vol. 3, pp. 183-205, 1986.
[13] T. Chao and C. J. Egyhazy, "Estimating temporary file sizes in distributed relational database systems," inProc. Second Int. Conf. on Data Engineering, 1986, pp. 4-12.
[14] S. Christodoulakis, "Implications of certain assumptions in database performance evaluation,"ACM Trans. Database Syst.vol. 9, no. 2, pp. 163-186, June 1984.
[15] W. Chu and P. Hurley, "Optimal query processing for distributed database systems,"IEEE Trans. Comput., vol. C-31, pp. 835-850, Sept. 1982.
[16] D. Danielset al., "An introduction to distributed query compilation in system R," inDistributed Data Bases, H. J. Schneider, ed. Amsterdam, The Netherlands: North Holland, 1982, pp. 247-290.
[17] C. Egyhazy and K. Triantis, "A query processing algorithm for distributed relational database system,"Comput. J., vol. 31, no. 1, pp. 34-40, 1988.
[18] M. El-Sharkawi and Y. Kambayashi, "Efficient processing of distributed set queries," inProc. PARBASE-90 Conf., FL, Mar, 1990, pp. 6-13.
[19] R. Epstein and M. Stonebraker, "Analysis of distributed data base processing strategies," inProc. 6th Conf. on VLDB, Montreal, P.Q., Canada, 1980, pp. 92-101.
[20] R. Gagliardi, M. Caneve and G. Oldano, "An operational approach to the integration of distributed heterogeneous environments," inProc. PARBASE-90 Conf., FL, Mar. 1990, pp. 368-377.
[21] B. Gavish and A. Segev, "Set query optimization on distr. data database Systems,"ACM TODS, vol. 11, no. 3, pp. 265-293, 1986.
[22] A. Hevner and S. B. Yao, "Query processing in distributed data base systems,"IEEE Trans. Software Eng., vol. SE-5, pp. 177-187, May 1979.
[23] H-Y Hwang and Y-T Yu, "An analytical method for estimating and interpreting query time," inProc. 13th Conf. on VLDB, 1987, pp. 347-358.
[24] T. Ibaraki and T. Kameda, "On the optimal nesting order for computing N-relational joins,"ACM TODS, vol. 9, no. 3, pp. 482-502, Sept. 1984.
[25] A. Ijbema and H. Blanken, "Estimating bucket accesses: A practical approach," inProc. Conf. Data Eng. (COMPDEC), Los Angeles, CA, Feb. 1986, pp. 30-37.
[26] A. Jhingran, "A performance study of query optimization algorithms on a database system supporting procedures," inProc. 14th Conf. on VLDB, 1988, pp. 88-99.
[27] W. Kim, "Global optimization of relational queries: A first step," inQuery Processing in Distributed Data Base Systems, W. Kim, D. Reiner, and D. Batory, eds. New York: Springer-Verlag, 1985, pp. 207-216.
[28] Y. Kambayashi, "Processing cyclic queries," inQuery Processing in Distributed Data Base Systems, Kim, Reiner, and Batory, eds. New York: Springer-Verlag, 1985, pp. 62-78.
[29] L. Kerschberg, P. D. Ting, and S. B. Yao, "Query optimization in star computer networks,"ACM Trans. Database Syst., vol. 7, Dec. 1982.
[30] A. Kumar and M. Stonebraker, "The effect of join selectivities on optimal nesting order,"Sigmod Rec., vol. 16, no. 1, Mar. 1987.
[31] S. Lafortune and E. Wong, "A state transition model for distributed query processing,"ACM Trans. Database Syst., vol. 11, pp. 294- 322, Sept. 1986.
[32] L. Lamport, "Time, clocks, and the ordering of events in a distributed system,"Commun. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[33] G. M. Lohman, C. Mohan, L. Haas, D. Daniels, B. Linday, P. Selinger, and P. Wilms, "Query processing in R*," inQuery Processing in Database Systems, W. Kim, D. Reiner, D. Batory, eds. New York: Springer Verlag, 1985, pp. 31-47.
[34] L. F. Mackert and G. M. Lohman, "R*optimizer validation and performance evaluation for distributed queries," inProc. 12th Int. Conf. Very Large Data Bases, Kyoto, Japan, 1986, pp. 149-159.
[35] L. F. Mackert, and G. M. Lehman, "R*optimizer validation and performance evaluation for local queries," inProc. 1986 ACM SIGMOD Conf., 1986, pp. 84-95.
[36] S. A. Mahmoud, J. S. Riordon, and K. C. Toth, "Distributed database partitioning and query processing," inProc. IFIP-TC-2 Conf., Venice, Italy, 1979, pp. 32-51.
[37] S. Masuyama, T. Ibaraki, S. Nishio, and T. Hasegawa, "Shortest semi-join schedule for a local area distributed database system,"IEEE Trans. Software Eng., vol. SE-13, May 1987, pp. 602-606.
[38] N. G. Nguyen, "Distributed query management for a local network," inProc. 2nd Int. Conf. on Distributed Computing Systems, Paris, France, Apr. 1981, pp. 188-196.
[39] F. Olken and D. Rotem, "Simple random sampling from relational databases," inProc. Int Conf. VLDB, Aug. 1986.
[40] E. Otoo, N. Santoro, and D. Rotem, "Improving semi-join evaluation in distributed query processing," inProc. 1987 Int. Conf. on Distributed Computing Systems, 1987, pp. 554-561.
[41] E. Ounegbe, S. Rahimi and A. Hevner, "Local query translation and optimization in a distributed system," inProc. NCC, July 1983, pp. 229-239.
[42] J. Pyra, "Dynamic query processing in distributed database systems," M.Comp.Sci. thesis, Techn. Univ. of Nova Scotia, Halifax, N.S., Canada, Dec. 1988.
[43] S. Salza and M. Terranova, "Evaluating the size of queries on relational databases with non-uniform distribution and stochastic dependence," inProc. 1989 ACM SIGMOD Conf., Portland,OR, pp. 8-14.
[44] P. G. Selinger and M. Adiba, "Access path selection in distributed data base management systems," inProc. First Int. Conf. on Data Bases, Aberdeen, 1980.
[45] T. Sellis, "Multiple-query optimization,"ACM Trans. Database Syst., vol. 13, no. 1, Mar. 1988.
[46] S. H. Son, "An environment for prototyping real-time distributed data-base," inProc. Int. Conf. Syst. Integration, Apr. 1990.
[47] M. Stonebraker,The INGRES Papers: Anatomy of a Relational Database System. Reading, MA: Addison-Wesley, 1986.
[48] S. Y. W. Su, K. P. Mikkilineni, R. Liurzi, and R. Chow, "A distributed query processing strategy based on decomposition, pipelining, intermediate result sharing techniques," inProc. COMDEC-86.
[49] K. C. Toth, S. A. Mahmoud, and J. S. Riordon, "Query processing strategies in a distributed DB architecture," inDistributed Data Sharing SystemsAmsterdam, The Netherlands: North Holland, 1982, pp. 117-134.
[50] A. Tzvieli and S. J. Cunnigham, "Query processing for integrated svstems," inProc. Systems Integration' 90 Conf., Morristown, NJ, Apr. 1990, pp. 528-537.
[51] B. Vander Zander, H. Taylor, and D. Bilton, "Estimating block accesses when attributes are correlated," inProc. 12th Int. Conf. Very Large Databases, Kyoto, Japan, Aug. 1986, pp. 119-127.
[52] J. S. Vitter, "Random sampling with a reservoir,"ACM TMS, vol. 11, no. 1, pp. 37-57, Mar. 1985.
[53] E. Wong, "A statistical approach to incomplete information in database systems," inACM TODS, vol. 7, no. 3, pp. 470-488, Sept. 1982.
[54] H. Z. Yang and P.-A. Larson, "Query transformation for PSJ-queries," inProc. 13th Int. Conf. on VLDB, 1987, pp. 245-254.
[55] C. Yu and Y. C. Lin, "Some estimation problems in distributed query processing," inProc. First Int. Conf. on-Data Engineering, 1982, pp. 13-19.
[56] C. Yu, L. Lilien, K. Guh and M. Templeton, "On the design of a distributed query processing strategy," inProc. ACM SIGMOD'83 Conf., May 1983, pp. 30-39.
[57] C. T. Yu, "Distributed database query processing," inQuery Processing in Database Systems, W. Kim, D. Reiner, and D. Batory, eds. New York: Springer Verlag, 1985, pp. 48-61.
[58] C. T. Yu, L. Lilien, K. Guh, M. Templeton, D. Brill, and A. Chen, "Adaptive techniques for distributed query processing," inProc. IEEE Int. Conf. Data Eng., 1986, pp. 86-93.
[59] C. Yu et al., "Algorithms to Process Distributed Queries in Fast Local Networks,"IEEE Trans. Computers, Oct. 1987, pp. 1153-1164.
[60] C. Yu and L. Chengwen, "Experiences with distributed query processing," inProc. Sixth Int. Conf. on Data Engineering, 1990, pp. 192-199.

Index Terms:
adaptive query execution; sampling; distributed query processing; distributed databases; a priori estimates; static approach; low overhead delay; heuristic strategies; distributed databases; heuristic programming; information retrieval
Citation:
P. Bodorik, J.S. Riordan, J.S. Pyra, "Deciding to Correct Distributed Query Processing," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 3, pp. 253-265, June 1992, doi:10.1109/69.142016
Usage of this product signifies your acceptance of the Terms of Use.