
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
M. Abdelguerfi, A.K. Sood, "Computational Complexity of Sorting and Joining Relations with Duplicates," IEEE Transactions on Knowledge and Data Engineering, vol. 3, no. 4, pp. 496503, December, 1991.  
BibTex  x  
@article{ 10.1109/69.109110, author = {M. Abdelguerfi and A.K. Sood}, title = {Computational Complexity of Sorting and Joining Relations with Duplicates}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {3}, number = {4}, issn = {10414347}, year = {1991}, pages = {496503}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.109110}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Computational Complexity of Sorting and Joining Relations with Duplicates IS  4 SN  10414347 SP496 EP503 EPD  496503 A1  M. Abdelguerfi, A1  A.K. Sood, PY  1991 KW  upper bounds; sorting; joining; relations; duplicate values; attribute columns; computational complexity; distinct tuple values; multirelation; cardinality; distinct elements; worst time complexity; binary comparisons; lower bounds; threebranch comparisons; natural join operation; computational complexity; database theory; relational databases; sorting VL  3 JA  IEEE Transactions on Knowledge and Data Engineering ER   
It is shown that the existence of duplicate values in some attribute columns has a significant impact on the computational complexity of the sorting and joining operations. This is especially true when the number of distinct tuple values is a small fraction of the total number of tuples. The authors characterize a multirelation M(n, L) by its cardinality n and the number of distinct elements L it contains. Under this characterization, the worst time complexity of sorting such a multirelation with binary comparisons as basic operations is investigated. Upper and lower bounds on the number of threebranch comparisons needed to sort such a multirelation are established. Thereafter, the methodology used to study the complexity of sorting is applied to the natural join operation. It is shown that the existence of duplicate values in the join attribute columns can be exploited to reduce the computational complexity of the natural join operation.
[1] M. Abdelguerfi, "Special Function Unit for Statistical Aggregation Functions,"Proc. Sixth Int'l Workshop on Database Machines, Lecture Notes in Computer Science, No. 368, SpringerVerlag, 1989, pp. 187201.
[2] M. Abdelguerfi and A. K. Sood, "A bus connected cellular array for relational database machines, "inProc. Fifth Int. Workshop Database Machines, Oct. 1987, (Database Machines and Knowledge Base Machines, M. Kitsuregawa and H. Tanaka, Eds. Norwell, MA: Kluwer Academic, 1988).
[3] M. M. Astrahanet al., "System R: Relational approach to database management,"Trans. Database Syst., vol. 1, no. 1, pp. 97137, 1976.
[4] M. M. Astrahan, M. Schkolnick, and K. Y. Whang, "Approximating the number of unique values of an attribute without sorting,"Inform. Syst., vol. 12, no. 1, pp. 1115, 1987.
[5] D. Bitton and D. J. Dewitt, "Duplicate record elimination in large data files,"ACM Trans. Database Syst., pp. 255265, June 1983.
[6] E. F. Codd, "A relational model of data for large shared data banks,"Commun. ACM, pp. 377387, June 1970.
[7] D. Dobkin and J. Munro, "Determining the mode,"Theoret. Comput. Sci., vol. 12, pp. 255265, Nov. 1980.
[8] M. H. Eich, "Main memory database research directions," inProc. Sixth Int. Workshop Database Machines, France, June 1989, pp. 251268 (Database Machines, Lecture Notes in Computer Science, H. Boral and P. Faudemay, Eds. Berlin, Germany, SpringerVerlag).
[9] P. Flajolet and G. N. Martin, "Probabilistic counting algorithms for database applications,"J. Comput. Syst. Sci., vol. 31, pp. 182209, 1985.
[10] D. E. Knuth,The Art of Computer Programming, Vol. 3, Reading, MA: AddisonWesley, 1973.
[11] M. Y. Lai and T. T. Lee, "Protocol verification using relational database systems," inProc. Third Int. Conf. Data Eng., 1987, pp. 347354.
[12] M. Y. Lai and T. T. Lee, "A relational algebraic approach to protocol verification,"IEEE Trans. Software Eng., pp. 184193, 1988.
[13] T. J. Lehman and M. J. Carvey, "Query processing in main memory database management systems," Comput. Sci. Tech. Rep. 637, Univ. of Wisconsin, Mar. 1986.
[14] H. T. Kung and P. L. Lehman, "Systolic (VLSI) arrays for relational database operations," inProc. ACM Sigmod 1980 Int. Conf. Management Data, May 1980, pp. 105116.
[15] V. V. Menon, "On the maximum of Stirling number of the second kind,"J. Combinatorial Theory (A), vol. 15, pp. 1124, 1973.
[16] K. Noshika, "Predicting the number of distinct elements in a multiset,"SIAM J. Comput., vol. 11, no. 4, pp. 611619, 1982.
[17] A. K. Sood, M. Abdelguerfi, and S. Shu, "Hardware implementation of relational algebra operations," inDatabase Modern Trends and Applications, Nato ASI Editions, Series F, A.K. Sood, Ed. Berlin, Germany: SpringerVerlag, 1986, pp. 341380.
[18] L. J. Stockmeyer and C. K. Wang, "On the number of comparisons to find the intersection of two relations,"SIAM J. Comput., vol. 8, no. 3, pp. 388404, 1979.