This Article 
 Bibliographic References 
 Add to: 
Computational Complexity of Sorting and Joining Relations with Duplicates
December 1991 (vol. 3 no. 4)
pp. 496-503

It is shown that the existence of duplicate values in some attribute columns has a significant impact on the computational complexity of the sorting and joining operations. This is especially true when the number of distinct tuple values is a small fraction of the total number of tuples. The authors characterize a multirelation M(n, L) by its cardinality n and the number of distinct elements L it contains. Under this characterization, the worst time complexity of sorting such a multirelation with binary comparisons as basic operations is investigated. Upper and lower bounds on the number of three-branch comparisons needed to sort such a multirelation are established. Thereafter, the methodology used to study the complexity of sorting is applied to the natural join operation. It is shown that the existence of duplicate values in the join attribute columns can be exploited to reduce the computational complexity of the natural join operation.

[1] M. Abdelguerfi, "Special Function Unit for Statistical Aggregation Functions,"Proc. Sixth Int'l Workshop on Database Machines, Lecture Notes in Computer Science, No. 368, Springer-Verlag, 1989, pp. 187-201.
[2] M. Abdelguerfi and A. K. Sood, "A bus connected cellular array for relational database machines, "inProc. Fifth Int. Workshop Database Machines, Oct. 1987, (Database Machines and Knowledge Base Machines, M. Kitsuregawa and H. Tanaka, Eds. Norwell, MA: Kluwer Academic, 1988).
[3] M. M. Astrahanet al., "System R: Relational approach to database management,"Trans. Database Syst., vol. 1, no. 1, pp. 97-137, 1976.
[4] M. M. Astrahan, M. Schkolnick, and K. Y. Whang, "Approximating the number of unique values of an attribute without sorting,"Inform. Syst., vol. 12, no. 1, pp. 11-15, 1987.
[5] D. Bitton and D. J. Dewitt, "Duplicate record elimination in large data files,"ACM Trans. Database Syst., pp. 255-265, June 1983.
[6] E. F. Codd, "A relational model of data for large shared data banks,"Commun. ACM, pp. 377-387, June 1970.
[7] D. Dobkin and J. Munro, "Determining the mode,"Theoret. Comput. Sci., vol. 12, pp. 255-265, Nov. 1980.
[8] M. H. Eich, "Main memory database research directions," inProc. Sixth Int. Workshop Database Machines, France, June 1989, pp. 251-268 (Database Machines, Lecture Notes in Computer Science, H. Boral and P. Faudemay, Eds. Berlin, Germany, Springer-Verlag).
[9] P. Flajolet and G. N. Martin, "Probabilistic counting algorithms for database applications,"J. Comput. Syst. Sci., vol. 31, pp. 182-209, 1985.
[10] D. E. Knuth,The Art of Computer Programming, Vol. 3, Reading, MA: Addison-Wesley, 1973.
[11] M. Y. Lai and T. T. Lee, "Protocol verification using relational database systems," inProc. Third Int. Conf. Data Eng., 1987, pp. 347-354.
[12] M. Y. Lai and T. T. Lee, "A relational algebraic approach to protocol verification,"IEEE Trans. Software Eng., pp. 184-193, 1988.
[13] T. J. Lehman and M. J. Carvey, "Query processing in main memory database management systems," Comput. Sci. Tech. Rep. 637, Univ. of Wisconsin, Mar. 1986.
[14] H. T. Kung and P. L. Lehman, "Systolic (VLSI) arrays for relational database operations," inProc. ACM Sigmod 1980 Int. Conf. Management Data, May 1980, pp. 105-116.
[15] V. V. Menon, "On the maximum of Stirling number of the second kind,"J. Combinatorial Theory (A), vol. 15, pp. 11-24, 1973.
[16] K. Noshika, "Predicting the number of distinct elements in a multiset,"SIAM J. Comput., vol. 11, no. 4, pp. 611-619, 1982.
[17] A. K. Sood, M. Abdelguerfi, and S. Shu, "Hardware implementation of relational algebra operations," inDatabase Modern Trends and Applications, Nato ASI Editions, Series F, A.K. Sood, Ed. Berlin, Germany: Springer-Verlag, 1986, pp. 341-380.
[18] L. J. Stockmeyer and C. K. Wang, "On the number of comparisons to find the intersection of two relations,"SIAM J. Comput., vol. 8, no. 3, pp. 388-404, 1979.

Index Terms:
upper bounds; sorting; joining; relations; duplicate values; attribute columns; computational complexity; distinct tuple values; multirelation; cardinality; distinct elements; worst time complexity; binary comparisons; lower bounds; three-branch comparisons; natural join operation; computational complexity; database theory; relational databases; sorting
M. Abdelguerfi, A.K. Sood, "Computational Complexity of Sorting and Joining Relations with Duplicates," IEEE Transactions on Knowledge and Data Engineering, vol. 3, no. 4, pp. 496-503, Dec. 1991, doi:10.1109/69.109110
Usage of this product signifies your acceptance of the Terms of Use.