This Article 
 Bibliographic References 
 Add to: 
Optimal Semijoins for Distributed Database Systems
May 1990 (vol. 16 no. 5)
pp. 558-560

A Bloom-filter-based semijoin algorithm for distributed database systems is presented. This algorithm reduces communications costs to process a distributed natural join as much as possible with a filter approach. An optimal filter is developed in pieces. Filter information is used both to recognize when the semijoin will cease to be effective and to optimally process the semijoin. An ineffective semijoin will be quickly and cheaply recognized. An effective semijoin will use all of the transmitted bits optimally.

[1] R. Bayer and K. Unterauer, "Prefix B-Trees,"ACM TODS, vol. 2, no. 1, pp. 11-26, Mar. 1977.
[2] P. A. Bernstein, N. Goodman, E. Wong, G. L. Reeve, and J. Rothmie, "Query processing in a system for distributed database (SDD-I),"ACM Trans. Database Syst., vol. 6, Dec. 1981.
[3] P. Bernstein and D. Chiu, "Using semijoins to solve relational queries,"J. ACM, vol. 28, no. 1, pp. 25-40, Jan. 1981.
[4] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors,"Commun. ACM, vol. 13, July 1970.
[5] K. Bratbergsengen, "Hashing methods and relational algebra operations," inProc. Conf. Very Large Data Bases(Singapore), Aug. 1984, pp. 323-333.
[6] D. A. Huffman, "A method for the construction of minimum redundancy codes,"Proc. IRE, vol. 40, pp. 1098-1101, 1952.
[7] L. F. Mackert and G. M. Lohman, "R*optimizer validation and performance evaluation for distributed queries," inProc. 12th Int. Conf. Very Large Data Bases, Kyoto, Japan, 1986, pp. 149-159.
[8] J. K. Mullin, "A second look at Bloom filters,"Commun. ACM, vol. 26, no. 8, pp. 570-571, Aug. 1983.
[9] D. Severance, "Differential files: their application to the maintenance of large databases,"ACM Trans. Data Base Syst., vol. 1, pp. 256-267, 1976.
[10] C. T. Yu and C. C. Chang, "On the design of a query processing strategy in a distributed database environment,"ACM SIGMOD Rec., vol. 13, no. 4, pp. 30-39, May 1983.

Index Terms:
optimal semijoins; filter information; distributed database systems; Bloom-filter-based semijoin algorithm; communications costs; distributed natural join; optimal filter; recognize; transmitted bits; database theory; distributed databases.
J.K. Mullin, "Optimal Semijoins for Distributed Database Systems," IEEE Transactions on Software Engineering, vol. 16, no. 5, pp. 558-560, May 1990, doi:10.1109/32.52778
Usage of this product signifies your acceptance of the Terms of Use.