This Article 
 Bibliographic References 
 Add to: 
Absolute Bounds on Set Intersection and Union Sizes from Distribution Information
July 1988 (vol. 14 no. 7)
pp. 1033-1048

A catalog of quick closed-form bounds on set intersection and union sizes is presented; they can be expressed as rules, and managed by a rule-based system architecture. These methods use a variety of statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the data items onto distributions that can be more easily analyzed. The methods can be used anytime, but tend to work best when there are strong or complex correlations in the data. This circumstance is poorly handled by the standard independence-assumption and distributional-assumption estimates.

[1] H. W. Block and A. R. Sampson, "Inequalities on distributions: Bivariate and multivariate," inThe Encyclopedia of Statistical Sciences, vol. 4, New York: Wiley, 1983, pp. 76-82.
[2] S. Christodoulakis, "Estimating record selectivities,"Inform. Syst., vol. 8, no. 2, 105-115, 1983.
[3] L. H. Cox, "Suppression methodology and statistical disclosure control,"J. Amer. Statistic. Assoc., vol. 75, no. 370, pp. 377-385, June 1980.
[4] R. Demolombe, "Estimation of the number of tuples satisfying a query expressed in predicate calculus language," inProc. Sixth Conf. Very Large Data Bases, Sept. 1980, pp. 55-63.
[5] D. E. Denning and J. Schlorer, "Inference controls for statistical databases,"IEEE Computer, vol. 16, no. 7, pp. 69-81, July 1983.
[6] E. L. Lawler, "An approach to multilevel boolean minimization,"J. ACM, vol. 11, no. 3, pp. 283-295, July 1964.
[7] E. Lefons, "A Silvestri, and F. Tangorra, "An analytic approach to statistical databases," inProc. Ninth Int. Conf. Very Large Data Bases, Florence, Italy, Sept. 1983, pp. 260-274.
[8] W. Lipski, "On semantic issues connected with incomplete information databases,"ACM Trans. Database Syst., vol. 4, no. 3, pp. 262-296, Sept. 1979.
[9] T. H. Merrett and E. Otoo, "Distribution models of relations," inProc. 5th Int. Conf. on Very Large Data Bases(Rio de Janeiro, Brazil), Oct. 1979, pp. 418-425.
[10] G. Piatetsky-Shapiro and C. Connell, "Accurate estimation of the number of tuples satisfying a condition," inProc. ACM SIGMOD Conf.(Boston, MA), June 1984, pp. 256-276.
[11] P. Richard, "Evaluation of the size of a query expressed in relational algebra," inProc. ACM-SIGMOD Ann. Meeting, June 1981, pp. 155- 163.
[12] N. C. Rowe, "Rule-based statistical calculation on a database abstract," Ph.D. dissertation, Stanford Univ. Comput. Sci. Dep.; also Rep. STAN-CS-83-975, June 1983 (Ph.D. thesis).
[13] N. C. Rowe, "Diophantine inferences on a statistical database,"Inform. Proc. Lett., vol. 18, pp. 25-31, 1984.
[14] N. C. Rowe, "Antisampling for estimation: An overview,"IEEE Trans. Software Eng., vol. SE-11, pp. 1081-1091, Oct. 1985.
[15] D. Severance, "A practitioner's guide to data base compression,"Inform. Syst., vol. 8, no. 1, vol. pp. 51-62, 1983.
[16] A. Shoshani, "Statistical databases: Characteristics, problems, and some solutions," inProc. 8th Int. Conf. Very Large Data Bases, Mexico City, Mexico, 1982, pp. 208-222.
[17] B. M. Tilden, "A hierarchy of knowledge levels implemented in a rule-based production system to calculate bounds on the size of intersection and unions of simple sets," Master's thesis, U.S. Naval Postgraduate School, Dec. 1984.
[18] J. D. Tukey,Exploratory Data Analysis. Reading, MA: Addison-Wesley, 1977.

Index Terms:
Boolean algebra; file organisation; database access; set theory; statistical analysis; set intersection; union sizes; distribution information; closed-form bounds; rule-based system architecture; Boolean algebra; database theory; file organisation; set theory; statistical analysis
N.C. Rowe, "Absolute Bounds on Set Intersection and Union Sizes from Distribution Information," IEEE Transactions on Software Engineering, vol. 14, no. 7, pp. 1033-1048, July 1988, doi:10.1109/32.42743
Usage of this product signifies your acceptance of the Terms of Use.