This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Handling Discovered Structure in Database Systems
April 1996 (vol. 8 no. 2)
pp. 227-240

Abstract—Most database systems research assumes that the database schema is determined by a database administrator. With the recent increase in interest in knowledge discovery from databases and the predicted increase in the volume of data expected to be stored it is appropriate to reexamine this assumption and investigate how derived or induced, rather than database administrator supplied, structure can be accommodated and used by database systems. This paper investigates some of the characteristics of inductive learning and knowledge discovery as they pertain to database systems and the constraints that would be imposed on appropriate inductive learning algorithms is discussed. A formal method of defining induced dependencies (both static and temporal) is proposed as the inductive analogue to functional dependencies. The Boswell database system exemplifying some of these characteristics is also briefly discussed.

[1] P.G. Selinger, "Predictions and Challenges for Database Systems in the Year 2000," Proc. 19th Int'l Conf. Very Large Databases, R. Agrawal, S. Baker, and D. Bell, eds., pp. 667-675,Dublin, Ireland, 1993.
[2] W. Frawley, G. Piatetsky-Shapiro, and C. Metheus, "Knowledge Discovery in Databases: An Overview," AI Magazine, vol. 13, no. 3, pp. 57-70, 1992.
[3] M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold, and A. Reuter, "DBMS Research at the Crossroads: the Vienna Update," Proc. 19th Int'l Conf. Very Large Databases, R. Agrawal, S. Baker, and D. Bell, eds., pp. 688-692,Dublin, Ireland, 1993.
[4] J.F. Roddick, N.G. Craske, and T.J. Richards, "Adding Inductive Inference to Database Management Systems: A Discussion of the Constraints, Limitations and Benefits," Technical Report 11/93, Dept. Computer Science and Computer Eng., La Trobe Univ., 1993.
[5] G. Polyá, Mathematics and Plausible Reasoning.Princeton, N.J.: Princeton Univ. Press, 1954.
[6] K. Popper, The Logic of Scientific Discovery, Second edition. New York: Harper and Row, 1968.
[7] A.G.P. Williams, Applicable Inductive Logic.London: B. Edsall and Co., 1982.
[8] J. Trusted, The Logic of Scientific Inference—An Introduction.London: Macmillan, 1979.
[9] H. Mortimer, The Logic of Induction, English edition., I. Craig and A.G. Cohn, eds. Chichester: Ellis Horwood, 1988.
[10] D. Hume, "An Inquiry Concerning Human Understanding, Section IV, (reprinted)," A Modern Introduction to Philosophy, P. Edwards and A. Pap, eds., Glencoe, Ill.: The Free Press, pp. 123-132, 1965.
[11] J. Holland, K. Holyoak, R. Nisbett, and P. Thorgard, Induction: Process of Inference, Learning, and Discovery. MIT Press, pp. 84-89, 1986.
[12] Machine Learning—An Artificial Intelligence Approach, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds. Berlin: Springer-Verlag, 1984.
[13] J.G. Carbonell, "Paradigms for Machine Learning," Machine Learning: Paradigms and Methods, J.G. Carbonell, ed., pp. 1-9.Amsterdam: MIT Press/Elsevier. Reprinted from Artificial Intelligence, vol. 40, pp. 1-3, 1990.
[14] A.W. Biermann, "Fundamental Mechanisms in Machine Learning and Inductive Inference," Lecture Notes in Artificial Intelligence, vol. 345, K.P. Jantke, ed., pp. 125-144.Berlin: Springer-Verlag, 1988.
[15] Y. Cai, N. Cercone, and J. Han, "Learning in Relational Databases: An Attribute-Oriented Approach," Computer Intelligence, vol. 7, no. 3, pp. 119-132, 1991.
[16] Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., Menlo Park, Calif.: AAAI Press/MIT Press, 1991.
[17] N. Cercone and M. Tsuchiya, eds., Special Issue on "Learning and Discovery in Knowledge-Based Databases," IEEE Trans. Knowledge and Data Engineering. vol. 5, no. 6, 1993.
[18] B. Skyrms, Choice and Chance, An Introduction to Inductive Logic, Third edition. Belmont, Calif.: Wadsworth, 1986.
[19] J. Han, Y. Cai, and N. Cercone, "Discovery of Quantitative Rules from Large Databases," Proc. Fifth Int'l Symp. Methodologies for Intelligent Systems, Z.W. Ras, M. Zemankova, and M.L. Emrich, eds., pp. 157-165, Knoxville, Tenn. NorthHolland, 1990.
[20] J.F. Roddick, N.G. Craske, and T.J. Richards, "Hierarchical and Set-Valued Domains as an Approach to Summarisation and Query Optimization in Databases," Technical Report 12/93, Dept. of Computer Science and Computer Eng., La Trobe Univ., 1993.
[21] J.F. Roddick, "Schema Evolution in Database Systems—An Annotated Bibliography," SIGMOD Rec., vol. 21, no. 4, pp. 35-40. An updated version of the bibliography may be obtained from the author, 1992.
[22] L.E. McKenzie, Jr. and R.T. Snodgrass, “Evaluation of Relational Algebras Incorporating the Time Dimension in Databases,” ACM Computing Surveys, vol. 23, no. 4, pp. 501–543, 1991.
[23] J.F. Roddick and J.D. Patrick,“Temporal semantics in information systems—a survey,” Information Systems, vol. 17, no. 3, pp. 249-267, 1992.
[24] A. Tansel et al. Temporal Databases: Theory, Design, and Implementation. Database Systems and Applications Series, Benjamin/Cummings, 1993.
[25] M. Ke and M. Ali, "Knowledge-Directed Induction in a Database Environment," Proc. Third Int'l Conf. Industrial and Eng. Applications of Artificial Intelligence and Expert Systems, pp. 325-332,Charleston, S.C., 1990.
[26] R. Reiter, "On Closed World Databases," Logic and Databases, H. Gallaire and J. Minker, eds., pp. 55-76.New York: Plenum Press. Reprinted in Artificial Intelligence and Databases, J. Mylopoulos and M.L. Brodie, eds., pp. 248-258. Morgan Kaufmann, 1978.
[27] P.R. Cohen and E.A. Feigenbaum, The Handbook of Artificial Intelligence, vol. III. Stanford, Calif.: Heuristic Press/William Kaufmann, 1983.
[28] D. Michie, "Methodologies from Machine Learning in Data Analysis and Software," Computer J., vol. 34, no. 6, pp. 559-565, 1991.
[29] J.G. Carbonell, R.S. Michalski, and T.M. Mitchell, "An Overview of Machine Learning," Machine Learning: An Artificial Intelligence Approach, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds., pp. 3-23.Berlin: Springer-Verlag, 1984.
[30] D.W. Patterson, Introduction to Artificial Intelligence and Expert Systems.London: Prentice Hall, 1990.
[31] J.H. Gennari, P. Langley, and D. Fisher, "Models of Incremental Concept Formation," Machine Learning: Paradigms and Methods, J.G. Carbonell, ed., pp. 11-61.Amsterdam: MIT Press/Elsevier. Reprinted from Artificial Intelligence, vol. 40, no. 1-3., 1990.
[32] A. Bundy, B. Silver, and D. Plummer, "An Analytical Comparison of Some Rule-Learning Programs," Artificial Intelligence vol. 27, no. 2, pp. 137-181, 1985.
[33] M. Ke and M. Ali, "Induction in Database Systems," J. Appl. Intell. vol. 1, no. 3, pp. 263-270, 1991.
[34] J.H. Friedman, "A Recursive Partitioning Decision Rule for Nonparametric Classification," IEEE Trans. Computers, vol. 26, no. 4, pp. 404-408, Apr. 1977.
[35] E.B. Hunt, J. Marin, and P. Stone, Experiments in Induction.New York: Academic Press, 1966.
[36] J.R. Quinlan, "Discovering Rules by Induction from Large Collections of Examples," Expert Systems in the Microelectronic Age, D. Michie, ed., pp. 168-201. Edinburgh Univ. Press, 1979.
[37] G.F. Luger and W.A. Stubblefield, Artificial Intelligence and the Design of Expert Systems,"Redwood City, Calif.: Benjamin-Cummings, 1989.
[38] M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola, "Discovering Functional and Inclusion Dependencies in Relational Databases," Int'l J. Intelligent Systems, vol. 7, pp. 591-607, 1992.
[39] H. Mannila and K.-J. Räihä, "Design by Example: An Application of Armstrong Relations," J. Computer Systems Science, vol. 33, no. 2, pp. 126-141, 1986.
[40] H. Mannila and K.-J. Räihä, "On the Complexity of Inferred Functional Dependencies," Discrete Applied Mathematics, vol. 40, no. 2, pp. 237-243, 1991.
[41] T.P. Bagchi, V.K. Rao Baratam, and S. Saha, "Dependency Inference Algorithms for Relational Database Design," Computers in Industry. vol. 14, pp. 319-350, 1990.
[42] P.A. Flach, "Inductive Characterisation of Database Relations," Proc. Fifth Int'l Symp. Methodologies for Intelligent Systems, Z.W. Ras, M. Zemankova, and M.L. Emrich, eds., pp. 371-378,Knoxville, Tenn., 1990.
[43] D. Bitton, J.C. Millman, and S. Torgersen, "A Feasibilty and Performance Study of Dependency Inference," Proc. Fifth IEEE Int'l Conf. Data Eng.Los Alamitos, Calif.: IEEE CS Press, 1989.
[44] Y. Cai, N. Cercone, and J. Han, "Learning Characteristic Rules from Relational Databases," Proc. Int'l Symp. Computational Intelligence, F. Gardin, G. Mauri, and M. Filippini, eds., pp. 187-196, Milan, 1989.
[45] Y. Cai, N. Cercone, and J. Han, "An Attribute-Oriented Approach for Learning Classification Rules from Relational Databases," Proc. IEEE Int'l Conf. Data Eng., pp. 281-288, 1990.
[46] W.V. Quine, "On Cores and Prime Implicants of Truth Functions," Selected Logic Papers, pp. 164-171.New York: Random House, 1966. Originally published in the Am. Math. Monthly, vol. 66, pp. 755-760, 1959.
[47] E.J. McCluskey, Introduction to the Theory of Switching Circuits.New York: McGraw-Hill, 1965.
[48] E.W. Samson and B.E. Mills, "Circuit Minimization: Algebra and Algorithm for New Boolean Canonical Expressions," Technical Report 54-21, Cambridge Air Force Research Center, 1954.
[49] E. Mendelson, Theory and Problems of Boolean Algebra and Switching Circuits.New York: McGraw-Hill, 1970.
[50] J.F. Roddick and T.J. Richards, "Induced Dependencies in Relational Databases," Proc. Second Int'l Computer Science Conf., E. Lam, F.H. Lochovsky, and D.C. Tsichritzis, eds., pp. 379-385,Kowloon, Hong Kong, 1992.
[51] D. Maier, The Theory of Relational Databases.Rockville: Computer Science Press, 1983.
[52] P. De Bra and J. Paredaens, "Horizontal Decompositions for Handling Exceptions to Functional Dependencies," Advances in Database Theory II, H. Gallaire, J. Minker, and J.-M. Nicolas, eds., pp. 123-144.New York: Plenum Press, 1984.
[53] A.L. Furtado, "Horizontal Decomposition to Improve Non-BCNF Scheme," SIGMOD Rec., vol. 12, no. 1, pp. 26-32, 1981.
[54] J. Paredaens, P. De Bra, M. Gyssens, and D. Van Gucht, The Structure of the Relational Database Model, vol. 17. Berlin: Springer-Verlag, 1989.
[55] S. Ceri, M. Negri, and G. Pelagatti, "Horizontal Partitioning in Database Design," Proc. ACM SIGMOD 1982 Int'l Conf. Management of Data, 1982.
[56] S. Ceri and G. Pelagatti, Distributed Databases: Principles and Systems.New York: McGraw-Hill, 1984.
[57] P. De Bra and J. Paredaens, "Horizontal Decompositions and Their Impact on Query Solving," SIGMOD Rec., vol. 13, no. 1, pp. 46-50, 1982.
[58] J. Ben-Zvi, "The Time Relational Model," PhD thesis, Computer Science Dept., UCLA, 1982.
[59] R.T. Snodgrass, “The Temporal Query Language TQuel,” ACM Trans. Database Systems, vol. 12, no. 2, pp. 247–298, 1987.
[60] A.U. Tansel, "Modelling temporal data," Information Software Technology, vol. 32, no. 8, pp. 514-520, 1990.
[61] J. Clifford and A. Tansel, "On an Algebra for Historical Relational Databases: Two Views," Proc. ACM SIGMOD Conf., 1985.
[62] J.F. Allen, “Maintaining Knowledge about Temporal Intervals,” Comm. ACM, vol. 26, no. 11, pp. 832–843, 1983.
[63] M.B. Vilain, "A System for Reasoning about Time," Proc. Nat'l Conf. Artificial Intelligence, pp. 197-201,Pittsburgh, Penn., 1982.
[64] P. Atzeni and V. De Antonellis, Relational Database Theory. Redwood City, Calif.: Benjamin/Cummings, 1993.
[65] Y. Cai, N. Cercone, and J. Han, "Attribute-Oriented Induction in Relational Databases," Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., pp. 213-228.Cambridge, Mass.: AAAI Press/MIT Press, 1991.
[66] M.P. Papazoglou and W. Valder, Relational Database Management—A Systems Programming Approach.London: Prentice Hall, 1989.
[67] M.P. Papazoglou, "An Extensible DBMS for Small-Medium Scale Systems," IEEE Micro, vol. 9, no. 2, pp. 52-68, 1989.
[68] J.F. Roddick, "Implementing Schema Evolution in Relational Database Systems: An Approach Based on Historical Schemata," Technical Report 10/93, Dept. of Computer Science and Computer Eng., La Trobe Univ., 1993.
[69] J.F. Roddick, N.G. Craske, and T.J. Richards, "A Taxonomy for Schema Versioning Based on the Relational and Entity Relationship Models," Proc. 12th Int'l Conf. Entity-Relationship Approach, R. Elmasri, ed., pp. 139-150,Dallas, Tex. Also to appear in Lecture Notes in Computer Science, Springer-Verlag., 1993.
[70] E.F. Codd,“A relational model of data for large shared data banks,” Comm. ACM, vol. 13, no. 6, June 1970.

Index Terms:
Inductive data models, knowledge discovery, temporal inference, Boswell.
Citation:
John F. Roddick, Noel G. Craske, Thomas J. Richards, "Handling Discovered Structure in Database Systems," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 2, pp. 227-240, April 1996, doi:10.1109/69.494163
Usage of this product signifies your acceptance of the Terms of Use.