This Article 
 Bibliographic References 
 Add to: 
A Conceptual Clustering Algorithm for Database Schema Design
June 1994 (vol. 6 no. 3)
pp. 396-411

Conceptual clustering techniques based on current theories of categorization provide a way to design database schemas that more accurately represent classes. An approach is presented in which classes are treated as complex clusters of concepts rather than as simple predicates. An important service provided by the database is determining whether a particular instance is a member of a class. A conceptual clustering algorithm based on theories of categorization aids in building classes by grouping related instances and developing class descriptions. The resulting database schema addresses a number of properties of categories, including default values and prototypes, analogical reasoning, exception handling, and family resemblance. Class cohesion results from trying to resolve conflicts between building generalized class descriptions and accommodating members of the class that deviate from these descriptions. This is achieved by combining techniques from machine learning, specifically explanation-based learning and case-based reasoning. A subsumption function is used to compare two class descriptions. A realization function is used to determine whether an instance meets an existing class description. A new function, INTERSECT, is introduced to compare the similarity of two instances. INTERSECT is used in defining an exception condition. Exception handling results in schema modification. This approach is applied to the database problems of schema integration, schema generation, query processing, and view creation.

[1] T. Anwar, H. Beck, and S. Navathe, "Knowledge mining by imprecise querying: A classification-based approach," inProc. IEEE 8th Int. Conf. Data Eng., Tempe, AZ, 1992.
[2] T. Anwar, S. Navathe, and H. Beck, "A semantically adaptive modeling interface for schema generation over multiple databases," Tech. Rep. 90-16, Database Syst. Res. and Dev. Center, Univ. of Florida, Gainesville, 1990.
[3] H. Beck, "Language acquisition from cases," inProc. DARPA Case-Based Reasoning Workshop.San Mateo, CA: Morgan Kaufmann, 1991.
[4] H. Beck and P. Fishwick, "Natural language, cognitive models, and simulation, " in P. Fishwick and P. Luker, Eds.,Qualitative Simulation Modeling and Analysis. New York: Springer-Verlag, 1990.
[5] H. Beck, S. Gala, and S. Navathe, "Classification as a query processing technique in the CANDIDE semantic data model," inProc. IEEE 5th Int. Conf. Data Eng., Los Angeles, CA, 1989.
[6] F. Bergadano, S. Matwin, R. Michalski, and J. Zhang, "Learning two-tiered descriptions of flexible concepts," Tech. Rep. MLI 88-6, TR-14-18, Mach. Learning and Inference Lab., Artificial Intell. Center, George Mason Univ., Fairfax, VA, 1988.
[7] A. Borgidaet al., "CLASSIC: A structural data model for objects," inProc. ACM SIGMOD '89 Conf., June 1989.
[8] A. Borgida, T. Mitchell, and K. Williamson, "Learning improved integrity constraints and schemas from exceptions in data and knowledge bases," in M. Brodie and J. Mylopoulos, Eds.,On Knowledge Base Management Systems. New York: Springer-Verlag, 1986, pp. 259-286.
[9] R. Brachman and J. Schmolze, "An overview of the KL-ONE knowledge representation system,"Cognitive Sci., vol. 9, pp. 171-216, 1985.
[10] Y. Cai, N. Cercone, and J. Han, "Learning characteristic rules from relational databases," inProc. Int. Symp. Computational Intell. '89, Milano, Italy, 1989.
[11] R. Elmasri, J. Weeldreyer, and A. Hevner, "The category concept: An extension to the entity-relationship model,"Int. J. Data Knowledge Eng., vol. 1, no. 1, 1985.
[12] E. Feigenbaum, "The simulation of verbal learning behavior," in E. Feigenbaum and J. Feldman, Eds.,Computers and Thought.New York: McGraw-Hill, 1963.
[13] D. Fisher, "Knowledge acquisition via incremental conceptual clustering,"Mach. Learning, vol. 2, pp. 139-172, 1987.
[14] M. Gluck and J. Corter, "Information, uncertainty, and the utility of categories, " inProc. 7th Ann. Conf. Cognitive Sci. Society. Hillsdale, NJ: Lawrence Erlbaum Associates, 1985, pp. 283-287.
[15] C. Glymour, R. Scheines, P. Spirtes, and K. Kelly,Discovering Causal Structure.San Diego, CA: Academic, 1987.
[16] S. Hanson and M. Bauer, "Conceptual clustering, categorization, and polymorphy,"Mach. Learning, vol. 3, pp. 343-372, 1989.
[17] J. Horty, R. Thomaxon, and D. Touretzky, "A skeptical theory of inheritance in nonmonotonic semantic networks," inProc. AAAI-87, Amer. Assoc. Artificial Intell.San Mateo, CA: Morgan Kaufmann, 1987, pp. 358-363.
[18] R. Hull and R. King, "Semantics database modeling: Survey, applications, and research issues,"Comput. Surveys, vol. 19, no. 3, pp. 201-260, Sept. 1987.
[19] P. Jacobs and U. Zernik, "Acquiring lexical knowledge from text," inProc. AAAI-88, American Assoc. Artificial Intell.San Mateo, CA: Morgan Kaufmann, 1988, pp. 739-744.
[20] H. Jagadish, "Incorporating hierarchy in a relational model of data,"Proc. ACM SIGMOD Int. Conf. on Management of Data, 1989, 78-87.
[21] F. Keil and N. Batterman, "A characteristic-to-defining shift in the development of word meaning,"J. Verbal Learning and Verbal Behavior, vol. 23, pp. 221-236, 1984.
[22] J. Kolodner, "Maintaining organization in a dynamic long-term memory,"Cognitive Sci., vol., pp. 243-280, 1983.
[23] J. Kolodner, Ed.Proc. Workshop on Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1988.
[24] G. Lakoff,Women, Fire, and Dangerous Things. Chicago: University of Chicago Press, 1987.
[25] M. Lebowitz, "Experiments with incremental concept formation: UNIMEN,"Mach. Learning, vol. 2, pp. 103-138, 1987.
[26] M. Lebowitz, "Not the path to perdition: The utility of similarity-based learning," inProc. AAAI-86. San Mateo, CA: Morgan Kaufmann, 1986, pp. 533-537.
[27] Q. Li and D. McLeod, "Object flavor evolution in an object-oriented database system," inProc. ACM Conf. Office Information Systems, SIGOIS Bulletin, vol. 9, no. 3, pp. 265-275, 1988.
[28] D. McLeod, "A learning-based approach to meta-data evolution in object-oriented database systems," in K. Dittrich, Ed.,Advances inObject-Oriented Database Systems, Lecture Notes in Computer Science. New York: Springer-Verlag, 1988.
[29] C. Mervis, "Child-basic object categories and early lexical development," in U. Neisser, Ed.,Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization. Cambridge, UK: Cambridge University Press, 1987, pp. 201-233.
[30] M. Minsky, "A framework for representing knowledge," in P. Winston, Ed.,The Psychology of Computer Vision. New York: McGraw-Hill, 1975, pp. 211-280.
[31] S. Minton, J. Carbonell, C. Knoblock, D. Kuokka, O. Etzioni, and Y. Gil, "Explanation based learning: A problem solving perspective,"Artificial Intell., vol. 40, pp. 63-118, 1989.
[32] B. Nebel, "Computational complexity of terminological reasoning in BACK,"Artificial Intell., vol. 34, pp. 371-383, 1988.
[33] U. Neisser, Ed. Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization. Cambridge, UK: Cambridge University Press, 1987.
[34] R. Reiter, "A logic for default reasoning,"Artificial Intell., vol. 13, pp. 81-132, 1980.
[35] E. Rosch and C. Mervis, "Family resemblances: Studies in the internal structure of categories,"Cognitive Psychology, vol. 7, pp. 573-605, 1975.
[36] G. Salton and M. J. McGill,Introduction to Modern Information Retrieval(Computer Series). New York: McGraw-Hill, 1983.
[37] A. Savasere, "An approach to schema integration using classification," M.S. thesis, Dept. of Comput. and Inform. Sci., Univ. of Florida, Gainesville, 1990.
[38] R. Schank and C. Rieger, "Inference and the computer understanding of natural language,"Artificial Intell., vol. 5, no. 4, pp. 373-412, 1974.
[39] S. Schwartz,Naming, Necessity, and Natural Kinds. Ithaca, NY: Cornell University Press, 1977.
[40] Sowa, J.F.,Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, Reading, Mass., 1984. (Conceptual Graphs)
[41] R. Stepp and R. Michalski, "Conceptual clustering: Inventing goal-oriented classifications of structured objects," in R. Michalski, J. Carbonell, and T. Mitchell, Eds.,Machine Learning: An Artificial Intelligence Approach, vol. 2. San Mateo, CA: Morgan Kaufmann, 1986, pp. 471-498.
[42] P. M. Winston,Artificial Intelligence. Reading, MA: Addison-Wesley, 1984.
[43] L. Wittgenstein,Philosophical Investigations.New York: Macmillan, 1953.
[44] H. Yang and D. Fisher, "Conceptual clustering of mean-ends plans," inProc. 6th Int. Workshop Mach. Learning.San Mateo, CA: Morgan Kaufmann, 1989, pp. 232-234.
[45] J. Yoo and D. Fisher, "Conceptual clustering of explanations," inProc. 6th Int. Workshop Mach. Learning. San Mateo, CA: Morgan Kaufmann, 1989, pp. 8-10.

Index Terms:
data structures; database management systems; database theory; query processing; exception handling; operating systems (computers); learning (artificial intelligence); case-based reasoning; conceptual clustering algorithm; database schema design; categorization; complex clusters; class descriptions; default values; analogical reasoning; exception handling; family resemblance; class cohesion; machine learning; explanation-based learning; case-based reasoning; subsumption function; realization function; INTERSECT; exception condition; schema modification; schema integration; schema generation; query processing; view creation
H.W. Beck, T. Anwar, S.B. Navathe, "A Conceptual Clustering Algorithm for Database Schema Design," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 3, pp. 396-411, June 1994, doi:10.1109/69.334862
Usage of this product signifies your acceptance of the Terms of Use.