This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases
March/April 1998 (vol. 10 no. 2)
pp. 193-208

Abstract—We present GDBR (Generalize DataBase Relation) and FIGR (Fast, Incremental Generalization and Regeneralization), two enhancements of Attribute-Oriented Generalization, a well-known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an on-line algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable although, under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.

[1] R. Agrawal, T. Imielinski, and A. Swami, Database Mining: A Performance Perspective IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[3] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.
[4] Y. Cai, N. Cercone, and J. Han, "Attribute-Oriented Induction in Relational Databases," G. Piatetsky-Shapiro and W.J. Frawley, eds., Knowledge Discovery in Databases, AAAI/MIT Press, Menlo Park, Calif., pp. 213-228, 1991.
[5] C.L. Carter and H.J. Hamilton, "A Fast, On-Line Generalization Algorithm for Knowledge Discovery," Applied Math. Letters, vol. 8, no. 2, pp. 5-11, Feb. 1995.
[6] C.L. Carter, "Efficient Attribute Oriented Generalization for Knowledge Discovery from Large Databases," MSc thesis, Univ. of Regina, Saskatchewan, Canada, 1994.
[7] C.L. Carter and H.J. Hamilton, "Fast, Incremental Generalization and Regeneralization for Knowledge Discovery from Databases," Proc. FLAIRS, Eighth Florida Artificial Intelligence Research Symp.,Melbourne, Fla., pp. 319-323, Apr. 1995.
[8] C.L. Carter and H.J. Hamilton, "Performance Evaluation of Attribute-Oriented Algorithms for Knowledge Discovery from Databases," Proc. ICTAI, Seventh IEEE Int'l Conf. Tools with Artificial Intelligence,Washington, D.C., Nov. 1995.
[9] W. Frawley, G. Piatetsky-Shapiro, and C. Metheus, "Knowledge Discovery in Databases: An Overview," AI Magazine, vol. 13, no. 3, pp. 57-70, 1992.
[10] H.J. Hamilton and D.R. Fudger, "Estimating DBLearn's Potential for Knowledge Discovery in Databases," Computational Intelligence, vol. 11, no. 2, 1995.
[11] J. Han, "Towards Efficient Induction Mechanisms in Database Systems," Theoretical Computing Science, vol. 133, pp. 361-385, 1994.
[12] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quantitative Rules in Relational Databases," IEEE Trans. Knowledge and Data Eng., pp. 29-40, Feb. 1993.
[13] J. Han, Y. Fu, and S. Tang, "Advances of the DBLearn System for Knowledge Discovery in Large Databases," Proc. IJCAI, Int'l Joint Conf. Artificial Intelligence,Montreal, pp. 2,049-2,050, Aug. 1995.
[14] H.-Y. Hwang and W.-C. Fu, "Efficient Algorithms for Attribute-Oriented Induction," Proc. KDD, First Int'l Conf. Knowledge Discovery and Data Mining,Montreal, pp. 168-173, Aug. 1995.
[15] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[16] G. Piatetsky-Shapiro, "Discovery, Analysis, and Presentation of Strong Rules," G. Piatetsky-Shapiro and W.J. Frawley, eds., Knowledge Discovery in Databases, AAAI/MIT Press,Menlo Park, Calif., pp. 229-248, 1991.
[17] J.R. Quinlan, "Probabilistic Decision Trees," Y. Kodratoff and R.S. Michalski, eds., Machine Learning: An Artificial Intelligence Approach, vol. III, Morgan Kaufmann, San Mateo, Calif., pp. 140-152, 1990.
[18] C.B. Rivera and C.L. Carter, "A Tutorial Guide to DB-Discover," version 2.0, Technical Report CS-95-05, Univ. of Regina, Saskatchewan, Canada, July 1995.
[19] N. Shan, H.J. Hamilton, and N.J. Cercone, "GRG: Knowledge Discovery Using Information Generalization, Information Reduction, and Rule Generation," Int'l J. Artificial Intelligence Tools, vol. 5, issues 1 and 2, pp. 99-112, 1996.
[20] M.A. Weiss, Data Structures and Algorithm Analysis in C++, Benjamin/Cummings, Redwood City, Calif., 1994.

Index Terms:
Knowledge discovery from databases, data mining, attribute-oriented induction.
Citation:
Colin L. Carter, Howard J. Hamilton, "Efficient Attribute-Oriented Generalization for Knowledge Discovery from Large Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 193-208, March-April 1998, doi:10.1109/69.683752
Usage of this product signifies your acceptance of the Terms of Use.