This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Mining: An Overview from a Database Perspective
December 1996 (vol. 8 no. 6)
pp. 866-883

Abstract—Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided, and a comparative study of such techniques is presented.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami, Database Mining: A Performance Perspective IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[4] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[5] R. Agrawal, K. Lin, H.S. Sawhney, and K. Shim, “Fast Similarity Search in the Presence of Noise, Scaling and Translation in Time-Series Databases,” Proc. Very Large Data Bases, pp. 490-501, Sept. 1995.
[6] R. Agrawal, M. Mehta, J. Shafer, R. Srikant, A. Arning, and T. Bollinger, "The QUEST Data Mining System," Proc. Int'l Conf. Data Mining and Knowledge Discovery (KDD '96), pp. 244-249,Portland, Ore., Aug. 1996.
[7] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[8] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.
[9] K.K. Al-Taha, R.T. Snodgrass, and M.D. Soo, "Bibliography on Spatiotemporal Databases," SIGMOD Record, vol. 22, no. 1, pp. 59-67, Mar. 1993.
[10] T.M. Anwar, H.W. Beck, and S.B. Navathe, "Knowledge Mining by Imprecise Querying: A Classification-Based Approach," Proc. Eighth Int'l Conf. Data Eng., pp. 622-630, Feb. 1992.
[11] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[12] M. Bieber and J. Wan, "Backtracking in a Multiple-Window Hypertext Environment," ACM European Conf. Hypermedia Technology, pp. 158-166, 1994.
[13] R. Brachman and T. Anand, "The Process of Knowledge Discovery in Databases: A Human-Centered Approach," U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, pp. 37-58. AAAI/MIT Press, 1996.
[14] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification of Regression Trees.Wadsworth, 1984.
[15] B. Kain, “Pragmatics of Reuse in the Enterprise,” Object Magazine, pp. 55-58, Feb. 1994.
[16] L.D. Catledge and J.E. Pitkow, "Characterizing Browsing Strategies in the World-Wide Web," Proc. Third WWW Conf., Apr. 1995.
[17] P.K. Chan and S.J. Stolfo, "Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning," Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD '95), pp. 39-44, Aug. 1995.
[18] P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 61-83, 1996.
[19] M.S. Chen, J.S. Park, and P.S. Yu, "Data Mining for Path Traversal Patterns in a Web Environment," Proc. 16th Int'l Conf. Distributed Computing Systems (ICDCS 96), IEEE CS Press, 1996, pp. 385-392.
[20] M.-S. Chen and P.S. Yu, "Using Multi-Attribute Predicates for Mining Classification Rules," IBM Research Report, 1995.
[21] D. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique Proc. 1996 Int'l Conf. Data Eng., pp. 106-114, Feb. 1996.
[22] C. Clifton and D. Marks, "Security and Privacy Implications of Data Mining," Proc. 1996 SIGMOD '96 Workshop Research Issues Data Mining and Knowledge Discovery (DMKD '96), pp. 15-20,Montreal, Canada, June 1996.
[23] J. December and N. Randall, The World Wide Web Unleashed, SAMS Publishing, 1994.
[24] V. Dhar and A. Tuzhilin, "Abstract-Driven Pattern Discovery in Databases," IEEE Trans. Knowledge and Data Engineering, vol. 5, no. 6, 1993.
[25] S. Dzeroski, "Inductive Logic Programming and Knowledge Discovery," U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, pp. 117-152. AAAI/MIT Press, 1996.
[26] J. Elder and D. Pregibon, "A Statistical Perspective on Knowledge Discovery in Databases," in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, eds., MIT Press, 1996, pp. 83-116.
[27] M. Ester, H. Kriegel, and X. Xu, “Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification,” Proc. Fourth Int'l Symp. Large Spatial Databases (SSD '95), pp. 67–82, 1995.
[28] C. Faloutsos and K.I. Lin, “Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,” Proc. SIGMOD, Int'l Conf. Management of Data, pp. 163-174, 1995.
[29] C. Faloutsos, M. Ranganathan, and I. Manolopoulos, “Fast Subsequence Matching in Time Series Databases,” Proc. ACM SIGMOD, pp. 419-429, May 1994.
[30] U. Fayyad et al., eds., Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass., 1996.
[31] D. Fisher, "Improving Inference Through Conceptual Clustering," Proc. AAAI Conf., pp. 461-465,Seattle, July 1987.
[32] D. Fisher, "Optimization and Simplification of Hierarchical Clusterings," Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD '95), pp. 118-123,Montreal, Canada, Aug. 1995.
[33] Y. Fu and J. Han, "Meta-Rule-Guided Mining of Association Rules in Relational Databases," Proc. First Int'l Workshop Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD '95), pp. 39-46,Singapore, Dec. 1995.
[34] B. Gaines, "Transforming Rules and Trees into Comprehensible Knowledge Structures," Advances in Knowledge Discovery and Data Mining, U.M. Fayyad et al., eds., MIT Press, Cambridge, Mass., 1996, pp. 205-226.
[35] A. Gupta, V. Harinarayan, and D. Quass, "Aggregate-Query Processing in Data Warehousing Environments," Proc. Eighth Int'l Conf. Very Large Databases (VLDB), pp. 358-369,Zurich, Switzerland, Sept. 1995.
[36] J. Han, "Mining Knowledge at Multiple Concept Levels," Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 19-24,Baltimore, Md., Nov. 1995.
[37] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quantitative Rules in Relational Databases," IEEE Trans. Knowledge and Data Eng., pp. 29-40, Feb. 1993.
[38] J. Han and Y. Fu, "Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases," Proc. AAAI '94 Workshop Knowledge Discovery in Databases (KDD '94), pp. 157-168,Seattle, July 1994.
[39] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[40] J.W. Han and Y.J. Fu, "Exploration of the Power of the Attribute-Oriented Induction in Data Mining," Advances in Knowledge Discovery and Data Mining, chapter 16. MIT Press, 1995.
[41] J. Han, S. Nishio, and H. Kawano, "Knowledge Discovery in Object-Oriented and Active Databases," F. Fuchi and T. Yokoi, eds., Knowledge Building and Knowledge Sharing, pp. 221-230, Ohmsha, Ltd. and IOS Press, 1994.
[42] J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu, A. Rajan, N. Stefanovic, B. Xia, and O.R. Zaiane, "DBMiner: A System for Mining Knowledge in Large Relational Databases," Proc. Int'l Conf. Data Mining and Knowledge Discovery (KDD '96), pp. 250-255,Portland, Ore., Aug. 1996.
[43] V. Harinarayan, A. Rajaraman, and J. D. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM SIGMOD, pp. 205-216, June 1996
[44] IBM, "Scalable POWERparallel Systems," Technical Report GA23-2475-02, Feb. 1995.
[45] T. Imielinski and A. Virmani, "DataMine—Application Programming Interface and Query Language for kdd Applications," Proc. Int'l Conf. Data Mining and Knowledge Discovery (KDD '96), pp. 256-261,Portland, Ore., Aug. 1996.
[46] H.V. Jagadish, “A Retrieval Technique for Similar Shapes,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 208-217, 1991.
[47] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[48] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
[49] D.A. Keim, H.-P. Kriegel, and T. Seidl, “Supporting Data Mining of Large Databases by Visual Feedback Queries,” Proc. 10th Int'l Conf. Data Eng., pp. 302-313, 1994.
[50] W. Kim, Introduction to Object-Oriented Databases, MIT Press, Cambridge, Mass., 1990.
[51] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, “Finding Interesting Rules from Large Sets of Association Rules,” Proc. Third Int'l Conf. Information and Knowledge Management, N.R. Adam, K.B. Bhargava, and Y. Yesha, eds. pp. 401-407, 1994.
[52] W. Klösgen, "Explora: A Multipattern and Multistrategy Discovery Assistant," U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in Knowledge Discovery and Data Mining, pp. 249-271. AAAI/MIT Press, 1996.
[53] K. Koperski and J. Han, “Discovery of Spatial Association Rules in Geographic Information Databases,” Proc. Fourth Int'l Symp. Large Spatial Databases (SSD '95), pp. 47–66, Portland, Maine, Aug. 1995.
[54] C.-S. Li, P.S. Yu, and V. Castelli, “Hierarchyscan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences,” Proc. Int'l Conf. Data Eng., 1996.
[55] H. Lu, R. Setiono, and H. Liu, "Neurorule: A Connectionist Approach to Data Mining," Proc, VLDB '95, pp. 478-489, 1995.
[56] W. Lu, J. Han, and B.C. Ooi, "Knowledge Discovery in Large Spatial Databases," Proc. Far East Workshop Geographic Information Systems, pp. 275-289,Singapore, June 1993.
[57] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, "Efficient Algorithms for Discovering Association Rules," Proc. AAAI Workshop Knowledge Discovery in Databases, pp. 181-192, July 1994.
[58] C.J. Matheus, G. Piatetsky-Shapiro, and D. McNeill, “Selecting and Reporting What Is Interesting,” Advances in Knowledge Discovery and Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., pp. 495-514, AAAI Press/The MIT Press, 1996.
[59] M. Mehta, R. Agrawal, and J. Rissanen, “SLIQ: A Fast Scalable Classifier for Data Mining,” Proc. Fifth Int'l Conf. Extending Database Technology, pp. 18-32, 1996.
[60] R.S. Michalski, "A Theory and Methodology of Inductive Learning," Michalski et al., eds., Machine Learning: An Artificial Intelligence Approach, vol. 1, pp. 83-134. Morgan Kaufmann, 1983.
[61] R.S. Michalski, L. Kerschberg, K.A. Kaufman, and J.S. Ribeiro, "Mining for Knowledge in Databases: The INLEN Architecture, Initial Implementation, and First Results," J. Int'l Information Systems, vol. 1, pp. 85-114, 1992.
[62] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[63] D.E. O'Leary, "Knowledge Discovery as a Threat to Database Security," G. Piatetsky-Shapiro and W.J. Frawley, eds., Knowledge Discovery in Databases, pp. 507-516, AAAI/MIT Press, 1991.
[64] A. Papoulis, Probability, Random Variable, and Stochastic Process.New York: McGraw Hill, 1984.
[65] J.-S. Park, M.-S. Chen, and P.S. Yu, "Mining Association Rules with Adjustable Accuracy," IBM Research Report, 1995.
[66] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[67] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 31-36, 1995.
[68] G. Piatetsky-Shapiro, "Discovery, Analysis, and Presentation of Strong Rules," G. Piatetsky-Shapiro and W. J. Frawley, eds., Knowledge Discovery in Databases, pp. 229-238. AAAI/MIT Press, 1991.
[69] U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 1-34, 1996.
[70] G. Piatetsky-Shapiro and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[71] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[72] J.R. Quinlan, C4.5: Programs for Machine Learning,San Mateo, Calif.: Morgan Kaufman, 1992.
[73] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[74] P.G. Selfridge, D. Srivastava, and L.O. Wilson, "IDEA: Interactive Data Exploration and Analysis," Proc. ACM SIGMOD Int'l Conf. Management Data, pp. 24-34,Montreal, Canada, June 1996.
[75] W.M. Shen, K. Ong, B. Mitbander, and C. Zaniolo, "Metaqueries for Data Mining, Advances in Knowledge Discovery and Data Mining, chapter 15. MIT Press, 1995.
[76] A. Silberschatz, M. Stonebraker, and J.D. Ullman, "Database Research: Achievements and Opportunities into the 21st Century," Report NSF Workshop Future of Database Systems Research, May 1995.
[77] A. Silberschatz and A. Tuzhilin, "On Subjective Measure of Interestingness in Knowledge Discovery," Proc. First Int'l Conf. Knowledge Discovery and Data Mining (KDD '95), pp. 275-281,Montreal, Canada, Aug. 1995.
[78] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[79] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, June 1996.
[80] R. Stam and R. Snodgrass, "A Bibliography on Temporal Databases," IEEE Bull. Data Eng., vol. 11, no. 4, Dec. 1988.
[81] Y. Stettiner, D. Malah, and D. Chazan, "Dynamic Time Warping with Path Control and Nonlocal Cost," Proc. 12th IAPR Int'l Conf. Pattern Recognition, pp. 174-177, Oct. 1994.
[82] S. Weiss and C. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.
[83] J. Widom, “Research Problems in Data Warehousing,” Proc. Int'l Conf. Information and Knowledge Management, pp. 25-30, Nov. 1995.
[84] W.P. Yan and P. Larson, "Eager Aggregation and Lazy Aggregation," Proc. Eighth Int'l Conf. Very Large Databases (VLDB), pp. 345-357,Zurich, Switzerland, Sept. 1995.
[85] T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103-114.
[86] Y. Zhuge, H. García-Molina, J. Hammer, and J. Widom, “View Maintenance in a Warehousing Environment,” Proc. SIGMOD, pp. 316-327, May 1995.
[87] W. Ziarko, Rough Sets, Fuzzy Sets and Knowledge Discovery. Springer-Verlag, 1994.

Index Terms:
Data mining, knowledge discovery, association rules, classification, data clustering, pattern matching algorithms, data generalization and characterization, data cubes, multiple-dimensional databases.
Citation:
Ming-Syan Chen, Jiawei Han, Philip S. Yu, "Data Mining: An Overview from a Database Perspective," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883, Dec. 1996, doi:10.1109/69.553155
Usage of this product signifies your acceptance of the Terms of Use.