This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Visualization Techniques for Mining Large Databases: A Comparison
December 1996 (vol. 8 no. 6)
pp. 923-938

Abstract—Visual data mining techniques have proven to be of high value in exploratory data analysis, and they also have a high potential for mining large databases. In this article, we describe and evaluate a new visualization-based approach to mining large databases. The basic idea of our visual data mining techniques is to represent as many data items as possible on the screen at the same time by mapping each data value to a pixel of the screen and arranging the pixels adequately. The major goal of this article is to evaluate our visual data mining techniques and to compare them to other well-known visualization techniques for multidimensional data: the parallel coordinate and stick figure visualization techniques. For the evaluation of visual data mining techniques, in the first place the perception of properties of the data counts, and only in the second place the CPU time and the number of secondary storage accesses are important. In addition to testing the visualization techniques using real data, we developed a testing environment for database visualizations similar to the benchmark approach used for comparing the performance of database systems. The testing environment allows the generation of test data sets with predefined data characteristics which are important for comparing the perceptual abilities of visual data mining techniques.

[1] W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus, "Knowledge Discovery in Databases: An Overview," Knowledge Discovery in Databases.Menlo Park, Calif.: AAAI Press, pp. 1-27, 1991.
[2] G. Dunn and B. Everitt, An Introduction to Mathematical Taxonomy.Cambridge, Mass.: Cambridge Univ. Press, 1982.
[3] T. Gaasterland, P. Godfrey, and J. Minker, "An Overview of Cooperative Answering," J. Intelligent Information Systems, vol. 1, pp. 123-157, 1992.
[4] T.M. Anwar, H.W. Beck, and S.B. Navathe, "Knowledge Mining by Imprecise Querying: A Classification-Based Approach," Proc. Eighth Int'l Conf. Data Eng., pp. 622-630, Feb. 1992.
[5] A. Motro, "FLEX: A Tolerant and Cooperative User Interface to Databases," IEEE Trans. Knowledge and Data Engineering, vol. 2, no. 2, pp. 231-246, 1990.
[6] G. Salton and C. Buckley, "Term-Weighting Approaches in Automatic Text Retrieval," Information Processing and Management, vol. 24, 1988, pp. 513-523.
[7] H.P. Frei and S. Meienberg, "Evaluating Weighted Search Terms as Boolean Queries," Proc. GI/GMD-Workshop, Darmstadt, Informatik-Fachberichte, vol. 289, pp. 11-22, 1991.
[8] C.J. Matheus, P.K. Chan, and G. Piatetsky-Shapiro, “Systems for Knowledge Discovery in Databases,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, 1993.
[9] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[10] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[11] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quantitative Rules in Relational Databases," IEEE Trans. Knowledge and Data Eng., pp. 29-40, Feb. 1993.
[12] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[13] E.R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Conn., 1983, p. 111.
[14] E.R. Tufte, Envisioning Information. Cheshire, Conn.: Graphics Press, 1990.
[15] D.F. Andrews, "Plots of High-Dimensional Data," Biometrics, vol. 29, pp. 125-136, 1972.
[16] W.S. Cleveland, Visualizing Data. Summit, N.J.: Hobart Press, 1993.
[17] G.W. Furnas and A. Buja, "Prosections Views: Dimensional Inference through Sections and Projections," J. Computational and Graphical Statistics, vol. 3, no. 4, pp. 323-353, 1994.
[18] A. Inselberg, "The Plane with Parallel Coordinates, Special Issue on Computational Geometry," The Visual Computer,Cheshire, Conn., vol. 1, pp. 69-97, 1985.
[19] A. Inselberg and B. Dimsdale, "Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry," Proc. Visualization '90, IEEE CS Press, 1990, pp. 361-370.
[20] P.J. Huber, "Projection Pursuit," The Annals of Statistics, vol. 13, no. 2, pp. 435-474, 1985.
[21] B. Alpern and L. Carter, “Hyperbox,” Proc. Visualization '91, pp. 133-139, 1991.
[22] J.J. van Wijk and R.D. van Liere, “Hyperslice,” Proc. Visualization '93, pp. 119-125, 1993.
[23] R.M. Pickett and G.G. Grinstein, “Iconographic Displays for Visualizing Multidimensional Data,” Proc. IEEE Conf. Systems, Man, and Cybernetics, pp. 514-519, 1988.
[24] J. Beddow, “Shape Coding of Multidimensional Data on a Microcomputer Display,” Proc. Visualization '90, pp. 238-246, 1990.
[25] J. LeBlanc, M.O. Ward, and N. Wittels, “Exploring N-Dimensional Databases,” Proc. Visualization '90, pp. 230-239, 1990.
[26] G.G. Robertson, J.D. Mackinlay, and S.K. Card, "Cone Trees: Animated 3D Visualizations of Hierarchical Information," Proc. ACM Conf. Human Factors in Computer Systems (CHI 91), ACM Press, 1991, pp. 189-194.
[27] B. Shneiderman, “Tree Visualization with Treemaps: A 2D Space-Filling Approach,” ACM Trans. Graphics, vol. 11, no. 1, pp. 92-99, 1992.
[28] M. Consens and A. Mendelzon, "Hy+: A Hygraph-Based Query and Visualization System," Proc. ACM SIGMOD Conf., 1993.
[29] V. Vasudevan, "Supporting High Bandwidth Navigation in Object-Bases," Proc. 10th Int'l Conf. Data Engineering,Houston, Tex., pp. 294-301, 1994.
[30] R.A. Becker, S.G. Eick, and A.R. Wilks, “Visualizing Network Data,” IEEE Trans. Visualization and Computer Graphics, vol. 1, no. 1, pp. 16-28, Mar. 1995.
[31] A. Buja et al., “Interactive Data Visualization Using Focusing and Linking,” Proc. Visualization '91, pp. 156-163, 1991.
[32] C. Ahlberg, C. Williamson, and B. Shneiderman, “Dynamic Queries for Information Exploration: An Implementation and Evaluation,” Proc. ACM CHI Int'l Conf. Human Factors in Computing, pp. 619-626, 1992.
[33] V. Anupam, S. Dar, T. Leibfried, and E. Petajan, “DataSpace: 3-D Visualization of Large Databases,” Proc. Int'l Symp. Information Visualization, pp. 82-88, 1995.
[34] D.A. Keim, H.-P. Kriegel, and T. Seidl, “Supporting Data Mining of Large Databases by Visual Feedback Queries,” Proc. 10th Int'l Conf. Data Eng., pp. 302-313, 1994.
[35] D.A. Keim and H.-P. Kriegel, “VisDB: Database Exploration Using Multidimensional Visualization,” IEEE Computer Graphics&Applications, pp. 40-49, Sept. 1994.
[36] D.A. Keim, H.-P. Kriegel, and M. Ankerst, “Recursive Pattern: A Technique for Visualizing Very Large Amounts of Data,” Proc. Visualization '95, pp. 279-286, 1995.
[37] D. Asimov, “The Grand Tour: A Tool For Viewing Multidimensional Data,” SIAM J. Science and Statistical Computing, vol. 6, pp. 128-143, 1985.
[38] K. Koffka, Principles of Gestalt Psychology. New York: Harcourt-Brace, 1935. C. Ahlberg, and B. Shneiderman, “Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays,” Proc. Conf. Human Factors and Computing Systems (CHI '94), pp. 313-317, 479-480, 1994.
[39] R. Becker, J.M. Chambers, and A.R. Wilks, The New S Language. Wadsworth Brooks/Cole Advanced Books and Software, Pacific Grove, Calif., 1988.
[40] D.F. Swayne, D. Cook, and A. Buja, "User's Manual for XGobi: A Dynamic Graphics Program for Data Analysis," Bellcore Technical Memorandum, 1992.
[41] P.F. Velleman, "Data Desk 4.2: Data Description,"Ithaca, N.Y., 1992.
[42] G. Grinstein, R. Pickett, and M.G. Williams, "EXVIS: An Exploratory Visualization Environment," Proc. Graphics Interface '89, London,Ontario, Canada, 1989.
[43] M.O. Ward, "XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data," Proc. Visualization '94, IEEE CS Press, 1994, pp. 326-336.
[44] C. Ahlberg and E. Wistrand, "IVEE: An Environment for Automatic Creation of Dynamic Queries Applications," Conf. Companion, CHI 95 Human Factors in Computing Systems, ACM Press, New York, 1995, pp. 15-16.
[45] D.A. Keim and H.-P. Kriegel, "VisDB: A System for Visualizing Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data,San Jose, Calif., p. 482, 1995.
[46] G. Peano, "Sur une courbe qui remplit toute une aire plaine," Math. Annalen, vol. 36, pp. 157-160, 1890.
[47] D. Hilbert, "ber stetige Abbildung einer Linie auf ein Flchenstck," Math. Annalen, vol. 38, pp. 459-460, 1891.
[48] G.M. Morton, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing.Ottawa, Canada: IBM Ltd., 1966.
[49] G.T. Herman and H. Levkowitz, “Color Scales for Image Data,” Computer Graphics and Applications, pp. 72-80, 1992.
[50] D.A. Keim and H.-P. Kriegel, “Issues in Visualizing Large Databases,” Visual Database Systems, pp. 203-214, Chapman&Hall Ltd., 1995.
[51] D.A. Keim, "Visual Support for Query Specification and Data Mining," PhD thesis, ISBN38265-05948, Univ. of Munich, July 1994,Aachen, Germany: Shaker-Publishing Company, 1995.
[52] D.A. Keim, “Enhancing the Visual Clustering of Query-Dependent Database Visualization Techniques Using Screen-Filling Curves,” Proc. Workshop Database Issues for Data Visualization, 1995.
[53] J. Friedman and J. Tukey, "A Projection Pursuit Algorithm for Exploratory Data Analysis," IEEE Trans. Computers, vol. 23, pp. 881-890, 1974.
[54] H. Chernoff, "The Use of Faces to Represent Points in k-Dimensional Space Graphically," J. Am. Statistical Assoc., vol. 68, pp. 361-368, 1973.
[55] R.M. Pickett, "Visual Analyses of Texture in the Detection and Recognition of Objects," Picture Processing and Psycho-Pictorics. B.S. Lipkin and A. Rosenfeld, eds., New York: Academic Press, 1970.
[56] G. Grinstein, J.C. Sieg, S. Smith, and G.M. Williams, "Visualization for Knowledge Discovery," Technical Report, Computer Science Dept., Univ. of Massachusetts, Lowell, 1991.
[57] S. Feiner and C. Beshers, “Visualizing n-Dimensional Virtual Worlds with n-Vision,” Computer Graphics, vol. 24, no. 2, pp. 37-38, 1990.
[58] R.D. Bergeron, D.A. Keim, and R. Pickett, "Test Data Sets for Evaluating Data Visualization Techniques," Perceptual Issues in Visualization,Berlin: Springer-Verlag, pp. 922, 1994.
[59] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.

Index Terms:
Data mining, explorative data analysis, visualizing large databases, visualizing multidimensional, multivariate data.
Citation:
Daniel A. Keim, Hans-Peter Kriegel, "Visualization Techniques for Mining Large Databases: A Comparison," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 923-938, Dec. 1996, doi:10.1109/69.553159
Usage of this product signifies your acceptance of the Terms of Use.