This Article 
 Bibliographic References 
 Add to: 
HD-Eye: Visual Mining of High-Dimensional Data
September/October 1999 (vol. 19 no. 5)
pp. 22-31
Most automated clustering algorithms do not work effectively on high-dimensional data -- they are likely to miss clusters with certain unexpected characteristics. For example, the so-called "curse of dimensionality" makes it difficult to find the necessary parameters for tuning the clustering algorithms to the specific application. We propose novel visual mining techniques to overcome these problems. The idea is to support the critical steps of an advanced automated clustering algorithm by visualization techniques. The automated clustering algorithm uses projections of the point density of the high-dimensional data to find good separators between the clusters. The visualization techniques we use allow easy identification of important data characteristics. Since the number of interesting projections may become large, we provide different visualization techniques ranging from abstract iconic representations of the separation potential to pixel-oriented overview plots of the multi-dimensional projections. The visualizations also allow specifying complex hyper-polygonal separators directly within the visualization. This permits finding clusters that no automatic algorithm can determine. We integrated all the visualization techniques using a tree-like visualization of the projection and separator hierarchy. Experiments applied our new visualization techniques in a real application from molecular biology.

1. H. Sawhney and J. Hafner, “Efficient Color Histogram Indexing,” Proc. Int'l Conf. Image Processing, pp. 66-70, 1994.
2. R. Methrotra and J.E. Gray, "Feature-Index-Based Similar Shape Retrieval," Proc. 3rd Working Conf. on Visual Database Systems, Chapman and Hall, London, 1995, pp. 46-65.
3. T. Wallace and P. Wintz, "An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors," Computer Graphics and Image Processing, Academic Press, Vol. 13, 1980, pp. 99-126.
4. K. Kukich, “Techniques for Automatically Correcting Words in Text,” ACM Computing Surveys, vol. 24, no. 4, pp. 377-439, 1992.
5. R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
6. M. Ester et al., "Density-Connected Sets and their Application for Trend Detection in Spatial Databases," Proc. 3rd Int'l Conf. on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1997.
7. X. Xu et al., "A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases," Proc. 14th Int'l Conf. Data Eng., IEEE CS Press, 1998, pp. 324-331.
8. E. Schikuta, "Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets," Proc. 13th Int'l Conf. Pattern Recognition, IEEE CS Press, 1996, pp. 101-105.
9. T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103-114.
10. W. Wang, J. Yang, and R.R. Muntz, "Sting: A Statistical Information Grid Approach to Spatial Data Mining," Proc. 23rd Int'l Conf. Very Large Databases, Morgan Kaufmann, 1997, pp. 186-195.
11. A. Hinneburg and D.A. Keim, "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1998, pp. 58-65.
12. A. Hinneburg and D.A. Keim, "Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering," Proc. 25th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1999, pp. 506-517.
13. B.W. Silverman, Density Estimation, Chapman and Hall, London, 1986.
14. D.W. Scott, Multivariate Density Estimation, Wiley and Sons, New York, 1992.
15. L. Breiman et al., Classification and Regression Trees, CRC Press, Monterey, Calif., 1984.
16. C. Faloutsos and K.I. Lin, “Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,” Proc. SIGMOD, Int'l Conf. Management of Data, pp. 163-174, 1995.
17. P.J. Huber, "Projection Pursuit," The Annals of Statistics, Vol. 13, No.2, 1985, pp. 435-474.
18. X. Daura et al., "Reversible Peptide Folding in Solution by Molecular Dynamics Simulation," J. Molecular Biology, Vol. 280, 1998, pp. 925-932.
1. P.J. Huber, "Projection Pursuit," The Annals of Statistics, Vol. 13, No. 2, 1985, pp. 435-474.
2. A. Inselberg and B. Dimsdale, "Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry," Proc. Visualization '90, IEEE CS Press, 1990, pp. 361-370.
3. R.M. Pickett and G.G. Grinstein, “Iconographic Displays for Visualizing Multidimensional Data,” Proc. IEEE Conf. Systems, Man, and Cybernetics, pp. 514-519, 1988.
4. J. Beddow, “Shape Coding of Multidimensional Data on a Microcomputer Display,” Proc. Visualization '90, pp. 238-246, 1990.
5. D.A. Keim, Visual Support for Query Specification and Data Mining, PhD. thesis, University of Munich, July 1994 and Shaker-Publishing Company, Aachen, Germany, 1995.
6. D.A. Keim and H.-P. Kriegel, “VisDB: Database Exploration Using Multidimensional Visualization,” IEEE Computer Graphics&Applications, pp. 40-49, Sept. 1994.
7. J. LeBlanc, M.O. Ward, and N. Wittels, “Exploring N-Dimensional Databases,” Proc. Visualization '90, pp. 230-239, 1990.
8. G.G. Robertson, J.D. Mackinlay, and S.K. Card, "Cone Trees: Animated 3D Visualizations of Hierarchical Information," Proc. ACM Conf. Human Factors in Computer Systems (CHI 91), ACM Press, 1991, pp. 189-194.
9. S. Eick and G.J. Wills, “Navigating Large Networks with Hierarchies,” Proc. Visualization '93, pp. 204-210, 1993.
10. R.A. Becker, S.G. Eick, and A.R. Wilks, “Visualizing Network Data,” IEEE Trans. Visualization and Computer Graphics, vol. 1, no. 1, pp. 16-28, Mar. 1995.
11. D. Asimov, “The Grand Tour: A Tool For Viewing Multidimensional Data,” SIAM J. Science and Statistical Computing, vol. 6, pp. 128-143, 1985.
12. K. Koffka, Principles of Gestalt Psychology. New York: Harcourt-Brace, 1935. C. Ahlberg, and B. Shneiderman, “Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays,” Proc. Conf. Human Factors and Computing Systems (CHI '94), pp. 313-317, 479-480, 1994.
13. C. Ahlberg, C. Williamson, and B. Shneiderman, “Dynamic Queries for Information Exploration: An Implementation and Evaluation,” Proc. ACM CHI Int'l Conf. Human Factors in Computing, pp. 619-626, 1992.
14. A. Buja et al., “Interactive Data Visualization Using Focusing and Linking,” Proc. Visualization '91, pp. 156-163, 1991.
15. M. Sarkar and M. Brown, “Graphical Fisheye Views,” Comm. ACM, vol. 37, no. 12, pp. 73-84, 1994.
16. J. Lamping, R. Rao, and P. Pirolli, “A Focus + Context Technique Based on Hyperbolic Geometry for Visualizing Large Hierarchies,” Proc. Human Factors in Computing Systems CHI '95 Conf., pp. 401-408, 1995.

Index Terms:
Visual Support of the Clustering Process, Visual Data Mining, Pixel-oriented Visualization Techniques, Iconic Visualization Techniques
Alexander Hinneburg, Daniel A. Keim, Markus Wawryniuk, "HD-Eye: Visual Mining of High-Dimensional Data," IEEE Computer Graphics and Applications, vol. 19, no. 5, pp. 22-31, Sept.-Oct. 1999, doi:10.1109/38.788795
Usage of this product signifies your acceptance of the Terms of Use.