The Community for Technology Leaders
RSS Icon
Issue No.05 - September/October (1999 vol.19)
pp: 22-31
Most automated clustering algorithms do not work effectively on high-dimensional data -- they are likely to miss clusters with certain unexpected characteristics. For example, the so-called "curse of dimensionality" makes it difficult to find the necessary parameters for tuning the clustering algorithms to the specific application. We propose novel visual mining techniques to overcome these problems. The idea is to support the critical steps of an advanced automated clustering algorithm by visualization techniques. The automated clustering algorithm uses projections of the point density of the high-dimensional data to find good separators between the clusters. The visualization techniques we use allow easy identification of important data characteristics. Since the number of interesting projections may become large, we provide different visualization techniques ranging from abstract iconic representations of the separation potential to pixel-oriented overview plots of the multi-dimensional projections. The visualizations also allow specifying complex hyper-polygonal separators directly within the visualization. This permits finding clusters that no automatic algorithm can determine. We integrated all the visualization techniques using a tree-like visualization of the projection and separator hierarchy. Experiments applied our new visualization techniques in a real application from molecular biology.
Visual Support of the Clustering Process, Visual Data Mining, Pixel-oriented Visualization Techniques, Iconic Visualization Techniques
Alexander Hinneburg, Daniel A. Keim, Markus Wawryniuk, "HD-Eye: Visual Mining of High-Dimensional Data", IEEE Computer Graphics and Applications, vol.19, no. 5, pp. 22-31, September/October 1999, doi:10.1109/38.788795
1. H. Sawhney and J. Hafner, “Efficient Color Histogram Indexing,” Proc. Int'l Conf. Image Processing, pp. 66-70, 1994.
2. R. Methrotra and J.E. Gray, "Feature-Index-Based Similar Shape Retrieval," Proc. 3rd Working Conf. on Visual Database Systems, Chapman and Hall, London, 1995, pp. 46-65.
3. T. Wallace and P. Wintz, "An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors," Computer Graphics and Image Processing, Academic Press, Vol. 13, 1980, pp. 99-126.
4. K. Kukich, “Techniques for Automatically Correcting Words in Text,” ACM Computing Surveys, vol. 24, no. 4, pp. 377-439, 1992.
5. R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
6. M. Ester et al., "Density-Connected Sets and their Application for Trend Detection in Spatial Databases," Proc. 3rd Int'l Conf. on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1997.
7. X. Xu et al., "A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases," Proc. 14th Int'l Conf. Data Eng., IEEE CS Press, 1998, pp. 324-331.
8. E. Schikuta, "Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets," Proc. 13th Int'l Conf. Pattern Recognition, IEEE CS Press, 1996, pp. 101-105.
9. T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1996, pp. 103-114.
10. W. Wang, J. Yang, and R.R. Muntz, "Sting: A Statistical Information Grid Approach to Spatial Data Mining," Proc. 23rd Int'l Conf. Very Large Databases, Morgan Kaufmann, 1997, pp. 186-195.
11. A. Hinneburg and D.A. Keim, "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1998, pp. 58-65.
12. A. Hinneburg and D.A. Keim, "Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering," Proc. 25th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1999, pp. 506-517.
13. B.W. Silverman, Density Estimation, Chapman and Hall, London, 1986.
14. D.W. Scott, Multivariate Density Estimation, Wiley and Sons, New York, 1992.
15. L. Breiman et al., Classification and Regression Trees, CRC Press, Monterey, Calif., 1984.
16. C. Faloutsos and K.I. Lin, “Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,” Proc. SIGMOD, Int'l Conf. Management of Data, pp. 163-174, 1995.
17. P.J. Huber, "Projection Pursuit," The Annals of Statistics, Vol. 13, No.2, 1985, pp. 435-474.
18. X. Daura et al., "Reversible Peptide Folding in Solution by Molecular Dynamics Simulation," J. Molecular Biology, Vol. 280, 1998, pp. 925-932.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool