The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2014 vol.20)
pp: 351-364
Jenny Hyunjung Lee , Dept. of Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA
Kevin T. McDonnell , Dept. of Math. & Comput. Sci, Dowling Coll., Oakdale, NY, USA
Alla Zelenyuk , Pacific Northwest Nat. Lab., Richland, WA, USA
Dan Imre , Imre Consulting, Richland, WA, USA
Klaus Mueller , Dept. of Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA
ABSTRACT
Although the euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging intercluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multidimensional scaling (MDS) where one can often observe nonintuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly in high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our biscale framework distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate euclidean distance.
INDEX TERMS
Layout, Euclidean distance, Correlation, Indexes, Data visualization, Extraterrestrial measurements,visual analytics, Information visualization, multivariate visualization, clustering, high-dimensional data
CITATION
Jenny Hyunjung Lee, Kevin T. McDonnell, Alla Zelenyuk, Dan Imre, Klaus Mueller, "A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multidimensional Scaling", IEEE Transactions on Visualization & Computer Graphics, vol.20, no. 3, pp. 351-364, March 2014, doi:10.1109/TVCG.2013.101
REFERENCES
[1] A. Artero, M. de Olivera, and H. Levkowitz, "Enhanced High-Dimensional Data Visualization through Dimension Reduction and Attribute Arrangement," Proc. 10th Int'l Conf. Information Visualization (InfoVis), pp. 707-712, 2006.
[2] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton Univ. Press, 1961.
[3] K. Bennett, U. Fayyad, and D. Geiger, "Density-Based Indexing for Approximate Nearest-Neighbor Queries," Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 233-243, 1999.
[4] J. Bertin, Semiology of Graphics: Diagrams, Networks, Maps, Translation of 1967 French Original. ESRI Press, 2010.
[5] U. Brandes and C. Pich, "Eigensolver Methods for Progressive Multidimensional Scaling of Large Data," Proc. 14th Int'l Conf. Graph Drawing, 2006.
[6] M. Bronstein, A. Bronstein, R. Kimmel, and I. Yavneh, "Multigrid Multidimensional Scaling," Numerical Linear Algebra with Applications, vol. 13, pp. 149-171, 2006.
[7] J. Choo, S. Bohn, and H. Park, "Two-Stage Framework for Visualization of Clustered High Dimensional Data," Proc. IEEE Visual Analytics Science and Technology Conf. (VAST), pp. 67-74, 2009.
[8] J. Choo, H. Lee, J. Kim, and H. Park, "iVisClassifier: An Interactive Visual Analytics System for Classification Based on Supervised Dimension Reduction," Proc. IEEE Visual Analytics Science and Technology Conf. (VAST), pp. 27-34, 2010.
[9] R. Coifman and S. Lafon, "Diffusion Maps," Applied and Computational Harmonic Analysis, vol. 21, pp. 5-30, 2006.
[10] E. Gansner and Y. Hu, "Efficient Node Overlap Removal Using a Proximity Stress Model," Proc. Symp. Graph Drawing, pp. 206-217, 2008.
[11] J. Hartigan, "Printer Graphics for Clustering," J. Statistical Computation and Simulation, vol. 4, no. 3, pp. 187-213, 1975.
[12] S. Ingram, T. Munzner, and M. Olano, "Glimmer: Multilevel MDS on the GPU," IEEE Trans. Visualization and Computer Graphics, vol. 15, no. 2, pp. 249-261, Mar. 2009.
[13] A. Inselberg and B. Dimsdale, "Parallel Coordinates: A tool for Visualizing Multi-Dimensional Geometry," Proc. IEEE Visualization, pp. 361-378, 1990.
[14] L.J.P. van der Maaten and G.E. Hinton, "Visualizing Data Using T-SNE," J. Machine Learning Research, vol. 9, pp. 2579-2605, 2008.
[15] J. Kruskal, "Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis," Psychometrika, vol. 29, no. 1, pp. 1-27, 1964.
[16] P. Larranaga, C.M.H. Kuijpers, R.H. Murga, I. Inza, and S. Dizdarevic, "Genetic Algorithms for the Travelling Salesman Problems: A Review of Representations and Operators," Artificial Intelligence Rev., vol. 13, pp. 129-170, 1999.
[17] J. Nam and K. Mueller, "TripAdvisorN-D: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail," IEEE Trans. Visualization and Computer Graphics, vol. 19, no. 2, pp. 291-305, Feb. 2012.
[18] K. McDonnell and K. Mueller, "Illustrative Parallel Coordinates," Computer Graphics Forum, vol. 27, no. 3, pp. 1031-1038, 2008.
[19] P. Oesterling, G. Scheuermann, S. Teresniak, G. Heyer, S. Koch, T. Ertl, and G. Weber, "Two-Stage Framework for a Topology-Based Projection and Visualization of Classified Document Collections," Proc. IEEE Symp. Visual Analytics Science and Technology (VAST), pp. 91-98, 2010.
[20] S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[21] H. Siirtola, T. Laivo, T. Heimonen, and K. Raiha, "Visual Perception of Parallel Coordinate Visualizations," Proc. Int'l Conf. Information Visualisation, pp. 3-9, July 2009.
[22] V. de Silva and J. Tenenbaum, "Global versus Local Methods in Nonlinear Dimensionality Reduction," Proc. Conf. Neural Information Processing Systems (NIPS), pp. 705-712, 2003.
[23] J.B. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[24] W. Torgerson, "Multidimensional Scaling: I. Theory and Method," Psychometrika, vol. 17, pp. 401-419, 1952.
[25] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
[26] Z. Wang and A.C. Bovik, "A Universal Image Quality Index," IEEE Signal Processing Letters, vol. 9, no. 3, pp. 81-84, Mar. 2002.
[27] M. Williams and T. Munzner, "Steerable, Progressive Multidimensional Scaling," Proc. IEEE Symp. Information Visualization, pp. 57-64, 2004.
[28] J. Yang, D. Hubball, M. Ward, E. Rundensteiner, and W. Ribarsky, "Value and Relation Display: Interactive Visual Exploration of Large Data Sets with Hundreds of Dimensions," IEEE Trans. Visualization and Computer Graphics, vol. 13, no. 3, pp. 494-507, May/June 2007.
[29] A. Zelenyuk and D. Imre, "Single Particle Laser Ablation Time-of-Flight Mass Spectrometer: An Introduction to Splat," Aerosol Science and Technology, vol. 39, no. 6, pp. 554-568, 2005.
[30] Z. Zhang, K. McDonnell, and K. Mueller, "A Network-Based Interface for the Exploration of High-Dimensional Data Spaces," Proc. IEEE Pacific Visualization Symp., pp. 17-24, Mar. 2012.
[31] http://archive.ics.uci.edu/ml/data sets/ Concrete+Compressive+ Strength (accessed 3 2011), 2013.
[32] http://archive.ics.uci.edu/ml/datasets/Waveform+Database+ Generator+percent;28Version+1 percent29 (accessed 3 2011) , 2013.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool