CSDL Home IEEE Transactions on Visualization & Computer Graphics 2010 vol.16 Issue No.06 - November/December

Subscribe

Issue No.06 - November/December (2010 vol.16)

pp: 1281-1290

Fernando V. Paulovich , Universidade de São Paulo (USP)

Luis G. Nonato , Universidade de São Paulo (USP)

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TVCG.2010.207

ABSTRACT

Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.

INDEX TERMS

Dimensionality Reduction; Projection Methods; Visual Data Mining; Streaming Technique

CITATION

Fernando V. Paulovich, Luis G. Nonato, "Two-Phase Mapping for Projecting Massive Data Sets",

*IEEE Transactions on Visualization & Computer Graphics*, vol.16, no. 6, pp. 1281-1290, November/December 2010, doi:10.1109/TVCG.2010.207REFERENCES

- [1] D. Achlioptas, Database-friendly random projections: Johnson-lindenstrauss with binary coins.
J. Comput. Syst. Sci., 66 (4): 671–687, 2003.- [2] A. Asuncion and D. Newman, UCI machine learning repository, 2007.
- [3] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation.
Neural Comput., 15 (6): 1373–1396, 2003.- [4] U. Brandes and C. Pich, Eigensolver methods for progressive multidimensional scaling of large data. In M. Kaufmann, and D. Wagner editors,
Lecture notes in Computer Science, volume 4372, pages 42–53. 2007.- [5] M. M. Bronstein, A. M. Bronstein, R. Kimmel, and I. Yavneh, Multigrid multidimensional scaling.
Numerical Linear Algebra with Applications, 13: 149–171, 2006.- [6] O. Bruno, L. G. Nonato, M. Pazoti, and J. Batista, Topological multi-contour decomposition for image analysis and image retrieval.
Pattern Recognition Letters, 29: 1675–1683, 2008.- [7] M. Chalmers, A linear iteration time layout algorithm for visualizing high-dimensional data.
In IEEE Visualization, pages 127–ff., 1996.- [8] J. de Leeuw, Applications of convex analysis to multidimensional scaling.
Recent Developments in Statistics, pages 133–146, 1977.- [9] V. de Silva, J. Tenenbaum, Sparse multidimensional scaling using landmark points.
Technical report, Stanford, 2004.- [10] D. Donoho and C. Grimes, Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data.
Proc. Natl. Acad. Sci., 100: 5591–5596, 2003.- [11] P. A. Eades, A heuristic for graph drawing.
In Congressus Numerantium, volume 42, pages 149–160, 1984.- [12] N. Elmqvist, P. Dragicevic, and J.-D. Fekete, Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation.
IEEE Trans. Vis. Comp. Graph., 14 (6): 1141–1148, 2008.- [13] C. Faloutsos and K. Lin, Fastmap: A fast algorithm for indexing, datamining and visualization of traditional and multimedia databases.
In ACM SIGMOD, pages 163–174, 1995.- [14] Y. Frishman and A. Tal, Multi-level graph layout on the gpu.
IEEE Trans Vis Comput Graph., 13: 1310–1319., 2007.- [15] E. R. Gansner, Y. Koren, and S. North, Graph drawing by stress majorization.
In Lecture Notes in Computer Science, volume 3383, pages 239–250. Springer, 2005.- [16] J. Heinrich and D. Weiskopf, Continuous parallel coordinates.
IEEE Trans. Vis. Comp. Graph., 15 (6): 1531–1538, 2009.- [17] S. Ingram and T. Munzner, and M. Olano, Glimmer: Multilevel mds on the gpu.
IEEE Trans. Vis. Comp. Graph., 15 (2): 249–261, 2009.- [18] I. Jolliffe,
Principal Component Analysis. Springer, second edition, 2002.- [19] F. Jourdan and G. Melançon, Multiscale hybrid mds.
In Information Visualisation, pages 388–393, 2004.- [20] Y. Koren, L. Carmel, and D. Harel, Ace: A fast multiscale eigenvectors computation for drawing huge graphs.
In IEEE Information Visualization, page 137, 2002.- [21] J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
Psychometrika, 29: 115–129, 1964.- [22] C. D. Meyer,
Matrix Analysis and Applied Linear Algebra. SIAM, 2000.- [23] A. Morrison, G. Ross, and M. Chalmers, A hybrid layout algorithm for sub-quadratic multidimensional scaling.
In IEEE Information Visualization, page 152, 2002.- [24] F. V. Paulovich and R. Minghim, HiPP: A novel hierarchical point placement strategy and its application to the exploration of document collections.
IEEE Trans. Visual. Comp. Graph., 14 (6): 1229–1236, 2008.- [25] F. V. Paulovich, L. G. Nonato, R. Minghim, and H. Levkowitz, Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping.
IEEE Transactions on Visualization and Computer Graphics, 14 (3): 564–575, 2008.- [26] E. Pekalska, D. de Ridder, R. P.W. Duin, and M. A. Kraaijveld, A new method of generalizing Sammon mapping with application to algorithm speed-up. In M. Boasson, J. A. Kaandorp, J. F.M. Tonino, and M. G. Vosselman editors,
, Annual Conference of the Advanced School for Computing and Imaging, pages 221–228, 1999.- [27] J. Platt, Fastmap, metricmap, and landmark mds are all nyström algorithms.
In Intl. Workshop Artificial Intelligence and Statistics, pages 261–268, 2005.- [28] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding.
Science, 290 (5500): 2323–2326, December 2000.- [29] J. W. Sammon, A nonlinear mapping for data structure analysis.
In IEEE Transactions on Computers, volume C-18, pages 401–409, May 1969.- [30] J. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain. http://www.cs.cmu.edu/quake-paperspainless-conjugate-gradient.pdf, 1994.
- [31] V. D. Silva and J. B. Tenenbaum, Global versus local methods in nonlinear dimensionality reduction.
In Advances in Neural Information Processing Systems 15, pages 705–712. MIT Press, 2003.- [32] M. Sips, B. Neubert, J. P. Lewis, and P. Hanrahan, Selecting good views of high-dimensional data using class consistency.
Computer Graphics Forum, 28 (3): 831–838, 2009.- [33] M. Steinbach, G. Karypis, and V. Kumar, A comparison of document clustering techniques.
In Workshop on Text Mining, ACM SIGKDD International Conference on Data Mining, pages 109–110, 2000.- [34] P. Tan, M. Steinbach, and V. Kumar,
Introduction to Data Mining. Addison-Wesley, 2005.- [35] E. Tejada, R. Minghim, and L. G. Nonato, On improved projection techniques to support visual exploration of multidimensional data sets.
Information Visualization, 2 (4): 218–231, 2003.- [36] J. B. Tenenbaum, V. de Silva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction.
Science, 290 (5500): 2319–2323, December 2000.- [37] W. S. Torgeson, Multidimensional scaling of similarity.
Psychometrika, 30: 379–393, 1965.- [38] D. Whalen and M. L. Norman, Competition data set and description.
In 2008 IEEE Visualization Design Contest. http://vis.computer.org/VisWeek2008/viscontests.html, 2008.- [39] M. Williams and T. Munzner, Steerable, progressive multidimensional scaling.
In INFOVIS'04, pages 57–64, 2004. |