Subscribe

Issue No.03 - May/June (2008 vol.14)

pp: 564-575

ABSTRACT

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analysis of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections (LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in $mD$. In order to perform the projection, a small number of distance calculations is necessary and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in $2D$. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high quality methods, particularly where it was mostly tested, that is, for mapping text sets.

INDEX TERMS

Multivariate visualization, Data and knowledge visualization, Information visualization

CITATION

Luis G. Nonato, Rosane Minghim, Haim Levkowitz, "Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping",

*IEEE Transactions on Visualization & Computer Graphics*, vol.14, no. 3, pp. 564-575, May/June 2008, doi:10.1109/TVCG.2007.70443REFERENCES

- [2] P. Berkhin, “Survey of Clustering Data Mining Techniques,” technical report, Accrue Software, 2002.
- [3] K. Borner, C. Chen, and K. Boyack, “Visualizing Knowledge Domains,”
Ann. Rev. Information Science and Technology, vol. 37, pp.1-51, 2003.- [7] I.T. Jolliffe,
Principal Component Analysis. Springer-Verlag, 1986.- [8] K.V. Mardia, J.T. Kent, and J.M. Bibby, “Multivariate Analysis,”
Probability and Mathematical Statistics. Academic Press, 1995.- [9] T.F. Cox and M.A.A. Cox,
Multidimensional Scaling, second ed. Chapman and Hall/CRC, 2000.- [10] J.W. Sammon, “A Nonlinear Mapping for Data Structure Analysis,”
IEEE Trans. Computers, vol. 13, pp. 401-409, May 1964.- [14] E. Pekalska, D. de Ridder, R.P.W. Duin, and M.A. Kraaijveld, “A New Method of Generalizing Sammon Mapping with Application to Algorithm Speed-Up,”
Proc. Fifth Ann. Conf. Advanced School for Computing and Imaging (ASCI '99), pp. 221-228, June 1999.- [15] T.M.J. Fruchterman and E.M. Reingold, “Graph Drawing by Force-Directed Placement,”
Software—Practice and Experience, vol. 21, no. 11, pp. 1129-1164, 1991.- [16] P.A. Eades, “A Heuristic for Graph Drawing,”
Congressus Numerantium, vol. 42, pp. 149-160, 1984.- [21] A. Morrison and M. Chalmers, “A Pivot-Based Routine for Improved Parent-Finding in Hybrid MDS,”
Information Visualization, vol. 3, no. 2, pp. 109-122, 2004.- [24] J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow, “Visualizing the Non-Visual: Spatial Analysis and Interaction with Information for Text Documents,”
Readings in Information Visualization: Using Vision to Think, pp. 442-450, Morgan Kaufmann, 1995.- [26]
${\rm IN}\hbox{-}{\rm SPIRE}^{TM}$ Visual Document Analysis. Pacific Northwest Nat'l Laboratory (PNL), http:/in-spire.pnl.gov/, 2007.- [28] O. Sorkine, Y. Lipman, D. Cohen-Or, M. Alexa, C. Rössl, and H. Seidel, “Laplacian Surface Editing,”
Proc. Eurographics/ACM SIGGRAPH Symp. Geometry Processing, pp. 179-188, 2004.- [30] W.T. Tutte,
How to Draw a Graph, no. 13, pp. 743-768, 1963.- [31] M. Martín-Merino and A. Muñoz, “A New Sammon Algorithm for Sparse Data Visualization,”
Proc. 17th Int'l Conf. Pattern Recoginition (ICPR), 2004.- [34] C. Faloutsos, K. Lin, “Fastmap: A Fast Algorithm for Indexing, Datamining and Visualization of Traditional and Multimedia Databases,”
Proc. ACM SIGMOD '95, pp. 163-174, 1995.- [35] S. Hettich and S.D. Bay,
The UCI KDD Archive, http:/kdd.ics.uci.edu, 1999.- [37] J.D. Fekete, G. Grinstein, and C. Plaisant,
IEEE InfoVis 2004 Contest: The History of InfoVis, www.cs.umd.edu/hciliv04contest, 2004.- [38] R. Cilibrasi and P. Vitanyi, “Clustering by Compression,”
IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1523-1545, 2005.- [41] J.R. Shewchuck,
An Introduction to the Conjugate Gradient Method without the Agonizing Pain, http://www.cs.cmu.edu/quake-paperspainless-conjugate-gradient.ps , Aug. 1994. |