This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization
September 2006 (vol. 28 no. 9)
pp. 1393-1403
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.

[1] Y. Weiss, “Segmentation Using Eigenvectors: A Unifying View,” Proc. IEEE Int'l Conf. Computer Vision, vol. 14, pp. 975-982, 1999.
[2] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[3] M. Meila and J. Shi, “A Random Walk's View of Spectral Segmentation,” AI and Statistics (AISTATS), 2001.
[4] S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, 2000.
[5] Z. Zhang and H. Zha, “Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignement,” Technical Report CSE-02-019, Dept. of Computer Science and Eng., Pennsylvania State Univ., 2002.
[6] M. Belkin and P. Niyogi, “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation,” Neural Computation, vol. 6, no. 15, pp. 1373-1396, June 2003.
[7] D.L. Donoho and C. Grimes, “Hessian Eigenmaps: New Locally Linear Embedding Techniques for High-Dimensional Data,” Proc. Nat'l Academy of Sciences, vol. 100, no. 10, pp. 5591-5596, May 2003.
[8] F. Chung, Spectral Graph Theory, no. 92, CBMS-AMS, May 1997.
[9] R.R. Coifman and S. Lafon, “Diffusion Maps,” Applied and Computational Harmonic Analysis, to appear.
[10] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, “Geometric Diffusions as a Tool for Harmonics Analysis and Structure Definition of Data: Diffusion Maps,” Proc. Nat'l Academy of Sciences, vol. 102, no. 21, pp. 7426-7431, 2005.
[11] M. Szummer and T. Jaakkola, “Partially Labeled Classification with Markov Random Walks,” Advances in Neural Information Processing Systems, vol. 14, 2001.
[12] I.S. Dhillon, Y. Guan, and B. Kulis, “Kernel K-Means, Spectral Clustering and Normalized Cuts,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2004.
[13] V. de Silva, J.B. Tenenbaum, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, 2000.
[14] B. Nadler, S. Lafon, R.R. Coifman, and I. Kevrekidis, “Diffusion Maps, Spectral Clustering and the Reaction Coordinates of Dynamical Systems,” Applied and Computational Harmonic Analysis, to appear.
[15] Private communication with R.R. Coifman.
[16] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans. Information Theory, vol. 28, no. 2, pp. 129-138, 1982.
[17] C.E. Priebe, D.J. Marchette, Y. Park, E.J. Wegman, J.L. Solka, D.A. Socolinsky, D. Karakos, K.W. Church, R. Guglielmi, R.R. Coifman, D. Lin, D.M. Healy, M.Q. Jacobs, and A. Tsao, “Iterative Denoising for Cross-Corpus Discovery,” Proc. IEEE Int'l Conf. Computer Vision, pp. 975-982, 2004.
[18] B. Schölkopf, A.J. Smola, and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[19] V.N. Vapnik, The Nature of Statistical Learning Theory, second ed. 1995.
[20] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
[21] F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens, “A Novel Way of Computing Similarities between Nodes of a Graph, with Application to Collaborative Recommendation,” Web Intelligence, 2005.
[22] R.R. Coifman and M. Maggioni, “Diffusion Wavelets,” Applied and Computational Harmonic Analysis, to appear.
[23] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, “Geometric Diffusions as a Tool for Harmonics Analysis and Structure Definition of Data: Multiscale Methods,” Proc. Nat'l Academy of Sciences, vol. 102, no. 21, pp. 7432-7437, 2005.

Index Terms:
Machine learning, text analysis, knowledge retrieval, quantization, graph-theoretic methods, compression (coding), clustering, clustering similarity measures, information visualization, Markov processes, graph algorithms.
Citation:
St?phane Lafon, Ann B. Lee, "Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1393-1403, Sept. 2006, doi:10.1109/TPAMI.2006.184
Usage of this product signifies your acceptance of the Terms of Use.