This Article 
 Bibliographic References 
 Add to: 
Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression
September 2007 (vol. 29 no. 9)
pp. 1546-1562
In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm which depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data

[1] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[3] M. Tipping and C. Bishop, “Mixtures of Probabilistic Principal Component Analyzers,” Neural Computation, vol. 11, no. 2, pp.443-482, 1999.
[4] M.A.T. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.
[5] R. Vidal, Y. Ma, and S. Sastry, “Generalized Principal Component Analysis (GPCA),” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1-15, Dec. 2005.
[6] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans. Information Theory, vol. 28, no. 2, pp. 129-137, Mar. 1982.
[7] E. Forgy, “Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classifications (Abstract),” Biometrics, vol. 21, pp. 768-769, 1965.
[8] R. Jancey, “Multidimensional Group Analysis,” Australian J.Botany, vol. 14, pp. 127-130, 1966.
[9] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math., Statistics, and Probability, pp. 281-297, 1967.
[10] K. Rose, “Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems,” Proc. IEEE, vol. 86, no. 11, pp. 2210-2239, 1998.
[11] E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell, “Distance Metric Learning, with Application to Clustering with Side Information,” Proc. Ann. Conf. Neural Information Processing Systems, 2002.
[12] J. Ho, M. Yang, J. Lim, K. Lee, and D. Kriegman, “Clustering Appearances of Objects under Varying Illumination Conditions,” Proc. Int'l Conf. Computer Vision and Pattern Recognition, 2003.
[13] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, no. B, pp. 1-38, 1977.
[14] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions. John Wiley & Sons, 1997.
[15] Z. Ghahramani and G.E. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRG-TR-96-1, Dept. of Computer Science, Univ. of Toronto, 1996.
[16] N. Ueda, R. Nakan, and Z. Ghahramani, “SMEM Algorithm for Mixture Models,” Neural Computation, vol. 12, pp. 2109-2128, 2000.
[17] T. Cover and J. Thomas, Elements of Information Theory. Wiley Series in Telecomm., 1991.
[18] J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 465-471, 1978.
[19] A. Barron, J. Rissanen, and B. Yu, “The Minimum Description Length Principle in Coding and Modeling,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2743-2760, 1998.
[20] M. Hansen and B. Yu, “Model Selection and the Principle of Minimum Description Length,” J. Am. Statistical Assoc., vol. 96, pp.746-774, 2001.
[21] M. Madiman, M. Harrison, and I. Kontoyiannis, “Minimum Description Length versus Maximum Likelihood in Lossy Data Compression,” Proc. 2004 IEEE Int'l Symp. Information Theory, 2004.
[22] K. Rose, “A Mapping Approach to Rate-Distortion Computation and Analysis,” IEEE Trans. Information Theory, vol. 40, no. 6, pp.1939-1952, 1994.
[23] H. Benson, “Concave Minimization: Theory, Applications and Algorithms,” Handbook of Global Optimization, R. Horst and P.M.Pardalos, eds., 1994.
[24] J. Ward, “Hierarchical Grouping to Optimize and Objective Function,” J. Am. Statistical Assoc., vol. 58, pp. 236-244, 1963.
[25] S. Kamvar, D. Klein, and C. Manning, “Interpreting and Extending Classical Agglomerative Clustering Methods Using a Model-Based Approach,” Technical Report 2002-11, Dept. of Computer Science, Stanford Univ., 2002.
[26] J. Hamkins and K. Zeger, “Gaussian Source Coding with Spherical Codes,” IEEE Trans. Information Theory, vol. 48, no. 11, pp. 2980-2989, 2002.
[27] D. Donoho, M. Vetterli, R. DeVore, and I. Daubechies, “Data Compression and Harmonic Analysis,” IEEE Trans. Information Theory, vol. 44, no. 6, pp. 2435-2476, 1998.
[28] R.A. Horn and C.R. Johnson, Matrix Analysis. Cambridge Univ. Press, 1985.
[29] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[30] Z. Ghahramani and G. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRG-TR-96-1, Univ. of Toronto, 1996.
[31] Z. Ghahramani and M. Beal, “Variational Inference for Bayesian Mixtures of Factor Analyzers,” Advances in Neural Information Processing Systems, vol. 12, pp. 449-455, 2000.
[32] J. Malik, S. Belongie, T. Leung, and J. Shi, “Contour and Texture Analysis for Image Segmentation,” Int'l J. Computer Vision, vol. 43, no. 1, pp. 7-27, 2001.
[33] S. Zhu, C. Guo, Y. Wu, and Y. Wang, “What Are Textons,” Proc. European Conf. Computer Vision, pp. 793-807, 2002.
[34] G. Mori, “Guiding Model Search Using Segmentation,” Proc. IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1417-1423, 2005.
[35] A. Yang, J. Wright, W. Hong, and Y. Ma, “Segmentation of Natural Images via Lossy Data Compression,” technical report, Coordinated Science Laboratory, Univ. of Illi nois, 2006.
[36] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge Univ. Press, 2005.

Index Terms:
Multivariate Mixed Data, Data Segmentation, Data Clustering, Rate Distortion, Lossy Coding, Lossy Compression, Image Segmentation, Microarray Data Clustering
Yi Ma, Harm Derksen, Wei Hong, John Wright, "Segmentation of Multivariate Mixed Data via Lossy Data Coding and Compression," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 9, pp. 1546-1562, Sept. 2007, doi:10.1109/TPAMI.2007.1085
Usage of this product signifies your acceptance of the Terms of Use.