Publication 1997 Issue No. 1 - January Abstract - Pairwise Data Clustering by Deterministic Annealing
Pairwise Data Clustering by Deterministic Annealing
January 1997 (vol. 19 no. 1)
pp. 1-14
 ASCII Text x Thomas Hofmann, Joachim M. Buhmann, "Pairwise Data Clustering by Deterministic Annealing," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, January, 1997.
 BibTex x @article{ 10.1109/34.566806,author = {Thomas Hofmann and Joachim M. Buhmann},title = {Pairwise Data Clustering by Deterministic Annealing},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {19},number = {1},issn = {0162-8828},year = {1997},pages = {1-14},doi = {http://doi.ieeecomputersociety.org/10.1109/34.566806},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - Pairwise Data Clustering by Deterministic AnnealingIS - 1SN - 0162-8828SP1EP14EPD - 1-14A1 - Thomas Hofmann, A1 - Joachim M. Buhmann, PY - 1997KW - pairwise data clusteringKW - deterministic annealingKW - maxiumum entropy methodKW - multidimensional scalingKW - texture segmentationKW - exploratory data analysisKW - nonlinear dimensionality reduction.VL - 19JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -

Abstract—Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness properties of maximum entropy inference. The resulting Gibbs probability distributions are estimated by mean-field approximation. A new structure-preserving algorithm to cluster dissimilarity data and to simultaneously embed these data in a Euclidian vector space is discussed which can be used for dimensionality reduction and data visualization. The suggested embedding algorithm which outperforms conventional approaches has been implemented to analyze dissimilarity data from protein analysis and from linguistics. The algorithm for pairwise data clustering is used to segment textured images.

[1] E.T. Jaynes, "Information Theory and Statistical Mechanics," Physical Review, vol. 106, pp. 620-630, 1957.
[2] E.T. Jaynes, "Information Theory and Statistical Mechanics II," Physical Review, vol. 108, pp. 171-190, 1957.
[3] E.T. Jaynes, "On the Rationale of Maximum-Entropy Methods," Proc IEEE, vol. 70, pp. 939-952, 1982.
[4] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data.Englewood Cliffs, NJ: Prentice Hall, 1988.
[5] G.J. McLachlan and K.E. Basford, Mixture Models.New York, Basel: Marcel Dekker, Inc., 1988.
[6] R.M. Gray, "Vector Quantization," IEEE Acoustics, Speech and Signal Processing, pp. 4-29, Apr. 1984.
[7] A. Gersho and R.M. Gray, Vector Quantization and Signal Processing.Boston: Kluwer Academic Publisher, 1992.
[8] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.New York: Wiley, 1973.
[9] P. Simic, "Statistical Mechanics as the Underlying Theory of "Elastic" and "Neural" Optimizations," Network, vol. 1, pp. 89-103, 1990.
[10] P. Simic, "Constrained Nets for Graph Matching and Other Quadratic Assignment Problems," Neural Computation, vol. 3, pp. 268-281, 1991.
[11] A. Yuille, P. Stolorz, and J. Utans, "Statistical Physics, Mixtures of Distributions and the EM Algorithm," Neural Computation, vol. 6, pp. 334-340, 1994.
[12] S. Gold and A. Rangarajan, “A Graduated Assignment Algorithm for Graph Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 4, pp. 377-388, Apr. 1996.
[13] D. Geiger and F. Girosi,“Parallel and deterministic algorithms from MRFs: Surface reconstruction,” IEEE Transactions on PAMI, vol. 13, no. 5, pp. 401-412, May 1991.
[14] A.L. Yuille, "Generalized Deformable Models, Statistical Physics and Matching Problems," Neural Computation, vol. 2, no. 1, pp. 1-24, 1990.
[15] C. Bregler and S. Omohundro, "Surface Learning with Applications to Lipreading," Advances in Neural Information Processing Systems vol. 6, J. Cowan, G. Tesauro, and J. Alspector, eds., 1994.
[16] K. Rose, E. Gurewitz, and G. Fox, "Statistical Mechanics and Phase Transitions in Clustering," Physical Review Letters, vol. 65, no. 8, pp. 945-948, 1990.
[17] K. Rose, E. Gurewitz, and G. Fox, "A Deterministic Annealing Approach to Clustering," Pattern Recognition Letters, vol. 11, no. 11, pp. 589-594, 1990.
[18] K. Rose, E. Gurewitz, and G. Fox, "Vector Quantization by Deterministic Annealing," IEEE Trans Information Theory, vol. 38, no. 4, pp. 1249-1257, 1992.
[19] K. Rose,E. Gurewitz,, and G.C. Fox,“Constrained clustering as an optimization method,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.15, pp. 785-794, 1993.
[20] J.M. Buhmann and H. Kühnel, "Complexity Optimized Data Clustering by Competitive Neural Networks," Neural Computation, vol. 5, pp. 75-88, 1993.
[21] J.M. Buhmann and H. Kühnel, "Vector Quantization with Complexity Costs," IEEE Trans Information Theory, vol. 39, pp. 1,133-1,145, July 1993.
[22] P.A. Chou, T. Lookabaugh, and R.M. Gray, "Entropy-Constrained Vector Quantization," IEEE Trans Acoustics, Speech and Signal Processing, vol. 37, pp. 31-42, 1989.
[23] S. Kirkpatrick, C. Gelatt, and M. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, pp. 671-680, 1983.
[24] V. ${\rm \mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}}\over C} erny}$, "Thermodynamical Approach to the Traveling Salesman Problem: an Efficient Simulation Algorithm," J. Optimization Theory and Applications, vol. 45, pp. 41-51, 1985.
[25] C.W. Gardiner, Handbook of Stochastic Methods.Berlin: Springer, 1983.
[26] Y. Tikochinsky, N. Tishby, and R.D. Levine, "Alternative Approach to Maximum-Entropy Inference," Physical Review A, vol. 30, pp. 2638-2644, 1984.
[27] I. Csiszár, "I-Divergence, Geometry of Probability Distributions and Minimization Problems," Annals Of Probability, vol. 3, pp. 146-158, 1975.
[28] T. Kohonen, Self-Organization and Associative Memory.Berlin: Springer, 1984.
[29] H. Ritter, T. Martinetz, and K. Schulten, Neural Computation and Self-Organizing Maps.New York: Addison Wesley, 1992.
[30] T.M. Cover and J.A. Thomas, Elements of Information Theory.New York: John Wiley and Sons, 1991.
[31] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Society Ser. B (methodological), vol. 39, pp. 1-38, 1977.
[32] T. Hofmann, J. Puzicha, and J. M. Buhmann, "Unsupervised Segmentation of Textured Images by Pairwise Data Clustering," Technical Report IAI-TR-96-2, Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Informatik III, Feb. 1996.
[33] M. Mezard, G. Parisi, and M.A. Virasoro, Spin Glass Theory and Beyond.Singapore: World Scientific, 1987.
[34] R.E. Peierls, "On a Minimum Property of the Free Energy," Physical Review, vol. 54, p. 918, 1938.
[35] D.J. Thouless, P.W. Anderson, and R.G. Palmer, "A Solution to a "Solvable" Model of a Spin Glass," Philosophical Magazine, vol. 35, p. 593, 1977.
[36] J.W. Sammon Jr, "A Non-Linear Mapping for Data Structure Analysis," IEEE Trans. Computers, vol. 18, pp. 401-409, 1969.
[37] J.B. Kruskal, "Nonmetric Multidimensional Scaling: a Numerical Method," Psychometrika, vol. 29, pp. 115-129, 1964.
[38] P. Huber, "Projection Pursuit," Annals of Statistics, vol. 13, pp. 435-475, 1985.
[39] T. Hastie and W. Stuetzle, "Principal Curves," J. American Statistical Ass'n, vol. 84, pp. 502-516, 1989.
[40] D. Bavelier and M. Jordan, "Representing Words in Connectionist Models." Abstract 34th Ann. Meeting Psychonomics Society,Washington, DC., 1993.
[41] D. Geman,S. Geman,C. Graffigne,, and P. Dong,“Boundary detection by constrained optimization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 7, pp. 609-628, July 1990.
[42] I. Fogel and D. Sagi, "Gabor Filters as Texture Discriminators," Biological Cybernetics, vol. 61, pp. 103-113, 1989.
[43] J. Daugman, "Uncertainty Relation for Resolution in Space, Spatial Frequency, and Orientation Optimized by Two-Dimensional Visual Cortical Filters," J. Optical Society of America A, vol. 2, no. 7, pp. 1,160-1,169, 1985.
[44] B. Julesz, “Visual Pattern Discrimination,” IRE Trans. Information Theory, vol. 8, pp. 84-92, 1962.
[45] T. Hofmann, J. Puzicha, and J.M. Buhmann, "Unsupervised Segmentation of Textured Images by Pairwise Data Clustering," Proc. Int'l Conf. Image Processing Lausanne, 1996.
[46] T. Hofmann and J.M. Buhmann, "Infering Hierarchical Clustering Structures by Deterministic Annealing," Proc. Knowledge Discovery and Data Mining Conf.,Portland, 1996.
[47] G. Parisi, Statistical Field Theory.Redwood City, Calif.: Addison Wesley, 1988.

Index Terms:
pairwise data clustering, deterministic annealing, maxiumum entropy method, multidimensional scaling, texture segmentation, exploratory data analysis, nonlinear dimensionality reduction.
Citation:
Thomas Hofmann, Joachim M. Buhmann, "Pairwise Data Clustering by Deterministic Annealing," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, Jan. 1997, doi:10.1109/34.566806