This Article 
 Bibliographic References 
 Add to: 
Transformation-Invariant Clustering Using the EM Algorithm
January 2003 (vol. 25 no. 1)
pp. 1-17

Abstract—Clustering is a simple, effective way to derive useful representations of data, such as images and videos. Clustering explains the input as one of several prototypes, plus noise. In situations where each input has been randomly transformed (e.g., by translation, rotation, and shearing in images and videos), clustering techniques tend to extract cluster centers that account for variations in the input due to transformations, instead of more interesting and potentially useful structure. For example, if images from a video sequence of a person walking across a cluttered background are clustered, it would be more useful for the different clusters to represent different poses and expressions, instead of different positions of the person and different configurations of the background clutter. We describe a way to add transformation invariance to mixture models, by approximating the nonlinear transformation manifold by a discrete set of points. We show how the expectation maximization algorithm can be used to jointly learn clusters, while at the same time inferring the transformation associated with each input. We compare this technique with other methods for filtering noisy images obtained from a scanning electron microscope, clustering images from videos of faces into different categories of identification and pose and removing foreground obstructions from video. We also demonstrate that the new technique is quite insensitive to initial conditions and works better than standard techniques, even when the standard techniques are provided with extra data.

[1] Y. Amit and D. Geman, “Shape Quantization and Recognition with Randomized Trees,” Neural Computation, vol. 9, pp. 1545-1588, 1997.
[2] C.M. Bishop, M. Svensén, and C.K.I. Williams, “GTM: The Generative Topographic Mapping,” Neural Computation, vol. 10, no. 1, pp. 215-235, 1998.
[3] M.J. Black, D.J. Fleet, and Y. Yacoob, “Robustly Estimating Changes in Image Appearance,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 8–31, 2000.
[4] P. Dayan and R.S. Zemel, “Competition and Multiple Cause Models,” Neural Computation, vol. 7, pp. 565-579, 1995.
[5] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Proc. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[6] B.J. Frey, Graphical Models for Machine Learning and Digital Comm., Cambridge, Mass.: MIT Press, 1998.
[7] B.J. Frey and G.E. Hinton, “Variational Learning in Non-Linear Gaussian Belief Networks,” Neural Computation, vol. 11, no. 1, pp. 193-214, 1999.
[8] B.J. Frey, G.E. Hinton, and P. Dayan, “Does the Wake-Sleep Algorithm Produce Good Density Estimators?” Proc. Eighth Conf. Advances in Neural Information Processing Systems, D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, eds., Dec. 1995.
[9] B. Frey and N. Jojic, Esimtating Mixture Models of Images and Inferring Spatial Transforms Using the EM Algorithm Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1416-1422, 1999.
[10] B. Frey and N. Jojic, Transformed Component Analysis: Joint Estimation of Spatial Transforms and Image Components Proc. Int'l Conf. Computer Vision, 1999.
[11] B.J. Frey and N. Jojic, “Fast, Large-Scale Transformation-Invariant Clustering,” Proc. 14th Conf. Advances in Neural Information Processing Systems, Dec. 2001.
[12] B.J. Frey and N. Jojic, “Learning Graphical Models of Images, Videos and Their Spatial Transformations,” Proc. Uncertainty in Artificial Intelligence, 2000.
[13] Z. Ghahramani, “Factorial Learning and the EM Algorithm,” Proc. Seventh Conf. Advances in Neural Information Processing Systems, G. Tesauro, D. Touretzky, and T. Leen, eds., Dec. 1994.
[14] R. Golem and I. Cohen, “Scanning Electron Microscope Image Enhancement,” technical report, School of Computer and Electrical Eng. Project Report, Ben-Gurion Univ., 1998.
[15] G.E. Hinton, P. Dayan, B.J. Frey, and R.M. Neal, “The Wake-Sleep Algorithm for Unsupervised Neural Networks,” Science, vol. 268, pp. 1158-1161, 1995.
[16] G.E. Hinton, P. Dayan, and M. Revow, Modeling the Manifolds of Images of Handwritten Digits IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 65-74, Jan. 1997.
[17] G.E. Hinton and T.J. Sejnowski, “Learning and Relearning in Boltzmann Machines,” Parallel Distributed Processing: Explorations in Microstructure of Cognition, D.E. Rumelhart and J.L. McClelland, eds., Cambridge, Mass.: MIT Press, 1986.
[18] M. Isard and A. Blake, “Contour Tracking by Stochastic Propagation of Conditional Density,” Proc. European Conf. Computer Vision, pp. 343-356, 1996.
[19] N. Jojic, N. Petrovic, B.J. Frey, and T.S. Huang, Transformed Hidden Markov Models: Estimating Mixture Models of Images and Inferring Spatial Transformations in Video Sequences Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2000.
[20] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[21] D.G. Lowe, “Similarity Metric Learning for a Variable-Kernel Classifier,” Neural Computation, vol. 7, no. 1, pp. 72-85, 1995.
[22] B. Moghaddam and A. Pentland, “Probabilistic Visual Learning for Object Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 696-710, July 1997.
[23] P.Y. Simard, B. Victorri, Y. LeCun, and J. Denker, “Tangent Prop—A Formalism for Specifying Selected Invariances in an Adaptive Network,” Proc. Fourth Conf. Advances in Neural Information Processing Systems, Dec. 1991.
[24] J.B. Tenenbaum and W.T. Freeman, “Separating Style and Content,” Proc. Ninth Conf. Advances in Neural Information Processing Systems, M.C. Mozer, M.I. Jordan, and T. Petsche, eds., Dec. 1996.
[25] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J. Cognitive Neuroscience, vol. 3, no. 1, 1991.
[26] N. Vasconcelos and A. Lippman, “Multiresolution Tangent Distance for Affine-Invariant Classification,” Proc. 10th Conf. Advances in Neural Information Processing Systems, M.I. Jordan, M.I. Kearns, and S.A. Solla, eds., Dec. 1997.
[27] M. Webber, M. Welling, and P. Perona, “Towards Automatic Discovery of Object Categories,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2000.

Index Terms:
Generative models, transformation, transformation-invariance, clustering, video summary, filtering, EM algorithm, probability model.
Brendan J. Frey, Nebojsa Jojic, "Transformation-Invariant Clustering Using the EM Algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 1-17, Jan. 2003, doi:10.1109/TPAMI.2003.1159942
Usage of this product signifies your acceptance of the Terms of Use.