Issue No. 09 - September (2011 vol. 23)

ISSN: 1041-4347

pp: 1406-1418

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.259

Jiawei Han , University of Illinois at Urbana Champaign, Urbana

Xiaofei He , Zhejiang University, Hangzhou

Yuanlong Shao , Zhejiang University, Hangzhou

Deng Cai , Zhejiang University, Hangzhou

Hujun Bao , Zhejiang University, Hangzhou

ABSTRACT

Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the Expectation-Maximization algorithm. In this paper, we consider the case where the probability distribution that generates the data is supported on a submanifold of the ambient space. It is natural to assume that if two points are close in the intrinsic geometry of the probability distribution, then their conditional probability distributions are similar. Specifically, we introduce a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM). The data manifold is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function. As a result, the obtained conditional probability distribution varies smoothly along the geodesics of the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach.

INDEX TERMS

Gaussian mixture model, clustering, graph laplacian, manifold structure.

CITATION

Jiawei Han, Xiaofei He, Yuanlong Shao, Deng Cai, Hujun Bao, "Laplacian Regularized Gaussian Mixture Model for Data Clustering",

*IEEE Transactions on Knowledge & Data Engineering*, vol. 23, no. , pp. 1406-1418, September 2011, doi:10.1109/TKDE.2010.259SEARCH