The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2009 vol.31)
pp: 1325-1331
Darío García-García , University Carlos III of Madrid, Madrid
Emilio Parrado Hernández , University Carlos III of Madrid, Madrid
Fernando Díaz-de María , University Carlos III of Madrid, Madrid
ABSTRACT
We review the existing alternatives for defining model-based distances for clustering sequences and propose a new one based on the Kullback-Leibler divergence. This distance is shown to be especially useful in combination with spectral clustering. For improved performance in real-world scenarios, a model selection scheme is also proposed.
INDEX TERMS
Clustering, sequential data, similarity measures.
CITATION
Darío García-García, Emilio Parrado Hernández, Fernando Díaz-de María, "A New Distance Measure for Model-Based Sequence Clustering", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 7, pp. 1325-1331, July 2009, doi:10.1109/TPAMI.2008.268
REFERENCES
[1] L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp.257-286, Feb. 1989.
[2] S. Fine, Y. Singer, and N. Tishby, “The Hierarchical Hidden Markov Model: Analysis and Applications,” Machine Learning, vol. 32, no. 1, pp.41-62, 1998.
[3] J. Bilmes, “Buried Markov Models for Automatic Speech Recognition,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, Mar. 1999.
[4] Z. Wu and R. Leahy, “An Optimal Graph-Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp.1101-1113, Nov. 1993.
[5] R. Xu and D.W. Wunsch II, “Survey of Clustering Algorithms,” IEEE Trans. Neural Networks, vol. 16, no. 3, pp.645-678, May 2005.
[6] P. Smyth, “Clustering Sequences with Hidden Markov Models,” Advances in Neural Information Processing Systems, vol. 9, pp.648-654, 1997.
[7] A. Panuccio, M. Bicego, and V. Murino, “A Hidden Markov Model-Based Approach to Sequential Data Clustering,” Proc. Joint IAPR Int'l Workshop Structural, Syntactic and Statistical Pattern Recognition, pp.734-742, 2002.
[8] F. Porikli, “Clustering Variable Length Sequences by Eigenvector Decomposition Using HMM,” Proc. Int'l Workshop Structural and Syntactic Pattern Recognition, pp.352-360, 2004.
[9] J. Yin and Q. Yang, “Integrating Hidden Markov Models and Spectral Analysis for Sensory Time Series Clustering,” Proc. Fifth IEEE Int'l Conf. Data Mining, Nov. 2005.
[10] P. Baldi, S. Brunak, and G. Stolovitzky, Bioinformatics: The Machine Learning Approach. MIT Press, 1998.
[11] G. Jin, L. Tao, and G. Xu, “Hidden Markov Model Based Events Detection in Soccer Video,” Proc. Int'l Conf. Image Analysis and Recognition, pp.605-612, 2004.
[12] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 39, no. 1, pp.1-38, 1977.
[13] K. Murphy, “Dynamic Bayesian Networks: Representation, Inference and Learning,” PhD dissertation, Computer Science Division, Univ. of California Berkeley, July 2002.
[14] A. Ng, M. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Advances in Neural Information Processing Systems, 2002.
[15] B. Juang and L. Rabiner, “A Probabilistic Distance Measure for Hidden Markov Models,” AT&T Technical J., vol. 64, no. 2, pp.391-408, Feb. 1985.
[16] T. Oates, L. Firoiu, and P.R. Cohen, “Using Dynamic Time Warping to Bootstrap HMM-Based Clustering of Time Series,” Sequence Learning— Paradigms, Algorithms, and Applications, Springer-Verlag, pp.35-52, 2001.
[17] T. Jebara, Y. Song, and K. Thadani, “Spectral Clustering and Embedding with Hidden Markov Models,” Proc. 18th European Conf. Machine Learning, Sept. 2007.
[18] S. Kullback and R. Leibler, “On Information and Sufficiency,” Annals of Math. Statistics, vol. 22, pp.79-86, 1951.
[19] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[20] J. Ortega-Garcia, J. Gonzalez-Rodriguez, and V. Marrero-Aguiar, “Ahumada: A Large Speech Corpus in Spanish for Speaker Characterization and Identification,” Speech Comm., vol. 31, pp.255-264, 2000.
[21] I. Jolliffe, Principal Component Analysis, second ed. Springer, 2002.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool