This Article 
 Bibliographic References 
 Add to: 
Bagging for Path-Based Clustering
November 2003 (vol. 25 no. 11)
pp. 1411-1415

Abstract—A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of path-based clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image data set of human segmentations.

[1] T. Hofmann and M. Buhmann, Pairwise Data Clustering by Deterministic Annealing IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 1-14, Jan. 1997.
[2] F. Pereira, N. Tishby, and L. Lee, Distributional Clustering for English Words Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 638-639, 1993.
[3] J. Puzicha, T. Hofmann, and J.M. Buhmann, “Histogram Clustering for Unsupervised Segmentation and Image Retrieval,” Pattern Recognition Letters, vol. 20, pp. 899-909, 1999.
[4] B. Fischer and J.M. Buhmann, Path-Based Clustering for Grouping Smooth Curves and Texture Segmentation IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 4, pp. 513-518, Apr. 2003.
[5] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[6] A.L. Fred and J. Leitao, Clustering under a Hypothesis of Smooth Dissimilarity Increments Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 190-194, 2000.
[7] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: John Wiley&Sons, 2001.
[8] N. Tishby and N. Slonim, Data Clustering by Markovian Relaxation and the Information Bottleneck Method Advances in Neural Information Processing Systems, vol. 13, pp. 640-646, 2001.
[9] M. Blatt, S. Wiseman, and E. Domany, “Data Clustering Using a Model Granular Magnet,” Neural Computation, vol. 9, pp. 1,805-1,847, 1997.
[10] M. Meilua and J. Shi, Learning Segmentation by Random Walks Advances in Neural Information Processing Sytems, vol. 13, pp. 873-879, 2001.
[11] J. Shi and J. Malik, Normalized Cuts and Image Segmentation IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[12] Y. Gdalyahu, D. Weinshall, and M. Werman., “Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1053-1074, Oct. 2001.
[13] D. Karger and C. Stein, A New Approach to the Minimum Cut Problem J. ACM, vol. 43, pp. 601-640, 1996.
[14] E. Levine and E. Domany, Resampling Method for Unsupervised Estimation of Cluster Validity Neural Computation, pp. 2573-2593, 2001.
[15] A. Strehl and J. Ghosh, Cluster Ensembles A Knowledge Reuse Framework for Combining Multiple Partitionings J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[16] A. Fred and A. Jain, Data Clustering Using Evidence Accumulation Proc. IEEE Conf. Pattern Recognition, vol. 4, pp. 276-280, 2002.
[17] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, pp. 123-140, 1996.
[18] J. Fridlyand and S. Dudoit, Applications of Resampling Methods to Estimate the Number of Clusters and to Improve the Accuracy of a Clustering Method Technical Report 600, Division of Biostatistics, Univ. of California, Berkeley, 2001.
[19] H. Kuhn, The Hungarian Method for the Assignment Problem Naval Research Logistic Quarterly, vol. 2, pp. 83-97, 1955.
[20] T. Lange, M. Braun, V. Roth, and J.M. Buhmann, Stability-Based Model Order Selection Advances in Neural Information Processing Systems, vol. 15, pp. 617-624, 2003.
[21] D. Martin, C. Fowlkes, D. Tal, and J. Malik, A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics Proc. Eighth Int'l Conf. Computer Vision, pp. 416-424, 2001.

Index Terms:
Clustering, resampling, color segmentation.
Bernd Fischer, Joachim M. Buhmann, "Bagging for Path-Based Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1411-1415, Nov. 2003, doi:10.1109/TPAMI.2003.1240115
Usage of this product signifies your acceptance of the Terms of Use.