The Community for Technology Leaders
RSS Icon
Issue No.11 - November (2009 vol.31)
pp: 2083-2087
Edward K.F. Dang , The Hong Kong Polytechnic University, Hong Kong
D.L. Lee , Hong Kong University of Science and Technology, Hong Kong
K.S. Ho , The Hong Kong Polytechnic University, Hong Kong
Stephen C.F. Chan , The Hong Kong Polytechnic University, Hong Kong
Given a set of clusters, we consider an optimization problem which seeks a subset of clusters that maximizes the microaverage F-measure. This optimal value can be used as an evaluation measure of the goodness of clustering. For arbitrarily overlapping clusters, finding the optimal value is NP-hard. We claim that a greedy approximation algorithm yields the global optimal solution for clusters that overlap only by nesting. We present a mathematical proof of this claim by induction. For a family of n clusters containing a total of N objects, this algorithm has an {\rm O}(n^{2}) time complexity and O(N) space complexity.
Clustering, classification, performance evaluation, optimization.
Edward K.F. Dang, D.L. Lee, K.S. Ho, Stephen C.F. Chan, "Optimal Combination of Nested Clusters by a Greedy Approximation Algorithm", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 11, pp. 2083-2087, November 2009, doi:10.1109/TPAMI.2009.75
[1] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[2] D. Judd, P.K. McKinley, and A.K. Jain, “Large-Scale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 871-876, Aug. 1998.
[3] C. Carpineto and G. Romano, “A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval,” Machine Learning, vol. 24, pp. 95-122, 1996.
[4] N. Jardine and C.J. van Rijsbergen, “The Use of Hierarchical Clustering in Information Retrieval,” Information Storage and Retrieval, vol. 7, pp. 217-240, 1971.
[5] B. Larsen and C. Aone, “Fast and Effective Text Mining Using Linear-Time Document Clustering,” Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, pp. 16-22, 1999.
[6] Y. Zhao and G. Karypis, “Evaluation of Hierarchical Clustering Algorithms for Document Datasets,” Proc. Int'l Conf. Information and Knowledge Management, pp. 515-524, 2002.
[7] E.K.F. Dang, R.W.P. Luk, K.S. Ho, S.C.F. Chan, and D.L. Lee, “A New Measure of Clustering Effectiveness: Algorithms and Experimental Studies,” J. Am. Soc. Information Science and Technology, vol. 59, no. 3, pp.390-406, 2008.
[8] S. Koshman, A. Spink, and B.J. Jansen, “Web Searching on the Vivisimo Search Engine,” J. Am. Soc. Information Science and Technology, vol. 57, pp.1875-1887, 2006.
[9] W.B. Croft, “A Model of Cluster Searching Based on Classification,” Information Systems, vol. 5, pp. 189-195, 1980.
[10] P. Willett, “Recent Trends in Hierarchical Document Clustering: A Critical Review,” Information Processing and Management, vol. 24, pp. 577-597, 1988.
[11] A. Tombros, R. Villa, and C.J. van Rijsbergen, “The Effectiveness of Query-Specific Hierarchical Clustering in Information Retrieval,” Information Processing and Management, vol. 38, pp. 559-582, 2002.
[12] B.J. Gao and M. Ester, “Clusters Description Formats, Problems and Algorithms,” Proc. SIAM Int'l Conf. Data Mining, J. Ghosh and D. Lambert, eds., pp. 462-466, 2006.
[13] R.D. Carr, S. Doddi, G. Konjevod, and M. Marathe, “On the Red-Blue Set Cover Problem,” Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms, pp.345-353, 2000.
[14] D. Peleg, “Approximation Algorithms for the Label-Cover MAX and Red-Blue Set Cover Problems,” Proc. Seventh Scandinavian Workshop Algorithm Theory, M.M. Halldorsson, ed., pp. 220-230, 2000.
[15] P. Robillard, “(0,1) Hyperbolic Programming Problems,” Naval Research Logistics Quarterly, vol. 18, pp. 47-58, 1971.
[16] P. Hansen, M.V. Poggi de Aragao, and C.C. Ribeiro, “Hyperbolic 0-1 Programming and Query Optimization in Information Retrieval,” Math. Programming, vol. 52, pp. 255-263, 1991.
[17] A. Murua, W. Stuetzle, J. Tantrum, and S. Sieberts, “Model Based Document Classification and Clustering,” Int'l J. Tomography and Statistics, vol. 8, pp. 1-24, 2008.
[18] L. Hubert and P. Arabie, “Comparing Partitions,” J. Classification, vol. 2, pp.193-218, 1985.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool