The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - November (2009 vol.31)
pp: 2083-2087
Edward K.F. Dang , The Hong Kong Polytechnic University, Hong Kong
Robert W.P. Luk , The Hong Kong Polytechnic University, Hong Kong
D.L. Lee , Hong Kong University of Science and Technology, Hong Kong
K.S. Ho , The Hong Kong Polytechnic University, Hong Kong
Stephen C.F. Chan , The Hong Kong Polytechnic University, Hong Kong
ABSTRACT
Given a set of clusters, we consider an optimization problem which seeks a subset of clusters that maximizes the microaverage F-measure. This optimal value can be used as an evaluation measure of the goodness of clustering. For arbitrarily overlapping clusters, finding the optimal value is NP-hard. We claim that a greedy approximation algorithm yields the global optimal solution for clusters that overlap only by nesting. We present a mathematical proof of this claim by induction. For a family of n clusters containing a total of N objects, this algorithm has an {\rm O}(n^{2}) time complexity and O(N) space complexity.
INDEX TERMS
Clustering, classification, performance evaluation, optimization.
CITATION
Edward K.F. Dang, Robert W.P. Luk, D.L. Lee, K.S. Ho, Stephen C.F. Chan, "Optimal Combination of Nested Clusters by a Greedy Approximation Algorithm", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 11, pp. 2083-2087, November 2009, doi:10.1109/TPAMI.2009.75
REFERENCES
[1] A.K. Jain, R.P.W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[2] D. Judd, P.K. McKinley, and A.K. Jain, “Large-Scale Parallel Data Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 871-876, Aug. 1998.
[3] C. Carpineto and G. Romano, “A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval,” Machine Learning, vol. 24, pp. 95-122, 1996.
[4] N. Jardine and C.J. van Rijsbergen, “The Use of Hierarchical Clustering in Information Retrieval,” Information Storage and Retrieval, vol. 7, pp. 217-240, 1971.
[5] B. Larsen and C. Aone, “Fast and Effective Text Mining Using Linear-Time Document Clustering,” Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, pp. 16-22, 1999.
[6] Y. Zhao and G. Karypis, “Evaluation of Hierarchical Clustering Algorithms for Document Datasets,” Proc. Int'l Conf. Information and Knowledge Management, pp. 515-524, 2002.
[7] E.K.F. Dang, R.W.P. Luk, K.S. Ho, S.C.F. Chan, and D.L. Lee, “A New Measure of Clustering Effectiveness: Algorithms and Experimental Studies,” J. Am. Soc. Information Science and Technology, vol. 59, no. 3, pp.390-406, 2008.
[8] S. Koshman, A. Spink, and B.J. Jansen, “Web Searching on the Vivisimo Search Engine,” J. Am. Soc. Information Science and Technology, vol. 57, pp.1875-1887, 2006.
[9] W.B. Croft, “A Model of Cluster Searching Based on Classification,” Information Systems, vol. 5, pp. 189-195, 1980.
[10] P. Willett, “Recent Trends in Hierarchical Document Clustering: A Critical Review,” Information Processing and Management, vol. 24, pp. 577-597, 1988.
[11] A. Tombros, R. Villa, and C.J. van Rijsbergen, “The Effectiveness of Query-Specific Hierarchical Clustering in Information Retrieval,” Information Processing and Management, vol. 38, pp. 559-582, 2002.
[12] B.J. Gao and M. Ester, “Clusters Description Formats, Problems and Algorithms,” Proc. SIAM Int'l Conf. Data Mining, J. Ghosh and D. Lambert, eds., pp. 462-466, 2006.
[13] R.D. Carr, S. Doddi, G. Konjevod, and M. Marathe, “On the Red-Blue Set Cover Problem,” Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms, pp.345-353, 2000.
[14] D. Peleg, “Approximation Algorithms for the Label-Cover MAX and Red-Blue Set Cover Problems,” Proc. Seventh Scandinavian Workshop Algorithm Theory, M.M. Halldorsson, ed., pp. 220-230, 2000.
[15] P. Robillard, “(0,1) Hyperbolic Programming Problems,” Naval Research Logistics Quarterly, vol. 18, pp. 47-58, 1971.
[16] P. Hansen, M.V. Poggi de Aragao, and C.C. Ribeiro, “Hyperbolic 0-1 Programming and Query Optimization in Information Retrieval,” Math. Programming, vol. 52, pp. 255-263, 1991.
[17] A. Murua, W. Stuetzle, J. Tantrum, and S. Sieberts, “Model Based Document Classification and Clustering,” Int'l J. Tomography and Statistics, vol. 8, pp. 1-24, 2008.
[18] L. Hubert and P. Arabie, “Comparing Partitions,” J. Classification, vol. 2, pp.193-218, 1985.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool