Subscribe

Issue No.05 - May (2013 vol.25)

pp: 1070-1082

Vikas K. Garg , Toyota Technological Institute at Chicago, Chicago

Y. Narahari , Indian Institute of Science (IISc), Bangalore

M. Narasimha Murty , Indian Institute of Science (IISc), Bangalore

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.73

ABSTRACT

We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.

INDEX TERMS

Games, Clustering algorithms, Resource management, Data models, Analytical models, Heuristic algorithms, Game theory, $(k)$-means, Cooperative game theory, Shapley value, clustering, multiobjective optimization

CITATION

Vikas K. Garg, Y. Narahari, M. Narasimha Murty, "Novel Biobjective Clustering (BiGC) Based on Cooperative Game Theory",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 5, pp. 1070-1082, May 2013, doi:10.1109/TKDE.2012.73REFERENCES

- [1] M. Ackerman and S. Ben-David, "Measures of Clustering Quality: A Working Set of Axioms for Clustering,"
Proc. Advances in Neural Information and Processing System (NIPS), 2008.- [2] D. Arthur and S. Vassilvitskii, "K-Means++: The Advantages of Careful Seeding,"
Proc. Ann. ACM- SIAM Symp. Discrete Algorithms (SODA), pp. 1027-1035, 2007.- [3] E. Backer and A. Jain, "A Clustering Performance Measure Based on Fuzzy Set Decomposition,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-3, no. 1, pp. 66-75, Jan. 1981.- [4] S.R. Bulo' and M. Pelillo, "A Game-Theoretic Approach to Hypergraph Clustering,"
Proc. Advances in Neural Information and Processing System (NIPS), pp. 1571-1579, 2009.- [5] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh, "Clustering with Bregman Divergences,"
J. Machine Learning Research, vol. 6, pp. 1705-1749, 2005.- [6] N. Bansal, A. Blum, and S. Chawla, "Correlation Clustering,"
Machine Learning J., vol. 56, no. 1, pp. 86-113, 2004.- [7] A. Ben-Hur, D. Horn, H. Siegelmann, and V. Vapnik, "Support Vector Clustering,"
J. Machine Learning Research, vol. 2, pp. 125-137, 2001.- [8] F. Cao, M. Ester, W. Qian, and A. Zhou, "Density-Based Clustering over an Evolving Data Stream with Noise,"
Proc. SIAM Conf. Data Mining (SDM), pp. 328-339, 2006.- [9] D. Chakrabarti, R. Kumar, and A. Tomkins, "Evolutionary Clustering,"
Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 554-560, 2006.- [10] Y. Chen and L. Tu, "Density-Based Clustering for Real-Time Stream Data,"
Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 133-142, 2007.- [11] G. Cormode, "Conquering the Divide: Continuous Clustering of Distributed Data Streams,"
Proc. Int'l Conf. Data Eng. (ICDE), pp. 1036-1045, 2007.- [12] G. Cormode and M. Garofalakis, "Sketching Probabilistic Data Streams,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '07), pp. 281-292, 2007.- [13] A. Cornuejols, "Getting Order Independence in Incremental Learning,"
Proc. European Conf. Machine Learning (ECML), pp. 196-212, 1993.- [14] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification. second ed. John Wiley and Sons, 2000.- [15] D. Fisher, L. Xu, and N. Zard, "Ordering Effects in Clustering,"
Proc. Int'l Conf. Machine Learning (ICML), pp. 163-168, 1992.- [16] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering Data Streams: Theory and Practice,"
IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 515-528, Mar. 2003.- [17] F. Gullo, A. Tagarelli, and S. Greco, "Diversity-Based Weighting Schemes for Clustering Ensembles,"
Proc. SIAM Conf. Data Mining (SDM), pp. 437-448, 2009.- [18] U. Gupta and N. Ranganathan, "A Game Theoretic Approach for Simultaneous Compaction and Equipartitioning of Spatial Data Sets,"
IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 465-478, Apr. 2010.- [19] G. Hamerly and C. Elkan, "Learning the $k$ in $k$ -Means,"
Proc. Advances in Neural Information and Processing System (NIPS), pp. 281-288, 2003.- [20] J. Handl and J. Knowles, "An Evolutionary Approach to Multiobjective Clustering,"
IEEE Trans. Evolutionary Computation, vol. 11, no. 1, pp. 56-76, Feb. 2007.- [21] D.S. Hochbaum and D.B. Shmoys, "A Best Possible Heuristic for the k-center Problem,"
Math. Operations Research, vol. 10, no. 2, pp. 180-184, 1985.- [22] A. Jain, R. Duin, and J. Mao, "Statistical Pattern Recognition: A Review,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.- [23] A. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review,"
ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.- [24] R. Jain, D. Chiu, and W. Hawe,
A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer System. Eastern Research Lab, Digital Equipment Corporation, 1984.- [25] D. Jiang, J. Pei, M. Ramanathan, C. Tang, and A. Zhang, "Mining Coherent Gene Clusters from Gene-Sample-Time Microarray Data,"
Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 430-439, 2004.- [26] M.E. Kabir, H. Wang, and E. Bertino, "Efficient Systematic Clustering Method for K-Anonymization,"
Acta Informatica, vol. 48, no. 1, pp. 51-66, 2011.- [27] L. Kaufman and P. Rousseeuw,
Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 1990.- [28] J. Kleinberg, "An Impossibility Theorem for Clustering,"
Proc. Advances in Neural Information and Processing System (NIPS), vol. 15, pp. 463-470, 2002.- [29] T. Kohonen, "The Self-Organizing Map,"
Proc. IEEE, vol. 78, no. 9, pp. 1464-1480, Sept. 1990.- [30] H.-P. Kriegel, P. Kröger, and A. Zimek, "Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering,"
ACM Trans. Knowledge Discovery from Data, vol. 3, no. 1, pp. 1-58, 2009.- [31] Q. Li, Z. Chen, Y. He, and J.-p. Jiang, "A Novel Clustering Algorithm Based upon Games on Evolving Network,"
Expert Systems with Applications, vol. 37, no. 8, pp. 5621-5629, 2010.- [32] Q. Li, Y. He, and J.-p. Jiang, "A Novel Clustering Algorithm Based on Quantum Games,"
J. Phys. A: Math. Theoretical, vol. 42, pp. 445303:1-16, 2009.- [33] W. Li, L. Jaroszewski, and A. Godzik, "Clustering of Highly Homologous Sequences to Reduce the Size of Large Protein Databases,"
Bioinformatics, vol. 17, pp. 282-283, 2001.- [34] Y.-F. Li, I.W. Tsang, J.T.-Y. Kwok, and Z.-H. Zhou, "Tighter and Convex Maximum Margin Clustering,"
J. Machine Learning Research, vol. 5, pp. 344-351, 2009.- [35] M. Meila and J. Shi, "A Random Walks View of Spectral Segmentation,"
Proc. Int'l Conf. Artificial Intelligence and Statistics, 2001.- [36] R.S. Michalski, "Knowledge Acquisition through Conceptual Clustering: A Theoretical Framework and an algorithm for Partitioning Data into Conjunctive Concepts,"
J. Policy Analysis and Information Systems, vol. 4, no. 3, pp. 219-244, 1980.- [37] G. Moise, J. Sander, and M. Ester, "Robust Projected Clustering,"
Knowledge Information Systems, vol. 14, no. 3, pp. 273-298, 2008.- [38] R.B. Myerson,
Game Theory: Analysis of Conflict. Harvard Univ. Press, 1997.- [39] R. Ostrovsky, Y. Rabani, L. Schulman, and C. Swamy, "The Effectiveness of Lloyd-Type Methods for the $k$ -Means Problem,"
Proc. IEEE Symp. Foundations of Computer Science (FOCS), pp. 165-176, 2006.- [40] J. Puzicha, T. Hofmann, and J. Buhmann, "Theory of Proximity Based Clustering: Structure Detection by Optimization,"
Pattern Recognition, vol. 33, no. 4, pp. 617-634, 2000.- [41] W. Rand, "Objective Criteria for the Evaluation of Clustering Methods,"
J. Am. Statistical Assoc., vol. 66, no. 336, pp. 846-850, 1971.- [42] C. van Rijsbergen,
Information Retrieval, second ed. Butterworths, 1979.- [43] P.J. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,"
J. Computational and Applied Math., vol. 20, no. 1, pp. 53-65, 1987.- [44] L.S. Shapley, "Cores of Convex Games,"
Int'l J. Game Theory, vol. 1, no. 1, pp. 11-26, 1971.- [45] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.- [46] M. Steinbach, P.-N. Tan, V. Kumar, S.A. Klooster, and C. Potter, "Discovery of Climate Indices Using Clustering,"
Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 446-455, 2003.- [47] K. Suresh, D. Kundu, S. Ghosh, S. Das, and A. Abraham, "Data Clustering Using Multi-Objective DE Algorithms,"
Fundamenta Informaticae, vol. 21, pp. 1001-1024, 2009.- [48] J. Vaidya and C. Clifton, "Privacy-Preserving k-Means Clustering over Vertically Partitioned Data,"
Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.- [49] V.K. Garg, "Pragmatic Data Mining: Novel Paradigms for Tackling Key Challenges," technical report, http://www.csa. iisc.ernet.in/TR/2009/11 thessamp.pdf, 2009.
- [50] H. Wang and J. Pei, "Clustering by Pattern Similarity,"
J. Computer Science Technology, vol. 23, no. 4, pp. 481-496, 2008.- [51] J. Wang, Y. Zhang, L. Zhou, G. Karypis, and C. Aggarwal, "Discriminating Subsequence Discovery for Sequence Clustering,"
Proc. SIAM Conf. Data Mining (SDM), 2007.- [52] R. Xu and D. WunschII, "Survey of Clustering Algorithms,"
IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.- [53] D. Yankov, E. Keogh, and K. Kan, "Locally Constrained Support Vector Clustering,"
Proc. IEEE Int'l Conf. Data Mining, pp. 715-720, 2007.- [54] M.L. Yiu and N. Mamoulis, "Iterative Projected Clustering by Subspace Mining,"
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 176-189, Feb. 2005.- [55] R.B. Zadeh and S. Ben-David, "A Uniqueness Theorem for Clustering,"
Proc. Conf. Uncertainty in Artificial Intelligence, 2009.- [56] P. Zhang, X. Zhu, J. Tan, and L. Guo, "Classifier and Cluster Ensembling for Mining Concept Drifting Data Streams,"
Proc. IEEE Int'l Conf. Data Mining, 2010.- [57] X. Zhuang, Y. Huang, K. Palaniappan, and Y. Zhao, "Gaussian Mixture Density Modeling, Decomposition, and Applications,"
IEEE Trans. Image Processing, vol. 5, no. 9, pp. 1293-1302, Sept. 1996. |