The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2009 vol.21)
pp: 1014-1026
Min Chen , Tongji University, Shanghai
Pawan Lingras , Saint Mary's University, Halifax
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This paper proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Experiments with synthetic, standard, and real-world retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. The proposed validity index can also be extended for evaluating other clustering algorithms such as fuzzy clustering.
Cluster validity, decision theory, loss functions, rough-set-based clustering, k-means clustering.
Min Chen, Pawan Lingras, "Rough Cluster Quality Index Based on Decision Theory", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 7, pp. 1014-1026, July 2009, doi:10.1109/TKDE.2008.236
[1] S. Asharaf, S.K. Shevade, and N.M. Murty, “Rough Support Vector Clustering,” Pattern Recognition, vol. 38, no. 10, pp. 1779-1783, 2005.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, 1981.
[3] M. Banerjee, S. Mitra, and S.K. Pal, “Rough Fuzzy MLP: Knowledge Encoding and Classification,” IEEE Trans. Neural Networks, vol. 9, no. 6, pp. 1203-1216, Nov. 1998.
[4] J.C. Bezdek and N.R. Pal, “Some New Indexes of Cluster Validity,” IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 28, no. 3, pp. 301-315, June 1998.
[5] R.B. Calinski and J. Harabasz, “A Dendrite Method for Cluster Analysis,” Communications in Statistics—Theory and Methods, vol. 3, pp. 1-27, 1974.
[6] D.L. Davies and D.W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no 2, pp. 224-227, Apr. 1979.
[7] J.C. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974.
[8] E. Falkenauer, Genetic Algorithms and Grouping Problems. John Wiley & Sons, 1998.
[9] J.A. Hartigan and M.A. Wong, “Algorithm AS136: A K-Means Clustering Algorithm,” Applied Statistics, vol. 28, pp. 100-108, 1979.
[10] M. Halkidi, Y. Batistakis, and M. Vazirgianni, “Clustering Validity Checking Methods: Part II,” ACM SIGMOD Conf. Record, vol. 31, no. 3, pp. 19-27, 2002.
[11] S. Hirano and S. Tsumoto, “On Constructing Clusters from Non-Euclidean Dissimilarity Matrix by Using Rough Clustering,” Proc. Japanese Soc. for Artificial Intelligence (JSAI) Workshops, pp. 5-16, 2005.
[12] T.B. Ho and N.B. Nguyen, “Nonhierarchical Document Clustering by a Tolerance Rough Set Model,” Int'l J. Intelligent Systems, vol. 17, no. 2, pp. 199-212, 2002.
[13] A. Joshi and R. Krishnapuram, “Robust Fuzzy Clustering Methods to Support Web Mining,” Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery, pp. 1-8, June 1998.
[14] Y. Li, C. Zhang, and J.R. Swan, “An Information Filtering Model on the Web and Its Application in JobAgent,” Knowledge-Based Systems, vol. 13, no. 5, pp. 285-296, 2000.
[15] Y. Li, S.C.K. Shiu, S.K. Pal, and J.N.K. Liu, “A Rough Set-Based Case-Based Reasoner for Text Categorization,” Int'l J. Approximate Reasoning, vol. 41, no. 2, pp. 229-255, 2006.
[16] P. Lingras, “Unsupervised Rough Set Classification Using GAs,” J.Intelligent Information Systems, vol. 16, no. 3, pp. 215-228, 2001.
[17] P. Lingras, “Rough Set Clustering for Web Mining,” Proc. 2002 IEEE Int'l Conf. Fuzzy Systems, pp. 12-17, 2002.
[18] P. Lingras, “Applications of Rough Set Based K-Means, Kohonen, GA Clustering,” Trans. Rough Sets, vol. 7, pp. 120-139, 2007.
[19] P. Lingras and C. West, “Interval Set Clustering of Web Users with Rough K-Means,” J. Intelligent Information System, vol. 23, no. 1, pp. 5-16, 2004.
[20] P. Lingras, M. Chen, and D.Q. Miao, “Rough Multi-Category Decision Theoretic Framework,” Rough Sets and Knowledge Technology, pp. 676-683, Springer, 2008.
[21] P. Lingras, M. Hogo, and M. Snorek, “Interval Set Clustering of Web Users Using Modified Kohonen Self-Organizing Maps Based on the Properties of Rough Sets,” Web Intelligence and Agent Systems: An Int'l Journal, vol. 2, no. 3, pp. 217-230, 2004.
[22] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Mathematical Statistics and Probability, vol. 1, pp. 281-297, 1967.
[23] S. Mitra, “An Evolutionary Rough Partitive Clustering,” Pattern Recognition Letters, vol. 25, pp. 1439-1449, 2004.
[24] S. Mitra, H. Bank, and W. Pedrycz, “Rough-Fuzzy Collaborative Clustering,” IEEE Trans. Systems, Man, and Cybernetics, vol. 36, no. 4, pp. 795-805, Aug. 2006.
[25] H.S. Nguyen, “Rough Document Clustering and the Internet,” Handbook on Granular Computing. John Wiley & Sons, 2007.
[26] O.L. Mangasarian and W.H. Wolberg, “Cancer Diagnosis via Linear Programming,” SIAM News, vol. 23, no. 5, 1990.
[27] Z. Pawlak, “Rough Sets,” Int'l J. Information and Computer Sciences, vol. 11, pp. 145-172, 1982.
[28] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, 1992.
[29] Z. Pawlak, S.K.M. Wong, and W. Ziarko, “Rough Sets: Probabilistic Versus Deterministic Approach,” Int'l J. Man-Machine Studies, vol. 29, pp. 81-95, 1988.
[30] W. Pedrycz and J. Waletzky, “Fuzzy Clustering with Partial Supervision,” IEEE Trans. Systems, Man, and Cybernetics, vol. 27, no. 5, pp. 787-795, Sept. 1997.
[31] G. Peters, “Outliers in Rough k-Means Clustering,” Proc. First Int'l Conf. Pattern Recognition and Machine Intelligence, pp. 702-707, 2005.
[32] G. Peters, “Some Refinements of Rough k-Means,” Pattern Recognition, vol. 39, no. 8, pp. 1481-1491, 2006.
[33] J.F. Peters, A. Skowron, Z. Suraj, W. Rzasa, M. Borkowski, “Clustering: A Rough Set Approach to Constructing Information Granules,” Proc. Sixth Int'l Conf. Soft Computing and Distributed Processing, pp. 57-61, 2002.
[34] L. Polkowski and A. Skowron, “Rough Mereology: A New Paradigm for Approximate Reasoning,” Int'l J. Approximate Reasoning, vol. 15, no. 4, pp. 333-365, 1996.
[35] S. Saha, C.A. Murthy, and S.K. Pal, “Rough Set Based Ensemble Classifier for Web Page Classification,” Fundamenta Informaticae, vol. 76, nos. 1/2, pp. 171-187, 2007.
[36] A. Skowron and J. Stepaniuk, “Information Granules in Distributed Environment,” New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, N. Zhong, A. Skowron, and S.Ohsuga, eds., vol. 1711, pp. 357-365, Springer-Verlag, 1999.
[37] X. Xie and G. Beni, “A Validity Measure for Fuzzy Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841-847, Aug. 1991.
[38] Y. Xie, V.V. Raghavan, P. Dhatric, and X. Zhao, “A New Fuzzy Clustering Algorithm for Optimally Finding Granular Prototypes,” Int'l J. Approximate Reasoning, vol. 40, pp. 109-124, 2005.
[39] Y.Y. Yao, “Decision-Theoretic Rough Set Models,” Lecture Notes in Computer Science, vol. 4481, pp. 1-12, 2007.
[40] Y.Y. Yao, “Constructive and Algebraic Methods of the Theory of Rough Sets,” Information Sciences, vol. 109, pp. 21-47, 1998.
[41] Y.Y. Yao, “Information Granulation and Approximation in a Decision-Theoretical Model of Rough Sets,” Rough-Neuro Computing: Techniques for Computing with Words, pp. 491-516, Springer, 2003.
[42] Y.Y. Yao and T.Y. Lin, “Generalization of Rough Sets Using Modal Logic,” Intelligent Automation and Soft Computing, vol. 2, no. 2, pp.103-120, 1996.
[43] Y.Y. Yao and S.K.M. Wong, “A Decision Theoretic Framework for Approximating Concepts,” Int'l J. Man-Machine Studies, vol. 37, pp. 793-809, 1992.
[44] Y.Y. Yao, S.K.M. Wong, and P. Lingras, “A Decision-Theoretic Rough Set Model,” Methodologies for Intelligent Systems, vol. 5, pp.17-24, 1990.
[45] Y.Y. Yao and Y. Zhao, “Attribute Reduction in Decision-Theoretic Rough Set Models,” Information Sciences, vol. 178, no. 17, pp. 3356-3373, 2008.
45 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool