The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2010 vol.22)
pp: 1219-1233
Ben Kao , The University of Hong Kong, Hong Kong
Sau Dan Lee , The University of Hong Kong, Hong Kong
Foris K.F. Lee , The University of Hong Kong, Hong Kong
David Wai-lok Cheung , The University of Hong Kong, Hong Kong
Wai-Shing Ho , The University of Hong Kong, Hong Kong
ABSTRACT
We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods.
INDEX TERMS
Uncertainty, clustering, object hierarchies, indexing methods.
CITATION
Ben Kao, Sau Dan Lee, Foris K.F. Lee, David Wai-lok Cheung, Wai-Shing Ho, "Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 9, pp. 1219-1233, September 2010, doi:10.1109/TKDE.2010.82
REFERENCES
[1] P. Misra and P. Enge, Global Positioning System: Signals, Measurements, and Performance, second ed. Ganga-Jamuna Press, 2006.
[2] W.R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, "Energy-Efficient Communication Protocol for Wireless Microsensor Networks," Proc. IEEE 33rd Ann. Hawaii Int'l Conf. System Sciences (HICSS), Jan. 2000.
[3] S. Bandyopadhyay and E.J. Coyle, "An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks," Proc. IEEE INFOCOM, Apr. 2003.
[4] O. Wolfson and H. Yin, "Accuracy and Resource Consumption in Tracking and Location Prediction," Proc. Symp. Spatial and Temporal Databases (SSTD), pp. 325-343, July 2003.
[5] M. Chau, R. Cheng, B. Kao, and J. Ng, "Uncertain Data Mining: An Example in Clustering Location Data," Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), pp. 199-204, Apr. 2006.
[6] W.K. Ngai, B. Kao, C.K. Chui, R. Cheng, M. Chau, and K.Y. Yip, "Efficient Clustering of Uncertain Data," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 436-445, Dec. 2006.
[7] F.K.H.A. Dehne and H. Noltemeier, "Voronoi Trees and Clustering Problems," Information Systems, vol. 12, no. 2, pp. 171-175, 1987.
[8] B. Kao, S.D. Lee, D.W. Cheung, W.-S. Ho, and K.F. Chan, "Clustering Uncertain Data Using Voronoi Diagrams," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 333-342, Dec. 2008.
[9] N.N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," The VLDB J., vol. 16, no. 4, pp. 523-544, 2007.
[10] D. Barbará, H. Garcia-Molina, and D. Porter, "The Management of Probabilistic Data," IEEE Trans. Knowledge and Data Eng., vol. 4, no. 5, pp. 487-502, Oct. 1992.
[11] R. Cheng, D.V. Kalashnikov, and S. Prabhakar, "Querying Imprecise Data in Moving Object Environments," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1112-1127, Sept. 2004.
[12] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.S. Vitter, "Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 876-887, Aug./Sept. 2004.
[13] J. Chen and R. Cheng, "Efficient Evaluation of Imprecise Location-Dependent Queries," Proc. Int'l Conf. Data Eng. (ICDE), pp. 586-595, Apr. 2007.
[14] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc., vol. B39, pp. 1-38, 1977.
[15] H.-P. Kriegel and M. Pfeifle, "Density-Based Clustering of Uncertain Data," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 672-677, Aug. 2005.
[16] J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[17] H. Hamdan and G. Govaert, "Mixture Model Clustering of Uncertain Data" Proc. 14th IEEE Int'l Conf. Fuzzy Systems, pp. 879-884, May 2005.
[18] S.D. Lee, B. Kao, and R. Cheng, "Reducing UK-Means to K-Means," Proc. First Workshop Data Mining of Uncertain Data (DUNE), in Conjunction with the Seventh IEEE Int'l Conf. Data Mining (ICDM), Oct. 2007.
[19] G. Cormode and A. McGregor, "Approximation Algorithms for Clustering Uncertain Data," Proc. Symp. Principles of Database Systems (PODS), M. Lenzerini and D. Lembo, eds., pp. 191-200, June 2008.
[20] C.K. Chui, B. Kao, and E. Hung, "Mining Frequent Itemsets from Uncertain Data," Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), pp. 47-58, May 2007.
[21] C.C. Aggarwal, "On Density Based Transforms for Uncertain Data Mining," Proc. Int'l Conf. Data Eng. (ICDE), pp. 866-875, Apr. 2007.
[22] H.-P. Kriegel and M. Pfeifle, "Hierarchical Density-Based Clustering of Uncertain Data," Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 689-692, Nov. 2005.
[23] E.H. Ruspini, "A New Approach to Clustering," Information and Control, vol. 15, no. 1, pp. 22-32, 1969.
[24] J.C. Dunn, "A Fuzzy Relative of the Isodata Process and Its Use in Detecting Compact Well-Separated Clusters," J. Cybernetics, vol. 3, pp. 32-57, 1973.
[25] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, 1981.
[26] M. Sato, Y. Sato, and L.C. Jain, Fuzzy Clustering Models and Applications. Physica-Verlag, 1997.
[27] M. Tabakov, "A Fuzzy Clustering Technique for Medical Image Segmentation" Proc. 2006 Int'l Symp. Evolving Fuzzy Systems, pp. 118-122, Sept. 2006.
[28] I. Stanoi, M. Riedewald, D. Agrawal, and A.E. Abbadi, "Discovery of Influence Sets in Frequently Updated Databases," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 99-108, Sept. 2001.
[29] F. Korn and S. Muthukrishnan, "Influence Sets Based on Reverse Nearest Neighbor Queries," Proc. ACM SIGMOD, pp. 201-212, May 2000.
[30] Y. Tao, D. Papadias, and X. Lian, "Reverse kNN Search in Arbitrary Dimensionality," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 744-755, Aug./Sept. 2004.
[31] Y. Manolopoulos, A. Nanopoulos, A.N. Papadopoulos, and Y. Theodoridis, R-Trees: Theory and Applications. Springer, 2005.
[32] F. Aurenhammer, "Voronoi Diagrams—A Survey of a Fundamental Geometric Data Structure," ACM Computing Surveys, vol. 23, no. 3, pp. 345-405, 1991.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool