Subscribe
Issue No.09 - September (2008 vol.20)
pp: 1205-1216
Zhenjie Zhang , NUS, Singapore
Yin Yang , Hong Kong University of Science and Technology, Hong Kong
Anthony K.H. Tung , National University of Singapore NUS, Singapore Singapore
Dimitris Papadias , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
Given a dataset P, a k-means query returns k points in space (called centers), such that the average squared distance between each point in P and its nearest center is minimized. Since this problem is NP-hard, several approximate algorithms have been proposed and used in practice. In this paper, we study continuous k-means computation at a server that monitors a set of moving objects. Re-evaluating k-means every time there is an object update imposes a heavy burden on the server (for computing the centers from scratch) and the clients (for continuously sending location updates). We overcome these problems with a novel approach that significantly reduces the computation and communication costs, while guaranteeing that the quality of the solution, with respect to the re-evaluation approach, is bounded by a user-defined tolerance. The proposed method assigns each moving object a threshold (i.e., range) such that the object sends a location update only when it crosses the range boundary. First, we develop an efficient technique for maintaining the k-means. Then, we present mathematical formulae and algorithms for deriving the individual thresholds. Finally, we justify our performance claims with extensive experiments.
INDEX TERMS
Data mining, Spatial databases and GIS
CITATION
Zhenjie Zhang, Yin Yang, Anthony K.H. Tung, Dimitris Papadias, "Continuous k-Means Monitoring over Moving Objects", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 9, pp. 1205-1216, September 2008, doi:10.1109/TKDE.2008.54
REFERENCES
 [1] D. Arthur and S. Vassilvitskii, “How Slow is the $k\hbox{-}{\rm Means}$ Method,” Proc. 22nd ACM Symp. Computational Geometry (SoCG), 2006. [2] B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan, “Maintaining Variance and $k\hbox{-}{\rm Means}$ over Data Stream Windows,” Proc. ACM Symp. Principles of Database Systems (PODS), 2003. [3] P. Bradley and U. Fayyad, “Refining Initial Points for $k\hbox{-}{\rm Means}$ Clustering,” Proc. 15th Int'l Conf. Machine Learning (ICML), 1998. [4] T. Brinkhoff, “A Framework for Generating Network-Based Moving Objects,” GeoInformatica, vol. 6, no. 2, pp. 153-180, 2002. [5] A. Datta, D. Vandermeer, A. Celik, and V. Kumar, “Broadcast Protocols to Support Efficient Retrieval from Databases by Mobile Users,” ACM Trans. Database Systems, vol. 24, no. 1, pp. 1-79, 1999. [6] B. Gedik and L. Liu, “MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System,” Proc. Ninth Int'l Conf. Extending Database Technology (EDBT), 2004. [7] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams: Theory and Practice,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 515-528, May/June 2003. [8] S. Har-Peled and B. Sadri, “How Fast is the $k\hbox{-}{\rm Means}$ Method,” Proc. 16th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), 2005. [9] H. Hu, J. Xu, and D. Lee, “A Generic Framework for Monitoring Continuous Spatial Queries over Moving Objects,” Proc. ACM SIGMOD, 2005. [10] M. Inaba, N. Katoh, and H. Imai, “Applications of Weighted Voronoi Diagrams and Randomization to Variance-Based Clustering,” Proc. 10th ACM Symp. Computational Geometry (SoCG), 1994. [11] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999. [12] C. Jensen, D. Lin, and B.C. Ooi, “Continuous Clustering of Moving Objects,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 9, pp.1161-1173, Sept. 2007. [13] C. Jensen, D. Lin, B.C. Ooi, and R. Zhang, “Effective Density Queries on Continuously Moving Objects,” Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE), 2006. [14] P. Kalnis, N. Mamoulis, and S. Bakiras, “On Discovering Moving Clusters in Spatio-Temporal Data,” Proc. Ninth Int'l Symp. Spatial and Temporal Databases (SSTD), 2005. [15] J.M. Kang, M. Mokbel, S. Shekhar, T. Xia, and D. Zhang, “Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors,” Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE), 2007. [16] T. Kanungo, M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An Efficient $k\hbox{-}{\rm Means}$ Clustering Algorithm: Analysis and Implementation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002. [17] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990. [18] A. Kumar, Y. Sabharwal, and S. Sen, “A Simple Linear Time $(1 + \varepsilon)\hbox{-}{\rm Approximation}$ Algorithm for $k\hbox{-}{\rm Means}$ Clustering in Any Dimensions,” Proc. 45th Ann. IEEE Symp. Foundations of Computer Science (FOCS), 2004. [19] Y. Li, J. Han, and J. Yang, “Clustering Moving Objects,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2004. [20] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans. Information Theory, vol. 28, no. 2, pp. 129-136, 1982. [21] M. Meila, “The Uniqueness of a Good Optimum for $k\hbox{-}{\rm Means}$ ,” Proc. 23rd Int'l Conf. Machine Learning (ICML), 2006. [22] M. Mokbel, X. Xiong, and W. Aref, “SINA: Scalable Incremental Processing of Continuous Queries in Spatio-Temporal Databases,” Proc. ACM SIGMOD, 2004. [23] K. Mouratidis, M. Hadjieleftheriou, and D. Papadias, “Conceptual Partitioning: An Efficient Method for Continuous Nearest Neighbor Monitoring,” Proc. ACM SIGMOD, 2005. [24] K. Mouratidis, D. Papadias, S. Bakiras, and Y. Tao, “A Threshold-Based Algorithm for Continuous Monitoring of $K$ Nearest Neighbors,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1451-1464, Nov. 2005. [25] K. Mouratidis, D. Papadias, and S. Papadimitriou, “Tree-Based Partitioning Querying: A Methodology for Computing Medoids in Large Spatial Datasets,” The VLDB J., vol. 17, no. 4, pp. 923-945, 2008. [26] K. Mouratidis, M. Yiu, D. Papadias, and N. Mamoulis, “Continuous Nearest Neighbor Monitoring in Road Networks,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006. [27] R. Ng and J. Han, “Efficient and Effective Clustering Method for Spatial Data Mining,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB), 1994. [28] S. Papadopoulos, D. Sacharidis, and K. Mouratidis, “Continuous Medoid Queries over Moving Objects,” Proc. 10th Int'l Symp. Spatial and Temporal Databases (SSTD), 2007. [29] D. Pelleg and A. Moore, “Accelerating Exact $k\hbox{-}{\rm Means}$ Algorithms with Geometric Reasoning,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 1999. [30] T. Xia and D. Zhang, “Continuous Reverse Nearest Neighbor Monitoring,” Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE), 2006. [31] X. Xiong, M. Mokbel, and W. Aref, “SEA-CNN: Scalable Processing of Continuous $K\hbox{-}{\rm Nearest}$ Neighbor Queries in Spatio-Temporal Databases,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE), 2005. [32] X. Yu, K. Pu, and N. Koudas, “Monitoring $K\hbox{-}{\rm Nearest}$ Neighbor Queries over Moving Objects,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE), 2005. [33] D. Zhang, Y. Du, and L. Hu, “On Monitoring the ${\rm Top}\hbox{-}k$ Unsafe Places,” Proc. 24th IEEE Int'l Conf. Data Eng. (ICDE), 2008. [34] D. Zhang, Y. Du, T. Xia, and Y. Tao, “Progressive Computation of Min-Dist Optimal-Location Query,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006. [35] Z. Zhang, B. Dai, and A. Tung, “On the Lower Bound of Local Optimum in $k\hbox{-}{\rm Means}$ Algorithm,” Proc. Sixth IEEE Int'l Conf. Data Mining (ICDM), 2006.