This Article 
 Bibliographic References 
 Add to: 
Continuous K-Means Monitoring with Low Reporting Cost in Sensor Networks
December 2009 (vol. 21 no. 12)
pp. 1679-1691
Ming Hua, Simon Fraser University, Burnaby
Man Ki Lau, MacDonald, Dettwiler and Associates Ltd., Richmond
Jian Pei, Simon Fraser Univeristy, Burnaby
Kui Wu, University of Victoria, Victoria
In this paper, we study an interesting problem: continuously monitoring k-means clustering of sensor readings in a large sensor network. Given a set of sensors whose readings evolve over time, we want to maintain the k-means of the readings continuously. The optimization goal is to reduce the reporting cost in the network, that is, let as few sensors as possible report their current readings to the data center in the course of maintenance. To tackle the problem, we propose the reading reporting tree, a hierarchical data collection, and analysis framework. Moreover, we develop several reporting cost-effective methods using reading reporting trees in continuous k-means monitoring. First, a uniform sampling method using a reading reporting tree can achieve good quality approximation of k-means. Second, we propose a reporting threshold method which can guarantee the approximation quality. Last, we explore a lazy approach which can reduce the intermediate computation substantially. We conduct a systematic simulation evaluation using synthetic data sets to examine the characteristics of the proposed methods.

[1] S. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans. Information Theory, vol. IT-28, no. 2, pp. 129-137, Mar. 1982.
[2] J.B. Macqueen, “Some Methods of Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[3] M. Li and Y. Liu, “Underground Structure Monitoring with Wireless Sensor Networks,” Proc. Sixth Int'l Conf. Information Processing in Sensor Networks (IPSN '07), pp. 69-78, 2007.
[4] P. Brucker, “On the Complexity of Clustering Problems,” Lecture Notes in Economics and Math. Systems, vol. 157, pp. 45-54, Springer, 1978.
[5] S. Durocher, “Geometric Facility Location under Continuous Motion,” PhD thesis, Univ. of British Columbia, Apr. 2006.
[6] C.C. Aggarwal, J. Han, J. Wang, and P. Yu, “A Framework for Clustering Evolving Data Streams,” Proc. 19th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 81-92, Sept. 2003.
[7] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams: Theory and Practice,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 515-528, May/June 2003.
[8] L. O'Callaghan, A. Meyerson, R. Motwani, N. Mishra, and S. Guha, “Streaming-Data Algorithms for High-Quality Clustering,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 685-694, 2002.
[9] J. Beringer and E. Hüllermeier, “Online Clustering of Parallel Data Streams,” Data and Knowledge Eng., vol. 58, no. 2, pp. 180-204, 2006.
[10] M. Younis, M. Youssef, and K. Arisha, “Energy-Aware Routing in Cluster-Based Sensor Networks,” Proc. 10th IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '02), pp. 129-136, 2002.
[11] S. Ghiasi, “Optimal Energy Aware Clustering in Sensor Network,” Sensor, vol. 2, pp. 258-269, 2002.
[12] S. Yoon and C. Shahabi, “The Clustered Aggregation (CAG) Technique Leveraging Spatial and Temporal Correlations in Wireless Sensor Networks,” ACM Trans. Sensor Networks, vol. 3, no. 1, p. 3, 2007.
[13] S. Banerjee and S. Khuller, “A Clustering Scheme for Hierarchical Control in Multi-Hop Wireless Networks,” Proc. INFOCOM, pp.1028-1037, 2001.
[14] S. Bandyopadhyay and E.J. Coyle, “An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks,” Proc. INFOCOM, pp. 1713-1723, 2003.
[15] W.B. Heinzelman, A.P. Chandrakasan, and H. Balakrishnan, “An Application-Specific Protocol Architecture for Wireless Microsensor Networks,” IEEE Trans. Wireless Comm., vol. 1, no. 4, pp. 660-670, Oct. 2002.
[16] O. Younis and S. Fahmy, “Distributed Clustering in Ad-Hoc Sensor Networks: A Hybrid, Energy-Efficient Approach,” Proc. INFOCOM, pp. 629-640, 2004.
[17] O. Younis and S. Fahmy, “Heed: A Hybrid, Energy-Efficient, Distributed Clustering Approach for Ad Hoc Sensor Networks,” IEEE Trans. Mobile Computing, vol. 3, no. 4, pp. 366-379, Oct.-Dec. 2004.
[18] A. Meka and A.K. Singh, “Distributed Spatial Clustering in Sensor Networks,” Proc. Int'l Conf. Extending Database Technology (EDBT '06), pp. 980-1000, 2006.
[19] C. Liu, K. Wu, and J. Pei, “An Energy-Efficient Data Collection Framework for Wireless Sensor Networks by Exploiting Spatiotemporal Correlation,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 7, pp. 1010-1023, July 2007.
[20] B.A. Bash, J.W. Byers, and J. Considine, “Approximately Uniform Random Sampling in Sensor Networks,” Proc. First Int'l Workshop Data Management for Sensor Networks (DMSN '04), pp. 32-39, 2004.
[21] D. Chu, A. Deshpande, J.M. Hellerstein, and W. Hong, “Approximate Data Collection in Sensor Networks Using Probabilistic Models,” Proc. Int'l Conf. Data Eng. (ICDE '06), p. 48, 2006.
[22] A. Deshpande, C. Guestrin, S. Madden, J.M. Hellerstein, and W. Hong, “Model-Driven Data Acquisition in Sensor Networks,” Proc. Int'l Conf. Very Large Data Bases (VLDB '04), pp. 588-599, 2004.
[23] D.Q. Goldin, “Faster In-Network Evaluation of Spatial Aggregation in Sensor Networks,” Proc. Int'l Conf. Data Eng. (ICDE '06), p.148, 2006.
[24] A. Manjhi, S. Nath, and P.B. Gibbons, “Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams,” Proc. ACM SIGMOD, pp. 287-298, 2005.
[25] X. Yang, H.-B. Lim, M.T. Özsu, and K.-L. Tan, “In-Network Execution of Monitoring Queries in Sensor Networks,” Proc. ACM SIGMOD, pp. 521-532, 2007.
[26] A. Bhattacharya, A. Meka, and A.K. Singh, “Mist: Distributed Indexing and Querying in Sensor Networks Using Statistical Models,” Proc. Int'l Conf. Very Large Data Bases (VLDB '07), pp.854-865, 2007.
[27] D. Tulone and S. Madden, “An Energy-Efficient Querying Framework in Sensor Networks for Detecting Node Similarities,” Proc. Ninth ACM Int'l Symp. Modeling Analysis and Simulation of Wireless and Mobile Systems (MSWiM '06), pp. 191-300, 2006.
[28] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. IEEE Symp. Foundations of Computer Science (FOCS '00), pp. 359-366, 2000.
[29] Y. Yu, D. Ganesan, L. Girod, D. Estrin, and R. Govindan, “Synthetic Data Generation to Support Irregular Sampling in Sensor Networks,” Proc. Workshop Geo Sensor Networks '03, 2003.
[30] Y. Yu, D. Estrin, M. Rahimi, and R. Govindan, “Using More Realistic Data Models to Evaluate Sensor Network Data Processing Algorithms,” Proc. 29th Ann. IEEE Int'l Conf. Local Computer Networks (LCN '04), pp. 569-570, 2004.
[31] A. Jindal and K. Psounis, “Modeling Spatially-Correlated Data of Sensor Networks with Irregular Topologies,” Proc. Second Ann. IEEE Comm. Soc. Conf. Sensor and Ad Hoc Comm. and Networks (SECON '05), pp. 305-316, 2005.
[32] A. Jindal and K. Psounis, “Modeling Spatially Correlated Data in Sensor Networks,” ACM Trans. Sensor Networks, vol. 2, no. 4, pp.466-499, 2006.
[33] Y.-A.L. Borgne, M. Moussaid, and G. Bontempi, “Simulation Architecture for Data Processing Algorithms in Wireless Sensor Networks,” Proc. 20th Int'l Conf. Advanced Information Networking and Applications (AINA '06), vol. 2, pp. 383-387, 2006.
[34] V.V. Vazirani, Approximation Algorithms. Springer, Mar. 2004.
[35] D. Angluin and L.G. Valiant, “Fast Probabilistic Algorithms for Hamiltonian Circuits and Matchings,” Proc. Ninth Ann. ACM Symp. Theory of Computing (STOC '77), pp. 30-41, 1977.

Index Terms:
Sensor networks, clustering, k-means, low reporting cost.
Ming Hua, Man Ki Lau, Jian Pei, Kui Wu, "Continuous K-Means Monitoring with Low Reporting Cost in Sensor Networks," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 12, pp. 1679-1691, Dec. 2009, doi:10.1109/TKDE.2009.41
Usage of this product signifies your acceptance of the Terms of Use.