Subscribe

Issue No.08 - Aug. (2012 vol.24)

pp: 1520-1535

Izchak Sharfman , Technion, Haifa

Assaf Schuster , Technion, Haifa

Daniel Keren , Haifa University, Haifa

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.102

ABSTRACT

An important problem in distributed, dynamic databases is to continuously monitor the value of a function defined on the nodes, and check that it satisfies some threshold constraint. We introduce a monitoring method, based on a geometric interpretation of the problem, which enables to define local constraints at the nodes. It is guaranteed that as long as none of these constraints is violated, the value of the function did not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local data, here we incorporate their temporal behavior. Also, the new constraints are tailored to the geometric properties of the specific monitored function. In addition, we extend the concept of safe zones for the monitoring problem, and show that previous work on geometric monitoring is a special case of the proposed extension. Experimental results on real data reveal that the new approach reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between achievable results and a newly defined lower bound on communication complexity.

INDEX TERMS

Data streams, distributed systems, geometric monitoring, shape, data modeling.

CITATION

Izchak Sharfman, Assaf Schuster, Daniel Keren, "Shape Sensitive Geometric Monitoring",

*IEEE Transactions on Knowledge & Data Engineering*, vol.24, no. 8, pp. 1520-1535, Aug. 2012, doi:10.1109/TKDE.2011.102REFERENCES

- [1]
The European Air Quality Database, http://dataservice.eea. europa.eu/dataservice metadetails.asp?id=1079, 2012.- [2] Y. Gordon, M. Meyer, and S. Reisner, "Constructing a Polytope to Approximate a Convex Body,"
Geometriae Dedicata, vol. 57, pp. 217-222, 1995.- [3] S. Agrawal, S. Deb, K.V.M. Naidu, and R. Rastogi, "Efficient Detection of Distributed Constraint Violations,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 1320-1324, 2007.- [4] N. Alon, Y. Matias, and M. Szegedy, "The Space Complexity of Approximating the Frequency Moments,"
Proc. 28th Ann. ACM Symp. Theory of Computing (STOC '96), pp. 20-29, 2006.- [5] http://archive.ics.uci.edu/ml/datasetsEl+Nino , 2012.
- [6] A. Arasu and G.S. Manku, "Approximate Counts and Quantiles over Sliding Windows,"
Proc. ACM Symp. Principles of Database Systems (PODS '04), pp. 286-296, 2004.- [7] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems,"
Proc. ACM Symp. Principles of Database Systems (PODS '02), pp. 1-16, 2002.- [8] B. Babcock and C. Olston, "Distributed Top-K Monitoring,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 28-39, 2003.- [9] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S.B. Zdonik, "Monitoring Streams - A New Class of Data Management Applications,"
Proc. 28th Int'l Conf. Very Large Databases (VLDB '02), pp. 215-226, 2002.- [10] A. Chakrabarti, G. Cormode, and A. McGregor, "A Near-Optimal Algorithm for Computing the Entropy of a Stream,"
Proc. 18th Ann.ACM-SIAM Symp. Discrete Algorithms (SODA '07), 2007.- [11] M. Charikar, K. Chen, and M. Farach-Colton, "Finding Frequent Items in Data Streams,"
Proc. Int'l Colloquium Automata, Languages and Programming (ICALP '02), pp. 693-703, 2002.- [12] E. Cohen and M.J. Strauss, "Maintaining Time-Decaying Stream Aggregates,"
J. Algorithms, vol. 59, no. 1, pp. 19-36, 2006.- [13] G. Cormode, R. Keralapura, and J. Ramimirtham, "Communication-Efficient Distributed Monitoring of Thresholded Counts,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), 2006.- [14] G. Cormode and M. Garofalakis, "Sketching Streams through the Net: Distributed Approximate Query Tracking,"
Proc. Int'l Conf. Very Large Databases (VLDB '05), pp. 13-24, 2005.- [15] G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi, "Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 25-36, 2005.- [16] G. Cormode, S. Muthukrishnan, and W. Zhuang, "Conquering the Divide: Continuous Clustering of Distributed Data Streams,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 1036-1045, 2007.- [17] G. Cormode, S. Muthukrishnan, and W. Zhuang, "What's Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams,"
Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), p. 57, 2006.- [18] A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi, "Distributed Set-Expression Cardinality Estimation,"
Proc. Int'l Conf. Very Large Databases (VLDB '04), pp. 312-323, 2004.- [19] M. Data, A. Gionis, P. Indyk, and R. Motwani, "Maintaining Stream Statistics over Sliding Windows: (Extended Abstract),"
Proc. 13th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '02), pp. 635-644, 2002.- [20] M. Dilman and D. Raz, "Efficient Reactive Monitoring,"
Proc. IEEE INFOCOM '01, pp. 1012-1019, 2001.- [21] G. Frahling, P. Indyk, and C. Sohler, "Sampling in Dynamic Data Streams and Applications,"
Proc. 21st Ann. Symp. Computational Geometry (SCG '05), pp. 142-149, 2005.- [22] L. Huang, M. Garofalakis, J. Hellerstein, A. Joseph, and N. Taft, "Toward Sophisticated Detection with Distributed Triggers,"
Proc. SIGCOMM Workshop Mining Network Data (MineNet '06), pp. 311-316, 2006.- [23] L. Huang, X. Nguyen, M.N. Garofalakis, J.M. Hellerstein, M.I. Jordan, A.D. Joseph, and N. Taft, "Communication-Efficient Online Detection of Network-Wide Anomalies,"
Proc. IEEE INFOCOM '07, pp. 134-142, 2007.- [24] A. Jain, J.M. Hellerstein, S. Ratnasamy, and D. Wetherall, "A Wakeup Call for Internet Monitoring Systems: The Case for Distributed Triggers,"
Proc. Third ACM SIGCOMM Workshop Hot Topics in Networks (HotNets), 2004.- [25] D.D. Lewis, Y. Yang, T.G. Rose, and F. Li., "Rcv1: A New Benchmark Collection for Text Categorization Research,"
J. Machine Learning Research, vol. 5, pp. 361-397, 2004.- [26] S. Madden and M.J. Franklin, "Fjording the Stream: An Architecture for Queries over Streaming Sensor Data,"
Proc. 18th Int'l Conf. Data Eng. (ICDE '02), p. 555, 2002.- [27] S. Madden, M. Shah, J.M. Hellerstein, and V. Raman, "Continuously Adaptive Continuous Queries over Streams,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 49-60, 2002.- [28] A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston, "Finding (Recently) Frequent Items in Distributed Data Streams,"
Proc. 21st Int'l Conf. Data Eng. (ICDE '05), pp. 767-778, 2005.- [29] G.S. Manku and R. Motwani, "Approximate Frequency Counts over Data Streams,"
Proc. Int'l Conf. Very Large Databases (VLDB '02), pp. 346-357, 2002.- [30] C. Olston, J. Jiang, and J. Widom, "Adaptive Filters for Continuous Queries over Distributed Data Streams,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 563-574, 2003.- [31] P.A. Parrilo, "Semidefinite Programming Relaxations for Semialgebraic Problems,"
Math. Programming, vol. 96, no. 2, pp. 293-320, 2003.- [32] T.G. Rose, M. Stevenson, and M. Whitehead, "The Reuters Corpus Volume 1—From Yesterday's News to Tomorrow's Language Resources,"
Proc. Third Int'l Conf. Language Resources and Evaluation (LREC '02), pp. 827-832, 2002.- [33] I. Sharfman, A. Schuster, and D. Keren, "A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams,"
ACM Trans. Database Systems, vol. 32, no. 4,article 23, 2007.- [34] I. Sharfman, A. Schuster, and D. Keren, "A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), pp. 301-312, 2006.- [35] Y. Yang and J.O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization,"
Proc. 14th Int'l Conf. Machine Learning (ICML '97), pp. 412-420, 1997.- [36] B.K. Yi, N. Sidiropoulos, T. Johnson, H.V. Jagadish, C. Faloutsos, and A. Biliris, "Online Data Mining for Co-Evolving Time Sequences,"
Proc. 16th Int'l Conf. Data Eng. (ICDE '00), p. 13, 2000.- [37] Y.J. Zhao, R. Govindan, and D. Estrin, "Computing Aggregates for Monitoring Wireless Sensor Networks,"
Proc. IEEE First Int'l Workshop Sensor Networks and Protocols (SNPA '03), 2003.- [38] Y. Zhu and D. Shasha, "Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time,"
Proc. 28th Int'l Conf. Very Large Databases (VLDB '02), pp. 358-369, 2002.- [39] G. Sagy, D. Keren, I. Sharfman, and A. Schuster, "Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation,"
Proc. VLDB Endowment, vol. 4, no. 2, pp. 46-57, 2010.- [40] G. Cormode, S. Muthukrishnan, and K. Yi, "Algorithms for Distributed Functional Monitoring,"
Proc. 19th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 1076-1085, 2008.- [41] K. Yi and Q. Zhang, "Optimal Tracking of Distributed Heavy Hitters and Quantiles,"
Proc. 28th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 167-174, 2009.- [42] G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang, "Optimal Sampling from Distributed Streams,"
Proc. 29th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '10), pp. 77-86, 2010.- [43] B.V.K Vijaya Kumar, A. Mahalanobis, and R.D. Juday,
Correlation Pattern Recognition. Cambridge Univ. Press, 2010. |