This Article 
 Bibliographic References 
 Add to: 
Filtering Data Streams for Entity-Based Continuous Queries
February 2010 (vol. 22 no. 2)
pp. 234-248
Reynold Cheng, The University of Hong Kong, Hong Kong
Ben C.M. Kao, The University of Hong Kong, Hong Kong
Alan Kwan, The University of Hong Kong, Hong Kong
Sunil Prabhakar, Purdue University, West Lafayette
Yi-Cheng Tu, University of South Florida, Tampa
The idea of allowing query users to relax their correctness requirements in order to improve performance of a data stream management system (e.g., location-based services and sensor networks) has been recently studied. By exploiting the maximum error (or tolerance) allowed in query answers, algorithms for reducing the use of system resources have been developed. In most of these works, however, query tolerance is expressed as a numerical value, which may be difficult to specify. We observe that in many situations, users may not be concerned with the actual value of an answer, but rather which object satisfies a query (e.g., "who is my nearest neighbor?”). In particular, an entity-based query returns only the names of objects that satisfy the query. For these queries, it is possible to specify a tolerance that is "nonvalue-based.” In this paper, we study fraction-based tolerance, a type of nonvalue-based tolerance, where a user specifies the maximum fractions of a query answer that can be false positives and false negatives. We develop fraction-based tolerance for two major classes of entity-based queries: 1) nonrank-based query (e.g., range queries) and 2) rank-based query (e.g., k-nearest-neighbor queries). These definitions provide users with an alternative to specify the maximum tolerance allowed in their answers. We further investigate how these definitions can be exploited in a distributed stream environment. We design adaptive filter algorithms that allow updates be dropped conditionally at the data stream sources without affecting the overall query correctness. Extensive experimental results show that our protocols reduce the use of network and energy resources significantly.

[1] D. Abadi et al., “The Design of the Borealis Stream Processing Engine,” Proc. Second Biennial Conf. Innovative Data Systems Research (CIDR), 2005.
[2] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless Sensor Networks: A Survey,” Computer Networks J., vol. 38, no. 4, pp. 393-422, Mar. 2002.
[3] A. Arasu et al., “Characterizing Memory Requirements for Queries over Continuous Data Streams,” ACM Trans. Database Systems, vol. 29, no. 1, pp. 162-194, 2004.
[4] B. Babcock and C. Olston, “Distributed Top-k Monitoring,” Proc. ACM SIGMOD, 2003.
[5] M. Charikar, K. Chen, and M. Farach-Colton, “Finding Frequent Items in Data Streams,” Theoretical Computer Science, vol. 312, pp.3-15, 2004.
[6] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD, 2003.
[7] R. Cheng, B. Kao, S. Prabhakar, A. Kwan, and Y. Tu, “Adaptive Stream Filters for Entity-Based Queries with Non-Value Tolerance,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[8] Y. Chi, H. Wang, P. Yu, and R. Muntz, “Loadstar: A Load Shedding Scheme for Classifying Data Streams,” Proc. SIAM Conf. Data Mining, 2005.
[9] MPR—Mote Processor Radio Board User's Manual. Crossbow, Inc., 2003.
[10] B. Cui et al., “Exploring Bit-Difference for Approximate knn Search in High-Dimensional Databases,” Proc. Australasian Database Conf., 2005.
[11] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos, “Hierarchical In-Network Data Aggregation with Quality Guarantees,” Proc. Conf. Extending Database Technology (EDBT), 2004.
[12] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, “Model-Driven Data Acquisition in Sensor Networks,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[13] L. Doherty, B.A. Warneke, B.E. Boser, and K.S.J. Peter, “Energy and Performance Considerations for Smart Dust,” Int'l J. Parallel and Distributed Sensor Networks, vol. 4, no. 3, pp. 121-133, 2001.
[14] D. Abadiand et al., “Aurora: A Data Stream Management System,” Proc. ACM SIGMOD, 2003.
[15] S. Ganguly, M. Garofalakis, R. Rastogi, and K. Sabnani, “Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks,” Proc. Int'l Conf. Distributed Computing Systems (ICDCS), 2007.
[16] M. Greenwald and S. Khanna, “Power-Conserving Computation of Order Statistics over Sensor Networks,” Proc. Symp. Principles of Database System (PODS), 2004.
[17] V. Hristidis, L. Gravano, and Y. Papakonstantinou, “Efficient IR-Style Keyword Search over Relational Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[18] G. Iwerks, H. Samet, and K. Smith, “Continuous k-Nearest Neighbor Queries for Continuously Moving Points with Updates,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[19] A. Jain, E. Chang, and Y. Wang, “Adaptive Stream Resource Management Using Kalman Filters,” Proc. ACM SIGMOD, 2004.
[20] P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein, “Energy-Efficient Computing for Wildlife Tracking: Design Tradeoffs and Early Experiences with ZebraNet,” Proc. Ann. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002.
[21] S. Khanna and W. Tan, “On Computing Functions with Uncertainty,” Proc. Symp. Principles of Database System (PODS), 2001.
[22] N. Koudas, B. Ooi, K. Tan, and R. Zhang, “Approximate NN Queries on Streams with Guaranteed Error/Performance Bounds,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[23] Lawrance Berkeley National Laboratory, The Internet Traffic Archive, http:/, 2009.
[24] O. Landsiedel, K. Wehrle, and S. Gotz, “Accurate Prediction of Power Consumption in Sensor Networks,” Proc. IEEE Workshop Embedded Networked Sensors (EmNetS II), 2005.
[25] Z. Liu, K.C. Sia, and J. Cho, “Cost Efficient Processing of Min/Max Queries over Distributed Sensors with Uncertainty,” Proc. ACM Symp. Applied Computing (SAC), 2005.
[26] K. Mouratidis et al., “A Threshold-Based Algorithm for Continuous Monitoring of k Nearest Neighbors,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1451-1464, Nov. 2005.
[27] J. Ni and C.V. Ravishankar, “Probabilistic Spatial Database Operations,” Proc. Int'l Symp. Advances in Spatial and Temporal Databases (SSTD), 2003.
[28] C. Olston, J. Jiang, and J. Widom, “Adaptive Filters for Continuous Queries over Distributed Data Streams,” Proc. ACM SIGMOD, 2003.
[29] V. Poosala and V. Ganti, “Fast Approximate Query Answering Using Precomputed Statistics,” Proc. Int'l Conf. Data Eng. (ICDE), 1999.
[30] A. Silberstein, R. Braynard, and J. Yang, “Constraint Chaining: On Energy-Efficient Continuous Monitoring in Sensor Networks,” Proc. ACM SIGMOD '06, 2006.
[31] N. Tatbul et al., “Load Shedding in a Data Stream Manager,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[32] Y.-C. Tu, S. Liu, S. Prabhakar, and B. Yao, “Load Shedding in Stream Databases: A Control-Based Approach,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[33] Mesquite Software, CSIM 19, http:/, 2009.
[34] S. Vrbsky and J. Liu, “Producing Approximate Answers to Set- and Single-Valued Queries,” J. Systems and Software, vol. 27, no. 3, pp. 243-251, 1994.
[35] O. Wolfson, P. Sistla, S. Chamberlain, and Y. Yesha, “Updating and Querying Databases That Track Mobile Units,” Distributed and Parallel Databases, vol. 7, no. 3, pp. 257-387, 1999.

Index Terms:
Data streams, continuous queries, adaptive filters, fraction-based tolerance.
Reynold Cheng, Ben C.M. Kao, Alan Kwan, Sunil Prabhakar, Yi-Cheng Tu, "Filtering Data Streams for Entity-Based Continuous Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 2, pp. 234-248, Feb. 2010, doi:10.1109/TKDE.2009.63
Usage of this product signifies your acceptance of the Terms of Use.