This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Maintaining Sliding Window Skylines on Data Streams
March 2006 (vol. 18 no. 3)
pp. 377-391
The skyline of a multidimensional data set contains the "best” tuples according to any preference function that is monotonic on each dimension. Although skyline computation has received considerable attention in conventional databases, the existing algorithms are inapplicable to stream applications because 1) they assume static data that are stored in the disk (rather than continuously arriving/expiring), 2) they focus on "one-time” execution that returns a single skyline (in contrast to constantly tracking skyline changes), and 3) they aim at reducing the I/O overhead (as opposed to minimizing the CPU-cost and main-memory consumption). This paper studies skyline computation in stream environments, where query processing takes into account only a "sliding window” covering the most recent tuples. We propose algorithms that continuously monitor the incoming data and maintain the skyline incrementally. Our techniques utilize several interesting properties of stream skylines to improve space/time efficiency by expunging data from the system as early as possible (i.e., before their expiration). Furthermore, we analyze the asymptotical performance of the proposed solutions, and evaluate their efficiency with extensive experiments.

[1] P.K. Agarwal and J. Erickson, “Geometric Range Searching and Its Relatives,” Contemporary Math., vol. 223, pp. 1-56, 1999.
[2] R. Ananthakrishna, A. Das, J. Gehrke, F. Korn, S. Muthukrishnan, and D. Srivastava, “Efficient Approximation of Correlated Sums on Data Streams,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 569-572, 2003.
[3] W.-T. Balke, U. Guntzer, and J.X. Zheng, “Efficient Distributed Skylining for Web Information Systems,” Proc. Int'l Conf. Extending Database Technology, pp. 256-273, 2004.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD, pp. 322-331 1990.
[5] M. Berg, M. Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. Springer, 2000.
[6] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” Proc. Int'l Conf. Data Eng., pp. 421-430, 2001.
[7] D. Carney, U. Cetintemel, A. Rasin, S.B. Zdonik, M. Cherniack, and M. Stonebraker, “Operator Scheduling in a Data Stream Manager,” Proc. Int'l Conf. Very Large Databases (VLDB), pp. 838-849, 2003.
[8] B. Chazelle, “A Functional Approach to Data Structures and Its Use in Multidimensional Searching,” SIAM J. Computing, vol. 17, no. 3, pp. 427-462, 1988.
[9] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, “Skyline with Presorting,” Proc. Int'l Conf. Data Eng., pp. 717-719, 2003.
[10] A. Das, J. Gehrke, and M. Riedewald, “Approximate Join Processing over Data Streams,” Proc. SIGMOD, pp. 40-51, 2003.
[11] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms for Middleware,” Proc. ACM Symp. Principles of Database Systems, 2001.
[12] S. Ganguly, M.N. Garofalakis, and R. Rastogi, “Processing Set Expressions over Continuous Update Streams,” Proc. ACM SIGMOD, pp. 265-276, 2003.
[13] J. Gehrke, F. Korn, and D. Srivastava, “On Computing Correlated Aggregates over Continual Data Streams,” Proc. ACM SIGMOD, pp. 13-24, 2001.
[14] P. Godfrey, “Skyline Cardinality for Relational Processing,” Proc. Third Int'l Symp. Foundations of Information and Knowledge Systems (FoIKS), pp. 78-97, 2004.
[15] L. Golab and M.T. Ozsu, “Processing Sliding Window Multijoins in Continuous Queries over Data Streams,” Proc. Int'l Conf. Very Large Databases (VLDB), pp. 500-511, 2003.
[16] G. Graefe and P.-A. Larson, “B-Tree Indexes and CPU Caches,” Proc. Int'l Conf. Data Eng., pp. 349-358, 2001.
[17] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD, pp. 47-57, 1984.
[18] G.R. Hjaltason and H. Samet, “Distance Browsing in Spatial Databases,” Proc. ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999.
[19] K.V.R. Kanth and A.K. Singh, “Optimal Dynamic Range Searching in Nonreplicating Index Structures,” Proc. Int'l Conf. Database Theory, pp. 257-276, 1999.
[20] K. Kim, S.K. Cha, and K. Kwon, “Optimizing Multidimensional Index Trees for Main Memory Access,” Proc. ACM SIGMOD, 2001.
[21] D. Kossmann, F. Ramsak, and S. Rost, “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,” Proc. Int'l Conf. Very Large Databases (VLDB), pp. 275-286, 2002.
[22] H.T. Kung, F. Luccio, and F.P. Preparata, “On Finding the Maxima of a Set of Vectors,” J. ACM, vol. 22, no. 4, pp. 469-476, 1975.
[23] D. Papadias, Y. Tao, G. Fu, and B. Seeger, “An Optimal and Progressive Algorithm for Skyline Queries,” Proc. ACM SIGMOD, pp. 467-478, 2003.
[24] U. Srivastava and J. Widom, “Memory-Limited Execution of Windowed Stream Joins,” Proc. Int'l Conf. Very Large Databases (VLDB), pp. 324-335, 2004.
[25] R. Steuer, Multiple Criteria Optimization. Wiley, 1986.
[26] K.-L. Tan, P.-K. Eng, and B.C. Ooi, “Efficient Progressive Skyline Computation,” Proc. Int'l Conf. Very Large Databases (VLDB), pp. 301-310, 2001.
[27] Y. Theodoridis and T.K. Sellis, “A Model for the Prediction of R-Tree Performance,” Proc. ACM Symp. Principles of Database Systems, pp. 161-171, 1996.
[28] S. Viglas and J.F. Naughton, “Rate-Based Query Optimization for Streaming Information Sources,” Proc. SIGMOD, pp. 37-48, 2002.
[29] D. Zhang and V.J. Tsotras, “Improving Min/Max Aggregation over Spatial Objects,” Proc. Ninth ACM Int'l Symp. Advances in Geographic Information Systems, pp. 88-93, 2001.

Index Terms:
Skyline, stream, database, algorithm.
Citation:
Yufei Tao, Dimitris Papadias, "Maintaining Sliding Window Skylines on Data Streams," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 377-391, March 2006, doi:10.1109/TKDE.2006.48
Usage of this product signifies your acceptance of the Terms of Use.