The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2008 vol.20)
pp: 216-229
ABSTRACT
We propose a novel predictive quantization (PQ) based approach for online summarization of multiple time varying data streams. A synopsis over a sliding window of most recent entries is computed in one pass and dynamically updated in constant time. The correlation between consecutive data elements is effectively taken into account without the need for preprocessing. We extend PQ to multiple streams and propose structures for real-time summarization and querying of a massive number of streams. Queries on any subsequence of a sliding window over multiple streams are processed in real-time. We examine each component of the proposed approach, prediction and quantization, separately and investigate the space-accuracy trade off for synopsis generation. Complementing the theoretical optimality of PQ based approaches, we show that the proposed technique, even for very short prediction windows, significantly outperforms the current techniques for a wide variety of query types on both synthetic and real data sets.
INDEX TERMS
multiple streams, Prediction, quantization, summarization, online update
CITATION
Fatih Altiparmak, Ertem Tuncel, Hakan Ferhatosmanoglu, "Incremental Maintenance of Online Summaries Over Multiple Streams", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 2, pp. 216-229, February 2008, doi:10.1109/TKDE.2007.190693
REFERENCES
[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms (FODO '93), pp. 69-84, 1993.
[2] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. 21st ACM Sigmod-SIGACT-SIGART Symp. Principles of Database Systems (PODS '02), pp. 1-16, 2002.
[3] B. Babcock, M. Datar, and R. Motwani, “Sampling from a Moving Window over Streaming Data,” Proc. 13th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '02), 2002.
[4] S. Babu and J. Widom, “Continuous Queries over Data Streams,” SIGMOD Record, vol. 30, no. 3, pp. 109-120, Sept. 2001.
[5] S. Berchtold, C. Bohm, H. Jagadish, H. Kriegel, and J. Sander, “Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces,” Proc. 16th Int'l Conf. Data Eng. (ICDE '00), pp. 577-588, 2000.
[6] A. Bulut and A. Singh, “Stardust: Fast Stream Indexing Using Incremental Wavelet Approximations,” Technical Report TRCS03-24, Dept. of Computer Science, Univ. of California at Santa Barbara, 2003.
[7] E. Cohen and M. Strauss, “Maintaining Time-Decaying Stream Aggregates,” Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '03), pp. 223-233, 2003.
[8] M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining Stream Statistics over Sliding Windows,” Proc. 13th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '02), 2002.
[9] A. Dobra, M. Garofalakis, J.E. Gehrke, and R. Rastogi, “Processing Complex Aggregate Queries over Data Streams,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 61-72, June 2002.
[10] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi, “Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets,” Proc. Ninth ACM Int'l Conf. Information and Knowledge Management (CIKM '00), pp. 202-209, Nov. 2000.
[11] A. Gersho, Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.
[12] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Straus, “Surfing Wavelets on Streams: One Pass Summaries for Approximate Aggregate Queries,” Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), 2001.
[13] S. Guha, “Space Efficiency in Synopsis Construction Algorithms,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 409-420, 2005.
[14] S. Guha, D. Gunopulos, and N. Koudas, “Correlating Synchronous and Asynchronous Data Streams,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 529-534, 2003.
[15] S. Guha and N. Koudas, “Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), 2002.
[16] P. Indyk, N. Koudas, and S. Muthukrishnan, “Identifying Representative Trends in Massive Time Series Data Sets Using Sketches,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), Sept. 2000.
[17] Y.E. Ioannidis and V. Poosala, “Balancing Histogram Optimality and Practicality for Query Result Size Estimation,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '95), pp. 233-244, 1995.
[18] W.B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz Mappings into a Hilbert Space,” Proc. Conf. Modern Analysis and Probability, pp. 189-206, 1982.
[19] J. Kang, J.F. Naughton, and S.D. Viglas, “Evaluating Window Joins over Unbounded Streams,” Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 560-571, Mar. 2003.
[20] F. Korn, S. Muthukrishnan, and D. Srivastava, “Reverse Nearest Neighbor Aggregates over Data Streams,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), Aug. 2002.
[21] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-Based Change Detection: Methods, Evaluation, and Applications,” Proc. Third ACM SIGCOMM Conf. Internet Measurement (IMC '03), 2003.
[22] X. Liu and H. Ferhatosmanoglu, “Efficient k-NN Search on Streaming Data Series,” Proc. Int'l Symp. Spatial and Temporal Databases (SSTD '03), pp. 83-101, July 2003.
[23] S.P. Lloyd, “Least Squares Quantization in PCM,” IEEE Trans. Information Theory, vol. 28, pp. 127-135, Mar. 1982.
[24] S. Papadimitriou, A. Brockwell, and C. Faloutsos, “Adaptive Hands-Off Stream Mining,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 560-571, Sept. 2003.
[25] S. Papadimitriou, J. Sun, and C. Faloutsos, “Streaming Pattern Discovery in Multiple Time-Series,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 697-708, 2005.
[26] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, “The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), pp. 516-526, Sept. 2000.
[27] N. Thaper, S. Guha, P. Indyk, and N. Koudas, “Dynamic Multidimensional Histograms,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 428-439, June 2002.
[28] P.A. Tucker, D. Maier, T. Sheard, and L. Fegaras, “Exploiting Punctuation Semantics in Continuous Data Streams,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 556-568, May/June 2003.
[29] S. Viglas and J.F. Naughton, “Rate-Based Query Optimization for Streaming Information Sources,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), June 2002.
[30] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 194-205, Aug. 1998.
[31] Y. Zhu and D. Shasha, “Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), Aug. 2002.
[32] Y. Ogras and H. Ferhatosmanoglu, “Online Summarization of Dynamic Time Series Data,” The VLDB J., vol. 15, no. 1, pp. 84-98, 2006.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool