
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin J. Strauss, "OnePass Wavelet Decompositions of Data Streams," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 541554, May/June, 2003.  
BibTex  x  
@article{ 10.1109/TKDE.2003.1198389, author = {Anna C. Gilbert and Yannis Kotidis and S. Muthukrishnan and Martin J. Strauss}, title = {OnePass Wavelet Decompositions of Data Streams}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {15}, number = {3}, issn = {10414347}, year = {2003}, pages = {541554}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1198389}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  OnePass Wavelet Decompositions of Data Streams IS  3 SN  10414347 SP541 EP554 EPD  541554 A1  Anna C. Gilbert, A1  Yannis Kotidis, A1  S. Muthukrishnan, A1  Martin J. Strauss, PY  2003 KW  Data streams KW  wavelets KW  randomized algorithms KW  approximate queries. VL  15 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—We present techniques for computing small space representations of massive data streams. These are inspired by traditional waveletbased approximations that consist of specific linear projections of the underlying data. We present general “sketch”based methods for capturing various linear projections and use them to provide pointwise and rangesum estimation of data streams. These methods use small amounts of space and peritem time while streaming through the data and provide accurate representation as our experiments with real data streams show.
[1] R. Agrawal and A. Swami, “A OnePass SpaceEcient Algorithm for Finding Quantiles,” Proc. COMAD, 1995.
[2] N. Alon, P. Gibbons, Y. Matias, and M. Szegedy, “Tracking Join and SelfJoin Sizes in Limited Storage,” Proc. ACM PODS, pp. 1020, 1999.
[3] N. Alon, Y. Matias, and M. Szegedy, “The Space Complexity of Approximating the Frequency Moments,” Proc. ACM Symp. Theory of Computing (STOC), pp. 2029, 1996.
[4] S. Babu and J. Widom, “Continuous Queries over Data Streams,” SIGMOD Record, vol. 30, no. 3, Sept. 2001.
[5] K. Chakrabarti, M.N. Garofalakis, R. Rastogi, and K. Shim, “Approximate Query Processing Using Wavelets,” Proc. VLDB Conf., pp. 111122, 2000.
[6] S. Chaudhuri and L. Gravano, “Evaluating Topk Selection Queries,” Proc. Very Large Data Bases Conf., pp. 397410, 1999.
[7] K.P. Chan and A. Fu, “Efficient Time Series Matching by Wavelets,” Proc. Int'l Conf. Data Eng., 1999.
[8] C. Cortes and D. Pregibon, “SignatureBased Methods for Data Streams,” Data Mining and Knowledge Discovery, vol. 5, no. 3, pp. 167182, 2001.
[9] G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan, “Fast Mining of Tabular Data via Approximate Distance Computations,” Proc. Int'l Conf. Data Eng., 2002.
[10] A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi, “Processing Complex Aggregate Queries over Data Stream,” Proc. ACM SIMGOD, June 2002.
[11] P. Domingos and G. Hulten, “Mining HighSpeed Data Streams,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), P. Domingos and G. Hulten, eds., pp. 7180, 2000.
[12] M. Fang, N. Shivakumar, H. GarciaMolina, R. Motwani, and J.D. Ullman, “Computing Iceberg Queries Efficiently,” Proc. 24th Int'l Conf. Very Large Data Bases, pp. 299310, Aug. 1998.
[13] J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan, “An Approximate$\big. L_1{\hbox{}}{\rm{Difference}}\bigr.$Algorithm for Massive Data Streams,” Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 501511, 1999.
[14] J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan, “Testing and SpotChecking of Data Streams,” Proc. ACMSIAM Symp. Discrete Algorithms (SODA), pp. 165174, 2000.
[15] V. Ganti, J. Gehrke,, and R. Ramakrishnan,"Mining Very Large Databases," Computer, vol. 32, no. 8, Aug. 1999, pp. 3845.
[16] J. Gehrke, F. Korn, and D. Srivastava, “On Computing Correlated Aggregates over Continuous Data Streams,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1324, May 2001.
[17] A. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss, “NearOptimal Sparse Fourier Representations via Sampling,” Proc. ACM Symp. Theory of Computing (STOC), pp. 152161, 2002.
[18] A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Fast, SmallSpace Algorithms for Approximate Histogram Maintenance,” Proc. ACM Symp. Theory of Computing (STOC), 2002.
[19] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “How to Summarize the Universe: Dynamic Maintenance of Quantiles,” Proc. Very Large Data Bases Conf., pp. 454465, 2002.
[20] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “QuickSAND: Quick Summary and Analysis of Network Data,” DIMACS Technical Report 200143, Nov. 2001.
[21] A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Optimal and Approximate Computation of Summary Statistics for Range Aggregates,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 227236, 2001.
[22] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss, “Surfing Wavelets on Streams: OnePass Summaries for Approximate Aggregate Queries,” Proc. 27th Int'l Conf. Very Large Data Bases, pp. 7988, Sept. 2001.
[23] P.B. Gibbons and Y. Matias, New SamplingBased Summary Statistics for Improving Approximate Query Answers Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, 1998.
[24] S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering Data Streams,” Proc. 41st Ann. Symp. Foundations of Computer Science, 2000.
[25] P. Indyk, “Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation,” Proc. 40th Symp. Foundations of Computer Science, pp. 189197, 2000.
[26] N. Thaper, S. Guha, P. Indyk, and N. Koudas, “Multidimensional Dynamic Histograms,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2002.
[27] J. Lee, D. Kim, and C. Chung, MultiDimensional Selectivity Estimation Using Compressed Histogram Information Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data, 1999.
[28] F. MacWilliams and N. Sloane, The Theory of ErrorCorrecting Codes. vol. 16,New York: North Holland Mathematical Library, 1977.
[29] S. Madden and M.J. Franklin, “Fjording the Stream: An Architecture for Queries over Streaming Sensor Data,” Proc. 18th IEEE Int'l Conf. Data Eng., pp. 555566, Feb. 2002.
[30] G. Singh, S. Rajagopalan, and B. Lindsay, Random Sampling Techniques for Space Efficient Computation Of Large Data Sets Proc. SIGMOD, June 1999.
[31] Y. Matias, J.S. Vitter, and M. Wang, WaveletBased Histograms for Selectivity Estimation Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, 1998.
[32] Y. Matias, J. Vitter, and M. Wang, “Dynamic Maintenance of WaveletBased Histograms,” Proc. Very Large Data Bases Conf., pp. 101110, 2000.
[33] N. Nisan, “Pseudorandom Generators for SpaceBounded Computation,” Proc. ACM Symp. Theory of Computing (STOC), pp. 204212, 1990.
[34] V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita, “Improved Histograms for Selectivity Estimation of Range Predicates,” Proc. ACM SIGMOD 1996, pp. 294305, 1996.
[35] Y. Wu, D. Agrawal, and A. Abbadi, “Applying the Golden Rule of Sampling for Query Estimation,” Proc. SIGMOD, pp. 449460, 2001.
[36] J.S. Vitter, M. Wang, and B.R. Iyer, Data Cube Approximation and Histograms via Wavelets Proc. 1998 ACM CIKM Int'l Conf. Information and Knowledge Management, 1998.