Proceedings 18th International Conference on Data Engineering (2002)
San Jose, California
Feb. 26, 2002 to Mar. 1, 2002
Liadan O'Callaghan , Stanford University
Adam Meyerson , Stanford University
Rajeev Motwani , Stanford University
Nina Mishra , Hewlett Packard Laboratories
Sudipto Guha , University of Pennsylvania
Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algorithms that consume a small amount of memory are critical. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.
S. Guha, L. O'Callaghan, R. Motwani, N. Mishra and A. Meyerson, "Streaming-Data Algorithms for High-Quality Clustering," Proceedings 18th International Conference on Data Engineering(ICDE), San Jose, California, 2002, pp. 0685.