This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
41st Annual Symposium on Foundations of Computer Science
Clustering data streams
Redondo Beach, California
November 12-November 14
ISBN: 0-7695-0850-2
S. Guha, Dept. of Comput. Sci., Stanford Univ., CA, USA
N. Mishra, Dept. of Comput. Sci., Stanford Univ., CA, USA
R. Motwani, Dept. of Comput. Sci., Stanford Univ., CA, USA
L. O'Callaghan, Dept. of Comput. Sci., Stanford Univ., CA, USA
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as Web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense.
Index Terms:
data analysis; pattern clustering; very large databases; computational complexity; deterministic algorithms; data stream clustering; point sequence; data stream model; massive data sets; Web click stream analysis; multimedia data analysis; constant-factor approximation algorithms; k-median problem; deterministic algorithms
Citation:
S. Guha, N. Mishra, R. Motwani, L. O'Callaghan, "Clustering data streams," focs, pp.359, 41st Annual Symposium on Foundations of Computer Science, 2000
Usage of this product signifies your acceptance of the Terms of Use.