This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Approximating Data with the Count-Min Sketch
January/February 2012 (vol. 29 no. 1)
pp. 64-69
Graham Cormode, AT&T Labs-Research
Muthu Muthukrishnan, Rutgers University
Faced with handling multiple large data sets in modern data-processing settings, researchers have proposed sketch data structures that capture salient properties while occupying little memory and that update or probe quickly. In particular, the Count-Min sketch has proven effective for a variety of applications. It concurrently tracks many item counts with surprisingly strong accuracy.

1. G. Cormode and S. Muthukrishnan, "An Improved Data Stream Summary: The Count-Min Sketch and Its Applications," J. Algorithms, vol. 55, no. 1, 2005, pp. 58–75.
2. T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms, MIT Press, 1990.
3. M. Thorup, "Even Strongly Universal Hashing Is Pretty Fast," Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA 00), SIAM, 2000, pp. 496–497.
4. Y.-K. Lai and G.T. Byrd, "High-Throughput Sketch Update on a Low-Power Stream Processor," Proc. ACM/IEEE Symp. Architecture for Networking and Communications Systems (ANCS 06), ACM Press, 2006, pp. 123–132.
5. D. Thomas et al., "On Efficient Query Processing of Stream Counts on the Cell Processor," Proc. 25th IEEE Int'l Conf. Data Eng. (ICDE 09), IEEE CS Press, 2009, pp. 748–759.
6. Y.-K. Lai et al., "Implementing On-line Sketch-Based Change Detection on a NetFPGA Platform," Proc. 1st Asia NetFPGA Developers Workshop, ACM, 2010.
7. A. Gilbert and P. Indyk, "Sparse Recovery Using Sparse Matrices," Proc. IEEE, vol. 98, no. 6, 2010, pp. 937–947.
8. A. Goyal et al., "Sketch Techniques for Scaling Distributional Similarity to the Web," Proc. Workshop Geometrical Models of Natural Language Semantics (GEMS 10), Assoc. Computational Linguistics, 2010, pp. 51–56.
1. G. Cormode and S. Muthukrishnan, "An Improved Data Stream Summary: The Count-Min Sketch and Its Applications," J. Algorithms, vol. 55, no. 1, 2005, pp. 58-75.
2. M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge Univ. Press, 2005.
3. G. Cormode and M. Hadjieleftheriou, "Finding the Frequent Items in Streams of Data," Comm. ACM, vol. 52, no. 10, 2009, pp. 97-105.
4. A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Mathematics, vol. 1, no. 4, 2005, pp. 485-509.
5. N. Alon et al., "Tracking Join and Self-Join Sizes in Limited Storage," Proc. 18th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS 99), ACM Press, 1999, pp. 10-20.
6. K. Beyer et al., "Distinct-Value Synopses for Multiset Operations," Comm. ACM, vol. 52, no. 10, 2009, pp. 87-95.

Index Terms:
Count-Min sketch, massive data, streaming algorithms, software engineering
Citation:
Graham Cormode, Muthu Muthukrishnan, "Approximating Data with the Count-Min Sketch," IEEE Software, vol. 29, no. 1, pp. 64-69, Jan.-Feb. 2012, doi:10.1109/MS.2011.127
Usage of this product signifies your acceptance of the Terms of Use.