This Article 
 Bibliographic References 
 Add to: 
Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters
February 2011 (vol. 23 no. 2)
pp. 297-306
David Eppstein, University of California, Irvine
Michael T. Goodrich, University of California, Irvine
In this paper, we study the straggler identification problem, in which an algorithm must determine the identities of the remaining members of a set after it has had a large number of insertion and deletion operations performed on it, and now has relatively few remaining members. The goal is to do this in o(n) space, where n is the total number of identities. Straggler identification has applications, for example, in determining the unacknowledged packets in a high-bandwidth multicast data stream. We provide a deterministic solution to the straggler identification problem that uses only O(d\log n) bits, based on a novel application of Newton's identities for symmetric polynomials. This solution can identify any subset of d stragglers from a set of n O(\log n)-bit identifiers, assuming that there are no false deletions of identities not already in the set. Indeed, we give a lower bound argument that shows that any small-space deterministic solution to the straggler identification problem cannot be guaranteed to handle false deletions. Nevertheless, we provide a simple randomized solution, using O(d\log n\log (1/\epsilon )) bits that can maintain a multiset and solve the straggler identification problem, tolerating false deletions, where \epsilon >0 is a user-defined parameter bounding the probability of an incorrect response. This randomized solution is based on a new type of Bloom filter, which we call the invertible Bloom filter.

[1] B.H. Bloom, "Space/Time Trade-Offs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, 422-426, 1970.
[2] B. Bollobás, Random Graphs. Academic Press, 1985.
[3] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, "An Improved Construction for Counting Bloom Filters," Proc. European Symp. Algorithms (ESA '06), pp. 684-695, 2006.
[4] P. Bose, H. Guo, E. Kranakis, A. Maheshwari, P. Morin, J. Morrison, M. Smid, and Y. Tang, "On the False-Positive Rate of Bloom Filters," report, School of Computer Science, Carleton Univ., dsbloom-submitted.pdf, 2007.
[5] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, no. 4, pp. 485-509, 2005.
[6] D.G. Cantor and E. Kaltofen, "On Fast Multiplication of Polynomials over Arbitrary Algebras," Acta Informatica, vol. 28, pp. 693-701, 1991.
[7] J.I. Capetanakis, "Tree Algorithms for Packet Broadcast Channels," IEEE Trans. Information Theory, vol. IT-25, no. 5, pp. 505-515, Sept. 1979.
[8] H. Cohen, A Course in Computational Algebraic Number Theory. Springer-Verlag, 1993.
[9] C.J. Colbourn, J.H. Dinitz, and D.R. Stinson, "Applications of Combinatorial Designs to Communications, Cryptography, and Networking," Surveys in Combinatorics, 1993, K. Walker, ed. Cambridge Univ. Press, 1999.
[10] G. Cormode and S. Muthukrishnan, "What's Hot and What's Not: Tracking Most Frequent Items Dynamically," ACM Trans. Database Systems, vol. 30, no. 1, pp. 249-278, 2005.
[11] D. Cox, J. Little, and D. O'Shea, Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer-Verlag, 1992.
[12] A. DeBonis, L. Gasieniec, and U. Vaccaro, "Generalized Framework for Selectors with Applications in Optimal Group Testing," Proc. 30th Int'l Colloquium on Automata, Languages and Programming (ICALP '03), pp. 81-96, 2003.
[13] D.-Z. Du and F.K. Hwang, Combinatorial Group Testing and Its Applications, second ed. World Scientific Publishing Co., 2000.
[14] D.-Z. Du and F.K. Hwang, Pooling Designs and Nonadaptive Group Testing. World Scientific Publishing Co., 2006.
[15] D. Eppstein and M.T. Goodrich, "Space-Efficient Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters," Proc. Workshop Algorithms and Data Structures (WADS '07), pp. 638-649, 2007.
[16] D. Eppstein, M.T. Goodrich, and D.S. Hirschberg, "Improved Combinatorial Group Testing for Real-World Problem Sizes," Proc. Workshop Algorithms and Data Structures (WADS '05), pp. 86-98, 2005.
[17] L. Fan, P. Cao, J. Almeida, and A.Z. Broder, "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[18] M. Farach, S. Kannan, E. Knill, and S. Muthukrishnan, "Group Testing Problems with Sequences in Experimental Molecular Biology," Proc. Conf. Compression and Complexity of Sequences, p. 357, 1997.
[19] S. Ganguly and A. Majumder, "Deterministic $k$ -Set Structure," Proc. 25th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, pp. 280-289, 2006.
[20] S. Ganguly and A. Majumder, "Deterministic k-Set Structure," Information Processing Letters, vol. 109, no. 1, pp. 27-31, Dec. 2008.
[21] L. Georgiadis and P. Papantoni-Kazakos, "A Collision Resolution Protocol for Random Access Channels with Energy Detectors," IEEE Trans. Comm., vol. COM-30, no. 11, pp. 2413-2420, Nov. 1982.
[22] M.T. Goodrich and D.S. Hirschberg, "Efficient Parallel Algorithms for Dead Sensor Diagnosis and Multiple Access Channels," Proc. 18th ACM Symp. Parallelism in Algorithms and Architectures (SPAA '06), pp. 118-127, 2006.
[23] A.G. Greenberg and R.E. Ladner, "Estimating the Multiplicities of Conflicts in Multiple Access Channels," Proc. 24th IEEE Ann. Symp. Foundations of Computer Science (FOCS '83), pp. 383-392, 1983.
[24] A.G. Greenberg and S. Winograd, "A Lower Bound on the Time Needed in the Worst Case to Resolve Conflicts Deterministically in Multiple Access Channels," J. ACM, vol. 32, no. 3, pp. 589-596, 1985.
[25] M. Hofri, "Stack Algorithms for Collision-Detecting Channels and Their Analysis: A Limited Survey," Modelling and Performance Evaluation Methodology, A.V. Balakrishnan and M. Thoma, eds., pp. 71-85, Springer, 1984.
[26] F.K. Hwang and V.T. Sós, "Non-Adaptive Hypergeometric Group Testing," Studia Scientiarum Math. Hungarica, vol. 22, pp. 257-263, 1987.
[27] Y. Minsky, A. Trachtenberg, and R. Zippel, "Set Reconciliation with Nearly Optimal Communication Complexity," IEEE Trans. Information Theory, vol. 49, no. 9, pp. 2213-2218, Sept. 2003.
[28] R. Motwani and P. Raghavan, Randomized Algorithms. Cambridge Univ. Press, 1995.
[29] H.J. Nussbaumer, "Fast Polynomial Transform Algorithms for Digital Convolutions," IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-28, no. 2, pp. 205-215, Apr. 1980.
[30] N. Pippenger, "Bounds on the Performance of Protocols for a Multiple-Access Broadcast Channel," IEEE Trans. Information Theory, vol. IT-27, no. 2, pp. 145-151, Mar. 1981.
[31] M. Ruszinkó and P. Vanroose, "A Code Construction Approaching Capacity 1 for Random Access with Multiplicity Feedback," Report 94-025, Fakultät für Mathematik der Univ. Bielefeld, preprints/, 1994.
[32] A. Schönhage and V. Strassen, "Schnelle Multiplikation Großer Zahlen," Computing, vol. 7, pp. 281-292, 1971.
[33] V. Shoup, "New Algorithms for Finding Irreducible Polynomials over Finite Fields," Math. Computation, vol. 54, pp. 435-447, 1990.
[34] V. Shoup, "A Fast Deterministic Algorithm for Factoring Polynomials over Finite Fields of Small Characteristic," Proc. Int'l. Symp. Symbolic and Algebraic Computation, pp. 14-21, 1991.
[35] B.S. Tsybakov, "Resolution of a Conflict of Known Multiplicity," Problems of Information Transmission, vol. 16, no. 2, pp. 134-144, 1980.

Index Terms:
Straggler identification, Newton's identities, Bloom filters, data streams.
David Eppstein, Michael T. Goodrich, "Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 2, pp. 297-306, Feb. 2011, doi:10.1109/TKDE.2010.132
Usage of this product signifies your acceptance of the Terms of Use.