This Article 
 Bibliographic References 
 Add to: 
Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines
April 2012 (vol. 23 no. 4)
pp. 776-784
David Dominguez-Sal, DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
Josep Aguilar-Saborit, Microsoft Corporation, Dana Point
Mihai Surdeanu, Lex Machina, Stanford Natural Language Processing Group, and Stanford University, Stanford
Josep Lluis Larriba-Pey, DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.

[1] D. Roussinov, W. Fan, and J. Robles-Flores, "Beyond Keywords: Automated Question Answering on the Web," Comm. ACM, vol. 51, no. 9, pp. 60-65, 2008.
[2] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision (ICCV), pp. 1470-1477, 2003.
[3] L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[4] S. Rhea and J. Kubiatowicz, "Probabilistic Location and Routing," Proc. IEEE INFOCOM, 2002.
[5] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. Levy, "On the Scale and Performance of Cooperative Web Proxy Caching," Proc. 17th ACM Symp. Operating Systems Principles (SOSP), pp. 16-31, 1999.
[6] T. Anderson, D. Culler, and D. Patterson, "A Case for Now (Networks of Workstations)," IEEE Micro, vol. 15, no. 1, pp. 54-64, Feb. 1995.
[7] M. Raunak, "A Survey of Cooperative Caching," technical report, , 1999.
[8] M. Feeley, W. Morgan, F. Pighin, A. Karlin, H. Levy, and C. Thekkath, "Implementing Global Memory Management in a Workstation Cluster," Proc. 15th ACM Symp. Operating Systems Principles (SOSP), pp. 201-212, 1995.
[9] S. Jiang, F. Petrini, X. Ding, and X. Zhang, "A Locality-Aware Cooperative Cache Management Protocol to Improve Network File System Performance," Proc. IEEE 26th Int'l Conf. Distributed Computing Systems (ICDCS), p. 42, 2006.
[10] L. Ramaswamy and L. Liu, "An Expiration Age-Based Document Placement Scheme for Cooperative Web Caching," IEEE Trans. Knowledge Data Eng., vol. 16, no. 5, pp. 585-600, May 2004.
[11] K. Lillis and E. Pitoura, "Cooperative Xpath Caching," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 327-338, 2008.
[12] Y. Du, S. Gupta, and G. Varsamopoulos, "Improving On-Demand Data Access Efficiency in Manets with Cooperative Caching," Ad Hoc Networks, vol. 7, no. 3, pp. 579-598, 2009.
[13] M. Dahlin, R. Wang, T. Anderson, and D. Patterson, "Cooperative Caching: Using Remote Client Memory to Improve File System Performance," Proc. First Symp. Operating Systems Design and Implementation (OSDI), pp. 267-280, 1994.
[14] D. Dominguez-Sal, J. Larriba-Pey, and M. Surdeanu, "A Multi-Layer Collaborative Cache for Question Answering," Proc. Euro-Par Conf., pp. 295-306, 2007.
[15] M. Korupolu and M. Dahlin, "Coordinated Placement and Replacement for Large-Scale Distributed Caches," IEEE Trans. Knowledge Data Eng., vol. 14, no. 6, pp. 1317-1329, Nov./Dec. 2002.
[16] T. Cortes, S. Girona, and J. Labarta, "Design Issues of a Cooperative Cache with no Coherence Problems," Proc. Fifth Workshop I/O in Parallel and Distributed Systems (IOPADS), pp. 37-46, 1997.
[17] P. Sarkar and J. Hartman, "Efficient Cooperative Caching Using Hints," Proc. Second USENIX Symp. Operating Systems Design and Implementation (OSDI), pp. 35-46, 1996.
[18] D. Dominguez-Sal, M. Surdeanu, J. Aguilar-Saborit, and J. Larriba-Pey, "Cache-Aware Load Balancing for Question Answering," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), pp. 1271-1280, 2008.
[19] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications," IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 17-32, Feb. 2003.
[20] A. Rowstron and P. Druschel, "Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems," Design, vol. 11, pp. 329-350, 2001.
[21] S. Annapureddy, M. Freedman, and D. Mazières, "Shark: Scaling File Servers via Cooperative Caching," Proc. Networked System Design and Implementation (NSDI), 2005.
[22] S. Iyer, A. Rowstron, and P. Druschel, "Squirrel: A Decentralized Peer-to-Peer Web Cache," Proc. 21st Ann. Symp. Principles of Distributed Computing (PODC), pp. 213-222, 2002.
[23] E. Markatos, "On Caching Search Engine Query Results," Computer Comm., vol. 24, no. 2, pp. 137-143, 2001.
[24] M. Surdeanu, D. Moldovan, and S. Harabagiu, "Performance Analysis of a Distributed Question/Answering System," IEEE Trans. Parallel Distributed Systems, vol. 13, no. 6, pp. 579-596, June 2002.
[25] T. Lahiri, V. Srihari, W. Chan, N. MacNaughton, and S. Chandrasekaran, "Cache Fusion: Extending Shared-Disk Clusters with Shared Caches," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB), pp. 683-686, 2001.
[26] E. Rahm, "Parallel Query Processing in Shared Disk Database Systems," SIGMOD Record, vol. 22, no. 4, pp. 32-37, 1993.
[27] J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West, "Scale and Performance in a Distributed File System," ACM Trans. Computer Systems, vol. 6, no. 1, pp. 51-81, 1988.
[28] D. Borthakur, "The Hadoop Distributed File System: Architecture and Design," , Sept. 2008.
[29] J. Aguilar-Saborit, P. Trancoso, V. Muntés, and J. Larriba-Pey, "Dynamic Count Filters," SIGMOD Record, vol. 35, no. 1, pp. 26-32, 2006.
[30] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, no. 4, pp. 485-509, 2003.
[31] P.C. Saraiva, E.S. de Moura, R. Fonseca, W.M.Jr., B. Ribeiro-Neto, and N. Ziviani, "Rank-Preserving Two-Level Caching for Scalable Search Engines," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 51-58, 2001.
[32] R. Baeza-Yates, "Web Usage Mining in Search Engines," Web Mining: Applications and Techniques, A. Scime, ed., pp. 307-321, Idea, 2005.
[33] V. Cardellini, E. Casalicchio, M. Colajanni, and P. Yu, "The State of the Art in Locally Distributed Web-Server Systems," ACM Computing Surveys, vol. 34, no. 2, pp. 263-311, 2002.
[34] K.C.D. Wessels, "Internet Cache Protocol: Protocol Specification, Version 2," RFC 2186, 1997.
[35] M. Surdeanu, J. Turmo, and E. Comelles, "Named Entity Recognition from Spontaneous Open-Domain Speech," Proc. Ninth European Conf. Speech Comm. and Technology (Interspeech), pp. 3433-3436, 2005.

Index Terms:
Distributed systems, distributed caching, resource intensive applications, count filter.
David Dominguez-Sal, Josep Aguilar-Saborit, Mihai Surdeanu, Josep Lluis Larriba-Pey, "Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 4, pp. 776-784, April 2012, doi:10.1109/TPDS.2011.162
Usage of this product signifies your acceptance of the Terms of Use.