The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2012 vol.23)
pp: 776-784
David Dominguez-Sal , DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
Josep Aguilar-Saborit , Microsoft Corporation, Dana Point
Mihai Surdeanu , Lex Machina, Stanford Natural Language Processing Group, and Stanford University, Stanford
Josep Lluis Larriba-Pey , DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
ABSTRACT
We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.
INDEX TERMS
Distributed systems, distributed caching, resource intensive applications, count filter.
CITATION
David Dominguez-Sal, Josep Aguilar-Saborit, Mihai Surdeanu, Josep Lluis Larriba-Pey, "Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 4, pp. 776-784, April 2012, doi:10.1109/TPDS.2011.162
REFERENCES
[1] D. Roussinov, W. Fan, and J. Robles-Flores, "Beyond Keywords: Automated Question Answering on the Web," Comm. ACM, vol. 51, no. 9, pp. 60-65, 2008.
[2] J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," Proc. IEEE Int'l Conf. Computer Vision (ICCV), pp. 1470-1477, 2003.
[3] L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[4] S. Rhea and J. Kubiatowicz, "Probabilistic Location and Routing," Proc. IEEE INFOCOM, 2002.
[5] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. Levy, "On the Scale and Performance of Cooperative Web Proxy Caching," Proc. 17th ACM Symp. Operating Systems Principles (SOSP), pp. 16-31, 1999.
[6] T. Anderson, D. Culler, and D. Patterson, "A Case for Now (Networks of Workstations)," IEEE Micro, vol. 15, no. 1, pp. 54-64, Feb. 1995.
[7] M. Raunak, "A Survey of Cooperative Caching," technical report, http://citeseer.ist.psu.eduraunak99survey.html , 1999.
[8] M. Feeley, W. Morgan, F. Pighin, A. Karlin, H. Levy, and C. Thekkath, "Implementing Global Memory Management in a Workstation Cluster," Proc. 15th ACM Symp. Operating Systems Principles (SOSP), pp. 201-212, 1995.
[9] S. Jiang, F. Petrini, X. Ding, and X. Zhang, "A Locality-Aware Cooperative Cache Management Protocol to Improve Network File System Performance," Proc. IEEE 26th Int'l Conf. Distributed Computing Systems (ICDCS), p. 42, 2006.
[10] L. Ramaswamy and L. Liu, "An Expiration Age-Based Document Placement Scheme for Cooperative Web Caching," IEEE Trans. Knowledge Data Eng., vol. 16, no. 5, pp. 585-600, May 2004.
[11] K. Lillis and E. Pitoura, "Cooperative Xpath Caching," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 327-338, 2008.
[12] Y. Du, S. Gupta, and G. Varsamopoulos, "Improving On-Demand Data Access Efficiency in Manets with Cooperative Caching," Ad Hoc Networks, vol. 7, no. 3, pp. 579-598, 2009.
[13] M. Dahlin, R. Wang, T. Anderson, and D. Patterson, "Cooperative Caching: Using Remote Client Memory to Improve File System Performance," Proc. First Symp. Operating Systems Design and Implementation (OSDI), pp. 267-280, 1994.
[14] D. Dominguez-Sal, J. Larriba-Pey, and M. Surdeanu, "A Multi-Layer Collaborative Cache for Question Answering," Proc. Euro-Par Conf., pp. 295-306, 2007.
[15] M. Korupolu and M. Dahlin, "Coordinated Placement and Replacement for Large-Scale Distributed Caches," IEEE Trans. Knowledge Data Eng., vol. 14, no. 6, pp. 1317-1329, Nov./Dec. 2002.
[16] T. Cortes, S. Girona, and J. Labarta, "Design Issues of a Cooperative Cache with no Coherence Problems," Proc. Fifth Workshop I/O in Parallel and Distributed Systems (IOPADS), pp. 37-46, 1997.
[17] P. Sarkar and J. Hartman, "Efficient Cooperative Caching Using Hints," Proc. Second USENIX Symp. Operating Systems Design and Implementation (OSDI), pp. 35-46, 1996.
[18] D. Dominguez-Sal, M. Surdeanu, J. Aguilar-Saborit, and J. Larriba-Pey, "Cache-Aware Load Balancing for Question Answering," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM), pp. 1271-1280, 2008.
[19] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications," IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 17-32, Feb. 2003.
[20] A. Rowstron and P. Druschel, "Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems," Design, vol. 11, pp. 329-350, 2001.
[21] S. Annapureddy, M. Freedman, and D. Mazières, "Shark: Scaling File Servers via Cooperative Caching," Proc. Networked System Design and Implementation (NSDI), 2005.
[22] S. Iyer, A. Rowstron, and P. Druschel, "Squirrel: A Decentralized Peer-to-Peer Web Cache," Proc. 21st Ann. Symp. Principles of Distributed Computing (PODC), pp. 213-222, 2002.
[23] E. Markatos, "On Caching Search Engine Query Results," Computer Comm., vol. 24, no. 2, pp. 137-143, 2001.
[24] M. Surdeanu, D. Moldovan, and S. Harabagiu, "Performance Analysis of a Distributed Question/Answering System," IEEE Trans. Parallel Distributed Systems, vol. 13, no. 6, pp. 579-596, June 2002.
[25] T. Lahiri, V. Srihari, W. Chan, N. MacNaughton, and S. Chandrasekaran, "Cache Fusion: Extending Shared-Disk Clusters with Shared Caches," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB), pp. 683-686, 2001.
[26] E. Rahm, "Parallel Query Processing in Shared Disk Database Systems," SIGMOD Record, vol. 22, no. 4, pp. 32-37, 1993.
[27] J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West, "Scale and Performance in a Distributed File System," ACM Trans. Computer Systems, vol. 6, no. 1, pp. 51-81, 1988.
[28] D. Borthakur, "The Hadoop Distributed File System: Architecture and Design," http://lucene.apache.org/hadoophdfs_design.html , Sept. 2008.
[29] J. Aguilar-Saborit, P. Trancoso, V. Muntés, and J. Larriba-Pey, "Dynamic Count Filters," SIGMOD Record, vol. 35, no. 1, pp. 26-32, 2006.
[30] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, no. 4, pp. 485-509, 2003.
[31] P.C. Saraiva, E.S. de Moura, R. Fonseca, W.M.Jr., B. Ribeiro-Neto, and N. Ziviani, "Rank-Preserving Two-Level Caching for Scalable Search Engines," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 51-58, 2001.
[32] R. Baeza-Yates, "Web Usage Mining in Search Engines," Web Mining: Applications and Techniques, A. Scime, ed., pp. 307-321, Idea, 2005.
[33] V. Cardellini, E. Casalicchio, M. Colajanni, and P. Yu, "The State of the Art in Locally Distributed Web-Server Systems," ACM Computing Surveys, vol. 34, no. 2, pp. 263-311, 2002.
[34] K.C.D. Wessels, "Internet Cache Protocol: Protocol Specification, Version 2," RFC 2186, 1997.
[35] M. Surdeanu, J. Turmo, and E. Comelles, "Named Entity Recognition from Spontaneous Open-Domain Speech," Proc. Ninth European Conf. Speech Comm. and Technology (Interspeech), pp. 3433-3436, 2005.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool