Issue No. 04 - April (2012 vol. 23)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2011.162
David Dominguez-Sal , DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
Josep Aguilar-Saborit , Microsoft Corporation, Dana Point
Mihai Surdeanu , Lex Machina, Stanford Natural Language Processing Group, and Stanford University, Stanford
Josep Lluis Larriba-Pey , DAMA-UPC, Universitat Politecnica de Catalunya, Barcelona
We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.
Distributed systems, distributed caching, resource intensive applications, count filter.
D. Dominguez-Sal, J. L. Larriba-Pey, J. Aguilar-Saborit and M. Surdeanu, "Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines," in IEEE Transactions on Parallel & Distributed Systems, vol. 23, no. , pp. 776-784, 2011.