The Community for Technology Leaders
RSS Icon
Issue No.08 - August (2009 vol.58)
pp: 1009-1025
Kaushik Rajan , SERC, Indian Institute of Science, Bangalore
Ramaswamy Govindarajan , SERC, Indian Institute of Science, Bangalore
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible to high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performance results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSCA outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mapping further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.
Special-purpose and application-based systems, design, performance, experimentation, cache architectures, network processors, synthetic trace generation, trace driven simulation.
Kaushik Rajan, Ramaswamy Govindarajan, "A Novel Cache Architecture and Placement Framework for Packet Forwarding Engines", IEEE Transactions on Computers, vol.58, no. 8, pp. 1009-1025, August 2009, doi:10.1109/TC.2009.18
[1] B. Agarwal and T. Sherwood, “Virtually Pipelined Network Memory,” Proc. Int'l Symp. Microarchitecture, 2006.
[2] J.-L. Baer, D. Low, P. Crowley, and N. Sidhwaney, “Memory Hierarchy Design for a Multiprocessor Look-Up Engine,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 2003.
[3] A. Brodnik, S. Carlsson, M. Degermark, and S. Pink, “Small Forwarding Tables for Fast Routing Lookups,” Proc. ACM SIGCOMM '97, 1997.
[4] H.J. Chao, “Next Generation Routers,” Proc. IEEE (Invited Paper), vol. 90, no. 9, pp.1518-1558, 2002.
[5] T. Chieueh and K. Gopalan, “Improving Route Lookup Performance Using Network Processor Cache,” IEEE/ACM Supercomputing Conf., 2002.
[6] T. Chieueh and P. Pradhan, “Cache Memory Design for Network Processors,” Proc. Int'l Symp. High Performance Computer Architecture, 2000.
[7] K.C. Claffy, “Internet Traffic Characterization,” PhD thesis, Univ. of California, San Diego, 1994.
[8] J. Edler and M.D. Hill, “Dinero IV Trace-Driven Uniprocessor Cache Simulator,”, 1998.
[9] T. Givargis, “Improved Indexing for Cache Miss Reduction in Embedded Systems,” Proc. Design Automation Conf. (DAC), 2003.
[10] A. González, M. Valero, N. Topham, and J.M. Parcerisa, “Eliminating Cache Conflict Misses through XOR-based Placement Functions,” Proc. Int'l Conf. Supercomputing, 1997.
[11] E.G. Hallnor and S.K. Reinhardt, “A Fully Associative Software-Managed Cache Design,” Proc. Int'l Symp. Computer Architecture, 2000.
[12] P. Gupta, S. Lin, and N. McKeown, “Routing Lookups in Hardware at Memory Access Speed,” Proc. IEEE Infocom, 1998.
[13] J. Hasan, S. Chandra, and T.N. Vijaykumar, “Efficient Use of Memory Bandwidth to Improve Network Processor Throughput,” Proc. Int'l Symp. Computer Architecture, 2003.
[14] J.L. Hennessey and D.A. Patterson, Computer Architecture: A Quantitative Approach, third ed. Morgan Kaufmann Publishers, Inc., 2003.
[15] R. Jain, “Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes,” Computer Networks and ISDN Systems, 1990.
[16] M. Kharbutli, K. Irwin, Y. Solilin, and J. Lee, “Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses,” Proc. IEEE Int'l Symp. High Performance Computer Architecture, 2004.
[17] B. Lampson, V. Srinivasan, and G. Varghese, “IP Lookup Using Multiway and Multicolumn Search,” Proc. IEEE Infocom, 1998.
[18] H. Liu, “Routing Prefix Caching in Network Processor Design,” Proc. Int'l Conf. Computer Comm. and Networks, 2001.
[19] H. Liu, “Reducing Cache Miss Ratio For Routing Prefix Cache,” Proc. Global Telecomm. Conf. (GLOBECOM), 2002.
[20] D.R. Morrison, “PATRICIA—Practical Algorithm to Retrieve Information Coded in Alphanumeric,” J. ACM, 1968.
[21] J. Mudigonda, H.M. Vin, and R. Yavatkar, “Overcoming the Memory Wall in Packet Processing: Hammers or Ladders?,” Proc. Int'l Symp. Architectures for Networking and Comm. System (ANCS), 2005.
[22] G. Narlikar and F. Zane, “Performance Modeling for Fast IP Lookups,” Proc. ACM SIGMETRICS, 2001.
[23] S. Nilsson and G. Karlsson, “IP-address Lookup Using LC-Tries,” IEEE J. Selected Areas in Comm., vol. 17, no. 6, June 1999.
[24] M.K. Qureshi, D. Thompson, and Y.N. Patt, “The V-Way Cache: Demand Based Associativity via Global Replacement,” Proc. Int'l Symp. Computer Architecture, 2005.
[25] K. Rajan and R. Govindarajan, “A Heterogeneously Segmented Cache Architecture for a Packet Forwarding Engine,” Proc. Int'l Conf. Supercomputing, 2005.
[26] K. Rajan and R. Govindarajan, “Two-Level Mapping Based Cache Index Selection for Packet Forwarding Engines,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques, 2006.
[27] V.C. Ravikumar, R. Mahapatra, and J.C. Liu, “Modified LC-Trie Based Efficient Routing Lookup,” Proc. IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS '02), 2002.
[28] J. Rooney, J. Delgado Frias, and D. Summerville, “An Associative Ternary Cache for IP Routing,” Proc. Computers and Digital Techniques, 2004.
[29] M. Ruiz-Sanchez et al., “Survey and Taxonomy of IP Address Lookup Algorithms,” IEEE Network Magazine, vol. 15, no. 2, Mar./Apr. 2001.
[30] G. Shenoy and R. Govindarajan, “Performance Modeling and Architecture Exploration of Network Processors,” Proc. Int'l Conf. Quantitative Evaluation of Systems (QEST), 2005.
[31] T. Sherwood, G. Varghese, and B. Calder, “A Pipelined Memory Architecture for High Throughput Network Processors,” Proc. Int'l Symp. Computer Architecture, 2003.
[32] W. Shi et al., “Synthetic Trace Generation for Internet,” Proc. IEEE Workshop Workload Characterization (WWC-4), 2001.
[33] P. Shivakumar and N.P. Jouppi, “CACTI 3.0: An Integrated Cache Timing, Power, and Area Model,” DEC WRL Research, 2001.
[34] D. Thiebaut et al., “Synthetic Traces for Trace-Driven Simulation of Cache Memories,” IEEE Trans. Computers, vol. 41, no. 4, pp. 388-410 Apr. 1992.
[35] V. Srinivasan and G. Varghese, “Fast Address Lookups Using Controlled Prefix Expansion,” ACM Trans. Computer Systems, 1999.
[36] B. Talbot, T. Sherwood, and B. Lin, “IP Caching for Terabit Speed Routers,” Proc. IEEE Globcom, 1999.
[37] J. Verdu, J. Garcia, M. Nemirovsky, and M. Valero, “Architectural Impact of Stateful Networking Applications,” Proc. Int'l Symp. Architectures for Networking and Comm. Systems (ANCS), 2005.
[38] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable High Speed IP Routing Lookup,” Proc. ACM SIGMETRICS, 2001.
[39] S.J.E. Wilton and N.P. Jouppi, “CACTI: An Enhanced Cache Access and Cycle Time Model,” IEEE J. Solid-State Circuits, vol. 31, no. 5, May 1996.
[40], 2007.
[41] Univ. of Oregon Route Views Project,, 2009.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool