This Article 
 Bibliographic References 
 Add to: 
Performance Models for Network Processor Design
June 2006 (vol. 17 no. 6)
pp. 548-561

Abstract—To provide a variety of new and advanced communications services, computer networks are required to perform increasingly complex packet processing. This processing typically takes place on network routers and their associated components. An increasingly central component in router design is a chip-multiprocessor (CMP) referred to as "network processor” or NP. In addition to multiple processors, NPs have multiple forms of on-chip memory, various network and off-chip memory interfaces, and other specialized logic components such as CAMs (Content Addressable Memories). The design space for NPs (e.g., number of processors, caches, cache sizes, etc.) is large due to the diverse workload, application requirements, and system characteristics. System design constraints relate to the maximum chip area and the power consumption that are permissible while achieving defined line rates and executing required packet functions. In this paper, an analytic performance model that captures the processing performance, chip area, and power consumption for a prototypical NP is developed and used to provide quantitative insights into system design trade offs. The model, parameterized with a networking application benchmark, provides the basis for the design of a scalable, high-performance network processor and presents insights into how best to configure the numerous design elements associated with NPs.

[1] Intel Corp., Intel IXP2800 Network Processor, 2002, products/npfamilyixp2800.htm.
[2] J. Allen, B. Bass, C. Basso, R. Boivie, J. Calvignac, G. Davis, L. Frelechoux, M. Heddes, A. Herkersdorf, A. Kind, J. Logan, M. Peyravian, M. Rinaldi, R. Sabhikhi, M. Siegel, and M. Waldvogel, “IBM PowerNP Network Processor: Hardware, Software, and Applications,” IBM J. Research and Development, vol. 47, nos. 2/3, pp. 177-194, 2003.
[3] AMCC, np7510 10 Gbps Network Processor, 2003, http:/
[4] EZchip Technologies Ltd., Yokneam, Israel, NP-1 10-Gigabit 7-Layer Network Processor, 2002,
[5] D. Burger and T.M. Austin, “The SimpleScalar Tool Set, Version 2.0,” Technical Report 1342, Dept. of Computer Science, Univ. of Wisconsin in Madison, June 1997.
[6] D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” Proc. ACM Int'l Symp. Computer Architecture, pp. 83-94, June 2000.
[7] P. Shivakumar and N.P. Jouppi, “CACTI 3.0: An Integrated Cache Timing, Power and Area Model,” Technical Report WRL Research Report 2001/2, Palo Alto, Calif.: Western Research Laboratory, Aug. 2001.
[8] T. Wolf and M.A. Franklin, “CommBench— A Telecommunications Benchmark for Network Processors,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 154-162, Apr. 2000.
[9] J.C. Mogul, “Simple and Flexible Datagram Access Controls for UNIX-Based Gateways,” USENIX Conf. Proc., pp. 203-221, June 1989.
[10] K.B. Egevang and P. Francis, “The IP Network Address Translator (NAT),” RFC 1631, Network Working Group, May 1994.
[11] G. Apostolopoulos, D. Aubespin, V. Peris, P. Pradhan, and D. Saha, “Design, Implementation and Performance of a Content-Based Switch,” Proc. IEEE INFOCOM 2000, pp. 1117-1126, Mar. 2000.
[12] A.S. Snoeren, C. Partridge, L.A. Sanchez, C.E. Jones, F. Tchakountio, S.T. Kent, and W.T. Strayer, “Hash-Based IP Traceback,” Proc. ACM SIGCOMM 2001, pp. 3-14, Aug. 2001.
[13] G. Kane and J. Heinrich, MIPS RISC Architecture. Prentice Hall, Sept. 1991.
[14] Triscend Corp., Triscend E5 Configurable System-on-Chip Family, 1999, .
[15] Tensilica, Inc., Application Specific Microprocessor Solutions— Data Sheet for Xtensa V1, http://www.tensilica.comdatasheet.pdf, 1998.
[16] P. Crowley, M.E. Fiuczynski, J.-L. Baer, and B.N. Bershad, “Characterizing Processor Architectures for Programmable Network Interfaces,” Proc. 2000 Int'l Conf. Supercomputing, pp. 54-65, May 2000.
[17] P. Crowley and J.-L. Baer, “A Modelling Framework for Network Processor Systems,” Proc. First Network Processor Workshop (NP-1) in Conjunction with Eighth Int'l Symp. High Performance Computer Architecture (HPCA-8), pp. 86-96, Feb. 2002.
[18] L. Thiele, S. Chakraborty, M. Gries, and S. Künzli, “Design Space Exploration of Network Processor Architectures,” Proc. First Network Processor Workshop (NP-1) in Conjunction with Eighth Int'l Symp. High Performance Computer Architecture (HPCA-8), pp. 30-41, Feb. 2002.
[19] N. Weng and T. Wolf, “Profiling and Mapping of Parallel Workloads on Network Processors,” Proc. 20th Ann. ACM Symp. Applied Computing (SAC), pp. 890-896, Mar. 2005.
[20] N. Weng and T. Wolf, “Pipelining versus Multiprocessors — Choosing the Right Network Processor System Topology,” Proc. Advanced Networking and Comm. Hardware Workshop (ANCHOR 2004) in Conjunction with the 31st Ann. Int'l Symp. Computer Architecture (ISCA 2004), June 2004.
[21] Standard Performance Evaluation Corp., SPEC CPU2000— Version 1.2, Dec. 2001.
[22] G. Memik, W.H. Mangione-Smith, and W. Hu, “NetBench: A Benchmarking Suite for Network Processors,” Proc. Int'l Conf. Computer-Aided Design, pp. 39-42, Nov. 2001.
[23] B.K. Lee and L.K. John, “NpBench: A Benchmark Suite for Control Plane and Data Plane Applications for Network Processors,” Proc. IEEE Int'l Conf. Computer Design (ICCD '03), pp. 226-233, Oct. 2003.
[24] “Embedded Microprocessor Benchmark Consortium,” http:/, 2006.
[25] A. Agarwal, “Performance Tradeoffs in Multithreaded Processors,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 525-539, Sept. 1992.
[26] J.L. Hennessy and D.A. Patterson, Computer Architecture— A Quantitative Approach, second ed., San Mateo, Calif.: Morgan Kaufmann Publishers, Inc., 1995.
[27] D. Burger and T. Austin, “The SimpleScalar Tool Set Version 2.0,” Computer Architecture News, vol. 25, no. 3, pp. 13-25, June 1997.
[28] G. Reinman and N.P. Jouppi, “CACTI 2.0: An Integrated Cache Timing and Power Model,” Technical Report WRL Research Report 2000/7, Palo Alto, Calif.: Western Research Laboratory, Feb. 2000.
[29] M.K. Gowan, L.L. Biro, and D.B. Jackson, “Power Considerations in the Design of the Alpha 21264 Microprocessor,” Proc. 35th Design Automation Conf., pp. 726-731, June 1998.
[30] M. Shreedhar and G. Varghese, “Efficient Fair Queuing Using Deficit Round Robin,” Proc. ACM SIGCOMM '95, pp. 231-242, Aug. 1995.
[31] S. McCanne and V. Jacobson, “The BSD Packet Filter: A New Architecture for User-Level Packet Capture,” Proc. USENIX Technical Conf., pp. 259-270, Jan. 1993.
[32] C. Adams, “The CAST-128 Encryption Algorithm,” RFC 2144, Network Working Group, May 1997.
[33] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Information Theory, vol. 23, no. 3, pp. 337-342, May 1977.
[34] T.R.N. Rao and E. Fujiwara, Error-Control Coding for Computer Systems. Englewood Cliffs, N.J.: Prentice Hall, 1989.
[35] G.K. Wallace, “The JPEG Still Picture Compression Standard,” Comm. ACM, vol. 34, no. 4, pp. 30-44, Apr. 1991.

Index Terms:
Network processor design, performance model, design optimization, power optimization, network processor benchmark.
Tilman Wolf, Mark A. Franklin, "Performance Models for Network Processor Design," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 6, pp. 548-561, June 2006, doi:10.1109/TPDS.2006.75
Usage of this product signifies your acceptance of the Terms of Use.