The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.62)
pp: 6-15
Kyueun Yi , LG Electronics Inc., Gumchon-gu
Won W. Ro , Yonsei University, Seoul
Jean-Luc Gaudiot , University of California, Irvine
ABSTRACT
As Internet and information technology have continued developing, the necessity for fast packet processing in computer networks has also grown in importance. All emerging network applications require deep packet classification as well as security-related processing and they should be run at line rates. Hence, network speed and the complexity of network applications will continue increasing and future network processors should simultaneously meet two requirements: high performance and high programmability. We will show that the performance of single processors will not be sufficient to support future demands. Instead, we will have to turn to multicore processors, which can exploit the parallelism in network workloads. In this paper, we focus on the cache coherence protocols which are central to the design of multicore-based network processors. We investigate the effects of two main categories of various cache coherence protocols with several network workloads on multicore processors. Our simulation results show that token protocols have a significantly higher performance than directory protocols. With an 8-core configuration, token protocols improves the performance compared to directory protocols by a factor of nearly 4 on average.
INDEX TERMS
Program processors, Protocols, Coherence, Multicore processing, Parallel processing, Algorithm design and analysis, data communications, Parallel processors, cache memories, multithreaded processors, network communications
CITATION
Kyueun Yi, Won W. Ro, Jean-Luc Gaudiot, "Importance of Coherence Protocols with Network Applications on Multicore Processors", IEEE Transactions on Computers, vol.62, no. 1, pp. 6-15, Jan. 2013, doi:10.1109/TC.2011.199
REFERENCES
[1] A. Nemirovsky, Towards Characterizing Network Processors: Needs and Challenges, XSTREAM LOGIC Inc., white paper, Nov. 2000.
[2] P. Crowley, M. Franklin, J. Buhler, and R. Chamberlain, "Impact of CMP Design on High-Performance Embedded Computing," Proc. High Performance Embedded Computing Workshop (HPEC '06), Sept. 2006.
[3] S. Melvin, Clearwater Networks CNP810SP Simultaneous Multithreading (SMT) Core, http://www.zytek.com/melvinclearwater. html , 2000.
[4] F. Gebali and A.N.M.E. Rafiq, "Processor Array Architectures for Deep Packet Classification," IEEE Trans. Parallel and Distributed Systems, vol. 17, no. 3, pp. 241-251, Mar. 2006.
[5] K. Kant, R. Iyer, and P. Mohapatra, "Architectural Impacet of Secure Socket Layer on Internet Servers," Proc. IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, pp. 7-14, Sept. 2000.
[6] Intel, Intel IXP2800 Network Processor, Aug. 2004.
[7] D.W. Wall, "Limits of Instruction-Level Parallelism," ASPLOS-IV: Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 176-188, 1991.
[8] D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 392-403, 1995.
[9] K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," SIGOPS Operating System Rev., vol. 30, no. 5, pp. 2-11, 1996.
[10] K. Olukotun and L. Hammond, "The Future of Microprocessors," ACM Queue, vol. 3, no. 7, pp. 26-29, 2005.
[11] J. Burns and J.-L. Gaudiot, "Area and System Clock Effects on SMT/CMP Throughput," IEEE Trans. Computers, vol. 54, no. 2, pp. 141-152, Feb. 2005.
[12] L.A. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," ISCA '98: Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 3-14, 1998.
[13] M.M.K. Martin, M.D. Hill, and D.A. Wood, "Token Coherence: Decoupling Performance and Correctness," ISCA '03: Proc. 30th Ann. Int'l Symp. Computer Architecture, pp. 182-193, 2003.
[14] M.M.K. Martin, D.J. Sorin, B.M. Beckmann, M.R. Marty, M. Xu, A.R. Alameldeen, K.E. Moore, M.D. Hill, and D.A. Wood, "Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset," SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92-99, 2005.
[15] S. Gal-On and M. Levy, "Measuring Multicore Performance," Computer, vol. 41, no. 11, pp. 99-102, Nov. 2008.
[16] M. Peyravian and J. Calvignac, "Fundamental Architectural Considerations for Network Processors," Computer Networks, vol. 41, no. 5, pp. 587-600, 2003.
[17] N. Shah, "Understanding Network Processors," master's thesis, Univ. of California, Berkeley, 2001.
[18] A.M. Odlyzko, "Internet Traffic Growth: Sources and Implications," Proc. SPIE Optical Transmission Systems and Equipment for WDM Networking II, vol. 5247, pp. 1-15, Aug. 2003.
[19] L. Zhao, R. Iyer, S. Makineni, and L. Bhuyan, "Anatomy and Performance of SSL Processing," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, Mar. 2005.
[20] H. Xie, L. Zhao, and L. Bhuyan, "Architectural Analysis and Instruction-Set Optimization for Design of Network Protocol Processors," CODES+ISSS '03: Proc. First IEEE/ACM/IFIP Int'l Conf. Hardware/Software Codesign and System Synthesis, pp. 225-230, 2003.
[21] J.L. Hennessy and D.A. Patterson, Computer Architecture a Quantitative Approach, third ed. Morgan Kaufmann, 2003.
[22] P. Sweazey and A.J. Smith, "A Class of Compatible Cache Consistency Protocols and Their Support by the IEEE Futurebus," SIGARCH Computer Architecture News, vol. 14, no. 2, pp. 414-423, 1986.
[23] P. Crowley, M.E. Fiuczynski, J.-L. Baer, and B.N. Bershad, "Characterization Processor Architectures for Programmable Network Interfaces," Proc. Int'l Conf. Supercomputing, 2000.
[24] S. Melvin, M. Nemirovsky, E. Musoll, J. Huynh, R. Milito, H. Urdaneta, and K. Saraf, "A Massively Multithreaded Packet Processor," NP2: Workshop Network Processors, Held in Conjunction with the Ninth Int'l Symp. High-Performance Computer Architecture, Feb. 2003.
[25] S. Melvin, Flowstorm Prothos Massive Multithreading (MMT) Packet Processor, http://www.zytek.com/melvinflowstorm.html , 2003.
[26] E.M. Nahum, D.J. Yates, J.F. Kurose, and D.F. Towsley, "Performance Issues in Parallelized Network Protocols," Proc. USENIX Conf. Operating Systems Design and Implementation, pp. 125-137, 1994.
[27] M.M. Martin, M.D. Hill, and D.A. Wood, "Token Coherence: A New Framework for Shared-Memory Multiprocessors," IEEE Micro, vol. 23, no. 6, pp. 108-116, Nov./Dec. 2003.
[28] D. Brooks and M. Martonosi, "Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware," Proc. Third Int'l Workshop Network-Based Parallel Computing: Comm., Architecture, and Applications, pp. 181-195, 1999.
[29] A. Kumar and R. Huggahalli, "Impact of Cache Coherence Protocols on the Processing of Network Traffic," MICRO 40: Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 161-171, 2007.
[30] A. Ros, M.E. Acacio, and J.M. Garcia, "A Direct Coherence Protocol for Many-Core Chip Multiprocessors," IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 12, pp. 1779-1792, Dec. 2010.
[31] Y. Qi, Z. Zhou, B. Yang, F. He, Y. Xue, and J. Li, "Towards Effective Network Algorithms on Multi-Core Network Processors," ANCS '08: Proc. Fourth ACM/IEEE Symp. Architectures for Networking and Comm. Systems, pp. 125-126, 2008.
[32] W. Cong, J. Morris, and W. Xiaojun, "High Performance Deep Packet Inspection on Multi-Core Platform," IC-BNMT '09: Proc. Second IEEE Int'l Conf. Broadband Network and Multimedia Technology, pp. 619-622, 2009.
[33] M.M. Michael and M.L. Scott, "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms," PODC '96: Proc. 15th Ann. ACM Symp. Principles of Distributed Computing, pp. 267-275, 1996.
[34] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[35] G.K. Konstadinidis, K. Normoyle, S. Wong, S. Bhutani, H. Stuimer, T. Johnson, A. Smith, D.Y. Cheung, F. Romano, S. Yu, S.-H. Oh, V. Melamed, S. Narayanan, D. Bunsey, C. Khieu, K.J. Wu, R. Schmitt, A. Dumlao, M. Sutera, J. Chau, K.J. Lin, and W.S. Coates, "Implementation of a Third-Generation 1.1-GHz 64-bit Microprocessor," IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. 1461-1469, Nov. 2002.
[36] M.A. Franklin and T. Wolf, "A Network Processor Performance and Design Model with Benchmark Parameterization," Proc. Network Processor Workshop in Conjunction with Eighth Int'l Symp. High Performance Computer Architecture (HPCA-8), pp. 117-139, 2002.
[37] G. Memik, W.H. Mangione-Smith, and W. Hu, "NetBench: A Benchmarking Suite for Network Processors," ICCAD '01: Proc. IEEE/ACM Int'l Conf. Computer-Aided Design, pp. 39-42, 2001.
[38] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Int'l Symp. Computer Architecture, pp. 24-36, 1995.
[39] Simics User Guide for Unix, 2nd ed., Virtutech, Aug. 2005.
[40] Passive Measurement and Analysis Project, Nat'l Laboratory for Applied Network Research, http://moat.nlanr.netTraces, 2010.
[41] D.E. Comer, Computer Networks and Internets with Internet Applications, fourth ed. Prentice Hall, 2004.
[42] A.R. Alameldeen and D.A. Wood, "IPC Considered Harmful for Multiprocessor Workloads," IEEE Micro, vol. 26, no. 4, pp. 8-17, June/Aug. 2006.
[43] K. Yi and J.-L. Gaudiot, "Features of Future Network Processor Architectures," Proc. IEEE John Vincent Atanasoff 2006 Int'l Symp. Modern Computing, pp. 69-76, 2006.
37 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool