The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2013 vol.24)
pp: 2513-2525
Petar Radojkovic , Barcelona Supercomputing Center, Barcelona
Vladimir Cakarevic , Barcelona Supercomputing Center, Barcelona
Javier Verdu , Universitat Politècnica de Catalunya, Barcelona
Alex Pajuelo , Universitat Politècnica de Catalunya, Barcelona
Francisco J. Cazorla , Spanish National Research Council (IIIA-CSIC) and Barcelona SuperComputing Center, Barcelona
Mario Nemirovsky , ICREA Research Professor and Barcelona SuperComputing Center, Barcelona
Mateo Valero , Universitat Politècnica de Catalunya and Barcelona SuperComputing Center, Barcelona
ABSTRACT
The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in state-of-the-art multicore/multithreaded processors is an NP-complete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in state-of-the-art OSs, and up to 60 percent with respect to a naive thread assignment.
INDEX TERMS
Instruction sets, Multithread processing, Message systems, Interference, Resource management,performance modeling, Chip multithreading (CMT), process scheduling
CITATION
Petar Radojkovic, Vladimir Cakarevic, Javier Verdu, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, Mateo Valero, "Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 12, pp. 2513-2525, Dec. 2013, doi:10.1109/TPDS.2012.311
REFERENCES
[1] ${\rm OpenSPARC}$ T2 Core Microarchitecture Specification, Sun Microsystems, Inc, 2007.
[2] Netra Data Plane Software Suite 2.0 Update 2 Reference Manual, Sun Microsystems, Inc, 2008.
[3] Netra Data Plane Software Suite 2.0 Update 2 User's Guide, Sun Microsystems, Inc, 2008.
[4] Oracle Data Sheet: Sun SPARC Enterprise T5220 Server, Oracle, 2009.
[5] C. Acosta et al., "Thread to Core Assignment in SMT On-Chip Multiprocessors," Proc. 21st Int'l Symp. Computer Architecture and High Performance Computing (SBAC-PAD), 2009.
[6] C. Boneti et al., "Software-Controlled Priority Characterization of POWER5 Processor," Proc. 35th Ann. Int'l Symp. Computer Architecture (ISCA), 2008.
[7] D. Bovet and M. Cesati, Understanding the Linux Kernel. O'Reilly Media, Inc., 2006.
[8] F.J. Cazorla et al., "Architectural Support for Real-Time Task Scheduling in SMT Processors," Proc. Int'l Conf. Compilers, Architectures and Synthesis for Embedded Systems (CASES), 2005.
[9] F.J. Cazorla et al., "Predictable Performance in SMT Processors: Synergy between the OS and SMTs," IEEE Trans. Computers, vol. 55, no. 7, pp. 785-799, July 2006.
[10] F.J. Cazorla et al., "Dynamically Controlled Resource Allocation in SMT Processors," Proc. IEEE/ACM 37th Ann. Int'l Symp. Microarchitecture (MICRO), 2004.
[11] F.J. Cazorla et al., "QoS for High-Performance SMT Processors in Embedded Systems," IEEE Micro, vol. 24, no. 4, pp. 24-31, July 2004.
[12] D. Chandra et al., "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture," Proc. 11th Int'l Symp. High-Performance Computer Architecture (HPCA), 2005.
[13] M. De Vuyst et al., "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors," Proc. 20th Int'l Conf. Parallel and Distributed Processing (IPDPS), 2006.
[14] A. El-Moursy et al., "Compatible Phase Co-Scheduling on a CMP of Multi-Threaded Processors," Proc. 20th Int'l Conf. Parallel and Distributed Processing (IPDPS), 2006.
[15] S. Eyerman and L. Eeckhout, "Per-Thread Cycle Accounting in SMT Processors," Proc. 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
[16] S. Eyerman and L. Eeckhout, "Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling," Proc. 15th Ed. of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.
[17] A. Fedorova et al., "Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design," Proc. Ann. USENIX Ann. Technical Conf., 2005.
[18] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979.
[19] R. Gioiosa et al., "Analysis of System Overhead on Parallel Computers," Proc. IEEE Fourth Int'l Symp. Signal Processing and Information Technology, 2004.
[20] Y. Jiang et al., "Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2008.
[21] J. Kihm et al., "Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors," The J. Instruction Level Parallelism, vol. 7, 2005.
[22] R. Kokku et al., "A Case for Run-Time Adaptation in Packet Processing Systems," ACM SIGCOMM Computer Comm. Rev., vol. 34, no. 1, 2004.
[23] R. Kumar et al., "Single-ISA Heterogenous Multi-Core Architectures for Multithreaded Workload Performance," Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA), 2004.
[24] Y.-K. Kwok and I. Ahmad, "Benchmarking and Comparison of the Task Graph Scheduling Algorithms," J. Parallel and Distributed Computing, vol. 59, no. 3, Dec. 1999.
[25] Y.-K. Kwok and I. Ahmad, "Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors," ACM Computing Surveys, vol. 31, no. 4, Dec. 1999.
[26] C. Luque et al., "ITCA: Inter-Task Conflict-Aware CPU Accounting for CMPs," Proc. 18th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2009.
[27] C. Luque et al., "CPU Accounting in CMP Processors," IEEE Computer Architecture Letters, vol. 8, no. 1, pp. 17-20, Jan.-June 2009.
[28] R.L. McGregor et al., "Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors," Proc. IEEE 19th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2005.
[29] F. Petrini et al., "The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q," Proc. ACM/IEEE Conf. Supercomputing (SC), 2003.
[30] P. Radojković et al., "Measuring Operating System Overhead on CMT Processors," Proc. 20th Int'l Symp. Computer Architecture and High Performance Computing (SBAC-PAD), 2008.
[31] P. Radojković et al., "Thread to Strand Binding of Parallel Network Applications in Massive Multi-Threaded Systems," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2010.
[32] J.M. Richard McDougall, Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture. Sun Microsystems Press/Prentice Hall, 2006.
[33] A. Settle et al., "Architectural Support for Enhanced SMT Job Scheduling," Proc. 13th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2004.
[34] D. Shelepov et al., "HASS: A Scheduler for Heterogeneous Multicore Systems," ACM SIGOPS Operating Systems Rev., vol. 43, pp. 66-75, 2009.
[35] T. Sherwood et al., "A Pipelined Memory Architecture for High Throughput Network Processors," Proc. 30th Ann. Int'l Symp. Computer Architecture (ISCA), 2003.
[36] E. Shmueli et al., "Evaluating the Effect of Replacing CNK with Linux on the Compute-Nodes of Blue Gene/L," Proc. 22nd Ann. Int'l Conf. Supercomputing (ICS), 2008.
[37] A. Snavely et al., "Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor," Proc. ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), 2002.
[38] A. Snavely and D.M. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor," Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.
[39] D. Tam et al., "Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors," Proc. Second ACM SIGOPS/EuroSys European Conf. Computer Systems (EuroSys), 2007.
[40] D.M. Tullsen and J.A. Brown, "Handling Long-Latency Loads in a Simultaneous Multithreading Processor," Proc. ACM/IEEE 34th Ann. Int'l Symp. Microarchitecture (MICRO), 2001.
[41] V. Čakarević et al., "Characterizing the Resource-Sharing Levels in the UltraSPARC T2 Processor," Proc. IEEE/ACM 42nd Ann. Int'l Symp. Microarchitecture (MICRO), 2009.
[42] J. Verdú, "Analysis and Architectural Support for Parallel Stateful Packet Processing," PhD thesis, Universitat Politècnica de Catalunya, 2008.
[43] T. Wolf et al., "Design Considerations for Network Processor Operating Systems," Proc. ACM Symp. Architecture for Networking and Comm. Systems (ANCS), 2005.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool