The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 289-300
Kishore Kumar , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
Pusukuri Rajiv , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
Gupta Laxmi , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
N. Bhuyan , Department of Computer Science and Engineering, University of California, Riverside, Riverside, USA 92521
ABSTRACT
On a cache-coherent multicore multiprocessor system, the performance of a multithreaded application with high lock contention is very sensitive to the distribution of application threads across multiple processors (or Sockets). This is because the distribution of threads impacts the frequency of lock transfers between Sockets, which in turn impacts the frequency of last-level cache (LLC) misses that lie on the critical path of execution. Since the latency of a LLC miss is high, an increase of LLC misses on the critical path increases both lock acquisition latency and critical section processing time. However, thread schedulers for operating systems, such as Solaris and Linux, are oblivious of the lock contention among multiple threads belonging to an application and therefore fail to deliver high performance for multithreaded applications. To alleviate the above problem, in this paper, we propose a scheduling framework called Shuffling, which migrates threads of a multithreaded program across Sockets so that threads seeking locks are more likely to find the locks on the same Socket. Shuffling reduces the time threads spend on acquiring locks and speeds up the execution of shared data accesses in the critical section, ultimately reducing the execution time of the application. We have implemented Shuffling on a 64-core Supermicro server running Oracle Solaris 11™ and evaluated it using a wide variety of 20 multithreaded programs with high lock contention. Our experiments show that Shuffling achieves up to 54% reduction in execution time and an average reduction of 13%. Moreover it does not require any changes to the application source code or the OS kernel.
INDEX TERMS
Instruction sets, Sockets, Multicore processing, Kernel, Image edge detection, Multiprocessing systems,last-level cache misses, Multicore, scheduling, thread migration, lock contention
CITATION
Kishore Kumar, Pusukuri Rajiv, Gupta Laxmi, N. Bhuyan, "Shuffling: A framework for lock contention aware thread scheduling for multicore multiprocessor systems", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 289-300, 2014, doi:10.1145/2628071.2628074
88 ms
(Ver 3.3 (11022016))