2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (2014)
July 13, 2014 to July 15, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PAAP.2014.51
Multiple threads running on a multi-core processorcan improve the performance of a parallel application significantly. However, effective scaling of threads and cores plays akey role to achieve optimal performance because performancedoes not necessarily improve with increasing number of cores. Multi-threaded applications suffer due to thread synchronization, negative interference in shared memory including last level cacheand main memory. Memory bandwidth also often limits theperformance of a multi-threaded workload. In this paper wepropose a method to achieve optimal scalability on multi-coreplatform and predict the bandwidth requirement of parallelworkloads for a given number of threads. We employ theproposed method to improve the performance of bandwidthlimited parallel applications. We find that DRAM access hasvarious phases and use the highest bandwidth among all phasesto predict the performance of a given workload on multi-threadedenvironment. We evaluate our proposed method using Gem5multi-core simulator and the experimental results show thatthe phase based bandwidth utilization method can estimate theoptimal number of threads for a given parallel workload and haslow prediction error.
Bandwidth, Instruction sets, Benchmark testing, Multicore processing, Radiation detectors, Hardware, Phase measurement
S. Manakkadu and S. Dutta, "Bandwidth Based Performance Optimization of Multi-threaded Applications," 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 2014, pp. 118-122.