The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2011)
Galveston, Texas USA
Oct. 10, 2011 to Oct. 14, 2011
ISSN: 1089-795X
ISBN: 978-0-7695-4566-0
pp: 253-263
ABSTRACT
As parallel programming becomes the mainstream due to multicore processors, dynamic memory allocators used in C and C++ can suppress the performance of multi-threaded applications if they are not scalable. In this paper, we present a new dynamic memory allocator for multi-threaded applications. The allocator never uses any synchronization for common cases. It uses only lock-free synchronization mechanisms for uncommon cases. Each thread owns a private heap and handles memory requests on the heap. Our allocator is completely synchronization-free when a thread allocates a memory block and deal locates it by itself. Synchronization-free means that threads do not communicate with each other at all. On the other hand, if a thread allocates a block and another thread frees it, we use a lock-free stack to atomically add it to the owner thread's heap to avoid the memory blowup problem. Furthermore, our allocator exploits various memory block caching mechanisms to reduce the latency of memory management. Freed blocks or intermediate memory chunks are cached hierarchically in each thread's heap and they are used for future memory allocation. We compare the performance and scalability of our allocator to those of well-known existing multi-threaded memory allocators using eight benchmarks. Experimental results on a 48-core AMD system show that our approach achieves better performance than other allocators for all benchmarks and is highly scalable with a large number of threads.
INDEX TERMS
dynamic memory management, concurrent memory allocator, lock-free, synchronization-free, caching, manycore
CITATION
Sangmin Seo, Junghyun Kim, Jaejin Lee, "SFMalloc: A Lock-Free and Mostly Synchronization-Free Dynamic Memory Allocator for Manycores", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 253-263, 2011, doi:10.1109/PACT.2011.57
184 ms
(Ver 3.3 (11022016))