The Community for Technology Leaders
Green Image
Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this paper, we detail the algorithm and techniques proposed to achieve such a result: first, we gather both the communication pattern information and the hardware details. Then we compute a relevant reordering of the various process ranks of the application. Finally, those new ranks are used to reduce the communication costs of the application.
Trees, Computer Systems Organization, General, Modeling of computer architecture, Processor Architectures, Parallel Architectures, Multi-core/single-chip multiprocessors, Memory hierarchy, Communication/Networking and Information Technology, Interprocessor communications, Network Architecture and Design, Network topology, Software/Software Engineering, Programming Techniques, Concurrent Programming, Distributed programming, Software Engineering, Software Construction, Programming paradigms, Operating Systems, Performance, Measurements, Data, Data Structures, Graphs and networks
Emmanuel Jeannot, Guillaume Mercier, Francois Tessier, "Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques", IEEE Transactions on Parallel & Distributed Systems, vol. , no. , pp. 0, 5555, doi:10.1109/TPDS.2013.104
80 ms
(Ver 3.3 (11022016))