IEEE International Performance Computing and Communications Conference (2011)
Orlando, FL, USA
Nov. 17, 2011 to Nov. 19, 2011
Pablo Barrio , Dpto. Ingeniería Electrónica, E.T.S.I. Telecomunicación, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, Madrid, Spain
Carlos Carreras , Dpto. Ingeniería Electrónica, E.T.S.I. Telecomunicación, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, Madrid, Spain
Applications that operate on meshes are very popular in High Performance Computing (HPC) environments. In the past, many techniques have been developed in order to optimize the memory accesses for these datasets. Different loop transformations and domain decompositions are commonly used for structured meshes. However, unstructured grids are more challenging. The memory accesses, based on the mesh connectivity, do not map well to the usual linear memory model. This work presents a method to improve the memory performance which is suitable for HPC codes that operate on meshes. We develop a method to adjust the sequence in which the data are used inside the algorithm, by means of traversing and sorting the mesh. This sorted mesh can be transferred sequentially to the lower memory levels and allows for minimum data transfer requirements. The method also reduces the lower memory requirements dramatically: up to 63% of the L1 cache misses are removed in a traditional cache system. We have obtained speedups of up to 2.58 on memory operations as measured in a general-purpose CPU. An improvement is also observed with sequential access memories, where we have observed reductions of up to 99% in the required low-level memory size.
C. Carreras and P. Barrio, "Mesh traversal and sorting for efficient memory usage in scientific codes," IEEE International Performance Computing and Communications Conference(PCCC), Orlando, FL, USA, 2011, pp. 1-8.