MPI derived datatypes are a powerful method to define arbitrary collections of non-contiguous data in memory and to enable non-contiguous data communication in a single MPI function call. In this paper, we employ MPI datatypes in four NAS benchmarks (MG, LU, BT, and SP) to transfer non-contiguous data. Comprehensive performance evaluation was carried out on two clusters: an Itanium-2 Myrinet cluster and a Xeon InfiniBand cluster. Performance results show that using datatypes can achieve performance comparable to manual packing/unpacking in the original benchmarks, though the MPI implementations that were studied also perform internal packing and unpacking on non-contiguous datatype communication. In some cases, better performance can be achieved because of the reduced costs to transfer non-contiguous data. This is because some optimizations in the MPI packing/unpacking implementations can be easily overlooked in manual packing and unpacking by users.
Our case study demonstrates that MPI datatypes simplify the implementation of non-contiguous communication and lead to application code with portable performance. We expect that with further improvement of datatype processing and datatype communication such as [10, 24], datatypes can outperform the conventional methods of non-contiguous data communication. Our modified NAS benchmarks can be used to evaluate datatype processing and datatype communication in MPI implementations.