The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Dynamic data redistribution is used to enhance data locality and algorithm performance by reducing interprocessor communication in many parallel scientific applications on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a processor replacement scheme to minimize the cost of interprocessor data exchange during runtime. The main idea of the proposed technique is to develop a replacement function for reordering logical processors in the destination phase. Based on the replacement function, a realigned sequence of destination processors can be derived and is then used to perform data decomposition in the receiving phase. Together with local matrix and compressed <tmath>CRS</tmath> vectors transposition schemes, the interprocessor communication can be eliminated during runtime. A significant improvement of this approach is that the realignment of data can be performed without interprocessor communication for special cases. The second contribution of the present technique is that the complicated communication sets generation could be simplified by applying local matrix transposition. Consequently, the indexing cost could be reduced significantly. The proposed techniques can be applied in both dense and sparse applications. A generalized symmetric redistribution algorithm is also presented in this work. To analyze the efficiency of the proposed technique, the theoretical analysis proves that up to <tmath>(p-1)/p</tmath> data transmission cost can be saved. For general cases, the symmetric redistribution algorithm saves <tmath>1/p</tmath> communication overheads compared with the traditional method. Experimental results also show that the proposed techniques provide superior performance in most data redistribution instances.</p>
Processor replacement, communication free, data redistribution, symmetric matrix, CRS transposition, sparse matrix.

C. Hsu, C. Yang, K. Li and M. Chen, "Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers," in IEEE Transactions on Parallel & Distributed Systems, vol. 17, no. , pp. 1226-1241, 2006.
97 ms
(Ver 3.3 (11022016))