The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a <it>basic-cycle calculation</it> technique to efficiently perform <tt>BLOCK-CYCLIC(s)</tt> to <tt>BLOCK-CYCLIC(t)</tt> redistribution. The main idea of the basic-cycle calculation technique is, first, to develop closed forms for computing source/destination processors of some specific array elements in a basic-cycle, which is defined as <it>lcm</it>(<it>s</it>, <it>t</it>)/<it>gcd</it>(<it>s</it>, <it>t</it>). These closed forms are then used to efficiently determine the communication sets of a basic-cycle. From the source/destination processor/data sets of a basic-cycle, we can efficiently perform a <tt>BLOCK-CYCLIC(s)</tt> to <tt>BLOCK-CYCLIC(t)</tt> redistribution. To evaluate the performance of the basic-cycle calculation technique, we have implemented this technique on an IBM SP2 parallel machine, along with the <it>PITFALLS</it> method and the multiphase method. The cost models for these three methods are also presented. The experimental results show that the basic-cycle calculation technique outperforms the <it>PITFALLS</it> method and the multiphase method for most test samples.</p>
Data redistribution, the basic-cycle calculation technique, the PITFALLS method, the multiphase method, distributed memory multicomputers.

Y. Chung, C. Hsu and S. Bai, "A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution," in IEEE Transactions on Parallel & Distributed Systems, vol. 9, no. , pp. 359-377, 1998.
94 ms
(Ver 3.3 (11022016))