The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—In many scientific applications, dynamic array redistribution is usually required to enhance the performance of an algorithm. In this paper, we present a <it>generalized basic-cycle calculation</it> (<it>GBCC</it>) method to efficiently perform a <tt>BLOCK-CYCLIC</tt>(<it>s</it>) over <it>P</it> processors to <tt>BLOCK-CYCLIC</tt>(<it>t</it>) over <it>Q</it> processors array redistribution. In the <it>GBCC</it> method, a processor first computes the source/destination processor/data sets of array elements in the first generalized basic-cycle of the local array it owns. A generalized basic-cycle is defined as <tmath>$lcm(sP,\;tQ)/(gcd(s,t)\times P)$</tmath> in the source distribution and <tmath>$lcm(sP,\;tQ)/(gcd(s,t)\times Q)$</tmath> in the destination distribution. From the source/destination processor/data sets of array elements in the first generalized basic-cycle, we can construct packing/unpacking pattern tables to minimize the data-movement operations. Since each generalized basic-cycle has the same communication pattern, based on the packing/unpacking pattern tables, a processor can pack/unpack array elements efficiently. To evaluate the performance of the <it>GBCC</it> method, we have implemented this method on an IBM SP2 parallel machine, along with the <it>PITFALLS</it> method and the <it>ScaLAPACK</it> method. The cost models for these three methods are also presented. The experimental results show that the <it>GBCC</it> method outperforms the <it>PITFALLS</it> method and the <it>ScaLAPACK</it> method for all test samples. A brief description of the extension of the <it>GBCC</it> method to multidimensional array redistributions is also presented.</p>
Redistribution, generalized basic-cycle calculation method, distributed memory multicomputers.

C. Hsu, Y. Chung, C. Yang and S. Bai, "A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution," in IEEE Transactions on Parallel & Distributed Systems, vol. 11, no. , pp. 1201-1216, 2000.
88 ms
(Ver 3.3 (11022016))