The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—In this paper, a method combining the loop pipelining technique with data prefetching, called <it>Partition Scheduling with Prefetching (PSP)</it>, is proposed. In PSP, the iteration space is first divided into regular partitions. Then a two-part schedule, consisting of the ALU and memory parts, is produced and balanced to produce high throughput. These two parts are executed simultaneously, and hence, the remote memory latencies are overlapped. We study the optimal partition shape and size so that a well-balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions.</p>
Prefetching, retiming, scheduling, partitioning, latency-hiding.

T. W. O'Neil, F. Chen and E. H. Sha, "Optimizing Overall Loop Schedules Using Prefetching and Partitioning," in IEEE Transactions on Parallel & Distributed Systems, vol. 11, no. , pp. 604-614, 2000.
93 ms
(Ver 3.3 (11022016))