1997 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '97) An Operation Placement and Scheduling Scheme for Cache and Communication Localities in Fine-Grain Parallel Architectures Taipei, Taiwan December 18-December 20 ISBN: 0-8186-8259-0
With increasing on-chip hardware, concurrency is a way to bridge the gap between the computational power demanded by the applications and that afforded by the computer platforms. Although parallel systems are increasingly popular, they remain very difficult to program. In fact, most compilers require the programmer to specify how to partition data or map program code to the system's processors. To ensure an effective program, cache locality is important because of the large speed gap between microprocessors and memory systems. It is also important to make use of local communication whenever possible, since it is cheaper, faster, and less power hungry than global communication. In order to exploit these locality properties, we present a systematic operation placement and scheduling scheme for fine-grain parallel architectures. The key advantages are two folds: (1) This multiprojection method, which deals with multidimensional parallelism systematically, can alleviate the burden of the programmer in coding and data partitioning. (2) It addresses the memory/communication bandwidth bottleneck, and can lead to faster program execution. On a special design example of the motion estimation block-matching algorithm, which requires the most intensive computation and memory accesses in video coding, our method leads to a reduction of external memory accesses by two to three orders of magnitude.
Index Terms:
Parallel compiler, operation placement and scheduling, performance optimization of instruction-level parallelism, VLSI array processor design methodology, multi-dimensional projection, multiprojection.
Citation:
Yen-Kuang Chen, S. Y. Kung, "An Operation Placement and Scheduling Scheme for Cache and Communication Localities in Fine-Grain Parallel Architectures," ispan, pp.390, 1997 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '97), 1997 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||