|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Nikola Vujic, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, "Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture," IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 4, pp. 494-505, April, 2010. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2009.97, author = {Nikola Vujic and Marc Gonzàlez and Xavier Martorell and Eduard Ayguadé}, title = {Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {21}, number = {4}, issn = {1045-9219}, year = {2010}, pages = {494-505}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2009.97}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture IS - 4 SN - 1045-9219 SP494 EP505 EPD - 494-505 A1 - Nikola Vujic, A1 - Marc Gonzàlez, A1 - Xavier Martorell, A1 - Eduard Ayguadé, PY - 2010 KW - Multicore processor KW - local memories KW - software cache KW - prefetch code generation. VL - 21 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
[1] H. Peter Hofstee et al., "Power Efficient Processor Architecture and the Cell Processor," Proc. 11th Int'l Symp. High-Performance Computer Architecture, 2005.
[2] D. Pham et al., "The Design and Implementation of a First-Generation Cell Processor," Proc. IEEE Int'l Solid-State Circuits Conf., 2005.
[3] M. Kistler et al., "Cell Multiprocessor Communication Network: Built for Speed," IEEE Micro, vol. 26, no. 3, pp. 10-23, May 2006.
[4] M. Gschwind et al., "A Novel SIMD Architecture for the Cell Heterogeneous Chip-Multiprocessor," Proc. 17th Hot Chips, 2005.
[5] A.E. Eichenberger et al., "Using Advanced Compiler Technology to Exploit the Performance of the Cell Broadband Engine Architecture," IBM Systems J., vol. 45, no. 1, pp. 59-84, 2006.
[6] McCalpin and D. John, "Memory Bandwidth and Machine Balance in Current High Performance Computers," IEEE CS Technical Committee on Computer Architecture (TCCA), 1995.
[7] B. Ramakrishna Rau et al., "Code Generation Schema for Modulo Scheduling Loops," Proc. 25th Ann. Int'l Symp. Microarchitecture, 1992.
[8] B. Ramakrishna Rau et al., "Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture, 1994.
[9] D.M. Lavery, "Modulo Scheduling of Loops in Control-Intensive Non-Numeric Programs," Proc. 29th Ann. ACM/IEEE Int'l Symp. Microarchitecture, 1996.
[10] D. Bailey et al., "The NAS Parallel Benchmarks," Technical Report TR RNR-91-002, NASA Ames, Aug. 1991.
[11] ftp://ftp.software.ibm.com/common/ssi/pm/ sp/n/bld01823 usenBLD01823USEN.PDF, 2008.
[12] M. Dasygenis et al., "A Combined DMA and Application-Specific Prefetching Approach for Tackling the Memory Bottleneck," IEEE Trans. Very Large Integration Systems, vol. 14, no. 3, pp. 279-291, Mar. 2006.
[13] T.-F. Chen, "An Effective Programmable Prefetch Engine for On-Chip Caches," Proc. 28th Ann. Int'l Symp. Microarchitecture, 1995.
[14] K.W. Batcher et al., "Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache," Proc. 12th IEEE Real-Time and Embedded Technology and Applications Symp., 2006.
[15] T. Chen et al., "Prefetching Irregular References for Software Cache on Cell," Proc. Sixth Ann. Int'l Symp. Code Generation and Optimization, 2008.
[16] T. Chen et al., "Orchestrating Data Transfer for the Cell B.E. Processor," Proc. Ann. Int'l Conf. Supercomputing, 2008.
[17] J. Hoeflinger and B. de Supinski, "The OpenMP Memory Model," Proc. First Int'l Workshop OpenMP, 2005.
[18] M. Gonzàlez et al., "Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 2008.
[19] N. Vujic et al., "Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture," Proc. 21st Ann. Workshop Languages and Compilers for Parallel Computing (LCPC), 2008.
[20] J. Dongarra and P. Luszczek, "Introduction to the HPCChallenge Benchmark Suite," ICL Technical Report, ICL-UT-05-01 (also appears as CS Dept. of Technical Report UT-CS-05-544), 2005.
[21] Y. Paek, J. Hoeflinger, and D. Padua, "Efficient and Precise Array Access Analysis," ACM Trans. Programming Languages and Systems, vol. 24, no. 1, pp. 65-109, 2002.

