|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Nikola Vujic, Felipe Cabarcas, Marc Gonzalez, Alex Ramirez, Xavier Martorell, Eduard Ayguade, "DMA++: On the Fly Data Realignment for On-Chip Memories," IEEE Transactions on Computers, vol. 61, no. 2, pp. 237-250, February, 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2010.255, author = {Nikola Vujic and Felipe Cabarcas and Marc Gonzalez and Alex Ramirez and Xavier Martorell and Eduard Ayguade}, title = {DMA++: On the Fly Data Realignment for On-Chip Memories}, journal ={IEEE Transactions on Computers}, volume = {61}, number = {2}, issn = {0018-9340}, year = {2012}, pages = {237-250}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2010.255}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - DMA++: On the Fly Data Realignment for On-Chip Memories IS - 2 SN - 0018-9340 SP237 EP250 EPD - 237-250 A1 - Nikola Vujic, A1 - Felipe Cabarcas, A1 - Marc Gonzalez, A1 - Alex Ramirez, A1 - Xavier Martorell, A1 - Eduard Ayguade, PY - 2012 KW - DMA KW - multicores KW - alignment KW - SIMD units. VL - 61 JA - IEEE Transactions on Computers ER - | |||
[1] T. Conte, P. Dubey, M. Jennings, R. Lee, A. Peleg, S. Rathnam, M. Schlansker, P. Song, and A. Wolfe, "Challenges to Combining General-Purpose and Multimedia Processors," Computer, vol. 30, no. 12, pp. 33-37, Dec. 1997.
[2] N. Slingerland and A.J. Smith, "Measuring the Performance of Multimedia Instruction Sets," IEEE Trans. Computers, vol. 51, no. 11, pp. 1317-1332, Nov. 2002.
[3] M. Alvarez, E. Salami, A. Ramirez, and M. Valero, "Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software, pp. 62-71, 2007.
[4] A. Shahbahrami, B. Juurlink, and S. Vassiliadis, "Performance Impact of Misaligned Accesses in SIMD Extensions," Proc. 17th Ann. Workshop Circuits, Systems and Signal Processing, pp. 23-24, Nov. 2006.
[5] D. Boggs, A. Baktha, J. Hawkins, D. Marr, J. Miller, P. Roussel, R. Singhal, B. Toll, and K. Venkatraman, "The Microarchitecture of the Intel Pentium 4 Processor on 90nm Technology," Intel Technology J., vol. 8, no. 1, pp. 1-17, 2004.
[6] S. Stepanian, "SB-1 MIPS64 CPU Core," Embedded Processor Forum, 2000.
[7] User Manual SPRU732C: TMS320 C64x/C64x+ DSP CPU and Instruction Set Reference Guide, Texas Instruments, 2005.
[8] J. VAN De Waerdt et al., "The TM3270 Media-Processor," Proc. 38th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 331-342, 2005.
[9] A. Eichenberger, P. Wu, and K. O'Brien, "Vectorization for SIMD Architectures with Alignment Constraints," Proc. ACM SIGPLAN Conf. Programming Languages Design and Implementation, pp. 82-93, June 2004.
[10] P. Wu, A. Eichenberger, and A. Wang, "Efficient SIMD Code Generation for Runtime Alignment and Length Conversion," Proc. Int'l Symp. Code Generation and Optimization, pp. 153-164, 2005.
[11] A.E. Eichenberger, K. O'Brien, K. O'Brien, P. Wu, T. Chen, P.H. Oden, D.A. Prener, J.C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, and M. Gschwind, "Optimizing Compiler for the Cell Processor," Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 161-172, 2005.
[12] M. Gschwind, D. Erb, S. Manning, and M. Nutter, "An Open Source Environment for Cell Broadband Engine System Software," Computer, vol. 40, no. 6, pp. 37-47, June 2007.
[13] M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki, "Synergistic Processing in Cell's Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, Mar./Apr. 2006.
[14] M. Kistler, M. Perrone, and F. Petrini, "Cell Multiprocessor Communication Network: Built for Speed," IEEE Micro, vol. 26, no. 3, pp. 10-23, May/June 2006.
[15] T. Ainsworth, T. Pinkston, N. Technol, and R. Beach, "Characterizing the Cell EIB On-Chip Network," IEEE Micro, vol. 27, no. 5, pp. 6-14, Sept./Oct. 2007.
[16] A. Rico, F. Cabarcas, A. Quesada, M. Pavlovic, A.J. Vega, C. Villavieja, Y. Etsion, and A. Ramirez, "Scalable Simulation of Decoupled Accelerator Architectures," Technical Report UPC-DAC-RR-2010-14, UPC, http://gsi.ac.upc.edu/reports/2010/14tasksim.pdf , 2010.
[17] J. Kahle, M. Day, H. Hofstee, C. Johns, T. Maeurer, and D. Shippy, "Introduction to the Cell Multiprocessor," IBM J. Research and Development, vol. 49, nos. 4/5, pp. 589-604, 2005.
[18] H.P. Hofstee, "Power Efficient Processor Architecture and the Cell Processor," Proc. 11th Int'l Symp. High-Performance Computer Architecture, pp. 258-262, 2005.
[19] D. Hackenber, "Fast Matrix Multiplication on Cell (SMP) Systems," http://www.tu-dresden.de/zih/cellmatmul, 2007.
[20] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, "LU Decomposition and Its Applications," Numerical Recipes in FORTRAN: The Art of Scientific Computing, pp. 34-42, Cambridge Univ. Press, 1992.
[21] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, "A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures," Parallel Computing, vol. 35, no. 1, pp. 38-53, 2009.
[22] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, "An Efficient k-Means Clustering Algorithm: Analysis and Implementation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.
[23] P. Hart, "Nearest Neighbor Pattern Classification," IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.
[24] D. Bailey et al., "The NAS Parallel Benchmarks," Int'l J. High Performance Computing Applications, vol. 5, no. 3, pp. 63-73, 1991.
[25] P. Luszczek, D. Bailey, J. Dongarra, J. Kepner, R. Lucas, R. Rabenseifner, and D. Takahashi, "The HPC Challenge (HPCC) Benchmark Suite," Proc. ACM/IEEE Conf. Supercomputing, pp. 11-17, 2006.
[26] Block-Matching in Motion Estimation Algorithms Using Streaming SIMD Extensions 3, Intel Corporation, 2003.
[27] K. Diefendorff, P. Dubey, R. Hochsprung, and H. Scales, "AltiVec Extension to PowerPC Accelerates Media Processing," IEEE Micro, vol. 20, no. 2, pp. 85-95, Mar./Apr. 2000.
[28] D. Sweetman, See MIPS Run. Morgan Kaufmann, 2006.
[29] R. Sites, Alpha Architecture Reference Manual. Digital Press, 1998.
[30] D. Nuzman and R. Henderson, "Multi-Platform Auto-Vectorization," Proc. Int'l Symp. Code Generation and Optimization, pp. 281-294, 2006.
[31] G. Cheong and M. Lam, "An Optimizer for Multimedia Instruction Sets," Proc. Second SUIF Compiler Workshop, 1997.
[32] VAST-F/AltiVec: Automatic Fortran Vectorizer for PowerPC Vector Unit, Crescent Bay Software, 2004.
[33] S. Larsen, E. Witchel, and S. Amarasinghe, "Increasing and Detecting Memory Address Congruence," Proc. 11th Int'l Conf. Parallel Architectures and Compilation Techniques, pp. 18-29, 2002.
[34] G. Payá-Vayá, J. Martín-Langerwerf, S. Moch, and P. Pirsch, "An Enhanced DMA Controller in SIMD Processors for Video Applications," Proc. 22nd Int'l Conf. Architecture of Computing Systems, pp. 159-170, 2009.
[35] L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," Proc. Int'l Conf. Computer Graphics and Interactive Techniques, pp. 1-15, 2008.

