This Article 
 Bibliographic References 
 Add to: 
Automatic Code Mapping on an Intelligent Memory Architecture
November 2001 (vol. 50 no. 11)
pp. 1248-1266

Abstract—This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.

[1] R. Barua, “Maps: A Compiler-Managed Memory System for Software-Exposed Architectures,” PhD thesis, Dept. of Electrical Eng. and Computer Science, Massachusetts Inst. of Tech nology, Jan. 2000.
[2] “NAS Parallel Benchmark,” Techreports/1998/pdfnas-98-009.pdf, 2001.
[3] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu,“Parallel Programming with Polaris,” Computer, vol. 29, no. 12, pp. 78-82, Dec. 1996.
[4] C. Cascaval, L. DeRose, D.A. Padua, and D. Reed, “Compile-Time Based Performance Prediction,” Proc. 12th Int'l Workshop Languages and Compilers for Parallel Computing, 1999.
[5] J. Chame, J. Shin, and M. Hall, “Code Transformations for Exploiting Bandwidth in PIM-Based Systems,” Proc. Solving the Memory Wall Problem Workshop, June 2000.
[6] I. Foster and C. Kesselman, “The Globus Project: A Status Report,” Proc. IPPS/SPDP '98 Heterogeneous Computing Workshop, 1998.
[7] A.S. Grimshaw and W.A. Wulf, "The Legion Vision of a Worldwide Virtual Computer," Comm. ACM, vol. 40, no. 1, 1997, pp. 39-45.
[8] M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park, “Mapping Irregular Applications to DIVA, a PIM-Based Data-Intensive Architecture,” Proc. Supercomputing 1999, Nov. 1999.
[9] IBM Microelectronics, “Blue Logic SA-27E ASIC,” News and Ideas of IBM Microelectronics,, Feb. 1999.
[10] S. Iyer and H. Kalter, "Embedded DRAM Technology: Opportunities and Challenges," IEEE Spectrum, Apr. 1999, pp. 56-64.
[11] D. Judd, K. Yelick, C. Kozyrakis, D. Martin, and D. Patterson, “Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler,” Proc. Second Workshop Intelligent Memory Systems, Nov. 2000.
[12] Y. Kang, M. Huang, S. Yoo, Z. Ge, D. Keen, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System,” Proc. Int'l Conf. Computer Design, Oct. 1999.
[13] P. Kogge, “The EXECUBE Approach to Massively Parallel Processing,” Proc. 1994 Int'l Conf. Parallel Processing, Aug. 1994.
[14] P. Kogge, S. Bass, J. Brockman, D. Chen, and E. Sha, “Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies,” Proc. 1996 Frontiers of Massively Parallel Computation Symp., 1996.
[15] C. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovic, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick, “Scalable Processors in the Billion-Transistor Era: IRAM,” Computer, vol. 30, no. 9, pp. 75-78, Sept. 1997.
[16] V. Krishnan and J. Torrellas, “An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), Oct. 1998.
[17] J. Lee, Y. Solihin, and J. Torrellas, “Automatically Mapping Code on an Intelligent Memory Architecture,” Proc. Seventh Int'l Symp. High Performance Computer Architecture, Jan. 2001.
[18] K. Mai, T. Paaske, N. Jayasena, R. Ho, and M. Horowitz, “Smart Memories: A Modular Reconfigurable Architecture,” Proc. 27th Ann. Int'l Symp. Computer Architecture, June 2000.
[19] R.L. Mattson, J. Gecsei, D. Slutz, and I. Traiger, “Evaluation Techniques for Storage Hierarchies,” IBM Systems J., vol. 9, no. 2, 1970.
[20] Mitsubishi, “Embedded RAM,” http:/, 2001.
[21] C. Moritz, M. Frank, W. Lee, and S. Amarasinghe, “Hot Pages: Software Caching for Raw Microprocessors,” Technical Report LCS-TM-599, Massachusetts Inst. of Tech nology, Aug. 1999.
[22] NeoMagic, “NeoMagic Products,” http:/, 2001.
[23] M. Oskin, F. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory,” Proc. 25th Ann. Int'l Symp. Computer Architecture, pp. 192-203, June 1998.
[24] D.A. Padua, “Multiprocessors: Discussion of Some Theoretical and Practical Problems,” Technical Report UIUCDCS-R-79-990, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Nov. 1979.
[25] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in Fortran 77. Cambridge Univ. Press, 1992.
[26] S. Rixner et al., "A Bandwidth-Efficient Architecture for Media Processing," Proc. 31st Int'l Symp. Microarchitecture, IEEE Computer Society Press, Los Alamitos, Calif., 1998, pp. 3-13.
[27] VeenstraJ.E. and R.J. Fowler, "MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors," Proc. Second Int'l Workshop Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, IEEE Computer Society Press, Los Alamitos, Calif., ISBN 0-8186-5292-6, Jan. 1994, p. 201.
[28] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal, “Baring It All to Software: Raw Machines,” Computer, pp. 86-93, Sept. 1997.
[29] H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers. ACM Press, 1990.

Index Terms:
Intelligent memory architecture, processing-in-memory, compilers, adaptive execution, performance prediction, heterogeneous system.
Yan Solihin, Jaejin Lee, Josep Torrellas, "Automatic Code Mapping on an Intelligent Memory Architecture," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1248-1266, Nov. 2001, doi:10.1109/12.966498
Usage of this product signifies your acceptance of the Terms of Use.