This Article 
 Bibliographic References 
 Add to: 
Compiler Support for Scalable and Efficient Memory Systems
November 2001 (vol. 50 no. 11)
pp. 1234-1247

Abstract—Technological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that the size of the cache accessible in a single cycle will decrease in a future generation of chips. Thus, a bank-exposed memory system comprised of small, decentralized cache banks must eventually replace that of a monolithic cache. This paper considers how to effectively use such a memory system for sequential programs. This paper presents Maps, the software technology central to bank-exposed architectures, which are architectures with bank-exposed memory systems. Maps solves the problem of bank disambiguation—that of determining at compile-time which bank a memory reference is accessing. Bank disambiguation is important because it enables the compile-time optimization for data locality, where data can be placed close to the computation that requires it. Two methods for bank disambiguation are presented: equivalence-class unification and modulo unrolling. Experimental results are presented using a compiler for the MIT Raw machine, a bank-exposed architecture that relies on the compiler to 1) manage its memory and 2) orchestrate its instruction level parallelism and communication. Results on Raw using sequential codes demonstrate that using bank disambiguation improves performance by a factor of 3 to 5 over using ILP alone.

[1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger, “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 248-259, June 2000.
[2] R. Allen and K. Kennedy,“Automatic translation of FORTRAN programs to vector form,”ACM Trans. Programm. Lang., Syst. 9, pp. 491–542, Oct. 1987.
[3] J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal, “The Raw Benchmark Suite: Computation Structures for General Purpose Computing,” Proc. IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 1997.
[4] U. Banerjee,Dependence Analysis for Supercomputing. Norwell, MA: Kluwer, 1988.
[5] R. Barua, “Maps: A Compiler-Managed Memory System for Software-Exposed Architectures,” PhD thesis, Dept. of Electrical Eng. and Computer Science, Massachusetts Inst. of Tech nology, Jan. 2000.
[6] W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford, “Effective Automatic Parallelization with Polaris,” Int'l J. Parallel Programming, May 1995.
[7] S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb, "Supporting Systolic and Memory Communication in iWarp," Proc. 17th Int'l Symp. Computer Architecture, pp. 70-81, 1990.
[8] W.Y. Chen, “Data Preload for Superscalar and VLIW Processors,” PhD thesis, Dept. of Electrical and Computer Eng., Univ. of Illinois at Urbana-Champaign, 1993.
[9] S.P.E. Corp., “The SPEC Benchmark Suites,” http:/, 2001.
[10] R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang, “Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures,” J. Parallel and Distributed Computing, vol. 22, no. 3, Sept. 1994.
[11] J. Fisher,“VLIW architecture and the ELI-512,” Proc. 10th Int’l Symp. Computer Architecture, pp. 140-150, May 1983.
[12] D.M. Gallagher et al., "Dynamic Memory Disambiguation Using the Memory Conflict Buffer," Proc. Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1994, pp. 183-193.
[13] M.W. Hall et al., "Maximizing Multiprocessor Performance with the SUIF Compiler," Computer, Dec. 1996, pp. 84-89.
[14] M. Horowitz, R. Ho, and K. Mai, “The Future of Wires,” Proc. Semiconductor Research Corp. Workshop Interconnects for Systems on a Chip, May 1999.
[15] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, MediaBench: A Tool For Evaluating and Synthesizing Multimedia and Communications Systems Proc. 30th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 330-335, 1997.
[16] C. Lee and M. Stoodley, “UTDSP BenchMark Suite,” infrastructureUTDSP.html, 1992.
[17] W. Lee, R. Barua, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe, "Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine," Proc. 8th ASPLOS, 1998.
[18] P.G. Lowney et al., "The Multiflow Trace Scheduling Compiler," J. Supercomputing, May 1993, pp. 51-142.
[19] D.E. Maydan, “Accurate Analysis of Array References,” PhD thesis, Stanford Univ., Oct. 1992.
[20] DSP56000 24-bit Digital Signal Processor Family Manual, Motorola, 1995. Also available
[21] S. Mukherjee, S. Sharma, M. Hill, J. Larus, A. Rogers, and J. Saltz, “Efficient Support for Irregular Applications on Distributed-Memory Machines,” Principles and Practice of Parallel Programming (PPoPP) 1995, pp. 68-79, July 1995.
[22] NEC$\mu PD7701x$Family User's Guide, NEC Corp., 1995.
[23] A. Nicolau,"Run-Time Disambiguation: Coping With Statically Unpredictable Dependencies," IEEE Trans. Computers, vol. 38, no. 5, pp. 663-678, May 1989.
[24] M.C. Rinard and M.S. Lam, “The Design, Implementation, and Evaluation of Jade,” ACM Trans. Programming Languages and Systems, vol. 20, no. 3, pp. 483-545, May 1998.
[25] Int'l Technology Roadmap for Semiconductors, 1999 Ed., Semiconductor Industry Assoc., 1999.
[26] L.A. Rowe, K. Gong, E. Hung, K. Patel, S. Smoot, and D. Wallach, “Berkeley MPEG Tools,” , 2001.
[27] R. Rugina and M. Rinard, “Pointer Analysis for Multithreaded Programs,” Proc. SIGPLAN '99 Conf. Program Language Design and Implementation, May 1999.
[28] M.A.R. Saghir, P. Chow, and C.G. Lee, “Exploiting Dual Data-Memory Banks in Digital signal Processors,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 234-243, Oct. 1996.
[29] J. Saltz, R. Ponnusamy, S. Sharma, B. Moon, Y.-S. Hwang, M. Uysal, and R. Das, “A Manual for the CHAOS Runtime Library,” technical report, Dept. of Computer Science and UMIACS, Univ. of Maryland, Mar. 1995.
[30] A. Sudarsanam and S. Malik, “Memory Bank and Register Allocation in Software Synthesis for ASIPs,” Proc. Int'l Conf. Computer-Aided Design, pp. 388-392, 1995.
[31] D. Sylvester and K. Keutzer, “Rethinking Deep-Submicron Circuit Design,” Computer, vol. 32, no. 11, pp. 25-33, Nov. 1999.
[32] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal, “Baring It All to Software: Raw Machines,” Computer, pp. 86-93, Sept. 1997.
[33] R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C. Tseng, M. Hall, M. Lam, and J. Hennessy, "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers," ACM SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, Dec 1994.
[34] M. Wolfe,“Optimizing Supercompilers For Supercomputers.”Cambridge, MA: MIT, 1989.

Index Terms:
Compiler, memory, bank disambiguation, memory parallelism, Maps, Raw.
Rajeev Barua, Walter Lee, Saman Amarasinghe, Anant Agarwal, "Compiler Support for Scalable and Efficient Memory Systems," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1234-1247, Nov. 2001, doi:10.1109/12.966497
Usage of this product signifies your acceptance of the Terms of Use.