This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories
April 1991 (vol. 2 no. 2)
pp. 138-148

Compile-time techniques for storage allocation of scalar values into memory modules that limit run-time memory-access conflicts are presented. The allocation approach is applicable to those operands in instructions that can be predicted at compile-time, where an instruction is composed of the multiple operations and corresponding operands that execute in parallel. Algorithms to schedule data transfers among memory modules to avoid conflicts that cannot be eliminated by the distribution of values alone are developed. The techniques have been implemented as part of a compiler for a reconfigurable long instruction word architecture. Results of experiments are presented demonstrating that a very high percentage of memory access conflicts can be avoided by scheduling a very low number of data transfers.

[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms. Menlo Park, CA: Addison-Wesley, 1974.
[2] K. E. Batcher, "The multidimensional access memory in STARAN,"IEEE Trans. Comput., pp. 174-177, Feb. 1977.
[3] P. P. Budnik and D. J. Kuck, "The organization and use of parallel memories,"IEEE Trans. Comput., vol. C-20, pp. 1566-1569, Dec. 1971.
[4] G. J. Chaitin, M.A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W. Markstein, "Register allocation via coloring,"Comput. Languages, vol. 6, pp. 47-57, 1981.
[5] F. Chow and J. Hennessy, "Register allocation by priority-based coloring,"SIGPLAN Not., vol. 19, no. 6, pp. 222-232, 1984.
[6] R. P. Colwell, R. P. Nix, J. J. O' Donnell, D. B. Panworth. and P. K. Rodman, "A VLIW architecture for a trace scheduling compiler,"IEEE Trans. Comput., vol. 37, pp. 967-979, Aug. 1988.
[7] R. Cytron and J. Ferrante, "What's in a name? or the value of renaming for parallelism detection and storage allocation," inProc. Int. Conf. Parallel Processing, Aug. 1987, pp. 19-27.
[8] J. Ellis,Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, MA, 1986, pp. 260-261.
[9] J. Ferrante, K. Ottenstein, and J. Warren, "The program dependence graph and its use in optimization,"ACM Trans. Program. Lang. Syst., vol. 9, no. 3, pp. 319-349, July 1987.
[10] J. A. Fisher, "The VLIW machine: A multiprocessor for compiling scientific code,"IEEE Comput. Mag., pp. 45-53, 1984.
[11] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[12] R. Gupta and M. L. Soffa, "A reconfigurable LIW architecture." inProc. Int. Conf. Parallel Processing, Aug. 1987, pp. 893-900.
[13] R. Gupta, "A reconfigurabhle LIW architecture and its compiler," Ph.D. dissertation, Tech. Rep. 87-3, Dep. Comput. Sci., Univ. of Pittsburgh, Aug. 1987.
[14] R. Gupta and M. L. Soffa, "Compilation techniques for a reconfigurable LIW architecture,"J. Supercomput., vol. 3, pp. 271-304, 1989.
[15] R. Gupta and M. L. Soffa, "Compile-time techniques for efficient utilization of parallel memories," Tech. Rep. TR-89-23, Univ. of Pittsburgh, Oct. 1989.
[16] D. T. Harper III and J. R. Jump, "Vector access performance in parallel memories using a skewed storage scheme,"IEEE Trans. Comput., vol. C-36, no. 12, pp. 1440-1449, 1987.
[17] S. Jain and C. Thompson, "An efficient approach to data flow analysis in a multiple data pass global optimizer," inProc. SIGPLAN'88 Conf. Programming Language Design and Implementation, June, 1988, pp. 154-163.
[18] D. J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe, "Compiler transformation of dependence graphs," inConf. Rec. 8th ACM Symp. Principles Program. Languages, Williamsburg, VA, Jan. 1981.
[19] D. J. Kuck, "ILLIAC IV software and application programming,"IEEE Trans. Comput., vol. C-17, no. 8, pp. 758-770, Aug. 1968.
[20] D. J. Kuck and R. A. Stokes, "The Burroughs Scientific Processor (BSP),"IEEE Trans. Comput., vol. C-31, no. 5, pp. 363-376, May 1982.
[21] M. E. Mace and R. E. Wagner, "Globally optimum selection of storage patterns," IBM Res. Rep. RC 10676, IBM T. J. Watson Research Center, Yorktown Heights, Aug. 1984.
[22] M. E. Mace,Memory Storage Patterns in Parallel Processing.New York: Kluwer Academic, 1987.
[23] R. E. Tarjan, "Decomposition by clique separators,"Discrete Math.vol. 55, pp. 221-231, 1985.

Index Terms:
Index Termsstorage allocation; scalar values; data transfers; compiler; reconfigurable long instructionword architecture; memory access conflicts; parallel processing; program compilers;storage allocation
Citation:
R. Gupta, M.L. Soffa, "Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 2, pp. 138-148, April 1991, doi:10.1109/71.89060
Usage of this product signifies your acceptance of the Terms of Use.