This Article 
 Bibliographic References 
 Add to: 
A VLIW architecture for a trace Scheduling Compiler
August 1988 (vol. 37 no. 8)
pp. 967-979
A VLIW (very long instruction word) architecture machine called the TRACE has been built along with its companion Trace Scheduling compacting compiler. This machine has three hardware configurations, capable of executing 7, 14, or 28 operations simultaneously. The 'seven-wide' achieves a performance improvement of a factor of five or six for a wide range of scientific code, compared to machines

[1] M. G. H. Katevenis,Reduced Instruction Set Computer Architectures for VLSI. Cambridge, MA: M.I.T. Press, 1985.
[2] G. S. Tjaden and M. J. Flynn, "Detection and parallel execution of independent instructions,"IEEE Trans. Comput., vol. C-19, pp. 889-895, Oct. 1970.
[3] C. C. Foster and E. M. Riseman, "Percolation of code to enhance parallel dispatching and execution,"IEEE Trans. Comput., vol. C- 21, pp. 1411-1415, 1972.
[4] J. A. Fisher, "Very long instruction word architectures and the ELI-512," inProc. 10th Annu. Int. Symp. Comput. Architecture, ACM-SIGARCH and the IEEE Computer Society, June 1983, pp. 140-150.
[5] J. Ellis,Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, MA, 1986, pp. 260-261.
[6] J. A. Fisher, "The optimization of horizontal microcode within and beyond basic blocks: An application of processor scheduling with resources," Tech. Rep. COO-3077-161, Courant Math. and Comput. Lab., New York Univ., Oct. 1979.
[7] J. L. Hennessy, N. Jouppi, F. Baskett, and J. Gill, "MIPS: A VLSI processor architecture," inProc. CMU Conf. VLSI Syst. Computat., Oct. 1981, pp. 337-346.
[8] G. Radin, "The 801 Minicomputer,"Proc. Symp. Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, Mar. 1982, pp. 39-47.
[9] J. E. Thornton,Design of a Computer: The Control Data 6600. Glenview, IL: Scott, Foresman, 1970.
[10] R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units," inComputer Structures: Principles and Examples. New York: McGraw-Hill, 1982, pp. 293-305.
[11] R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors,"IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.
[12] J. J. Dongarra, "Performance of various computers using standard linear equations software in a Fortran environment,"Comput. Architecture News, vol. 13, no. 1, pp. 3-11, Mar. 1985.
[13] Swanson Analysis Systems, Inc., "Ansys large scale benchmark timing results," Tech. Rep., Houston, PA, Apr. 30, 1987.
[14] F. H. McMahon, "The Livermore Fortran kernels: A computer test of the numerical performance range," Tech. Rep., Lawrence Livermore Nat. Lab., Dec. 1986.
[15] C. L. Seitz, "The Cosmic Cube,"Commun. ACM, pp. 22-33, Jan. 1985.
[16] D.L. Waltz, "Applications of the Connection Machine,"Computer, Vol. 20, No. 1, 1987, pp. 85-97.
[17] J. A. Fisher, "Trace scheduling: A technique for global microcode compaction,"IEEE Trans. Comput., vol. C-30, pp. 478-490, July 1981.
[18] J. A. Fisher and John J. O'Donnell, "VLIW machines: Multiprocessors we can actually program," inCompCon 84 Proc., 1984, pp. 299-305.
[19] J. R. Ellis, J. A. Fisher, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," inProc. SIGPLAN 84 Symp. Compiler Construction, ACM SIGPLAN Notices, June 1984.
[20] G. F. Pfister and V. A. Norton, "Hot-spot contention and combining in multistage interconnection networks,"IEEE Trans. Comput., vol C- 34, pp. 943-948, Oct. 1985.
[21] A. Smith, "Cache Memories,"Computing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473- 530.
[22] W. D. Strecker, "VAX-11/780--A virtual address extension to the DEC PDP-11 family," inComputer Structures: Principles and Examples. New York: McGraw-Hill, 1982, pp. 716-729.
[23] D. Ditzel, H. McLellan, and A. Berenbaum, "The Hardware Architecture of the CRISP Microprocessor,"14th Ann. Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, CA, Order No. 776, 1987, pp. 309-319.
[24] D. W. Clark and J. S. Emer, "Performance of the VAX-11/780 translation buffer: Simulation and measurement,"ACM Trans. Comput. Syst., vol. 3, pp. 31-62, Feb. 1985.
[25] D. Wall, "Global register allocation at link time," inProc. SIGPLAN'86 Symp. Compiler Construction, ACM, June 1986, pp. 264-275.
[26] Arvind and R. A. Iannucci, "A critic of multiprocessing von Neumann style," inProc. 10th Annu. Symp. Comput. Architecture, June 1983, pp. 426-436.

Index Terms:
VLIW architecture; trace scheduling compiler; very long instruction word; TRACE; run-time resource usage; performance results; computer architecture; program compilers; scheduling.
R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.P. Papworth, P.K. Rodman, "A VLIW architecture for a trace Scheduling Compiler," IEEE Transactions on Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988, doi:10.1109/12.2247
Usage of this product signifies your acceptance of the Terms of Use.