This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Instruction Window Size Trade-Offs and Characterization of Program Parallelism
April 1994 (vol. 43 no. 4)
pp. 431-442

Detecting independent operations is a prime objective for computers that are capable of issuing and executing multiple operations simultaneously. The number of instructions that are simultaneously examined for detecting those that are independent is the scope of concurrency detection. The authors present an analytical model for predicting the performance impact of varying the scope of concurrency detection as a function of available resources, such as number of pipelines in a superscalar architecture. The model developed can show where a performance bottleneck might be: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative execution is examined via a set of probability distributions that characterize the inherent parallelism in the instruction stream. These results were derived using traces from a Multiflow TRACE SCHEDULING compacting FORTRAN 77 and C compilers. The experiments provide misprediction delay estimates for 11 common application-level benchmarks under scope constraints, assuming speculative, out-of-order execution and run time scheduling. The throughput prediction of the analytical model is shown to be close to the measured static throughput of the compiler output.

[1] R. D. Acosta, J. Kjelstrup, and H. C. Torng, "An instruction issuing approach to enhancing performance in multiple functional unit processors,"IEEE Trans. Comput., vol. C-35, pp. 815-828, Sept. 1986.
[2] D. Bailey, J. Barton, T. Lasinski, and H. Simon, "The NAS parallel benchmarks," Rep. RNR-91-002, NASA Ames Res. Ctr., Jan. 1991.
[3] M. Butler, T-Y. Yeh, Y. Patt, M. Alsup, H. Scales, and M. Shebanow, "Single instruction stream parallelism is greater than two," inProc. 18th Annu. Int. Symp. Comput. Architecture, Toronto, Canada, IEEE and ACM, May 1991, pp. 276-286.
[4] R. Cohn et al., "Architecture and Compiler Trade-offs for a Wide-Instruction Word Microprocessor,"Proc. Third. Int'l Conf. Architectural Support for Programming Language and Operating System (ASPLOS III), IEEE CS Press, Los Alamitos, Calif., Order No. 1936, Apr. 1989, pp. 2-14.
[5] R. P. Colwell, W. E. Hall, C. S. Joshi, D. B. Papworth, P. K. Rodman, and J. E. Tomes, "Architecture and implementation of a VLIW supercomputer," inSupercomputing '90, 1990, pp. 910-919.
[6] G. Cybenko, L. Kipp, L. Pointer, and D. Kuck, "Supercomputer performance evaluation and the PERFECT benchmarks," CSRD Rep. No. 965, Univ. of Illinois, Urbana, IL, Mar. 1990.
[7] R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, "A VLIW architecture for a trace scheduling compiler,"IEEE Trans. Comput.vol. 37, pp. 967-979, Aug. 1988.
[8] P. K. Dubey and M. J. Flynn, "Branch strategies: Modelling and optimization,"IEEE Trans. Comput., vol. 40, pp. 1159-1167, Oct. 1991.
[9] P. K. Dubey, G. B. Adams, III, and M. J. Flynn, "Exploiting fine-grain concurrency: analytical insights in superscalar processor design," Tech. Rep. No. TR-EE 91-31, School of Elect. Eng., Purdue Univ., Aug. 1991.
[10] J. A. Fisher, "Trace scheduling: A technique for global microcode compaction,"IEEE Trans. Comput., vol. C-30, pp. 478-490, July 1981.
[11] P. Y. T. Hsu and E. S. Davidson, "Highly concurrent scalar processing," inProc. 13th Annu. Symp. Comput. Architecture, June 1986, pp. 386-395.
[12] R. A. Kamin, G. B. Adams, III, and P. K. Dubey, "Dynamic trace analysis for analytic modeling of superscalar performance," to appear in a Special Issue ofPerformance Evaluation(the Performance Modeling of Parallel Processing Systems).
[13] D. Kuck, Y. Muraoka, and S. Chen, "On the number of operations simultaneously executable in Fortran-like programs and their resulting speedup,"IEEE Trans. Comput., vol. C-21, pp. 1293-1310, Dec. 1972.
[14] H. T. Kung, "Why systolic architectures?,"Computer, vol. 15, no. 1, pp. 37-46, Jan. 1982.
[15] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines,"Proc. Sigplan 88 Conf. Programming Language Design and Implementation, ACM, New York, 1988, pp. 318-328.
[16] T. Nakatani and K. Ebcioglu, "Using a lookahead window in compaction-based parallelzing compiler, " inProc. 23rd Microprogramming Workshop (MICRO-23), Orlando, FL, Nov. 1990.
[17] A. Nicolau and J. Fisher, "Measuring the parallelism available for very long instruction word architectures,"IEEE Trans. Comput., vol. C-33, pp. 968-976, Nov. 1984.
[18] A. Nicolau, "Uniform parallelism exploitation in ordinary programs," inProc. Int. Conf. Parallel Processing, Aug. 1985, pp. 614-618.
[19] R.R. Oehler and R.D. Groves, "IBM RISC System/6000 Processor Architecture,"IBM J. Research and Development, Vol. 34, No. 1, Jan. 1990, pp. 23-36.
[20] Y. Patt, W. Hwu, and M. Shebanow, "HPS, A new microarchitecture: Rationale and introduction," inProc. MICRO-18, ACM, Dec. 1985, pp. 103-108.
[21] E. M. Riseman and C. C. Foster, "The inhibition of potential parallelism,"IEEE Trans. Comput., vol. C-21, pp. 1405-1411, Dec. 1972.
[22] M. Schuette, "Exploitation of instruction-level parallelism for detection of processor execution errors," Res. Rep. No. CMUCDS-91-7, Carnegie-Mellon Univ., 1991.
[23] M. Smith, M. Johnson, and M. Horowitz, "Limits on Multiple Instruction Issue,"Symp. Architectural Support Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, CA, Order No. 1,936, 1989, pp. 290-302.
[24] G. S. Sohi and S. Vajapeyam, "Instruction issue logic in high-performance interruptible pipelined processors," inProc. 14th Annu. Symp. on Computer Architecture, June 1987, pp. 27-34.
[25] G. S. Tjaden and M. J. Flynn, "Detection and parallel execution of independent instructions,"IEEE Trans. Comput., vol. C-19, pp. 889-895, Oct. 1970.
[26] D.W. Wall, "Limits of Instruction-Level Parallelism,"Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 1991, pp. 176-188.
[27] R. G. Wedig, "Detection of concurrency in directly executed language instruction streams," Ph.D. dissertation, Stanford Univ., June 1982.
[28] S. Weiss and J.E. Smith, "Instruction Issue Logic for Pipelined Supercomputers,"Proc. Int'l Symp. Computer Architecture, Vol. 12, No. 3, June 1984, pp. 110-118.

Index Terms:
program compilers; scheduling; concurrency control; parallel programming; performance evaluation; instruction window size; trade-offs; characterization; program parallelism; concurrency detection; performance impact; performance bottleneck; parallelism; instruction stream parallelism; probability distributions; inherent parallelism; Multiflow TRACE SCHEDULING; compilers; delay estimates; scope constraints; run time scheduling; throughput prediction.
Citation:
P.K. Dubey, G.B. Adams, III, M.J. Flynn, "Instruction Window Size Trade-Offs and Characterization of Program Parallelism," IEEE Transactions on Computers, vol. 43, no. 4, pp. 431-442, April 1994, doi:10.1109/12.278481
Usage of this product signifies your acceptance of the Terms of Use.