This Article 
 Bibliographic References 
 Add to: 
Requirements for Optimal Execution of Loops with Tests
September 1992 (vol. 3 no. 5)
pp. 573-581
Both the efficient execution of branch intensive code and knowing the bounds on the same are important issues in computing in general and supercomputing in particular. In prior work, it has been suggested that the hardware needed to execute code with branches optimally is exponentially dependent on the total number of dynamic branches executed, this number of branches being proportional at least to the number of iterations of the loop. For classes of code taking at least one cycle per iteration to execute, this is not the case. For loops containing one test (normally in the form of a Boolean recurrence of order one), it is shown that the hardware necessary varies from exponential to polynomial in the length of the dependence cycle L, while execution time varies from one time cycle per iteration to less than L time cycles per iteration; the variation depends on specific code dependences. These results bring the eager evaluation of imperative code closer to fruition.

[1] A. Aiken and A. Nicolau, "Perfect Pipelining: A New Loop Parallelization Technique,"Proc. European Symp. Programming, 1988, pp. 221-235.
[2] U. Banerjee and D. Gajski, "Fast execution of loops with IF statements,"IEEE Trans. Comput.vol. C-33, no. 11, pp. 1030-1033, Nov. 1984.
[3] R. G. Cytron, "Doacross: Beyond vectorization for multiprocessors (extended abstract)," inProc. 1986 Int. Conf. Parallel Processing, Pennsylvania State Univ. and the IEEE Computer Society, Aug. 1986, pp. 836-844.
[4] K. Ebcioglu, "A compilation technique for software pipelining of loops with conditional jumps," inProc. Twentieth Annu. Workshop Microprogramming (MICRO-20), Association of Computing Machinery, Dec. 1987, pp. 69-79.
[5] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[6] C. D. Polychronopoulos, "On program restructuring, scheduling, an communication for parallel processor systems," Ph.D. dissertation, CSRD 595, Center of Supercomput. Res. Develop., University of Illinois, Aug. 1986.
[7] B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, "The Cydra 5 departmental supercomputer,"IEEE Computer Mag., vol. 22, no. 1, pp. 12-35, Jan. 1989.
[8] E. M. Riseman and C. C. Foster, "The inhibition of potential parallelism by conditional jumps,"IEEE Trans. Comput., pp. 1405-1411, Dec. 1972.
[9] U. Schwiegelshohn, F. Gasperoni, and K. Ebcioglu, "On optimal parallelization of arbitrary loops,"J. Parallel Distributed Comput., vol. 11, pp. 130-134, 1991.
[10] B. Su, S. Ding, J. Wang, and J. Xia, "GURPR--A method for global software pipelining," inProc. Twentieth Annu. Workshop Microprogramming (MICRO-20), Association of Computing Machinery, Dec. 1987, pp. 88-96.
[11] R. M. Tomasulo, "An efficient algorithm for expoiting multiple arithmetic units,"IBM J., pp. 25-33, Jan. 1967.
[12] A. K. Uht, Hardware extraction of low-level concurrency from sequential instruction streams," Ph.D. dissertation, Carnegie-Mellon University, Pittsburgh, PA, Dec. 1985. Available from University Microfilms International, Ann Arbor, MI.
[13] A. K. Uht and R. G. Wedig, "Hardware extraction of low-level concurrency from serial instruction streams," inProc. Int. Conf. Parallel Processing, IEEE Computer Society and the Association for Computing Machinery, Aug. 1986, pp. 729-736.
[14] A. K. Uht, "Incremental performance contributions of hardware concurrency extraction techniques," inProc. Int. Conf. Supercomput., Athens, Greece, Computer Technology Institute, Greece, in cooperation with the ACM, IFIP,et al., June 1987. Springer-Verlag Lecture Note Series.
[15] A. K. Uht, C. D. Polychronopoulos, and J. F. Kolen, "On the combination of hardware and software concurrency extraction methods," inProc. Twentieth Annu. Workshop Microprogramming (MICRO-20), ACM, Dec. 1987, pp. 133-141.
[16] A. K. Uht, "Requirements for optimal execution of loops with tests," inProc. Int. Conf. Supercomput., St., Malo, France, Association for Computing Machinery, July 4-8, 1988. An earlier version appeared with the same title as UCSD Comput. Sci. and Eng. Tech. Rep. CS88-116.
[17] A. K. Uht, "A theory of reduced and minimal procedural dependencies,"IEEE Trans. Comput., vol. 40, pp. 681-692, June 1991.
[18] A. K. Uht, "Concurrency extraction via hardware methods executing the static instruction stream,"IEEE Trans. Comput., to be published.
[19] S. S. Wang, "Enhancing concurrent program execution with eager evaluation," Ph.D. dissertation, Univ. California at San Diego, June 1991. Available as Dep. Comput. Sci. Eng. Tech. Rep. CS91-203.

Index Terms:
Index Termsloop iterations; loops with tests; branch intensive code; dynamic branches; Booleanrecurrence; order one; dependence cycle; time cycle; imperative code; parallelprogramming
A.K. Uht, "Requirements for Optimal Execution of Loops with Tests," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 5, pp. 573-581, Sept. 1992, doi:10.1109/71.159040
Usage of this product signifies your acceptance of the Terms of Use.