This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploiting Parallelism Across Program Execution: A Unification Technique and Its Analysis
October 1990 (vol. 1 no. 4)
pp. 399-414

It is shown that the transformed programs so generated provide significant speedups over the original program on vector processors and vector multiprocessors. The parallelism that arises when multiple instances of a program are executed on simultaneously available data sets is exploited. This is in contrast to the existing approaches that aim at detecting parallelism within a program. The analytic model is used to prove the optimality of the complete first policy for block selection for a class of program graphs known asnonregressive graphs. Analytic and simulation models of the technique clearly indicate the speedups that could be achieved when several data sets are available simultaneously, as is the case in many fields of interest.

[1] A. T. Acree, R. A. DeMillo, T. A. Budd, and F. G. Sayward, "Mutation analysis," Tech. Rep., GIT-ICS-79/08, Georgia Instit. of Technol., Atlanta, GA, 1979.
[2] A. V. Aho, R. Sethi, and J. D. Ullman,Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
[3] J. R. Allen and K. Kennedy, "A Parallel programming environment,"IEEE Software, pp. 21-29, July 1985.
[4] B. Beizer,Software System Testing and Quality Assurance, Van Nostrand Reinhold, New York, 1984, p. 300.
[5] V. C. Bhavsar and J. R. Issac, "Design and analysis of parallel Monte Carlo algorithms,"SIAM J. Scientif. Statist. Comput., vol. 8, no. 1, pp. 573-595, Jan. 1987.
[6] V. C. Bhavsar and T. A. Tassou, "Monte Carlo neutron transport on the Alliant FX/8 (preliminary results)," inProc. Int. Conf. Parallel Processing, 1987, pp. 421-423.
[7] R. Cytron, "Doacross: Beyond vectorization for multiprocessors (extended abstract)," inProc. Int Conf. Parallel Processing, 1986, pp. 836-844.
[8] R. Cytron, "Limited processor scheduling of doacross loops," inProc. Int. Conf. Parallel Processing, 1987, pp. 226-234.
[9] J. Daviset al., "The KAP/S-1: An advanced source-to-source vectorizer for the S-1 Mark IIa supercomputer," inProc. Int. Conf. Parallel Processing, 1986, pp. 833-835.
[10] H. Duifhuiset al., "Modeling the cochlear partition with coupled Van Der Pol oscillators, " inPeripheral Auditory Mechanisms, Lecture Notes in Biomathematics, no. 64, Berlin, Germany: Springer Verlag, Aug. 1985, pp. 290-297.
[11] FX / FORTRAN, Programmer's Handbook, Alliant Computer Systems Corp., MA 01460, Mar. 1987.
[12] J. P. Hayeset al., "A microprocessor-based hypercube supercomputer,"IEEE Micro, vol. 6, pp. 6-17, Oct. 1986.
[13] C. Husonet al., "The KAP/205: An advanced source-to-source vectorizer for the CYBER 205 supercomputer," inProc. Int. Conf. Parallel Processing, 1986, pp. 827-832.
[14] D. J. Kucket al., "The effects of program restructuring, algorithm change, and architecture choice on program performance," inProc. Int. Conf. Parallel Processing, 1984, pp. 129-138.
[15] A. P. Mathur and E. Galiano, "Inducing vectorization: A formal analysis," inProc. Third Int. Conf. Supercomput., Boston, May 1988.
[16] A. P. Mathur and E. W. Krauser, "Modeling mutation on a vector processor," inProc. Int. Conf. Software Engineering, Singapore, Apr. 1988.
[17] S. P. Midkiff and D. A. Padua, "Compiler generated synchronization for do loops," inProc. Int. Conf. Parallel Processing, 1986, pp, 544-551.
[18] S. P. Midkiff and D. A. Padua, "Compiler algorithms for synchronization,"IEEE Trans. Comput., vol. C-36, no. 12, pp. 1485-1495, Dec. 1987.
[19] J. Peir and R. Cytron, "Minimum distance: A method for partitioning recurrences for multiprocessors," inProc. Int. Conf. Parallel Proccessing, 1987, pp. 217-225.
[20] V. J. Rego and A. P. Mathur, "Concurrency enhancement through program unification: A performance analysis,"J. Parallel Distributed Comput., vol. 8, pp. 210-217, Mar. 1990.
[21] V. Rego and A. P. Mathur "Exploiting parallelism across program execution," Purdue CSD-TR-751, Mar. 1988.
[22] P. Tang and P. Yew, "Processor self scheduling for multiplenested parallel loops, " inProc. Int. Conf. Parallel Processing, 1986, pp. 528-535.
[23] C. Polychronopoulos, "Loop coalescing: A compiler transformation for parallel machines," inProc. Int. Conf. Parallel Processing, 1987, pp. 235-242.
[24] C. Polychronopouloset al., "Execution of parallel loops on parallel processor systems," inProc. Int. Conf. Parallel Processing, 1986, pp. 519-527.
[25] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[26] W. Szpankowski and V. Rego. "Yet another application of a binomial recurrence: Order statistics," inComputing: Archives for Informatics and Numerical Computation.Springer-Verlag, vol. 43, 1990, pp. 401-410.
[27] A. Wallquistet al., "Exploiting physical parallelism using supercomputers: Two examples from chemical physics,"IEEE Comput. Mag., pp. 9-21, May 1987.
[28] H. Wassermanet al., "A benchmark of the SCS 540 computer: A mini supercomputer compatible with the Cray XP/24," inProc. Vector Parallel Processors Comput. Sci. Conf., Liverpool, England, Aug. 1987.
[29] M. Wolf, "Advanced loop interchanging," inProc. Int. Conf. Parallel Procesing, pp. 536-543.
[30] M. Wolf, "Multiprocessor synchronization for concurrent loops,"IEEE Software, pp. 34-42, Jan. 1988.

Index Terms:
Index Termsparallelism; unification; source-to-source transformation; sequential programs; vector processors; vector multiprocessors; optimality; program graphs; nonregressive graphs; parallel programming; programming theory
Citation:
V.J. Rego, A.P. Mathur, "Exploiting Parallelism Across Program Execution: A Unification Technique and Its Analysis," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 4, pp. 399-414, Oct. 1990, doi:10.1109/71.80170
Usage of this product signifies your acceptance of the Terms of Use.