
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Haigeng Wang, Alexandru Nicolau, Stephen Keung, KaiYeung (Sunny) Siu, "Computing Programs Containing Band Linear Recurrences on Vector Supercomputers," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 8, pp. 769782, August, 1996.  
BibTex  x  
@article{ 10.1109/71.532109, author = {Haigeng Wang and Alexandru Nicolau and Stephen Keung and KaiYeung (Sunny) Siu}, title = {Computing Programs Containing Band Linear Recurrences on Vector Supercomputers}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {8}, issn = {10459219}, year = {1996}, pages = {769782}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.532109}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Computing Programs Containing Band Linear Recurrences on Vector Supercomputers IS  8 SN  10459219 SP769 EP782 EPD  769782 A1  Haigeng Wang, A1  Alexandru Nicolau, A1  Stephen Keung, A1  KaiYeung (Sunny) Siu, PY  1996 KW  Band linear recurrences (BLRs) KW  parallel evaluation of BLRs with resource constraints KW  programs with BLRs KW  parallel programming KW  vector supercomputer. VL  7 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—Many largescale scientific and engineering computations, e.g., some of the Grand Challenge problems [1], spend a major portion of execution time in their core loops computing band linear recurrences (BLRs). Conventional compiler parallelization techniques[4] cannot generate scalable parallel code for this type of computation because they respect loopcarried dependences (LCDs) in programs, and there is a limited amount of parallelism in a BLR with respect to LCDs. For many applications, using library routines to replace the core BLR requires the separation of BLR from its dependent computation, which usually incurs significant overhead. In this paper, we present a new scalable algorithm, called the
[1] R.K. Agarwal, "Computational Fluid Dynamics on Parallel Processors," a tutorial at the 1992 Sixth Int'l Conf. Supercomputing, Washington, D.C., McDonnell Douglas Research Laboratories, July 1992.
[2] Z. Ammarguellat and W. Harrison III, "Automatic Recognition of Induction Variable and Recurrence Relations by Abstract Interpretation," Proc. ACM SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 283295,White Plains, New York, June2022, 1990.
[3] U. Banerjee, S.C. Chen, D. Kuck, and R. Towle, "Time and Parallel Processor Bounds for FortranLike Loops," IEEE Trans. Computers, vol. 28, no. 9, pp. 660670, Sept. 1979.
[4] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua, "Automatic Program Parallelization," Proc. IEEE, vol. 81, Feb. 1993.
[5] D. Callahan, "Recognizing and Parallelizing Bounded Recurrences," Lecture Notes in Computer Science—Languages and Compilers for Parallel Computing, pp. 169185, SpringerVerlag, 1992.
[6] S.C. Chen, D. Kuck, and A.H. Sameh, "Practical Parallel Band Triangular System Solvers," ACM Trans. Mathematics Software, vol. 4, pp. 270277, Sept. 1978.
[7] H. Conn and L. Podrazik, "Parallel Recurrence Solvers for Vector and SIMD Supercomputers," Proc. 1992 Int'l Conf. Parallel Processing, pp. 8895, vol. 3, Aug.1721, 1992.
[8] Convex Computer Corp., Convex Architecture Reference,Richardson, Texas, 1991.
[9] Convex Computer Corp., Convex Theory of Operation (C200 Series), Document No. 081005030000, second edition, Richardson, Texas, Sept. 1990.
[10] Convex Computer Corp., Convex SCILIB User's Guide, Document No. 710013630001, first edition, Richardson, Texas, Aug. 1991.
[11] Convex Computer Corp., Convex VECLIB User's Guide, Document No. 710011030001, sixth edition, Richardson, Texas, Aug. 1991.
[12] J.J. Dongarra, C.B. Moler, J.R. Bunch, and G.W. Stewart, Linpack Users' Guide, Chapter 7, SIAM, Philadelphia, 1979.
[13] R. Eigenmann, J. Hoeflinger, G. Jaxon, Z. Li, and D. Padua, "Restructuring Fortran Programs for Cedar," Proc. ICPP, Aug. 1991.
[14] F.E. Fich, "New Bounds for Parallel Prefix Circuits," Proc. 15th ACM STOC, pp. 100109, 1983.
[15] D. Gajski, "An Algorithm for Solving Linear Recurrence Systems on Parallel and Pipelined Machines," IEEE Trans. Computers, vol. 30, no. 3, Mar. 1981.
[16] K.A. Gallivan, R.J. Plemmons, and A.H. Sameh, "Parallel Algorithms for Dense Linear Algebra Computations," SIAM Rev., vol. 32, no. 1, pp. 54135, Mar. 1990.
[17] L. Hyafil and H.T. Kung, "The Complexity of Parallel Evaluation of Linear Recurrence," J. ACM, vol. 24, no. 3, pp. 513521, July 1977.
[18] R. Karp, R. Miller, and S. Winograd, "The Organization of Computations for Uniform Recurrence Equations," J. ACM, vol. 14, July 1967.
[19] P. Kogge and H. Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. Computer, vol. 22, no. 8, Aug. 1973.
[20] D. Kuck, The Structure of Computers and Computations, vol. 1. New York: John Wiley and Sons, 1978.
[21] R.E. Ladner and M.J. Fischer, "Parallel Prefix Computation," J. ACM, vol. 27, no. 4, pp. 831838, Oct. 1980.
[22] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[23] F.H. McMahon, "The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range," Lawrence Livermore National Laboratory, Livermore, Calif., UCRL53745, Dec. 1986.
[24] A. Nicolau and H. Wang, "Optimal Schedules for Parallel Prefix Computation with Bounded Resources," SIGPLAN Notices and Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming,Williamsburg, Va., Apr.2124, 1991.
[25] T. Peters, "Livermore Loops Coded in C," Kendall Square Research Corp., Latest File Modification: Oct.22, 1992, available at http://www.netlib.org/benchmark/livermorec.
[26] V.P. Roychowdhury, “Derivation, Extensions, and Parallel Implementation of Regular Iterative Algorithms,” PhD thesis, Dept. of Electrical Eng., Stanford Univ., Stanford, Calif., Dec. 1988.
[27] S. Pinter and R. Pinter, "Program Optimization and Parallelization Using Idioms," Conf. Record 18th ACM Symp. Principles of Programming Languages, Jan. 1991.
[28] A. Sameh and R. Brent, "Solving Triangular Systems on a Parallel Computer," SIAM J. Numerical Analysis, vol. 14, pp. 1,1011,113, 1977.
[29] W. Shang and J.A.B. Fortes, "On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 350363, May 1992.
[30] W. Shang and J.A.B. Fortes, "Independent Partitioning of Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 41, no. 2, pp. 190206, Feb. 1992.
[31] W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 40, June 1991.
[32] M. Snir, "DepthSize TradeOffs for Parallel Prefix Computation," J. Algorithms, vol. 7, pp. 185201, 1986.
[33] Y. Tanaka, "Compiling Techniques for FirstOrder Linear Recurrence," J. Supercomputing, vol. 4, no. 1, pp. 6382, Mar. 1990.
[34] N.K. Tsao, "Solving Triangular System in Parallel is Accurate," Numerical Linear Algebra, Digital Signal Processing and Parallel Algorithms, pp. 633638, G. Golub and P. Van Dooren, eds., NATO Series F: Computer and Systems Sciences, vol. 70, SpringerVerlag, 1991.
[35] H. Wang and A. Nicolau, "Speedup of Band Linear Recurrences in the Presence of Resource Constraints," Proc. Sixth Int'l Conf. Supercomputing, pp. 466477,Washington, D.C., July1923, 1992.
[36] H. Wang and A. Nicolau, "Computing Programs Containing Band Linear Recurrences on Vector Supercomputers," TR 92113, Dept. of Computer Science, Univ. of California at Irvine, Dec. 1992.