This Article 
 Bibliographic References 
 Add to: 
The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints
November 1996 (vol. 45 no. 11)
pp. 1257-1271

AbstractPrefix computation is a basic operation at the core of many important applications, e.g., some of the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry.1 In this paper, we present new and strict time-optimal parallel schedules for prefix computation with resource constraints under the concurrent-read-exclusive-write (CREW) parallel random access machine (PRAM) model. For prefix of N elements on p processors (p independent of N) when N > p(p + 1)/2, we derive Harmonic Schedules that achieve the strict optimal time (steps), $\left\lceil {{{2\left( {N-1} \right)} \mathord{\left/ {\vphantom {{2\left( {N-1} \right)} {\left( {p+1} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {p+1} \right)}}} \right\rceil $. We also derive Pipelined Schedules that have better program-space efficiency than the Harmonic Schedule, yet only require a small constant number of steps more than the optimal time achieved by the Harmonic Schedule. Both the Harmonic Schedules and the Pipelined Schedules are simple and easy to implement. For prefix of N elements on p processors (p independent of N) where Np(p + 1)/2, the Harmonic Schedules are not time-optimal. For these cases, we establish an optimization method for determining key parameters of time-optimal schedules, based on connections between the structure of parallel prefix and Pascal's triangle. Using the derived parameters, we devise an algorithm to construct such schedules. For a restricted class of values of N and p, we prove that the constructed schedules are strictly time-optimal. We also give strong empirical evidence that our algorithm constructs strict time-optimal schedules for all cases where Np(p + 1)/2.

[1] R.K. Agarwal, "Computational Fluid Dynamics on Parallel Processors," tutorial, McDonnell Douglas Research Laboratories, Proc. Sixth ACM SigArch Int'l Conf. Supercomputing,Washington, D.C., July 1992.
[2] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[3] G. Almasi and A. Gottlieb, Highly Parallel Computing, chap. 4. Benjamin/Cummings, 1989.
[4] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua, "Automatic Program Parallelization," Proc. IEEE, vol. 81, Feb. 1993.
[5] A. Bilgory and D. Gajski, "A Heuristic for Suffix Solutions," IEEE Trans. Computers, vol. 35, no. 1, Jan. 1986.
[6] R. Cole and U. Vishkin, "Faster Optimal Parallel Prefix Sums and List Ranking," Information and Computation, vol. 81, no. 3, pp. 334-352, 1989.
[7] O. Egecioglu and C.K. Koc, "Parallel Prefix Computation with Few Processors," Computers and Mathematics with Applications, vol. 24, no. 4, pp. 77-84, 1992.
[8] F.E. Fich, "New Bounds for Parallel Prefix Circuits," Proc. 15th ACM STOC, pp. 100-109, 1983.
[9] C. Kruskal, L. Rudolph, and M. Snir, "The Power of Parallel Prefix," IEEE Trans. Computers, vol. 34, no. 10, Oct. 1985.
[10] D. Kuck, The Structure of Computers and Computations, vol. 1. New York: John Wiley and Sons, 1978.
[11] P. Kogge and H. Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. Computers, vol. 22, no. 8, Aug. 1973.
[12] R.E. Ladner and M.J. Fischer, "Parallel Prefix Computation," J. ACM, vol. 27, no. 4, pp. 831-838, Oct. 1980.
[13] S. Lakshmivarahan and S.K. Dhall, Parallel Computing Using the Prefix Problem. Oxford Univ. Press, 1994.
[14] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[15] Y. Muraoka, "Parallelism Exposure and Exploitation in Programs," PhD thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Report No. 424, Feb. 1971.
[16] A. Nicolau and H. Wang, "Optimal Schedules for Parallel Prefix Computation with Bounded Resources," SIGPLAN Notices and Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming,Williamsburg, Va., Apr.21-24, 1991.
[17] Y. Ofman, "On the Algorithmic Complexity of Discrete Functions," Cybernetics and Control Theory, Soviet Physics Doklady, vol. 7, no. 7, pp. 589-591, Jan. 1963.
[18] M. Snir, "Depth-Size Trade-Offs for Parallel Prefix Computation," J. Algorithms, vol. 7, pp. 185-201, 1986.
[19] H. Wang, "Parallelization of Programs Containing Loop-Carried Dependences with Resource Constraints," PhD thesis, Dept. of Information and Computer Science, Univ. of California, Irvine, Sept. 1994.

Index Terms:
Parallel prefix computation, scan operator resource-constrained parallel algorithms, strict time-optimal schedules, loop parallelization, loop-carried dependences, associative operations, tree-height reduction, Pascal's Triangle, combinatorial optimization.
Haigeng Wang, Alexandru Nicolau, Kai-Yeng S. Siu, "The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints," IEEE Transactions on Computers, vol. 45, no. 11, pp. 1257-1271, Nov. 1996, doi:10.1109/12.544482
Usage of this product signifies your acceptance of the Terms of Use.