|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Ding-Kai Chen, Pen-Chung Yew, "On Effective Execution of Nonuniform DOACROSS Loops," IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 5, pp. 463-476, May, 1996. | |||
| BibTex | x | ||
| @article{ 10.1109/71.503771, author = {Ding-Kai Chen and Pen-Chung Yew}, title = {On Effective Execution of Nonuniform DOACROSS Loops}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {7}, number = {5}, issn = {1045-9219}, year = {1996}, pages = {463-476}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.503771}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - On Effective Execution of Nonuniform DOACROSS Loops IS - 5 SN - 1045-9219 SP463 EP476 EPD - 463-476 A1 - Ding-Kai Chen, A1 - Pen-Chung Yew, PY - 1996 KW - Compiler transformation KW - data dependence KW - loop parallelization KW - parallelism KW - scheduling KW - synchronization. VL - 7 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—It is extremely difficult to parallelize
[1] J.R. Allen, D. Callahan, and K. Kennedy, "Automatic Decomposition of Scientific Programs for Parallel Execution," Proc. 14th Ann. ACM Symp. Principles of Programming Languages,Munich, Germany, Jan. 1987.
[2] G.M. Amdahl, "Validity of the Single Processor Approach to Achieving Large Scale Computing Capability," Proc. AFIPS Spring Joint Computer Conf., pp. 483-487, Aug. 1967.
[3] U. Banerjee,Dependence Analysis for Supercomputing. Norwell, MA: Kluwer, 1988.
[4] BBN Advanced Computers, Butterfly Products Overview, 1987.
[5] D.-K. Chen, “Compiler Optimizations for Parallel Loops with Fine-Grained Synchronization,” PhD dissertation, Univ. of Illinois at Urbana-Champaign, 1994.
[6] D. Chen, J. Torrellas, and P. Yew, “An Efficient Algorithm for Runtime Parallelization of DOACROSS Loops,” Proc. Supercomputing 94, pp. 815-527, Nov. 1994.
[7] D.-K. Chen and P.-C. Yew, "An Empirical Study of DOACROSS Loops," Proc. Supercomputing 91, pp. 620-632, IEEE CS Press, Nov. 1991. Also available as CSRD Technical Report No. 1140.
[8] Z. Chen and W. Shang, "On Uniformization of Affine Dependence Algorithms," CACS Technical Report No. TR 92-3-3, Ctr. for Advanced Computer Studies, Univ. of Southwestern Louisiana, Sept. 1992.
[9] R. Cytron, "DOACROSS: Beyond Vectorization for Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 836-845, Aug. 1986.
[10] L. Lamport, "The Parallel Execution of DO Loops," Comm. ACM, vol. 17, Feb. 1974.
[11] D. Lenoski et al., "The directory-based cache coherence protocol for the dash multiprocessor," Proc. 17th Int'l Symp. Computer Architecture,Los Alamitos, Calif., pp. 148-159, 1990.
[12] ——,“Compiler algorithms for synchronization,”IEEE Trans. Comput., vol. C-36, pp. 1485–1495, Dec. 1987.
[13] Y. Muraoka, "Parallelism Exposure and Exploitation in Programs," PhD thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Report No. 424, Feb. 1971.
[14] D.A. Padua and M.J. Wolfe, "Advanced Compiler Optimizations for Supercomputers," Comm. ACM, vol. 29, Dec. 1986.
[15] D.A. Padua, "Multiprocessors: Discussion of Some Theoretical and Practical Problems," PhD thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Oct. 1979.
[16] "The Parallel Computing Forum," PCF Fortran: Language Definition, first edition, Aug. 1988.
[17] G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture," Proc. Int'l Conf. Parallel Processing, pp. 764-771, Aug. 1985.
[18] F.P. Preparata and M.I. Shamos, Computational Geometry. Springer-Verlag, 1985.
[19] Z. Shen, Z. Li, and P.-C. Yew, "An Empirical Study of Fortran Programs for Parallelizing Compilers," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 3, pp. 356-364, July 1990.
[20] H.-M. Su and P.-C. Yew, "On Data Synchronization for Multiprocessors," Proc. Int'l Symp. Computer Architecture, pp. 416-423, May 1989.
[21] H.-M. Su and P.-C. Yew, "Efficient DOACROSS Execution for Distributed Shared Memory Multiprocessors," Proc. Supercomputing 91, pp. 842-853, Nov. 1991.
[22] T.H. Tzen and L.M. Ni, “Dependence Uniformization: A Loop Parallelization Technique,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 5 pp. 547-558, May 1993.
[23] M. Wolfe,“Optimizing Supercompilers for Supercomputers,”Ph.D. dissertation, Dep. Comput. Sci., Univ. Illinois at Urbana-Champaign, 1982.

