|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Pradeep K. Dubey, George B. Adams, Michael J. Flynn, "Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 1, pp. 17-27, January, 1995. | |||
| BibTex | x | ||
| @article{ 10.1109/71.363414, author = {Pradeep K. Dubey and George B. Adams and Michael J. Flynn}, title = {Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {6}, number = {1}, issn = {1045-9219}, year = {1995}, pages = {17-27}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.363414}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives IS - 1 SN - 1045-9219 SP17 EP27 EPD - 17-27 A1 - Pradeep K. Dubey, A1 - George B. Adams, A1 - Michael J. Flynn, PY - 1995 VL - 6 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
The model shows that the number of pipelines (or processors) at which the maximum throughput is obtained is, as memory access time increases, increasingly sensitive to the ratio of memory access time to network access delay. Further, as a function of interiteration dependence distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding optimum number of processors varies linearly. The predictions from the analytical model agree with similar results published using simulation-based techniques.
[1] R.D. Acosta et al., "An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors," IEEE Trans. Computers, Vol. C-35, No. 9, Sept. 1986, pp. 815-828.
[2] D. W. Anderson, F. J. Sparcio, and R. M. Tomasulo,“The IBM System/360 model 91: Machine philosophy and instruction handling,”IBM J. Res. and Dev. 11, pp. 8–24, Jan. 1967.
[3] M. Butleret al.,“Single instruction stream parallelism is greater than two,”inProc. 18th Int. Symp. on Comput. Architecture, May 1991, pp. 276–286.
[4] R. Cytron,“Doacross: beyond vectorization for multiprocessors,”inProc. 1986 Int. Conf. Parallel Processing, pp. 836–844.
[5] P.K. Dubey and M.J. Flynn, "Optimal Pipelining," J. Parallel and Distributed Computing, Vol. 8, No. 1, Jan. 1990, pp. 10-19.
[6] P. K. Dubey, G. B. Adams III, and M. J. Flynn,“Spectrum of choices: Superpipelined, superscalar, or multiprocessor?”inProc. 3rd. IEEE Symp. Parallel and Distrib. Processing, Dec. 1991, pp. 233–240.
[7] M. J. Flynn,“Some computer organizations and their effectiveness,”IEEE Trans. on Comput., C-21, 9, Sep. 1972, pp. 948–960.
[8] T. R. Gross and J. Hennessey,“Optimizing delayed branches,”inProc. 15th Workshop on Microprogramming, 1982.
[9] N.P. Jouppi and D.W. Wall,"Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines," Proc. Third Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Assoc. of Computing Machinery,N.Y., Apr. 1989, pp. 272-282.
[10] D. Kuck, Y. Muraoka, and S. Chen,“On the number of operations simultaneously executable in fortran-like programs and their resulting speedup,”IEEE Trans. Comput., C-21, Dec. 1972, pp. 1293–1310.
[11] M. S. Lam and R. P. Wilson,“Limits of control flow on parallelism,”Proc. 19th Int. Symp. Comput. Architecture, May 1992, pp. 46–57.
[12] D. J. Lilja and P. C. Yew,“The performance potential of fine-grain and coarse-grain parallel architectures,”inProc. 24th Hawaii Int. Conf. Syst. Sci., vol. 1, Architecture, Jan. 1991, pp. 324–333.
[13] A. Nicolau and J. Fisher,“Measuring the parallelism available for very long instruction word architectures,”IEEE Trans. Comput., C-33, pp. 968–976, Nov. 1984.
[14] A. Pleszkun and G. S. Sohi,“The performance potential of multiple functional unit processors,”inProc. 15th Int. Symp. Comput. Architecture, June 1988, pp. 37–44.
[15] C. D. Polychronopoulos,“On program restructuring, scheduling and communication for parallel processor systems,”Ph.D. dissertation, Dep. of Computer Science, Univ. of Illinois, Aug. 1986.
[16] B.R. Rau and C.D. Glaeser,“Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientificcomputing,” Proc. 14th Ann. Workshop Microprogramming, pp. 183-198, Oct. 1981.
[17] M.D. Smith, M. Johnson, and M. Horowitz, “Limits on Multiple Instruction Issue,” Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 290-302, Apr. 1989.
[18] G. S. Tjaden and M. J. Flynn,“Detection and parallel execution of independent instructions,”IEEE Trans. Comput., C-19, Oct. 1970, pp. 889–895.
[19] D.W. Wall, “Limits of Instruction-Level Parallelism,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 176-188, 8-11 Apr. 1991.

