This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Tradeoffs in Multithreaded Processors
September 1992 (vol. 3 no. 5)
pp. 525-539
An analytical performance model for multithreaded processors that includes cache interference, network contention, context-switching overhead, and data-sharing effects is presented. The model is validated through the author's simulations and by comparison with previously published simulation results. The results indicate that processors can substantially benefit from multithreading, even in systems with small caches, provided sufficient network bandwidth exists. Caches that are much larger than the working-set sizes of individual processes yield close to full processor utilization with as few as two to four contexts. Smaller caches require more contexts to keep the processor busy, while caches that are comparable in size to the working-sets of individual processes cannot achieve a high utilization regardless of the number of contexts. Increased network contention due to multithreading has a major effect on performance. The available network bandwidth and the context-switching overhead limits the best possible utilization.

[1] H. Sulivan and T. R. Bashkov, "A large scale homogeneous, fully distributed parallel machine, I," inProc. 4th Symp. Comput. Arch., March 1977, pp. 105-117.
[2] B. J. Smith, "A pipelined, shared resource MIMD computer," inProc. 1978 Int. Conf. Parallel Processing, 1978, pp. 6-8.
[3] M. J. Flynn and A. Podvin, "Shared resource multiprocessing,"IEEE Comput. Mag., pp. 20-28, Mar. 1972.
[4] E. S. Davidson, "A multiple stream microprocessor prototype system: AMP-1," inProc. 7th Annu. Symp. Comput. Architecture, IEEE, New York, May 1980, pp. 9-16.
[5] R. H. Halstead, Jr. and T. Fujita, "MASA: A multithreaded processor architecture for parallel symbolic computing," inProc. 15th Annu. Int. Symp. Comput. Architecture, May-June 1988, pp. 443-451.
[6] D. I. Moldovan and J. A. B. Fortes, "Partitioning and mapping algorithms into fixed size systolic arrays,"IEEE Trans. Comput., vol. C-35, pp. 1-12, Jan. 1986.
[7] G. M. Papadopoulos, "Implementation of a general-purpose dataflow multiprocessor," Tech. Rep. TR-432, M.I.T. Lab. for Comput. Sci., 545 Technology Square, Cambridge, MA, Aug. 1988.
[8] M. R. Thistle and B. J. Smith, "A processor architecture for Horizon," inProc. Supercomput. '88, Nov. 1988, pp. 35-41.
[9] R. S. Nikhil and Arvind, "Can dataflow subsume van Neumann computing?" inProc. 16th Ann. Int. Symp. on Computer Architecture, 1989.
[10] A. Agrawal et al., "APRIL: A Processor Architecture for Multiprocessing,"17th Int'l Symp. Computer Architectures, 1990, IEEE Computer Soc. Press, Los Alamitos, Calif., Order No. 2047, pp. 104-114.
[11] C. Whitby-Strevens, "The Transputer," inProc. 12th Annu. Symp. Comput. Architecture, Boston, MA, June 1985, pp. 292-300.
[12] W. J. Dally, L. Chao, and A. Chien et al., "Architecture of a message-driven processor," inProc. 14th Annu. Symp. Comput. Architecture, ACM, June 1987, pp. 189-196.
[13] W. C. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers,"IEEE Comput. Mag., vol. 21, no. 8, pp. 9-24, Aug. 1988.
[14] A. Agarwal et al., "Limitless Directories: A Scalable Cache Coherence Scheme,"Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, New York, 1991, pp. 224-234.
[15] J. Hennessy and T. Gross, "Postpass Code Optimization of Pipeline Constraints,"ACM Trans. Programming Languages and Systems, Vol. 5, No. 3, New York, July 1983, pp. 422-448.
[16] SPARC Architecture Manual, SUN Microsystems, Mountain View, CA, 1988.
[17] M.G.H. Katevenis,Reduced Instruction Set Computer Architectures for VLSI, doctoral dissertation, Univ. of California, Berkeley, Calif., 1983.
[18] D. Patterson and C. Sequin, "A VLSI RISC,"IEEE Comput. Mag., vol. 15, no. 9, pp. 8-21, Sept. 1982.
[19] D. Wall, "Global register allocation at link time," inProc. SIGPLAN'86 Symp. Compiler Construction, ACM, June 1986, pp. 264-275.
[20] P. Steenkiste and J. Hennessy, "A simple interprocedural register allocation algorithm and its effectiveness for lisp,"Trans. Programming Languages and Syst., vol. 11, no. 1, pp. 1-32, Jan. 1989.
[21] A. Agarwal, "Performance tradeoffs in multithreaded processors," Tech. Rep. 89-566, M.I.T. VLSI Memo, Sept. 1989. Also available as M.I.T. Lab. for Comput. Sci. TR-501, Apr. 1991.
[22] L. Kleinrock,Queueing Systems. New York: Wiley, 1975.
[23] C. L. Seitz, "Concurrent VLSI architectures,"IEEE Trans. Comput., vol. C-33, no. 12, pp. 1247-1265, Dec. 1984.
[24] P. Kermani and L. Kleinrock, "Virtual cut-through: A new computer communication switching technique,"Comput. Networks, vol. 3, pp. 267-286, Oct. 1979.
[25] H. J. Siegel,Interconnectron Networks for Large-Scale Parallel Processing: Theory and Case Studies, second ed. New York: McGraw-Hill, 1990.
[26] C. P. Kruskal and M. Snir, "The performance of multistage interconnection networks for multiprocessors,"IEEE Trans. Comput., vol. C-32, no. 12, pp. 1091-1098, Dec. 1983.
[27] A. Agarwal, "Limits on interconnection network performance,"IEEE Trans. Parallel Distributed Syst., vol. 2, pp. 398-412, Oct. 1991.
[28] A. Agarwal, M. Horowitz, and J. Hennessy, "An analytical cache model,"ACM Trans. Comput. Syst., vol. 7, pp. 184-215, May 1989.
[29] D. F. Thiebaut and H. S. Stone, "Footprints in the cache,"ACM Trans. Comput. Syst., vol. 5, no. 4, pp. 305-329, Nov. 1987.
[30] P. J. Denning, "The working set model for program behavior,"Commun. ACM, vol. 11, no. 5, pp. 323-333, May 1968.
[31] L. A. Belady and C. J. Kuehner, "Dynamic space sharing in computer system,"Commun. ACM, vol. 12, pp. 282-288, May 1969.
[32] M. Kobayashi and M. H. MagDougall, "The stack growth function: Cache line reference models,"IEEE Trans. Comput., vol. 38, no. 6, pp. 798-805, June 1989.
[33] D. Thiebaut, "On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio,"IEEE Trans. Comput., vol. 38, no. 7, pp. 1012-1026, July 1989.
[34] A. Agarwal, J. Hennessy, and M. Horowitz, "Cache performance of operating systems and multiprogramming workloads,"ACM Trans. Comput. Syst., vol. 6, pp. 393-431, Nov. 1988.
[35] A. Smith, "Cache Memories,"Computing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473- 530.
[36] R.L. Sites and A. Agarwal, "Multiprocessor Cache Analysis Using ATUM,"Proc. 15th Int'l Symp. Computer Architecture, 1988, IEEE CS Press, Los Alamitos, Calif. Order No. 861, pp. 186-195.
[37] D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal, "Directory-based cache-coherence in large-scale multiprocessors,"IEEE Comput. Mag., vol. 23, no. 6, pp. 41-58, June 1990.
[38] R. H. Halstead, "Processor architecture for multiprocessors," 1985, unpublished report.
[39] R. H. Saavedra-Barrara, D. Culler, and T. von Eicken, "Analysis of multithreaded architectures for parallel computing," inProc. 2nd Annu. ACM Symp. Parallel Algorithms Architectures, July 1990, pp. 169-177.
[40] W.D. Weber and A. Gupta, "Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results,"Proc. 16th Ann. Int'l Symp, Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 1948 (microfiche only), 1989, pp. 273-280.
[41] B.-H. Lim and A. Agarwal, "Waiting algorithms for synchronization in large-scale multiprocessors," M.I.T. VLSI Memo 91-632, Feb. 1991.
[42] J. Kubiatowicz, D. Chaiken, and A. Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions,"Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS V). ACM, Oct. 1992, pp. 274-284.

Index Terms:
Index Termsmultithreaded processors; cache interference; network contention; context-switchingoverhead; data-sharing; network bandwidth; caches; buffer storage; multiprocessingsystems; multiprocessor interconnection networks; parallel algorithms; parallelprogramming; performance evaluation; storage management; switching theory
Citation:
A. Agarwal, "Performance Tradeoffs in Multithreaded Processors," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 5, pp. 525-539, Sept. 1992, doi:10.1109/71.159037
Usage of this product signifies your acceptance of the Terms of Use.