|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| A. S. Cassidy, A. G. Andreou, "Beyond Amdahl's Law: An Objective Function That Links Multiprocessor Performance Gains to Delay and Energy," IEEE Transactions on Computers, vol. 61, no. 8, pp. 1110-1126, Aug., 2012. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2011.169, author = {A. S. Cassidy and A. G. Andreou}, title = {Beyond Amdahl's Law: An Objective Function That Links Multiprocessor Performance Gains to Delay and Energy}, journal ={IEEE Transactions on Computers}, volume = {61}, number = {8}, issn = {0018-9340}, year = {2012}, pages = {1110-1126}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2011.169}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Beyond Amdahl's Law: An Objective Function That Links Multiprocessor Performance Gains to Delay and Energy IS - 8 SN - 0018-9340 SP1110 EP1126 EPD - 1110-1126 A1 - A. S. Cassidy, A1 - A. G. Andreou, PY - 2012 KW - parallel processing KW - microprocessor chips KW - single die area KW - Amdahl law KW - general objective function KW - parallel processing performance KW - system level KW - subsystem microarchitecture structures KW - memories KW - communications networks KW - microarchitectural elements KW - global system performance KW - energy-delay cost KW - chip multiprocessor architecture exploration KW - architectural parameters KW - optimal CMP architecture KW - architectural optimization KW - computer architectures KW - Delay KW - Program processors KW - Computer architecture KW - Parallel processing KW - Hidden Markov models KW - Algorithm design and analysis KW - Optimization KW - chip-multiprocessor architecture. KW - Modeling KW - evaluation KW - design exploration KW - and optimization of multiprocessor systems VL - 61 JA - IEEE Transactions on Computers ER - | |||
[1] G.M. Amdahl, "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities," Proc. AFIPS Spring Joint Computer Conf., 1967.
[2] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, second ed. Morgan Kaufmann Publishers, 2002.
[3] R.G. Brown, "Maximizing Beowulf Performance," Proc. Fourth Ann. Linux Showacase and Conf., 2000.
[4] J. Gustafson, "Reevaluating Amdahl's Law," Comm. ACM, vol. 31, no. 5, pp. 532-533, May 1988.
[5] S. Krishnaprasad, "Uses and Abuses of Amdahl's Law," J. Computing Sciences in Colleges, vol. 17, no. 2, pp. 288-293, Dec. 2001.
[6] G. Bell, J. Gray, and A. Szalay, "Petascale Computational Systems," Computer, vol. 39, no. 1, pp. 110-112, Jan. 2006.
[7] A. Szalay and G. Bell, "GrayWulf: Scalable Clustered Architecture for Data Intensive Computing," Proc. Hawaii Int'l Conf. System Sciences (HICSS), 2009.
[8] S. Borkar, "Thousand Core Chips: A Technology Perspective," Proc. ACM/IEEE Design Automation Conf. (DAC), 2007.
[9] M. Hill and M. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41, no. 7, pp. 33-38, July 2008.
[10] J.M. Paul and B.H. Meyer, "Amdahl's Law Revisited for Single Chip Systems," Int'l J. Parallel Programming, vol. 35, no. 2, Apr. 2007.
[11] D.H. Woo and H. Lee, "Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era," Computer, vol. 41, no. 12, pp. 24-31, Dec. 2008.
[12] K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), 1996.
[13] L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," Proc. Int'l Symp. Computer Architecture, 2000.
[14] C. Mead and L. Conway, Introduction to VLSI Systems. Addison-Wesley Publishers, 1979.
[15] A.S. Cassidy and A.G. Andreou, "Analytical Methods for the Design and Optimization of Chip-Multiprocessor Architectures," Proc. 43rd Ann. Conf. Information Sciences and Systems, 2009.
[16] A.S. Cassidy, K. Yu, H. Zhou, and A.G. Andreou, "A High-Level Analytical Model for Application Specific CMP Design Exploration," Proc. Conf. Design Automation and Test in Europe (DATE), 2011.
[17] S. Young et al., The HTK Book. Univ. of Cambridge, 2009.
[18] J.A. Rice, Mathematical Statistics and Data Analysis, third ed. Duxbury Press, 2006.
[19] P. Kogge et al., "ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems," DARPA IPTO report, 2008.
[20] R. Gonzalez and M. Horowitz, "Energy Dissipation in General Purpose Microprocessors," IEEE J. Solid-State Circuits, vol. 31, no. 9, pp. 1277-1284 , Sept. 1996.
[21] S. Przybylski, M. Horowitz, and J. Hennessy, "Characteristics of Performance-Optimal Multi-Level Cache Hierarchies," Proc. Int'l Symp. Computer Architecture (ISCA), 1989.
[22] A. Hartstein, V. Srinivasan, T. Puzak, and P. Emma, "On the Nature of Cache Miss Behavior: Is It Square Root of 2?" J. Instruction-Level Parallelism, vol. 10, 2008.
[23] L. Codrescu, M. Deb-Pant, T. Taha, J. Eble, S. Wills, and J. Meindl, "Exploring Microprocessor Architectures for Gigascale Integration," Proc. 20th Anniversary Conf. Advanced Research in VLSI, 1999.
[24] D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: A Framework for Architectural-Level Power Analysis and Optimizations," Proc. Int'l Symp. Computer Architecture (ISCA), 2000.
[25] D. Brooks, P. Bose, V. Srinivasan, M. Gschwind, P. Emma, and M. Rosenfield, "New Methodology for Early-Stage, Microarchitecture-Level Power-Performance Analysis of Microprocessors," IBM J. Research and Development, 2003.
[26] N. Vijaykrishnan, M. Kandemir, M. Irwin, H. Kim, and W. Ye, "Energy-Driven Integrated Hardware-Software Optimizations Using SimplePower," ACM SIGARCH Computer Architecture News, vol. 28, no. 2, pp. 95-106, May 2000.
[27] S. Thoziyoor, N. Muralimanohar, and J. Ahn, "CACTI 5.1," Technical Report HPL-2008-20, HP Laboratories, 2008.
[28] C.-L. Su and A. Despain, "Cache Design Trade-Offs for Power and Performance Optimization: A Case Study," Proc. Int'l Symp. Low Power Design (ISLPED), 1995.
[29] M. Kamble and K. Ghose, "Analytical Energy Dissipation Models for Low Power Caches," Proc. Int'l Symp. Low Power Electronics and Design, 1997.
[30] M. Kamble and K. Ghose, "Energy-Efficiency of VLSI Caches: A Comparative Study," Proc. Int'l Conf. VLSI Design, 1997.
[31] F. Pollack, "New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies (Keynote Address)(Abstract Only)," Proc. ACM/IEEE Int'l Symp. Microarchitecture (MICRO 32), 1999.
[32] J. Schutz and C. Webb, "A Scalable X86 CPU Design for 90 nm Process," Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2004.
[33] H. McIntyre et al., "A 4-MB On-Chip L2 Cache for a 90-nm 1.6-GHz 64-Bit Microprocessor," IEEE J. Solid-State Circuits, vol. 40., no. 1, pp. 52-59, Jan. 2005.
[34] J. Chang et al., "A 130-nm Triple-Vt 9-MB Third-Level On-Die Cache for the 1.7-GHz Itanium 2 Processor," IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 195-203, Jan. 2005.
[35] C. McNairy and R. Bhatia, "Montecito: A Dual-Core, Dual-Thread Itanium Processor," IEEE MICRO, vol. 25, no. 2, pp. 10-20, Mar./Apr. 2005.
[36] AMD, "AMD Website," www.amd.com/, 2010.
[37] Intel, "Intel Website," www.intel.com, 2010.
[38] J. Tendler, J. Dodson, J. Fields, H. Le, and B. Sinharoy, "POWER4 System Microarchitecture," IBM J. Research and Development, vol. 46, no. 1, pp. 5-25, Jan. 2002.
[39] R. Kalla, B. Sinharoy, and J. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE MICRO, vol. 24, no. 2, pp. 40-47, Mar./Apr. 2004.
[40] H. Le, W. Starke, J. Fields, F. O'Connell, D. Nguyen, B. Ronchetti, W. Sauer, E. Schwarz, and M. Vaden, "IBM Power6 Microarchitecture," IBM J. Research and Development, vol. 51, no. 6, Nov. 2007.
[41] P. Conway and B. Hughes, "The AMD Opteron Northbridge Architecture," IEEE MICRO, vol. 27, no. 2, pp. 10-21, Mar./Apr. 2007.
[42] B. Stackhouse et al., "A 65 nm 2-Billion Transistor Quad-Core Itanium Processor," IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 18-31, Jan. 2009.
[43] P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded SPARC Processor," IEEE MICRO, vol. 25, no. 2, pp. 21-29, Mar./Apr. 2005.
[44] M. Shah et al., "UltraSPARC T2: A Highly-Treaded, Power-Efficient, SPARC SOC," Proc. IEEE Asian Solid-State Circuits Conf., 2007.
[45] U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill, "Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 6-20, Jan. 2008.
[46] G. Konstadinidis et al., "Architecture and Physical Implementation of a Third Generation 65nm, 16 Core, 32 Thread Chip-Multithreading SPARC Processor," IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 7-17 Jan. 2009.
[47] Azul, "Azul Website," www.azulsystems.com, 2010.
[48] J. Andrews and N. Baker, "Xbox 360 System Architecture," IEEE MICRO, vol. 26, no. 2, pp. 25-37, Mar./Apr. 2006.
[49] D. Pham et al., "Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor," IEEE J. Solid-State Circuits, vol. 14, no. 1, pp. 179-196, Jan. 2006.
[50] M. Kistler, M. Perrone, and F. Petrini, "Cell Multiprocessor Communication Network: Built for Speed," IEEE MICRO, vol. 26, no. 3, pp. 10-23, May/June 2006.
[51] L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Trans. Graphics, vol. 27, no. 3, article 18, Aug. 2008.
[52] S. Vangal et al., "An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS," Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2007.
[53] S. Bell et al., "TILE64 - Processor: A 64-Core SoC with Mesh Interconnect," Proc. IEEE Int'l Solid-State Circuits Conf. (ISSCC), 2008.
[54] Clearspeed, "ClearSpeed Website," www.clearspeed.com, 2010.
[55] J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879–899 May 2008.
[56] Nvidia, "NVIDIA Website," www.nvidia.com, 2010.
[57] B. Khailany, T. Williams, J. Lin, E. Long, M. Rygh, D. Tovey, and W. Dally, "A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 202-213, Jan. 2008.
[58] D. Burger and T. Austin, "The SimpleScalar Tool Set, Version 2.0," SIGARCH Computer Architecture News, 1997.
[59] L. Zhao, R. Iyer, J. Moses, R. lllikkal, S. Makineni, and D. Newell, "Exploring Large-Scale CMP Architectures Using ManySim," IEEE MICRO, vol. 27, no. 4, pp. 21-33, July/Aug. 2007.
[60] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[61] C. Hughes, V. Pai, P. Ranganathan, and S. Adve, "Rsim: Simulating Shared-Memory Multiprocessors with ILP Processors," Computer, vol. 35, no. 2, pp. 40-49, Feb. 2002.
[62] M. Rosenblum, E. Bugnion, S. Devine, and S. Herrod, "Using the SimOS Machine Simulator to Study Complex Computer Systems," ACM Trans. Modeling and Computer Simulation, vol. 7, no. 1, pp. 78-103, Jan. 1997.
[63] T. Oh, H. Lee, K. Lee, and S. Cho, "An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor," Proc. IEEE Computer Soc. Ann. Symp. VLSI (ISVLSI), 2009.
[64] Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron, "CMP Design Space Exploration Subject to Physical Constraints," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 2006.
[65] M. Monchiero, R. Canal, and A. González, "Design Space Exploration for Multicore Architectures: A Power/Performance/Thermal View," Proc. Int'l Conf. Supercomputing (ICS), 2006.
[66] Maplesoft, "Maple," www.maplesoft.com/productsMaple/, 2010.
[67] Sagemath, "Sage," www.sagemath.org/, 2010.
[68] D. Chandra, F. Guo, S. Kim, and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture," Proc. Int'l Symp. High-Performance Computer Architecture (HPCA), 2005.
[69] X. Chen and T. Aamodt, "A First-Order Fine-Grained Multithreaded Throughput Model," Proc. Int'l Symp. High Performance Computer Architecture (HPCA), 2009.
[70] D.J. Sorin, V.S. Pai, S.V. Adve, M.K. Vernon, and D.A. Wood, "Analytic Evaluation of Shared-Memory Systems with ILP Processors," ACM SIGARCH Computer Architecture News, vol. 26, no. 3, pp. 380-391, June 1998.

