| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
December 2005 (vol. 16 no. 12)
pp. 1132-1142
Abstract—This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model.
[1] P.K. Dubey, K. O'Brien, K.M. O'Brien, and C. Barton, “Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading,” Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, June 1995.
[2] G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, “Multiscalar Processors,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, May 1995.
[3] S.E. Breach, T.N. Vijaykumar, and G.S. Sohi, “The Anatomy of the Register File in a Multiscalar Processor,” Proc. 27th Ann. Int'l Symp. Microarchitecture, Dec. 1994.
[4] S. Wallace, B. Calder, and D.M. Tullsen, “Threaded Multiple Path Execution,” Proc. 25th Ann. Int'l Symp. Computer Architecture, June 1998.
[5] D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, May 1995.
[6] S.W. Keckler and W.J. Dally, “Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism,” Proc. 19th Ann. Int'l Symp. Computer Architecture, May 1992.
[7] M. Fillo, S.W. Keckler, W.J. Dally, N.P. Carter, A. Chang, Y. Gurevich, and W.S. Lee, “The M-Machine Multicomputer,” Proc. 28th Ann. Int'l Symp. Microarchitecture, Dec. 1995.
[8] A. Wolfe and J.P. Shen, “A Variable Instruction Stream Extension to the VLIW Architecture,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Apr. 1991.
[9] W.A. Havanki, “Treegion Scheduling for VLIW Processors,” master's thesis, Dept. of Electrical and Computer Eng., North Carolina State Univ., Raleigh, North Carolina, July 1997.
[10] W.A. Havanki, S. Banerjia, and T.M. Conte, “Treegion Scheduling for Wide-Issue Processors,” Proc. Fourth Int'l Symp. High Performance Computer Architecture, Feb. 1998.
[11] B.R. Rau, “Dynamically Scheduled VLIW Processors,” Proc. 26th Ann. Int'l Symp. Microarchitecture, Dec. 1993.
[12] M. Franklin and G.S. Sohi, “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, May 1996.
[13] K. Sundaramoorthy, Z. Purser, and E. Rotenberg, “Slipstream Processors: Improving both Performance and Fault Tolerance,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Nov. 2000.
[14] R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, “Dynamically Allocating Processor Resources between Nearby and Distant ILP,” Proc. 28th Ann. Int'l Symp. Computer Architecture, June 2001.
[15] J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, “A Scalable Approach to Thread-Level Speculation,” Proc. 27th Ann. Int'l Symp. Computer Architecture, June 2000.
[16] A. Roth and G.S. Sohi, “Speculative Data-Driven Multithreading,” Proc. Sixth Conf. High-Performance Computer Architecture, Jan. 2000.
[17] C.-K. Luk, “Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,” Proc. 28th Ann. Int'l Symp. Computer Architecture, June 2001.
[18] E. Özer, T.M. Conte, and S. Sharma, “Weld: A Multithreading Technique towards Latency-Tolerant VLIW Processors,” Proc. Eighth Int'l Conf. High Performance Computing, Dec. 2001.
[19] W.W. Hwu and Y.N. Patt, “Checkpoint Repair for High-Performance Out-of-Order Execution Machines,” IEEE Trans. Computers, vol. 36, no. 12, Dec. 1987.
[20] M. Franklin and G.S. Sohi, “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, May 1996.
[21] M. Tremblay, “A Microprocessor Architecture for the New Millennium,” Hot Chips 11, Aug. 1999.
[22] Transmeta, CrusoeTM, http:/www.transmeta.com, 2005.
[23] Intel, Intel Itanium Processor at 800MHZ and 733MHZ Data Sheet, May 2001.
[24] T. Sukemura, “FR500 VLIW-Architecture High-Performance Embedded Microprocessor,” FUJITSU Scientific and Technical J., vol 36, no. 1, June 2000.
[25] StarCore, SC140 DSP Core Reference Manual, 2001.
[26] Texas Instruments, TMS320C62XX CPU and Instruction Set Reference Guide, July 1997.
[27] Philips, TM 1000 Preliminary Data Book, 1997.
[28] P. Faraboschi, G. Brown, J.A. Fisher, G. Desoli, and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” Proc. 27th Int'l Symp. Computer Architecture (ISCA-2000), 2000.
[29] S. Kaxiras, A.D. Berenbaum, and G. Narlikar, “Simultaneous Multithreaded DSPs: Scaling from High Performance to Low Power,” Bell Laboratories Technical Memorandum 10009639-001024-06TM, 2000.
[30] H.P. Rao, S.K. Nandy, and M.N. V. S. Kiran, “Simultaneous MultiStreaming for Complexity Effective VLIW Architectures,” Proc. Advances in Computer System Architecture (ACSAC 2003), Sept. 2003.
[31] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Water, and W.-m.W. Hwu, “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” Proc. 18th Ann. Int'l Symp. Computer Architecture, May 1991.
[32] S. Aditya, V. Kathail, and B.R. Rau, “Elcor's Machine Description System: Version 3.0,” HP Technical Report HPL-98-128, Oct. 1998.
Index Terms:
Multithreaded processors, VLIW architectures, modeling of computer architecture.
Citation:
Emre ?zer, Thomas M. Conte, "High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 12, pp. 1132-1142, Dec. 2005, doi:10.1109/TPDS.2005.150