This Article 
 Bibliographic References 
 Add to: 
Bulldozer: An Approach to Multithreaded Compute Performance
March/April 2011 (vol. 31 no. 2)
pp. 6-15
Michael Butler, Advanced Micro Devices
Leslie Barnes, Advanced Micro Devices
Debjit Das Sarma, Advanced Micro Devices
Bob Gelinas, Advanced Micro Devices

AMD's Bulldozer module represents a new direction in microarchitecture and includes a number of firsts for AMD, including AMD's multithreaded x86 processor, implementation of a shared Level 2 cache, and x86 processor to incorporate floating-point multiply-accumulate (FMAC). This article discusses the module's multithreading architecture, power-efficient microarchitecture, and subblocks, including the various microarchitectural latencies, bandwidths, and structure sizes.

1. T Fischer et al., "Design Solutions for the Bulldozer 32-nm SOI 2-Core Processor Module in an 8-Core CPU," IEEE Int'l Solid State Circuits Conf., IEEE Press, 2011.
2. P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded SPARC Processor," IEEE Micro, vol. 25, no. 2, 2005, pp. 21-29.
3. J.D. Davis, J. Laudon, and K. Olukotun, "Maximizing CMP Throughput with Mediocre Cores," Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques, IEEE CS Press, 2005, pp. 51-62.
4. "Hyper-Threading Technology," Intel Tech. J., vol. 6, no. 1, 2002, pp. 4-15.
5. H.M. Mathis et al., "Characterization of Simultaneous Multithreading (SMT) Efficiency in Power5," IBM J. Research and Development, Jul.-Sep. 2005, pp. 555-564.
6. J. Emer, "Simultaneous Multithreading: Multiplying Alpha Performance," Proc. Microprocessor Forum, Linley Group, 1999.
7. D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," Proc. 22nd Ann. Int'l Symp. Computer Architecture, ACM Press, 1995, pp. 392-403.
8. G. Reinman, B. Calder, and T. Austin, "Optimizations Enabled by a Decoupled Front-End Architecture," IEEE Trans. Computers, vol. 50, no. 4, 2001, pp. 338-355.
9. R. Jotwani et al., "An x86, 64-Core Implemented in 32-nm SOI CMOS," IEEE Int'l Solid State Circuits Conf. IEEE Press, 2010, pp. 106-107.
10. AMD64 Architecture Programmers Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions, AMD, 2009.
11. M. Golden et al., "40-Entry Unified Out-of-Order Integer Execution Unit for the AMD Bulldozer x86-64 Core," IEEE Int'l Solid State Circuits Conf., IEEE Press, 2011, pp. 80-81.
12. R.K. Montoye, E. Hokenek, and S.L. Runyon, "Design of the IBM RISC System/6000 Floating Point Execution Unit," IBM J. Research and Development, vol. 34, 1990, pp. 59-70.

Index Terms:
Microprocessors, microcomputers, microarchitecture implementation considerations, processor architectures, Bulldozer
Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas, "Bulldozer: An Approach to Multithreaded Compute Performance," IEEE Micro, vol. 31, no. 2, pp. 6-15, March-April 2011, doi:10.1109/MM.2011.23
Usage of this product signifies your acceptance of the Terms of Use.