This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions
September 2000 (vol. 49 no. 9)
pp. 934-946

Abstract—Three-dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geometry processing of 3D graphics on the host CPU and have specialized hardware handle the rendering task. In this paper, we analyze microarchitecture and SIMD instruction set enhancements to a RISC superscalar processor for exploiting parallelism in geometry processing for 3D computer graphics. Our results show that 3D geometry processing has inherent parallelism. Adding SIMD operations improves performance from 8 percent to 28 percent on a 4-issue dynamically scheduled processor that can issue at most two floating-point operations. In comparison, an 8-issue processor, ignoring cycle time effects, can achieve 20 to 60 percent performance improvement over a 4-issue. If processor cycle time scales with the number of ports to the register file, then doubling only the floating-point issue width of a 4-issue processor with SIMD instructions gives the best performance among the architectural configurations that we examine (the most aggressive configuration is an 8-issue processor with SIMD instructions).

[1] K. Akeley and T. Jermoluk, “High-Performance Polygon Rendering,” ACM Computer Graphics, vol. 22, no. 4, pp. 239-246, Aug. 1988.
[2] AMD 3DNow! Technology,http://www.amd.com/product/cpg/k623dinside3d.html .
[3] Digital Unix V4.0 Programmer's Guide, pp. 8-13.
[4] K. Farkas, “Memory-System Design Considerations for Dynamically-Scheduled Microprocessors,” PhD thesis, Dept. of Electrical and Computer Eng., Univ. of Toronto, Jan. 1997.
[5] K. Farkas, N. Jouppi, and P. Chow, “Register File Design Considerations in Dynamically Scheduled Processors,” Proc. Second Int'l Symp. High Performance Computer Architecture, 1997.
[6] J.D. Foley et al., Computer Graphics: Principles and Practice, Second Edition in C, Addison-Wesley, Reading, Mass., 1995.
[7] Intel MMX2,http://developer.intel.com/drg/newskatmai.html .
[8] N. Jouppi and S. Wilson, “An Enhanced Access and Cycle Time Model for On-Chip Caches,” Technical Report 93.5, DEC Western Research Laboratory, July 1994.
[9] G. Kane, PA-RISC 2.0 Architecture. Prentice Hall PTR, 1996.
[10] L. Kohn, G. Maturana, M. Tremblay, A. Prabhu, and G. Zyner, “Visual Instruction Set (VIS) in UltraSPARC™,” Proc. COMPCON '95, pp. 462-469, Mar. 1995.
[11] R. Lee and M. Smith, “Media Processing: A New Design Target,” IEEE Micro, pp. 6-9, Aug. 1996.
[12] S. McFarling, “Combing Branch Predictors,” Digital Equipment Corp. Western Research Lab Technical Note TN-36, 1993.
[13] MESA library,http://www.ssec.wisc.edu/~brianpMesa.html .
[14] Microprocessor Forum, Oct. 1997.
[15] MIPS V ISA Extension,http://www.sgi.com/MIPS/archISA5/.
[16] MMX™Tech nology, Intel Architecture MMX Technology Programmer's Reference Manual, Intel Corp., Mar. 1996.
[17] S. Molnar, J. Eyles, and J. Poulton, “PixelFlow: High-Speed Rendering Using Image Composition,” ACM Computer Graphics, vol. 26, no. 2, pp. 231-240, July 1992.
[18] J. Montrym, D. Baum, D. Dignam, and C. Migdal, “InfiniteReality: A Real-Time Graphics System,” ACM Computer Graphics, pp. 293-301, 1997.
[19] Motorola AltiVec Technology,http://www.mot.com/SPS/PowerPCAltiVec.
[20] OpenGL Performance Benchmark Viewperf,http://www. specbench.org/gpc/opc.static vp50.html.
[21] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors," Proc. Int'l Symp. Computer Architecture, ACM, 1997, pp. 206-218.
[22] S. Palacharla, N. Jouppi, and J. Smith, “Quantifying the Complexity of Superscalar Processors,” Technical Report CS-TR-96-1328, Univ. of Wisconsin-Madison, Nov. 1996.
[23] P. Ranganathan, S. Adve, and N. Jouppi, “Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions,” Proc. 26th Ann. Int'l Symp. Computer Architecture, pp. 124-135, 1999.
[24] R. Sites, Alpha Architecture Reference Manual. Digital Press, 1992.
[25] A. Srivastava and A. Eustace, "ATOM: A System for Building Customized Program Analysis Tools," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, ACM Press, New York, 1994.
[26] M. Woo, J. Neider, and T. Davis, OpenGL Programming Guide. Addison-Wesley, 1997.
[27] C. Yang, B. Sano, and A. Lebeck, “Exploiting Instruction Level Parallelism in Geometry Processing for Three Dimensional Graphics Applications,” Technical Report CS-1998-14, Computer Science Dept., Duke Univ., Sept. 1998.

Index Terms:
3D graphics, geometry pipeline, superscalar processors, SIMD instructions, paired-single instructions.
Citation:
Chia-Lin Yang, Barton Sano, Alvin R. Lebeck, "Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions," IEEE Transactions on Computers, vol. 49, no. 9, pp. 934-946, Sept. 2000, doi:10.1109/12.869324
Usage of this product signifies your acceptance of the Terms of Use.