
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Donghyun Kim, LeeSup Kim, "A FloatingPoint Unit for 4D Vector Inner Product with Reduced Latency," IEEE Transactions on Computers, vol. 58, no. 7, pp. 890901, July, 2009.  
BibTex  x  
@article{ 10.1109/TC.2008.210, author = {Donghyun Kim and LeeSup Kim}, title = {A FloatingPoint Unit for 4D Vector Inner Product with Reduced Latency}, journal ={IEEE Transactions on Computers}, volume = {58}, number = {7}, issn = {00189340}, year = {2009}, pages = {890901}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2008.210}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  A FloatingPoint Unit for 4D Vector Inner Product with Reduced Latency IS  7 SN  00189340 SP890 EP901 EPD  890901 A1  Donghyun Kim, A1  LeeSup Kim, PY  2009 KW  Floatingpoint arithmetic KW  vector inner product KW  DP4 KW  3D graphics. VL  58 JA  IEEE Transactions on Computers ER   
[1] E. Lindholm, M.J. Kilgard, and H. Moreton, “A UserProgrammable Vertex Engine,” Proc. ACM SIGGRAPH '01, pp. 149158, 2001.
[2] D. Kim, K. Chung, C.H. Yu, C.H. Kim, I. Lee, J. Bae, Y.J. Kim, J.H. Park, S. Kim, Y.H. Park, N.H. Seong, J.A. Lee, J. Park, S. Oh, S.W. Jeong, and L.S. Kim, “An SoC with 1.3 Gtexels/sec 3D Graphics Full Pipeline Engine for Consumer Applications,” IEEE J. SolidState Circuits, vol. 41, no. 1, pp. 7184, Jan. 2006.
[3] C.H. Yu, K. Chung, D. Kim, and L.S. Kim, “A 120Mvertices/s MultiThreaded VLIW Vertex Processor for Mobile Multimedia Applications,” Proc. IEEE Int'l SolidState Circuit Conf. (ISSCC '06), pp. 408409, 2006.
[4] C.H. Yu, K. Chung, D. Kim, and L.S. Kim, “A 186Mvertices/s 161mW FloatingPoint Vertex Processor for Mobile Graphics Systems,” Proc. IEEE Custom Integrated Circuits Conf. (CICC '07), pp. 579582, 2007.
[5] D. Blythe, “The Direct3D 10 System,” ACM Trans. Graphics, vol. 25, no. 3, pp. 724734, July 2006.
[6] P.M. Seidel and G. Even, “On the Design of Fast IEEE FloatingPoint Adders,” Proc. 15th IEEE Symp. Computer Arithmetic (ARITH '01), pp. 184194, 2001.
[7] M.R. Santoro, G. Bewick, and M.A. Horowitz, “Rounding Algorithms for IEEE Multipliers,” Proc. Ninth IEEE Symp. Computer Arithmetic (ARITH '89), pp. 176183, 1989.
[8] P.M. Seidel and G. Even, “DelayOptimized Implementation of IEEE FloatingPoint Addition,” IEEE Trans. Computers, vol. 53, no. 2, pp. 99113, Feb. 2004.
[9] P. Farmwald, Bifurcated Method and Apparatus for FloatingPoint Addition with Decreased Latency Time, US Patent 4639887, 1987.
[10] K. Ng, FloatingPoint ALU with Parallel Paths, US Patent 5136536, Weitek Corp., 1992.
[11] G. Even and W.J. Paul, “On the Design of IEEE Compliant FloatingPoint Units,” IEEE Trans. Computers, vol. 49, no. 5, pp.398413, May 2000.
[12] G. Gerwig and M. Kroener, “FloatingPoint Unit in Standard Cell Design with 116 Bit Wide Dataflow,” Proc. 14th IEEE Symp. Computer Arithmetic (ARITH '99), pp. 266273, 1999.
[13] E. Hokenek, R.K. Montoye, and P.W. Cook, “SecondGeneration RISC Floating Point with MultiplyAdd Fused,” IEEE J. SolidState Circuits, vol. 25, no. 5, pp. 12071213, Oct. 1990.
[14] T. Lang and J.D. Bruguera, “FloatingPoint MultiplyAddFused with Reduced Latency,” IEEE Trans. Computers, vol. 53, no. 8, pp.9881003, Aug. 2004.
[15] G. Li and Z. Li, “Design of a Fully Pipelined SinglePrecision MultiplyAddFused Unit,” Proc. 20th IEEE Int'l Conf. VLSI Design, pp. 318323, 2007.
[16] S.H. Kim, J.S. Yoon, C.H. Yu, D. Kim, K. Chung, H.S. Lim, H.W. Park, and L.S. Kim, “36 fps SXGA 3D Display Processor with a Programmable 3D Graphics Rendering Engine,” Proc. IEEE Int'l SolidState Circuit Conf. (ISSCC '07), pp. 276277, 2007.
[17] S.M. Mueller, C. Jacobi, H.J. Oh, K.D. Tran, S.R. Cottier, B.W. Michael, H. Nishikawa, Y. Totsuka, T. Namatame, N. Yano, T. Machida, and S.H. Dhong, “The Vector FloatingPoint Unit in a Synergistic Processor Element of a CELL Processor,” Proc. IEEE Symp. Computer Arithmetic, pp. 5967, June 2005.
[18] IEEE Standard for Binary FloatingPoint Arithmetic, ANSI/IEEE Standard 754, 1985.
[19] S.F. Oberman and M.Y. Siu, “A HighPerformance AreaEfficient Multifunction Interpolator,” Proc. 17th IEEE Symp. Computer Arithmetic (ARITH '05), pp. 272279, June 2005.
[20] N.J. Rohrer, M. Canada, E. Cohen, M. Ringler, M. Mayfield, P. Sandon, P. Kartschoke, J. Heaslip, J. Allen, P. McCormick, T. Pfluger, J. Zimmerman, C. Lichtenau, T. Werner, G. Salem, M. Ross, D. Appenzeller, and D. Thygesen, “PowerPC 970 in 130nm and 90 nm Technologies,” Proc. IEEE Int'l SolidState Circuit Conf. (ISSCC '04), pp. 6869, 2004.
[21] Nvidia Corp., FX Composer 2.0, http://developer.nvidia.com/ objectfx_composer_home.html , 2008.
[22] J.M. Muller, ““Partially Rounded” SmallOrder Approximations for Accurate, HardwareOriented, TableBased Methods,” Proc. 16th IEEE Symp. Computer Arithmetic (ARITH '03), pp. 114121, 2003.