
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Karl Papadantonakis, Nachiket Kapre, Stephanie Chan, André DeHon, "Pipelining Saturated Accumulation," IEEE Transactions on Computers, vol. 58, no. 2, pp. 208219, February, 2009.  
BibTex  x  
@article{ 10.1109/TC.2008.110, author = {Karl Papadantonakis and Nachiket Kapre and Stephanie Chan and André DeHon}, title = {Pipelining Saturated Accumulation}, journal ={IEEE Transactions on Computers}, volume = {58}, number = {2}, issn = {00189340}, year = {2009}, pages = {208219}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2008.110}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Pipelining Saturated Accumulation IS  2 SN  00189340 SP208 EP219 EPD  208219 A1  Karl Papadantonakis, A1  Nachiket Kapre, A1  Stephanie Chan, A1  André DeHon, PY  2009 KW  Highspeed arithmetic KW  pipeline and parallel arithmetic and logic structures KW  saturated arithmetic KW  accumulation KW  parallel prefix. VL  58 JA  IEEE Transactions on Computers ER   
[1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger, “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures,” Proc. 27th Int'l Symp. Computer Architecture (ISCA '00), pp. 248259, 2000.
[2] D. Chinnery and K. Keutzer, Closing the Gap between ASIC & Custom: Tools and Techniques for HighPerformance ASIC Design. Kluwer Academic Publishers, 2002.
[3] W. Tsu, K. Macy, A. Joshi, R. Huang, N. Walker, T. Tung, O. Rowhani, V. George, J. Wawrzynek, and A. DeHon, “HSRA: HighSpeed, Hierarchical Synchronous Reconfigurable Array,” Proc. Int'l Symp. FieldProgrammable Gate Arrays (FPGA '99), pp.125134, Feb. 1999.
[4] D.P. Singh and S.D. Brown, “The Case for Registered Routing Switches in Field Programmable Gate Arrays,” Proc. Int'l Symp. FieldProgrammable Gate Arrays (FPGA '01), pp. 161169, Feb. 2001.
[5] C. Leiserson, F. Rose, and J. Saxe, “Optimizing Synchronous Circuitry by Retiming,” Proc. Third Caltech Conf. VLSI, Mar. 1983.
[6] N. Weaver, Y. Markovskiy, Y. Patel, and J. Wawrzynek, “PostPlacement CSlow Retiming for the Xilinx Virtex FPGA,” Proc. Int'l Symp. FieldProgrammable Gate Arrays (FPGA '03), pp. 185194, 2003.
[7] B. Smith, “Architecture and Applications of the HEP Multiprocessor Computer System,” Proc. Fourth Symp. RealTime Signal Processing, pp. 241248, 1981.
[8] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” Proc. 23rd Int'l Symp. Computer Architecture (ISCA '96), pp. 191202, 1996.
[9] Z. Luo and M. Martonosi, “Accelerating Pipelined Integer and FloatingPoint Accumulations in Configurable Hardware with Delayed Addition Techniques,” IEEE Trans. Computers, vol. 49, no. 3, pp. 208218, Mar. 2000.
[10] Xilinx Spartan3 FPGA Family Data Sheet, Xilinx, Inc., dS099, http://direct.xilinx.com/bvdocs/publications ds099.pdf, Dec. 2004.
[11] K. Papadantonakis, N. Kapre, S. Chan, and A. DeHon, “Pipelining Saturated Accumulation,” Proc. IEEE Int'l Conf. FieldProgrammable Technology (FPT '05), pp. 1926, Dec. 2005.
[12] C. Lee, M. Potkonjak, and W.H. MangioneSmith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '97), pp. 330335, 1997.
[13] R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, “Maps: A CompilerManaged Memory System for Raw Machines,” Proc. 26th Int'l Symp. Computer Architecture (ISCA '99), pp. 415, 1999.
[14] W.D. Hillis and G.L. Steele, “Data Parallel Algorithms,” Comm. ACM, vol. 29, no. 12, pp. 11701183, Dec. 1986.
[15] R.P. Brent and H.T. Kung, “A Regular Layout for Parallel Adders,” IEEE Trans. Computers, vol. 31, no. 3, pp. 260264, Mar. 1982.
[16] F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
[17] B.D. de Dinechin, C. Monat, and F. Rastello, “Parallel Execution of the Saturated Reductions,” Proc. IEEE Workshop Signal Processing Systems (SiPS '01), pp. 373384, 2001.
[18] M. Schulte, P. Balzola, J. Ruan, and J. Glossner, “Parallel Saturating Multioperand Adders,” Proc. Int'l Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '00), pp. 172179, 2000.
[19] P.I. Balzola, M.J. Schulte, J. Ruan, J. Glossner, and E. Hokenek, “Design Alternatives for Parallel Saturating Multioperand Adders,” Proc. Int'l Conf. Computer Design (ICCD '01), pp. 172177, Sept. 2001.
[20] J.H. Hubbard and B.B.H. Hubbard, Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach. Prentice Hall, 1999.
[21] S. Winograd, “On the Time Required to Perform Addition,” J.ACM, vol. 12, no. 2, pp. 277285, Apr. 1965.
[22] M. Hrishikesh, N.P. Jouppi, K.I. Farkas, D. Burger, S.W. Keckler, and P. Shivakumar, “The Optimal Logic Depth Per Pipeline Stage Is 6 to 8 FO4 Inverter Delays,” Proc. 29th Int'l Symp. Computer Architecture (ISCA '02), pp. 1424, 2002.