|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Karl Papadantonakis, Nachiket Kapre, Stephanie Chan, André DeHon, "Pipelining Saturated Accumulation," IEEE Transactions on Computers, vol. 58, no. 2, pp. 208-219, February, 2009. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2008.110, author = {Karl Papadantonakis and Nachiket Kapre and Stephanie Chan and André DeHon}, title = {Pipelining Saturated Accumulation}, journal ={IEEE Transactions on Computers}, volume = {58}, number = {2}, issn = {0018-9340}, year = {2009}, pages = {208-219}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2008.110}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Pipelining Saturated Accumulation IS - 2 SN - 0018-9340 SP208 EP219 EPD - 208-219 A1 - Karl Papadantonakis, A1 - Nachiket Kapre, A1 - Stephanie Chan, A1 - André DeHon, PY - 2009 KW - High-speed arithmetic KW - pipeline and parallel arithmetic and logic structures KW - saturated arithmetic KW - accumulation KW - parallel prefix. VL - 58 JA - IEEE Transactions on Computers ER - | |||
[1] V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger, “Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures,” Proc. 27th Int'l Symp. Computer Architecture (ISCA '00), pp. 248-259, 2000.
[2] D. Chinnery and K. Keutzer, Closing the Gap between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design. Kluwer Academic Publishers, 2002.
[3] W. Tsu, K. Macy, A. Joshi, R. Huang, N. Walker, T. Tung, O. Rowhani, V. George, J. Wawrzynek, and A. DeHon, “HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array,” Proc. Int'l Symp. Field-Programmable Gate Arrays (FPGA '99), pp.125-134, Feb. 1999.
[4] D.P. Singh and S.D. Brown, “The Case for Registered Routing Switches in Field Programmable Gate Arrays,” Proc. Int'l Symp. Field-Programmable Gate Arrays (FPGA '01), pp. 161-169, Feb. 2001.
[5] C. Leiserson, F. Rose, and J. Saxe, “Optimizing Synchronous Circuitry by Retiming,” Proc. Third Caltech Conf. VLSI, Mar. 1983.
[6] N. Weaver, Y. Markovskiy, Y. Patel, and J. Wawrzynek, “Post-Placement C-Slow Retiming for the Xilinx Virtex FPGA,” Proc. Int'l Symp. Field-Programmable Gate Arrays (FPGA '03), pp. 185-194, 2003.
[7] B. Smith, “Architecture and Applications of the HEP Multiprocessor Computer System,” Proc. Fourth Symp. Real-Time Signal Processing, pp. 241-248, 1981.
[8] D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor,” Proc. 23rd Int'l Symp. Computer Architecture (ISCA '96), pp. 191-202, 1996.
[9] Z. Luo and M. Martonosi, “Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques,” IEEE Trans. Computers, vol. 49, no. 3, pp. 208-218, Mar. 2000.
[10] Xilinx Spartan-3 FPGA Family Data Sheet, Xilinx, Inc., dS099, http://direct.xilinx.com/bvdocs/publications ds099.pdf, Dec. 2004.
[11] K. Papadantonakis, N. Kapre, S. Chan, and A. DeHon, “Pipelining Saturated Accumulation,” Proc. IEEE Int'l Conf. Field-Programmable Technology (FPT '05), pp. 19-26, Dec. 2005.
[12] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems,” Proc. 30th Ann. Int'l Symp. Microarchitecture (MICRO '97), pp. 330-335, 1997.
[13] R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal, “Maps: A Compiler-Managed Memory System for Raw Machines,” Proc. 26th Int'l Symp. Computer Architecture (ISCA '99), pp. 4-15, 1999.
[14] W.D. Hillis and G.L. Steele, “Data Parallel Algorithms,” Comm. ACM, vol. 29, no. 12, pp. 1170-1183, Dec. 1986.
[15] R.P. Brent and H.T. Kung, “A Regular Layout for Parallel Adders,” IEEE Trans. Computers, vol. 31, no. 3, pp. 260-264, Mar. 1982.
[16] F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
[17] B.D. de Dinechin, C. Monat, and F. Rastello, “Parallel Execution of the Saturated Reductions,” Proc. IEEE Workshop Signal Processing Systems (SiPS '01), pp. 373-384, 2001.
[18] M. Schulte, P. Balzola, J. Ruan, and J. Glossner, “Parallel Saturating Multioperand Adders,” Proc. Int'l Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '00), pp. 172-179, 2000.
[19] P.I. Balzola, M.J. Schulte, J. Ruan, J. Glossner, and E. Hokenek, “Design Alternatives for Parallel Saturating Multioperand Adders,” Proc. Int'l Conf. Computer Design (ICCD '01), pp. 172-177, Sept. 2001.
[20] J.H. Hubbard and B.B.H. Hubbard, Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach. Prentice Hall, 1999.
[21] S. Winograd, “On the Time Required to Perform Addition,” J.ACM, vol. 12, no. 2, pp. 277-285, Apr. 1965.
[22] M. Hrishikesh, N.P. Jouppi, K.I. Farkas, D. Burger, S.W. Keckler, and P. Shivakumar, “The Optimal Logic Depth Per Pipeline Stage Is 6 to 8 FO4 Inverter Delays,” Proc. 29th Int'l Symp. Computer Architecture (ISCA '02), pp. 14-24, 2002.

