The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2009 vol.20)
pp: 1142-1157
Xiaobo Yan , National University of Defense Technology, Changsha
Zuocheng Xing , National University of Defense Technology, Changsha
Yu Deng , National University of Defense Technology, Changsha
Jiang Jiang , National University of Defense Technology, Changsha
Jing Du , National University of Defense Technology, Changsha
Xuejun Yang , National University of Defense Technology, Changsha
ABSTRACT
The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and implement a 64-bit stream processor, Fei Teng 64 (FT64), which has a peak performance of 16 Gflops. FT64 supports two kinds of communications, message passing and stream communications, based on which, an interconnection architecture is designed for a FT64-based high-performance computer. This high-performance computer contains multiple modules, with each module containing eight FT64s. We also design a novel stream programming language, Stream Fortran 95 (SF95), together with the compiler SF95Compiler, so as to facilitate the development of scientific applications. We test nine typical scientific application kernels on our FT64 platform to evaluate this design. The results demonstrate the effectiveness and efficiency of FT64 and its compiler for scientific computing.
INDEX TERMS
Microprocessors, computer languages, compilers, programming.
CITATION
Xiaobo Yan, Zuocheng Xing, Yu Deng, Jiang Jiang, Jing Du, Xuejun Yang, "Fei Teng 64 Stream Processing System: Architecture, Compiler, and Programming", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 8, pp. 1142-1157, August 2009, doi:10.1109/TPDS.2008.170
REFERENCES
[1] S. Agrawal, W. Thies, and S. Amarasinghe, “Optimizing Stream Programs Using Linear State Space Analysis,” Proc. Int'l Conf. Compilers, Architecture and Synthesis for Embedded Systems (CASES '05), pp. 126-136, 2005.
[2] J.R. Allen et al., “Conversion of Control Dependence to Data Dependence,” Conf. Record of the 10th ACM Symp. Principles of Programming Languages (POPL '83), pp. 177-189, 1983.
[3] V.M. Bove and J.A. Watlington, “Cheops: A Reconfigurable Data-Flow System for Video Processing,” IEEE Trans. Circuits and Systems for Video Technology, vol. 5, no. 2, pp. 140-149, 1995.
[4] I. Buck, Brook Spec v0.2, Report of Stanford Univ., http://merrimac.stanford.edu/brookbrookspec-v0.2.pdf , 2008.
[5] D. Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” Computer, vol. 37, no. 7, pp. 44-55, July 2004.
[6] E. Caspi et al., “A Streaming Multi-Threaded Model,” Proc. Third Workshop Media and Stream Processors, pp. 21-28, 2001.
[7] W.J. Dally, P. Hanrahan, M. Erez, and T.J. Knight, “Merrimac: Supercomputing with Streams,” Proc. ACM/IEEE Conf. Supercomputing (SC), 2003.
[8] M.I. Gordon et al., “A Stream Compiler for Communication-Exposed Architectures,” Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '02), pp. 291-303, 2002.
[9] T. Hoare, “Communicating Sequential Processes,” Comm. ACM, vol. 8, no. 21, pp. 666-677, 1978.
[10] U.J. Kapasi et al., “The Imagine Stream Processor,” Proc. 20th IEEE Int'l Conf. Computer Design (ICCD '02), pp. 282-288, 2002.
[11] U.J. Kapasi et al., “Stream Scheduling,” Proc. Third Workshop Media and Streaming Processors, pp. 101-106, 2001.
[12] U.J. Kapasi et al., “Programmable Stream Processors,” Computer, vol. 36, no. 8, pp. 54-62, 2003.
[13] C. Kozyrakis, “Scalable Vector Media-Processors for Embedded Systems,” PhD dissertation, Univ. of California, Berkeley, 2002.
[14] P. Mattson, “A Programming System for the Imagine Media Processor,” PhD dissertation, Stanford Univ., 2002.
[15] P. Mattson et al., “Communication Scheduling,” SIGPLAN Notices, vol. 35, no. 11, pp. 82-92, 2000.
[16] D. May, “OCCAM,” SIGPLAN Notices, vol. 18, no. 4, pp. 69-79, 1983.
[17] J. Owens et al., “Media Processing Applications on the Imagine Stream Processor,” Proc. 20th IEEE Int'l Conf. Computer Design (ICCD '02), pp. 295-302, 2002.
[18] D. Pham et al., “The Design and Implementation of a First-Generation Cell Processor,” Proc. Int'l Solid-State Circuits Conf. (ISSCC '05), pp. 184-185, 2005.
[19] S. Rixner, Stream Processor Architecture. Kluwer Academic Publishers, ISBN: 0-7923-7545-9, 2002.
[20] S. Rixner et al., “Memory Access Scheduling,” Proc. 27th Ann. Int'l Symp. Computer Architecture (ISCA '00), pp. 128-138, 2000.
[21] M. Taylor et al., “The RAW Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs,” IEEE Micro, vol. 22, no. 2, pp. 25-35, 2002.
[22] W. Thies, M. Karczmarek, and S. Amarasinghe, “StreaMIT: A Language for Streaming Applications,” Proc. 11th Int'l Conf. Compiler Construction (ICCC '02), pp. 179-196, 2002.
[23] W. Thies et al., StreamIt: A Compiler for Streaming Applications, Technical Memo TM-622, MIT-LCS, http://www.lcs.mit.edu/publications/pubs/ pdfMIT-LCS-TM-622.pdf, 2001.
[24] S. Williams et al., “The Potential of the Cell Processor for Scientific Computing,” Proc. Third Conf. Computing Frontiers (CF'06), pp. 9-20, 2006.
[25] X. Yang et al., “Matrix-Based Programming Optimization for Improving Memory Hierarchy Performance on Imagine,” Proc. Fourth Int'l Symp. Parallel and Distributed Processing and Applications (ISPA), 2006.
[26] X. Yang et al., “A 64-Bit Stream Processor Architecture for Scientific Applications,” Proc. 34th Int'l Symp. Computer Architecture (ISCA), 2007.
[27] Z. Shen, Z. Hu, X. Liao, H. Wu, K. Zhao, and Y. Lu, Methods of Parallel Compilation, ISBN: 7-118-02209-8, 2000.
[28] A. Das, W.J. Dally, and P. Mattson, “Compiling for Stream Processing,” Proc. 15th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '06), pp. 33-42, 2006.
[29] G.J. Chaitin, “Register Allocation & Spilling via Graph Coloring,” Proc. ACM SIGPLAN '82, pp. 98-105, 1982.
[30] P. Briggs et al., “Improvements to Graph Coloring Register Allocation,” ACM Trans. Programming Languages and Systems, vol. 16, no. 3, pp. 428-455, 1994.
[31] L. George and A.W. Appel, “Iterated Register Coalescing,” ACM Trans. Programming Languages and Systems, vol. 18, no. 3, pp. 300-324, 1996.
[32] M.D. Smith et al., “A Generalized Algorithm for Graph-Coloring Register Allocation,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '04), pp. 277-288, 2004.
[33] L. Lian et al., “Memory Coloring: A Compiler Approach for Scratchpad Memory Management,” Proc. 14th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '05), pp. 329-338, 2005.
[34] A.W. Appel, Modern Compiler Implementation in C. Cambridge Univ. Press, 1998.
[35] S.D. Naffziger et al., “The Implementation of the Itanium 2 Microprocessor,” IEEE J. Solid-State Circuits, vol. 37, pp. 1448-1460, Nov. 2002.
[36] B. Serebrin et al., “A Stream Processor Development Platform,” Proc. 20th IEEE Int'l Conf. Computer Design (ICCD '02), pp. 303-308, 2002.
[37] J.H. Ahn et al., “Evaluating the Imagine Stream Architecture,” Proc. 31st Ann. Int'l Symp. Computer Architecture (ISCA '04), pp. 14-25, 2004.
[38] http://en.wikipedia.org/wikiIBM_Roadrunner , 2008.
[39] W. Dally, “The Imagine Instruction Set Architecture,” technical report, 2002.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool