The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.23)
pp: 2338-2350
Haitao Wei , HuaZhong University of Science and Technology, Wuhan
Junqing Yu , Huazhong University of Science and Technology, Wuhan
Huafei Yu , HuaZhong University of Science and Technology, Wuhan
Mingkang Qin , HuaZhong University of Science and Technology, Wuhan
Guang R. Gao , University of Delaware, Newark
ABSTRACT
Stream programming model has been productively applied to a number of important application domains. Software pipelining is an important code scheduling technique for stream programs. However, the multicore evolution has presented a new dimension of challenges: that is how to orchestrate the best software pipelining schedule in the face of resource constrained architectures (e.g., number of cores, available memory, and bandwidth)? In this paper, we proposed a new solution methodology to address the problem above. Our main contributions include the following. A unified Integer Linear Programming (ILP) formulation has been proposed that combines the requirement of both rate-optimal software pipelining and the minimization of intercore communication overhead. Next, an extended formulation has been proposed to formulate the schedule under memory size constrained systems. It orchestrates the rate-optimal software pipelining execution for stream programs with strict memory, processor cores, and communication constraints. A solution testbed has been implemented for the proposed problem formulations. This has been realized by extending the Brook programming environment with our software pipelining support—named DFBrook. An experimental study has been conducted to verify the effectiveness of the proposed solutions.
INDEX TERMS
Schedules, Resource management, Pipeline processing, Memory management, Multicore processing, resource constrained, Multicore, stream programs, software pipelining
CITATION
Haitao Wei, Junqing Yu, Huafei Yu, Mingkang Qin, Guang R. Gao, "Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 12, pp. 2338-2350, Dec. 2012, doi:10.1109/TPDS.2012.41
REFERENCES
[1] H.P. Hofstee, "Power Efficient Processor Design and the Cell Processor," Proc. 11th Int'l Symp. High-Performance Computer Architecture, pp. 258-262, Feb. 2005.
[2] P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded SPARC Processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, Feb. 2005.
[3] J. Nickolls and I. Buck, "NVIDIA CUDA Software and GPU Parallel Computing Architecture," Proc. Microprocessor Forum, May 2007.
[4] W. Thies, M. Karczmarek, and S.P. Amarasinghe, "StreamIt: A Language for Streaming Applications," Proc. Int'l Conf. Compiler Construction, pp. 179-196, 2002.
[5] I. Buck et al., "Brook for GPUs: Stream Computing on Graphics Hardware," ACM Trans. Graphics, vol. 23, no. 3, pp. 777-786, Aug. 2004.
[6] D. Zhang, Z. Li, H. Song, and L. Liu, "A Programming Model for an Embedded Media Processing Architecture," Proc. Fifth Int'l Symp. Systems, Architectures, Modeling, and Simulation, vol. 3553, pp. 251-261, July 2005.
[7] W. Mark, R. Glanville, K. Akeley, and J. Kilgard, "Cg: A System for Programming Graphics Hardware in a C-like Language," Proc. 30th Int'l Conf. Computer Graphics and Interactive Techniques, pp. 893-907, July 2003.
[8] E.A. Lee and D.G. Messerschmitt, "Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing," IEEE Trans. Computers., vol. C-36, no. 1, pp. 24-35, Jan. 1987.
[9] G. Gao, R. Govindarajan, and P. Panangaden, "Well-Behaved Dataflow Programs for DSP Computation. ICASSP-92:," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 561-564, Mar. 1992.
[10] M.I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs," Proc. 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 151-162, 2006.
[11] M. Kudlur and S. Mahlke, "Orchestrating the Execution of Stream Programs on Multicore Platforms," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 114-124, 2008.
[12] I. Buck Brook Spec v0.2 http://merrimac.stanford.edu/brookbrookspec-v0.2.pdf , 2003.
[13] B.R. Rau, M.S. Schlansker, and P.P. Tirumalai, "Code Generation for Modulo Scheduled Loops," Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 158-169, Nov. 1992.
[14] A. Udupa, R. Govindarajan, and M.J. Thazhuthaveetil, "Software Pipelined Execution of Stream Programs on GPUs," Proc. Int'l Symp. Code Generation and Optimization, pp. 200-209, 2009.
[15] B.R. Rau, "Iterative Modulo Scheduling: An Algorithm for Software Pipelined Loops," Proc. 27th Ann. Int'l Symp. Microarchitecture, pp. 63-74, Nov. 1994.
[16] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. Conf. Programming Language Design and Implementation (SIGPALN '88), pp. 318-328, 1988.
[17] O. Sinnen, Task Scheduling for Parallel Systems. Wiley, 2007.
[18] S.S. Bhattacharyya, P.K. Murthy, and E.A. Lee, "Synthesis of Embedded Software from Synchronous Dataflow Specifications," J. Very Large Scale Integration Signal Processing, vol. 21, pp. 151-166, 1999.
[19] P.K. Murthy, S. Bhattacharyya, and E.A. Lee, "Joint Minimization of Code and Data for Synchronous Dataflow Programs," J. Formal Methods in System Design, vol. 11, no. 1, pp. 41-70, July 1997.
[20] J.L. Pino, S.S. Bhattacharyya, and E.A. Lee, "A Hierarchical Multiprocessor Scheduling Framework for Synchronous Dataflow Graphs," Technical Report UCB/ERL M95/36, Univ. of California, May 1995.
[21] S. Ha and E.A. Lee, "Compile-Time Scheduling and Assignment of Data-Flow Program Graphs with Data Dependent Iteration," IEEE Trans. Computers, vol. 40, no. 11, pp. 1225-1238, Nov. 1991.
[22] A.A. Lamb, W. Thies, and S. Amarasinghe, "Linear Analysis and Optimization of Stream Programs," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 12-25, 2003.
[23] J. Gummaraju and M. Rosenblum, "Stream Programming on General-Purpose Processors," Proc. 38th Ann. IEEE/ACM Int'l Symp. Microarchitecture, pp. 343-354, 2005.
[24] S.W. Liao, Z. Du, G. Wu, and G.-Y. Lueh, "Data and Computation Transformations for Brook Streaming Applications on Multiprocessors," Proc. Int'l Symp. Code Generation and Optimization, no. 1, pp. 196-207, 2006.
[25] A. Das, W. Dally, and P. Mattson, "Compiling for Stream Processing," Proc. Parallel Architectures and Compilation Techniques (PACT), pp. 33-42, Sept. 2006.
[26] R. Govindarajan, G. Gao, and P. Desai, "Minimizing Memory Requirements in Rate-Optimal Schedules," Proc. Int'l Conf. Application Specific Array Processors (ASAP '94), pp. 75-86, Aug. 1994.
[27] Y. Choi, Y. Lin, N. Chong, S. Mahlke, and T. Mudge, "Stream Compilation for Real-Time Embedded Multicore Systems," Proc. Int'l Symp. Code Generation and Optimization, pp. 210-220, 2009.
[28] A.H. Hormati, Y. Choi, M. Kudlur, R. Rabbah, T. Mudge, and S. Mahlke, "Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures," Proc. Parallel Architectures and Compilation Techniques (PACT), 2009.
[29] F. Labonte et al., "The Stream Virtual Machine," Proc. 13th Int'l Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 267-277, 2004.
[30] D. Fan, H. Zhang, D. Wang, X. Ye, F. Song, J. Zhang, and L. Fan, "High-Efficient Architecture of Godson-T Many-Core Processor," Proc. The 23rd Hot Chips: A Symp. High Performance Chips (HotChips), 2011.
[31] N. Yuan, Y. Zhou, G. Tan, J. Zhang, and D. Fan, "High Performance Matrix Multiplication on Many Cores," Proc. 15th Int'l Euro-Par Conf. Parallel Processing (Euro-Par '09), pp. 948-959, 2009.
8 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool