Issue No.01 - Jan.-June (2012 vol.11)
pp: 5-8
Ji Kong , Shanghai Jiaotong University, Shanghai
Peilin Liu , Shanghai Jiaotong University, Shanghai
Yu Zhang , IBM Research-China, Beijing
State of the art fabrication technology for integrating numerous hardware resources such as Processors/DSPs and memory arrays into a single chip enables the emergence of Multiprocessor System-on-Chip (MPSoC). Stream programming paradigm based on MPSoC is highly efficient for single functionality scenario due to its dedicated and predictable data supply system. However, when memory traffic is heavily shared among parallel tasks in applications with multiple interrelated functionalities, performance suffers through task interferences and shared memory congestions which lead to poor parallel speedups and memory bandwidth utilizations. This paper proposes a framework of stream processing based on-chip data supply system for task-parallel MPSoCs. In this framework, stream address generations and data computations are decoupled and parallelized to allow full utilization of on-chip resources. Task granularities are dynamically tuned to jointly optimize the overall application performance. Experiments show that proposed framework as well as the tuning scheme are effective for joint optimization in task-parallel MPSoCs.
Multiple Data Stream Architectures (Multiprocessors), Multi-core/single-chip multiprocessors, Memory hierarchy, Application studies resulting in better multiple-processor systems
Ji Kong, Peilin Liu, Yu Zhang, "Atomic Streaming: A Framework of On-Chip Data Supply System for Task-Parallel MPSoCs", IEEE Computer Architecture Letters, vol.11, no. 1, pp. 5-8, Jan.-June 2012, doi:10.1109/L-CA.2011.21
1. Jayanth Gummaraju and Mendel Rosenblum,Stream Programming on General-Purpose Processors, Proc. MICRO 38, November 2005.
2. Brucek K. Khailany,Ted Williams,Jim Lin,Eileen Peters Long,Mark Rygh,DeForest W. Tovey,, and William J. Dally,A Programmable 512 GOPS stream processor for signal, image, and video processing, Proc. Solid-State Circuits, IEEE Journal, 2008.
3. Timothy D.R. Hartley and Umit Catalyurek, etc., , Biomedical image analysis on a cooperative cluster of GPUs and multicores, Proc. ICS'08.
4. Mark Woh,Yuan Lin,Sangwon Seo,Scott Mahlke,Trevor Mudge,Chaitali Chakrabartiy,Richard Brucez,Danny Kershawz,Alastair Reidz,Mladen Wilderz, and Krisztian Flautnerz,From SODA to scotch: The evolution of a wireless baseband processor, Proc. MICRO 41, 2008.
5. Manjunath Kudlur and Scott Mahlke,Orchestrating the execution of stream programs on multicore platforms, Proc. PLDI'08.
6. Antoniu Pop and Albert Cohen,A Stream-Computing Extension to OpenMP, Proc. HiPEAC'11.
8. lang.html
9. Mattan Erez and Jung Ho Ahn,Executing Irregular Scientific Applications on Stream Architectures, Proc. ICS'07.
10. N. Wu and M. Wen, etc., Cache Streamization for High Performance Stream Processor, Proc. HIPC'09.
11. R Manikantan,R Govindarajan, and Kaushik Rajan,Extended Histories: Improving Regularity and Performance in Correlation Prefetchers, Proc. HiPEAC'11.
12. Eiman Ebrahimi,Onur Mutlu,Chang Joo Lee, and Yale N. Patt,Coordinated Control of Multiple Prefetchers in Multi-Core Systems, Proc. MICRO'09.
13. Pedro Diaz and Marcelo Cintra,Stream Chaining: Exploiting Multiple Levels of Correlation in Data Prefetching, Proc. ISCA'09.