The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - July-Dec. (2012 vol.11)
pp: 37-40
Yaohua Wang , National University of Defence Technology, PR.China.
Shuming Chen , National University of Defence Technology, PR.China.
Kai Zhang , National University of Defence Technology, PR.China.
Jianghua Wan , National University of Defence Technology, PR.China.
Xiaowen Chen , National University of Defence Technology, PR.China.
Hu Chen , National University of Defence Technology, PR.China.
Haibo Wang , National University of Defence Technology, PR.China.
ABSTRACT
SIMD architectures are less efficient for applications with the diverse control-flow behavior, which can be mainly attributed to the requirement of the identical control-flow. In this paper, we propose a novel instruction shuffle scheme that features an efficient control-flow handling mechanism. The cornerstones are composed of a shuffle source instruction buffer array and an instruction shuffle unit. The shuffle unit can concurrently deliver instructions of multiple distinct control-flows from the instruction buffer array to eligible SIMD lanes. Our instruction shuffle scheme combines the best attributes of both the SIMD and MIMD execution paradigms. Experimental results show that, an average performance improvement of 86% can be achieved, at a cost of only 5.8% area overhead.
INDEX TERMS
Instruction sets, Process control, Resource management, Vectors, Scalability, SIMD, Kernel, Process control, Resource management, Vectors, Scalability, Arrays, data dependent control-flow, instruction shuffle, instruction buffer array
CITATION
Yaohua Wang, Shuming Chen, Kai Zhang, Jianghua Wan, Xiaowen Chen, Hu Chen, Haibo Wang, "Instruction Shuffle: Achieving MIMD-like Performance on SIMD Architectures", IEEE Computer Architecture Letters, vol.11, no. 2, pp. 37-40, July-Dec. 2012, doi:10.1109/L-CA.2011.34
REFERENCES
1. B. Krashinsky,C. Batten et al., “The vector-thread architecture,” Micro, IEEE, volume 24, no. 6, pp. 84-90, Nov/Dec 2004.
2. M. Woh,S. Seo, et al., “Anysp: Anytime anywhere anyway signal processing,” Micro, IEEE, volume 30, no. 1, pp. 81-91, Jan/Feb. 2010.
3. Yunsup Lee,Rimas Avizienis, et al, “, Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Acceler-ators”, ISCA, Jun 2011.
4. Pedro Marcuello,Antonio Gonzalez,“Clustered Speculative Multithread Processors”, ICS 99, pp. 365-372.
5. M. Jayapala,F. Barat, et al, “Clustered loop buffer organization for low energy vliw embedded processors,” Computers, IEEE Transactions on, volume 54, no. 6, pp. 672-683, Jun 2005.
6. U. J. Kapasi., “Conditional Techniques for Stream Processing Kernels”. PhD thesis, Stanford University, March 2004.
7. Wilson W. L. Fung, et al, “, Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow”. in MICRO'07.
8. Narasiman et al., “Improving GPU Performance via Large Warps and Two-Level Warp Scheduling” UT Tech Report 2010.
9. W. Bouknight,S. Denenberg, et al, “The Illiac IV System”. Proc. of the IEEE, 60(4):369-388, 1972.
33 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool