A multiple SIMD, multiple data (MSMD) architecture: Parallel execution of dynamic and static SIMD fragments
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Yaohua Wang,Shuming Chen,Jianghua Wan,Jiayuan Meng,Kai Zhang,Wei Liu,Xi Ning
Issue Date:February 2013
pp. 603-614
The efficacy of widely used single instruction, multiple data architectures is often limited when handling divergent control flows and short vectors; both circumstances result in SIMD fragments that use only a subset of the available datapaths. This paper ...
Instruction Shuffle: Achieving MIMD-like Performance on SIMD Architectures
Found in: IEEE Computer Architecture Letters
By Yaohua Wang,Shuming Chen,Kai Zhang,Jianghua Wan,Xiaowen Chen,Hu Chen,Haibo Wang
Issue Date:July 2012
pp. 37-40
SIMD architectures are less efficient for applications with the diverse control-flow behavior, which can be mainly attributed to the requirement of the identical control-flow. In this paper, we propose a novel instruction shuffle scheme that features an ef...
Architectural Implications for SIMD Processors in the Wireless Communication Domain
Found in: 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)
By Yaohua Wang,Kai Zhang,Jianghua Wan,Sheng Liu,Xi Ning,Shuming Chen
Issue Date:June 2012
pp. 1199-1204
To further improve the performance of SIMD (Single Instruction Multiple Data) architectures, which are widely used in the wireless communication domain. The main components of Long Term Evolution (LTE) protocol are analyzed. Performance investigation is ta...
Matrix Odd-Even Partition: A High Power-Efficient Solution to the Small Grain Data Shuffle
Found in: Networking, Architecture, and Storage, International Conference on
By Sheng Liu,ShuMing Chen,JiangHua Wan,HaiYan Chen,YaoHua Wang
Issue Date:July 2011
pp. 348-354
The shuffle operation is one of the bottlenecks invector DSPs. The partitioning problem of the shuffle matrix will have a great effect on the design of the shuffle unit, when dealing with the small grain data shuffle using a smaller-sized crossbar. The tra...
AIFSP: An Adaptive Instruction Flow Stream Processor
Found in: VLSI, IEEE Computer Society Annual Symposium on
By Yaohua Wang, Shuming Chen, Jianghua Wan, Kai Zhang, Shenggang Chen
Issue Date:July 2011
pp. 272-277
Stream processor is efficient for media applications as it exploits the features of media processing, such as data parallelism, producer-consumer locality and so on. However, the loosely coupled structure between host and stream processor makes the communi...
FT-Matrix: A Coordination-aware Architecture for Signal Processing
Found in: IEEE Micro
By Shuming Chen,Yaohua Wang,Sheng Liu,Jianghua Wan,Haiyan Chen,Hengzhu Liu,Kai Zhang,Xiangyuan Liu,Xi Ning
Issue Date:December 2013
pp. 1
Vector-SIMD architectures have gained increasing attention due to their high performance in signal processing applications. However, the performance of existing vector-SIMD architectures is still limited due to their inefficiency in the coordinated exploit...