Search For:

Displaying 1-9 out of 9 total
Auto-Tuning GEMV on Many-Core GPU
Found in: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)
By Weizhi Xu,Zhiyong Liu,Jun Wu,Xiaochun Ye,Shuai Jiao,Da Wang,Fenglong Song,Dongrui Fan
Issue Date:December 2012
pp. 30-36
GPUs provide powerful computing ability especially for data parallel algorithms. However, the complexity of the GPU system makes the optimization of even a simple algorithm difficult. Different parallel algorithms or optimization methods on a GPU often lea...
 
Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU
Found in: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD)
By Weizhi Xu,Hao Zhang,Shuai Jiao,Da Wang,Fenglong Song,Zhiyong Liu
Issue Date:August 2012
pp. 231-235
It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GP...
 
Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism
Found in: IEEE Micro
By Dongrui Fan,Hao Zhang,Da Wang,Xiaochun Ye,Fenglong Song,Guojie Li,Ninghui Sun
Publication Date: April 2012
pp. N/A
Godson-T is a research many-core processor designed for parallel scientific computing. It delivers efficient performance and flexible programmability simultaneously. On the one hand, Godson-T has many features to achieve high efficiency for on-chip resourc...
 
Godson-T: An Efficient Many-Core Processor Exploring Thread-Level Parallelism
Found in: IEEE Micro
By Dongrui Fan,Hao Zhang,Da Wang,Xiaochun Ye,Fenglong Song,Guojie Li,Ninghui Sun
Issue Date:March 2012
pp. 38-47
Godson-T is a research many-core processor designed for parallel scientific computing that delivers efficient performance and flexible programmability simultaneously. It also has many features to achieve high efficiency for on-chip resource utilization, su...
 
Design of New Hash Mapping Functions
Found in: Computer and Information Technology, International Conference on
By Fenglong Song, Zhiyong Liu, Dongrui Fan, Junchao Zhang, Lei Yu, Nan Yuan, Wei Lin
Issue Date:October 2009
pp. 45-50
Conflict can decrease performance of computer severely, such as bank conflicts reduce bandwidth of interleave multibank memory systems and conflict misses reduce effective on-chip capacity, and this incurs much conflict miss further. Conflicts can be avoid...
 
A Synchronization-Based Alternative to Directory Protocol
Found in: Parallel and Distributed Processing with Applications, International Symposium on
By He Huang, Lei Liu, Nan Yuan, Wei Lin, Fenglong Song, Junchao Zhang, Dongrui Fan
Issue Date:August 2009
pp. 175-181
The efficient support of cache coherence is extremely important to design and implement many-core processors. In this paper, we propose a synchronization-based coherence (SBC) protocol to efficiently support cache coherence for shared memory many-core arch...
 
Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor
Found in: Parallel and Distributed Processing with Applications, International Symposium on
By Fenglong Song, Zhiyong Liu, Dongrui Fan, He Huang, Nan Yuan, Lei Yu, Junchao Zhang
Issue Date:August 2009
pp. 571-576
On-chip many-core architecture is an emerging and promising computation platform. High speed on-chip communication and abundant chipped resources are two outstanding advantages of this architecture, which provide an opportunity to implement efficient synch...
 
Study on Fine-Grained Synchronization in Many-Core Architecture
Found in: Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, ACIS International Conference on
By Lei Yu, Zhiyong Liu, Dongrui Fan, Fenglong Song, Junchao Zhang, Nan Yuan
Issue Date:May 2009
pp. 524-529
The synchronization between threads has serious impact on the performance of many-core architecture. When communication is frequent, coarse-grained synchronization brings significant overhead. Thus, coarse-grained synchronization is not suitable for this s...
 
Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor
Found in: Proceedings of the 8th ACM International Conference on Computing Frontiers (CF '11)
By Aiichiro Nakano, Dongrui Fan, Fenglong Song, Guangming Tan, Hao Zhang, Liu Peng, Priya Vashishta, Rajiv K. Kalia
Issue Date:May 2011
pp. 1-10
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. This paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T -like...
     
 1