Search For:

Displaying 1-7 out of 7 total
Software Transactional Memory for GPU Architectures
Found in: IEEE Computer Architecture Letters
By Yunlong Xu,Rui Wang,Nilanjan Goswami,Tao Li,Depei Qian
Issue Date:January 2014
pp. 1-1
To make applications with dynamic data sharing among threads benefit from GPU acceleration, we propose a novel software transactional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the ...
Power-performance co-optimization of throughput core architecture using resistive memory
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Nilanjan Goswami,Bingyi Cao,Tao Li
Issue Date:February 2013
pp. 342-353
Massively parallel computing on throughput computers such as GPUs requires myriad memory accesses to register files, on-chip scratchpad, caches, and off-chip DRAM. Unlike CPUs, these processors have a large register file and on-chip scratchpad memory, whic...
Hierarchically characterizing CUDA program behavior
Found in: IEEE Workload Characterization Symposium
By Zhibin Yu,Hai Jin,Nilanjan Goswami,Tao Li,Lizy K. John
Issue Date:November 2011
pp. 76
CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect the basic block level characteristics, a...
Analyzing soft-error vulnerability on GPGPU microarchitecture
Found in: IEEE Workload Characterization Symposium
By Jingweijia Tan,Nilanjan Goswami,Tao Li,Xin Fu
Issue Date:November 2011
pp. 226-235
The general-purpose computation on graphic processing units (GPGPU) becomes increasingly popular due to their high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and tolerance s...
Software Transactional Memory for GPU Architectures
Found in: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14)
By Depei Qian, Lan Gao, Nilanjan Goswami, Rui Wang, Tao Li, Yunlong Xu
Issue Date:February 2014
pp. 1-10
Modern GPUs have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. Ofte...
Accelerating GPGPU architecture simulation
Found in: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems (SIGMETRICS '13)
By Chengzhong Xu, Hai Jin, Lizy John, Nilanjan Goswami, Zhibin Yu
Issue Date:June 2013
pp. 331-332
Recently, graphics processing units (GPUs) have opened up new opportunities for speeding up general-purpose parallel applications due to their massive computational power and up to hundreds of thousands of threads enabled by programming models such as CUDA...
Integrating nanophotonics in GPU microarchitecture
Found in: Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT '12)
By Ajit Verma, Nilanjan Goswami, Ramkumar Shankar, Tao Li, Zhongqi Li
Issue Date:September 2012
pp. 425-426
As high-performance computing device, the GPU has exposed bandwidth and latency bottlenecks in on-chip interconnect and off-chip memory access. To eliminate such bottlenecks, we employ silicon nanophotonics and 3D stacking technologies in GPU microarchitec...