This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Kilo TM: Hardware Transactional Memory for GPU Architectures
May/June 2012 (vol. 32 no. 3)
pp. 7-16
Wilson W.L. Fung, University of British Columbia
Inderpreet Singh, University of British Columbia
Andrew Brownsword, Electronic Arts
Tor M. Aamodt, University of British Columbia
Programming GPUs is challenging for applications with irregular fine-grained communication between threads. To improve the programmability of GPUs and thus extend their usage to a wider range of applications, the authors propose to enable transactional memory (TM) on GPUs via Kilo TM, a novel hardware TM system that scales to thousands of concurrent transactions.

1. D. Arnold et al., "Stack Trace Analysis for Large Scale Debugging," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS 07), IEEE CS, 2007; doi:10.1109/IPDPS.2007.370254.
2. M. Herlihy and J.E.B. Moss, "Transactional Memory: Architectural Support for Lock-Free Data Structures," Proc. 20th Ann. Int'l Symp. Computer Architecture (ISCA 93), ACM, 1993, pp. 289-300.
3. T. Harris, J. Larus, and R. Rajwar, Transactional Memory, 2nd ed., Morgan and Claypool, 2010.
4. M. Burtscher and K. Pingali, "An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm," GPU Computing Gems, Emerald ed., Morgan Kaufmann, 2011, pp. 75-92.
5. "NVIDIA's Next-Gen CUDA Compute Architecture: Fermi," white paper, Nvidia, Oct. 2009; http://www.nvidia.com/content/PDF/fermi_white_papers NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf .
6. W.W.L. Fung et al., "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture, IEEE CS, 2007, pp. 407-420.
7. W.W.L. Fung et al., "Hardware Transactional Memory for GPU Architectures," Proc. 44th Ann. IEEE/ACM Int'l Symp. Microarchitecture, ACM, 2011, pp. 296-307.
8. H. Chafi et al., "A Scalable, Non-blocking Approach to Transactional Memory," Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA 07), IEEE CS, 2007, pp. 97-108.
9. L. Yen et al., "LogTM-SE: Decoupling Hardware Transactional Memory from Caches," Proc. IEEE 13th Int'l Symp. High Performance Computer Architecture (HPCA 07), IEEE CS, 2007, pp. 261-272.
10. R. Guerraoui and M. Kapalka, "On the Correctness of Transactional Memory," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP 08), ACM, 2008, pp. 175-184.
11. M.F. Spear, M.M. Michael, and C. von Praun, "RingSTM: Scalable Transactions with a Single Atomic Instruction," Proc. 20th Ann. Symp. Parallelism in Algorithms and Architectures (SPAA 08), ACM, 2008, pp. 275-284.
12. L. Dalessandro, M.F. Spear, and M.L. Scott, "NOrec: Streamlining STM by Abolishing Ownership Records," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP 10), ACM, 2010, pp. 67-78.
13. M.M. Michael, "Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS," Proc. 18th Int'l Symp. Distributed Computing (DISC 04), LNCS 3274, Springer, 2004, pp. 144-158.
14. A. Bakhoda et al., "Analyzing CUDA Workloads Using a Detailed GPU Simulator," Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS 09), IEEE, 2009, pp. 163-174.
15. W.J. Dally and B.P. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004.
1. L. Dalessandro, M.F. Spear, and M.L. Scott, "NOrec: Streamlining STM by Abolishing Ownership Records," Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP 10), ACM Press, 2010, pp. 67-78.
2. W.W.L. Fung et al., KILO TM Correctness: ABA Tolerance and Validation-Commit Indivisibility, tech. report, Dept. of Electrical Eng., University of British Columbia, to be published in 2012.
3. M.M. Michael, "Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS," Proc. 18th Int'l Symp. Distributed Computing (DISC 04), LNCS 3274, Springer, 2004, pp. 144-158.
4. G. Weikum and G. Vossen, Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery, Morgan Kaufmann, 2001.

Index Terms:
SIMD processors, hardware-software interface, parallel processors, transactional memory, GPU, KILO TM, fine-grained communication
Citation:
Wilson W.L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt, "Kilo TM: Hardware Transactional Memory for GPU Architectures," IEEE Micro, vol. 32, no. 3, pp. 7-16, May-June 2012, doi:10.1109/MM.2012.16
Usage of this product signifies your acceptance of the Terms of Use.