The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2014 vol.25)
pp: 104-115
Mai Zheng , Ohio State University, Columbus
Vignesh T. Ravi , Advanced Micro Devices, Austin
Feng Qin , Ohio State University, Columbus
Gagan Agrawal , Ohio State University, Columbus
ABSTRACT
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. While languages like CUDA and OpenCL have eased GPU programming for nongraphical applications, they are still explicitly parallel languages. All parallel programmers, particularly the novices, need tools that can help ensuring the correctness of their programs. Like any multithreaded environment, data races on GPUs can severely affect the program reliability. In this paper, we propose GMRace, a new mechanism for detecting races in GPU programs. GMRace combines static analysis with a carefully designed dynamic checker for logging and analyzing information at runtime. Our design utilizes GPUs memory hierarchy to log runtime data accesses efficiently. To improve the performance, GMRace leverages static analysis to reduce the number of statements that need to be instrumented. Additionally, by exploiting the knowledge of thread scheduling and the execution model in the underlying GPUs, GMRace can accurately detect data races with no false positives reported. Our experimental results show that comparing to previous approaches, GMRace is more effective in detecting races in the evaluated cases, and incurs much less runtime and space overhead.
INDEX TERMS
Instruction sets, Graphics processing units, Synchronization, Runtime, Kernel, Instruments, Message systems,multithreading, GPU, CUDA, data race, concurrency
CITATION
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawal, "GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme", IEEE Transactions on Parallel & Distributed Systems, vol.25, no. 1, pp. 104-115, Jan. 2014, doi:10.1109/TPDS.2013.44
REFERENCES
[1] "CUDA Community Showcase," http:/www.nvidia.com, 2013.
[2] A.D. Malony, S. Biersdorff, W. Spear, and S. Mayanglambam, "An Experimental Approach to Performance Measurement of Heterogeneous Parallel Applications Using Cuda," Proc. 24th ACM Int'l Conf. Supercomputing (ICS), 2010.
[3] Khro nos Group, "OpenCL: The Open Standdard for Heterogeneous Parallel Programming," http://www.khronos.orgopencl, 2008.
[4] S. zee Ueng, M. Lathara, S.S. Baghsorkhi, W. mei, and W. Hwu, "CUDA-Lite: Reducing GPU Programming Complexity," Proc. Int'l Workshop Languages and Compilers for Parallel Computing (LCPC), 2008.
[5] S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization," Proc. 14th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '09), 2009.
[6] N. Sundaram, A. Raghunathan, and S. Chakradhar, "A Framework for Efficient and Scalable Execution of Domain-Specific Templates on GPUs," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS), 2009.
[7] W. Ma and G. Agrawal, "A Translation System for Enabling Data Mining Applications on GPUs," Proc. 23rd Int'l Conf. Supercomputing (ICS), 2009.
[8] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, "Eraser: A Dynamic Data Race Detector for Multithreaded Programs," ACM Trans. Computer Systems, vol. 15, no. 4, pp. 391-411, 1997.
[9] S. Lu, S. Park, E. Seo, and Y. Zhou, "Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics," Proc. 13th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
[10] M. Boyer, K. Skadron, and W. Weimer, "Automated Dynamic Analysis of CUDA Programs," Proc. Third Workshop Software Tools for MultiCore Systems (STMCS), 2008.
[11] G. Li and G. Gopalakrishnan, "Scalable SMT-Based Verification of GPU Kernel Functions," Proc. 18th ACM SIGSOFT Int'l Symp. Foundations of Software Eng. (FSE), 2010.
[12] J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan, "Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2002.
[13] C.-S. Park, K. Sen, P. Hargrove, and C. Iancu, "Efficient Data Race Detection for Distributed Memory Parallel programs," Proc. ACM/IEEE Conf. Supercomputing (SC '11), 2011.
[14] A. Dinning and E. Schonberg, "An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection," Proc. Second ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 1990.
[15] R.H.B. Netzer and B.P. Miller, "Improving the Accuracy of Data Race Detection," Proc. Second ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 1991.
[16] D. Perkovic and P.J. Keleher, "Online Data-Race Detection via Coherency Guarantees," Proc. Second USENIX Symp. Operating Systems Design and Implementation (OSDI), 1996.
[17] C. Flanagan and S.N. Freund, "Fasttrack: Efficient and Precise Dynamic Race Detection," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2009.
[18] R. O'Callahan and J.-D. Choi, "Hybrid Dynamic Data Race Detection," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2003.
[19] E. Pozniansky and A. Schuster, "Efficient On-the-Fly Data Race Detection in Multithreaded C++ Programs," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2003.
[20] Y. Yu, T. Rodeheffer, and W. Chen, "Racetrack: Efficient Detection of Data Race Conditions via Adaptive Tracking," Proc. 12th ACM Symp. Operating Systems Principles (SOSP), 2005.
[21] M. Zheng, V.T. Ravi, F. Qin, and G. Agrawal, "GRace: A Low-Overhead Mechanism for Detecting Data Races in Gpu Programs," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2011.
[22] "ROSE Compiler Infrastructure," http:/www.rosecompiler.org, 2013.
[23] P. Feautrier, "Parametric Integer Programming," RAIRO Recherche Opérationnelle, vol. 22, no. 3, pp. 243-268, 1988.
[24] H. Cho, I.S. Dhillon, Y. Guan, and S. Sra, "Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data," Proc. Fourth SIAM Int'l Conf. Data Mining (SDM), 2004.
[25] A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.
[26] W. Ma and G. Agrawal, "An Integer Programming Framework for Optimizing Shared Memory Use on GPUs," Proc. IEEE Ann. Int'l Conf. High Performance Computing (HiPC '12), 2012.
[27] B. Korel, "Automated Software Test Data Generation," IEEE Trans. Software Eng., vol. 16, no. 8, pp. 870-879, Aug. 1990.
[28] P. Godefroid, K. Nils, and K. Sen, "Dart: Directed Automated Random Testing," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2005.
[29] P. Godefroid, "Compositional Dynamic Test Generation," Proc. 34th Ann. ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages (POPL), 2007.
[30] C. Cadar, G.V., P. Pawlowski, D. Dill, and D. Engler, "EXE: Automatically Generating Inputs of Death," Proc. 13th ACM Conf. Computer and Comm. Security, 2006.
[31] D. Engler and K. Ashcraft, "RacerX: Effective, Static Detection of Race Conditions and Deadlocks," Proc. 19th ACM Symp. Operating Systems Principles (SOSP), 2003.
[32] C. Boyapati, R. Lee, and M. Rinard, "Ownership Types for Safe Programming: Preventing Data Races and Deadlocks," Proc. 17th ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2002.
[33] C. Flanagan and S.N. Freund, "Type-Based Race Detection for Java," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2000.
[34] T.A. Henzinger, R. Jhala, and R. Majumdar, "Race Checking by Context Inference," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2004.
[35] M.-H. Kang, O.-K. Ha, S.-W. Jun, and Y.-K. Jun, "A Tool for Detecting First Races in Openmp Programs," Proc. 10th Int'l Conf. Parallel Computing Technologies (PACT), 2009.
[36] A. Fedorova, S. Blagodurov, and S. Zhuravlev, "Managing Contention for Shared Resources on Multicore Processors," Comm. ACM, vol. 53, no. 2, pp. 49-57, 2010.
[37] Q. Gao, W. Zhang, Z. Chen, M. Zheng, and F. Qin, "2ndStrike: Toward Manifesting Hidden Concurrency Typestate Bugs," Proc. 16th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '11), 2011.
[38] C. Flanagan and S.N. Freund, "Atomizer: A Dynamic Atomicity Checker for Multithreaded Programs," Proc. 31st ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages (POPL), 2004.
[39] M. Xu, R. Bodík, and M.D. Hill, "A Serializability Violation Detector for Shared-Memory Server Programs," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2005.
[40] S. Lu, J. Tucek, F. Qin, and Y. Zhou, "AVIO: Detecting Atomicity Violations via Access Interleaving Invariants," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006.
[41] L. Chew and D. Lie, "Kivati: Fast Detection and Prevention of Atomicity Violations," Proc. Fifth European Conf. Computer Systems (EuroSys), 2010.
[42] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J.B. Saxe, and R. Stata, "Extended Static Checking for Java," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), 2002.
[43] M. Naik, C.-S. Park, K. Sen, and D. Gay, "Effective Static Deadlock Detection," Proc. 31st Int'l Conf. Software Eng. (ICSE), 2009.
[44] "Java PathFinder," http:/javapathfinder.sourceforge.net, 2007.
[45] Y. Nir-Buchbinder, R. Tzoref, and S. Ur, "Deadlocks: From Exhibiting to Healing," 2008.
[46] F. Zeng and R.P. Martin, "Ghost Locks: Deadlock Prevention for Java," Proc. Mid-Atlantic Student Workshop Programming Languages and Systems, 2004.
[47] H. Jula, D. Tralamazza, C. Zamfir, and G. Candea, "Deadlock Immunity: Enabling Systems to Defend Against Deadlocks," Proc. Eighth USENIX Conf. Operating Systems Design and Implementation (OSDI), 2008.
[48] S. Yang, A.R. Butt, Y.C. Hu, and S.P. Midkiff, "Trust But Verify: Monitoring Remotely Executing Programs for Progress and Correctness," Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2005.
[49] J. DeSouza, B. Kuhn, B.R. de Supinski, V. Samofalov, S. Zheltov, and S. Bratanov, "Automated, Scalable Debugging of MPI Programs with Intel Message Checker," Proc. Second Int'l Workshop Software Eng. for High Performance Computing System Applications (SE-HPCS), 2005.
[50] C. Falzone, A. Chan, E. Lusk, and W. Gropp, "A Portable Method for Finding User Errors in the Usage of MPI Collective Operations," Int'l J. High Performance Computing Applications, vol. 21, no. 2, pp. 155-165, 2007.
[51] Q. Gao, F. Qin, and D.K. Panda, "DMTracker: Finding Bugs in Large-Scale Parallel Programs by Detecting Anomaly in Data Movements," Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
[52] T. Hilbrich, B.R. de Supinski, M. Schulz, and M.S. Müller, "A Graph Based Approach for MPI Deadlock Detection," Proc. 23rd Int'l Conf. Supercomputing (ICS), 2009.
[53] B. Krammera, K. Bidmona, M.S. Muller, and M.M. Rescha, "MARMOT: An MPI Analysis and Checking Tool," Proc. Advances in Parallel Computing (PARCO), 2003.
[54] G. Luecke, H. Chen, J. Coyle, J. Hoekstra, M. Kraeva, and Y. Zou, "MPI-CHECK: A Tool for Checking Fortran 90 MPI Programs," Concurrency and Computation: Practice and Experience, vol. 15, no. 2, pp. 93-100, 2003.
[55] J.S. Vetter and B.R. de Supinski, "Dynamic Software Testing of MPI Applications with Umpire," Proc. ACM/IEEE Conf. Supercomputing (SC), 2000.
[56] J. Odom, J.K. Hollingsworth, L. DeRose, K. Ekanadham, and S. Sbaraglia, "Using Dynamic Tracing Sampling to Measure Long Running Programs," Proc. ACM/IEEE Conf. Supercomputing (SC), 2005.
[57] A. Zhai, G. He, and M. Heimdahl, "Hardware and Compiler Support for Dynamic Software Monitoring," Proc. Int'l Workshop Runtime Verification (RV), 2009.
[58] D.H. Ahn, B.R. de Supinski, I. Laguna, G.L. Lee, B. Liblit, B.P. Miller, and M. Schulz, "Scalable Temporal Order Analysis for Large Scale Debugging," Proc. Conf. High Performance Computing Networking, Storage and Analysis (SC), 2009.
[59] D.C. Arnold, D.H. Ahn, B.R. de Supinski, G. Lee, B.P. Miller, and M. Schulz, "Stack Trace Analysis for Large Scale Debugging," Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '07), 2007.
[60] S.M. Balle, B.R. Brett, C.-P. Chen, and D. LaFrance-Linden, "Extending a Traditional Debugger to Debug Massively Parallel Applications," J. Parallel and Distributed Computing, vol. 64, no. 5, pp. 617-628, 2004.
[61] Etnus, LLC, "TotalView," http://www.etnus.comTotalView, 2013.
[62] S.S. Lumetta and D.E. Culler, "The Mantis Parallel Debugger," Proc. SIGMETRICS Symp. Parallel and Distributed Tools, 1996.
[63] G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S.P. Rajan, "GKLEE: Concolic Verification and Test Generation for GPUs," Proc. Second ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), 2012.
[64] P. Li, G. Li, and G. Gopalakrishnan, "Parametric Flows: Automated Behavior Equivalencing for Symbolic Analysis of Races in CUDA Programs," Proc. Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC '12), 2012.
[65] A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner, "Verifying GPU Kernels by Test Amplification," Proc. 33rd ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI '12), 2012.
[66] M. Zheng, V.T. Ravi, W. Ma, F. Qin, and G. Agrawal, "GMProf: A Low-Overhead Fine-Grained Profiling Approach for GPU Programs," Proc. 19th Int'l Conf. High Performance Computing (HiPC '12), 2012.
[67] E.Z. Zhang, Y. Jiang, Z. Guo, and X. Shen, "Streamlining GPU Applications on the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping," Proc. 24th ACM Int'l Conf. Supercomputing (ICS), 2010.
[68] "ATI Stream Technology," http://www.amd.comstream, 2013.
41 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool