The Community for Technology Leaders
RSS Icon
pp: 1
Alan Humphrey , University of Utah, Salt Lake City
Qingyu Meng , University of Utah, Salt Lake City
Martin Berzins , University of Utah, Salt Lake City
Diego Caminha B De Oliveira , University of Utah, Salt Lake City
Zvonimir Rakamaric , University of Utah, Salt Lake City
Ganesh C. Gopalakrishnan , University of Utah, Salt Lake City
Parallel computational frameworks for high performance computing (HPC) are central to the advancement of simulation based studies in science and engineering. Unfortunately, finding and fixing bugs in these frameworks can be extremely time consuming. Left unchecked, these bugs can drastically diminish the amount of new science that can be performed. This paper presents our systematic study of the Uintah computational framework, and our approaches to debug it more incisively. Our key insight is to leverage the modular structure of Uintah which lends itself to systematic debugging. In particular, we have developed a new approach based on Coalesced Stack Trace Graphs (CSTGs) that summarize the system behavior in terms of key control flows manifested through function invocation chains. We illustrate several scenarios how CSTGs could help efficiently localize bugs, and present a case study of how we found and fixed a real Uintah bug using CSTGs.
Alan Humphrey, Qingyu Meng, Martin Berzins, Diego Caminha B De Oliveira, Zvonimir Rakamaric, Ganesh C. Gopalakrishnan, "Systematic Debugging Methods for Large Scale HPC Computational Frameworks", Computing in Science & Engineering, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/MCSE.2014.11
35 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool