The Community for Technology Leaders
RSS Icon
Subscribe
pp: 1
Minh Ngoc Dinh , Monash University, Victoria
Chao Jin , Monash University, Victoria
ABSTRACT
Detecting and isolating bugs that arise only at high processor counts is a challenging task. Over a number of years, we have implemented a special debugging method, called "relative debugging", that supports debugging applications as they evolve or are ported to larger machines. It allows a user to compare the state of a suspect program against another reference version even as the number of processors is increased. The innovative idea is the comparison of runtime data in order to reason about the state of the suspect program. Whilst powerful, a naïve implementation of the comparison phase does not scale to large problems running on large machines. In this paper, we propose two different solutions including a hash-based scheme and a direct point-to-point scheme. We demonstrate the implementation, a case study, as well as the performance, of our techniques on 20K cores of a Cray XE6 system.
INDEX TERMS
Debugging aids, Software/Software Engineering, Software Engineering, Software/Program Verification, Assertion checkers, assertion languages, performance, Testing and Debugging, Distributed debugging
CITATION
Minh Ngoc Dinh, Chao Jin, "Scalable Relative Debugging", IEEE Transactions on Parallel & Distributed Systems, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TPDS.2013.86
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool