|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2010 IEEE/IFIP International Conference on Dependable Systems&Networks (DSN)
AutomaDeD: Automata-based debugging for dissimilar parallel tasks
Chicago, IL, USA
June 28-July 01
ISBN: 978-1-4244-7500-1
| ASCII Text | x | ||
| Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, "AutomaDeD: Automata-based debugging for dissimilar parallel tasks," IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp. 231-240, 2010 IEEE/IFIP International Conference on Dependable Systems&Networks (DSN), 2010. | |||
| BibTex | x | ||
| @article{ 10.1109/DSN.2010.5544927, author = {Greg Bronevetsky and Ignacio Laguna and Saurabh Bagchi and Bronis R. de Supinski and Dong H. Ahn and Martin Schulz}, title = {AutomaDeD: Automata-based debugging for dissimilar parallel tasks}, journal ={IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)}, volume = {0}, year = {2010}, isbn = {978-1-4244-7500-1}, pages = {231-240}, doi = {http://doi.ieeecomputersociety.org/10.1109/DSN.2010.5544927}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) TI - AutomaDeD: Automata-based debugging for dissimilar parallel tasks SN - 978-1-4244-7500-1 SP231 EP240 A1 - Greg Bronevetsky, A1 - Ignacio Laguna, A1 - Saurabh Bagchi, A1 - Bronis R. de Supinski, A1 - Dong H. Ahn, A1 - Martin Schulz, PY - 2010 VL - 0 JA - IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) ER - | |||
Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. This growing scale makes debugging the applications that run on them a daunting challenge. Few debugging tools perform well at this scale and most provide an overload of information about the entire job. Developers need tools that quickly direct them to the root cause of the problem. This paper presents AutomaDeD, a tool that identifies which tasks of a large-scale application first manifest a bug at a specific code region and specific program execution point. AutomaDeD statistically models the application's control-flow and timing behavior, grouping tasks and identifying deviations from normal execution, which significantly reduces debugging effort. In addition to a case study in which AutomaDeD locates a bug that occurred during development of MVAPICH, we evaluate AutomaDeD on a range of bugs injected into the NAS parallel benchmarks. Our results demonstrate that AutomaDeD detects the time period when a bug first manifested with 90% accuracy for stalls and hangs and 70% accuracy for interference faults. It identifies the subset of processes first affected by the fault with 80% accuracy and 70% accuracy, respectively and the code region where the fault first manifested with 90% and 50% accuracy, respectively.
Citation:
Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, "AutomaDeD: Automata-based debugging for dissimilar parallel tasks," dsn, pp.231-240, 2010 IEEE/IFIP International Conference on Dependable Systems&Networks (DSN), 2010
Usage of this product signifies your acceptance of the Terms of Use.
