This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
27th International Conference on Distributed Computing Systems (ICDCS '07)
Embedded Gossip: Lightweight Online Measurement for Large-Scale Applications
Toronto, Canada
June 25-June 27
ISBN: 0-7695-2837-3
Wenbin Zhu, University of New Mexico
Patrick G. Bridges, University of New Mexico
Arthur B. Maccabe, University of New Mexico

For large-scale parallel applications, lightweight online monitoring can enable a wide range of online adaptations, including load balancing, power management, and progress monitoring. The processing and monitoring overhead of centralized global tracing techniques make them unsuitable for such tasks. Purely local tools, on the other hand, fail to provide the global information necessary for many desirable online adaptations of large-scale applications.

In this paper, we describe a novel distributed online measurement method for large-scale applications called Embedded Gossip (EG). EG works by piggybacking performance information about application behavior on existing application messages and merging received information with previously known data in a fashion customized to the needs of a particular monitoring task. EG thus provides each process with both local and global views of application behavior with low overhead.

To illustrate the capabilities of Embedded Gossip, we also show that it disseminates global information in a timely fashion for a wide range of monitoring tasks, including critical path profiling, workload imbalance monitoring, and progress monitoring. This global information has a wide range of potential uses, including imbalance detection for load balancing and energy management tools, progress monitoring for batch schedulers, and a wide range of other performance debugging and optimization techniques.

Citation:
Wenbin Zhu, Patrick G. Bridges, Arthur B. Maccabe, "Embedded Gossip: Lightweight Online Measurement for Large-Scale Applications," icdcs, pp.58, 27th International Conference on Distributed Computing Systems (ICDCS '07), 2007
Usage of this product signifies your acceptance of the Terms of Use.