SC Conference (2002)
Nov. 16, 2002 to Nov. 22, 2002
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SC.2002.10047
Jason Lee , Lawrence Berkeley National Laboratory
Dan Gunter , Lawrence Berkeley National Laboratory
Martin Stoufer , Lawrence Berkeley National Laboratory
Brian Tierney , Lawrence Berkeley National Laboratory
Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these performance problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Researchers need to be able to view and compare this very detailed monitoring data from a variety of angles. To address this problem, we propose a relational monitoring data archive that is designed to efficiently handle high-volume streams of monitoring data. In this paper we present an instrumentation and monitoring event archive service that can be used to collect and aggregate detailed end-to-end monitoring information from distributed applications. This archive service is designed to be scalable and fault tolerant. We also show how the archive is based on the "Grid Monitoring Architecture" defined by the Global Grid Forum.
D. Gunter, M. Stoufer, B. Tierney and J. Lee, "Monitoring Data Archives for Grid Environments," SC Conference(SC), Baltimore, Maryland, 2002, pp. 66.