Cluster Computing and the Grid, IEEE International Symposium on (2008)
May 19, 2008 to May 22, 2008
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, scheduling resources, and debugging applications. Dedicated networks on inter-site communications are rarely available for the monitoring purposes. Therefore, for real-time monitoring systems, reducing communication cost is important to handle a large number of nodes with limited network resources. We implemented a real-time Grid monitoring system called VGXP, with techniques for low cost data gathering. It tries to send only diffs to recent data, and adapts to the requested data freshness and tolerable errors to minimize required communication. We evaluate monitoring overheads of the proposed method on a distributed environment consisting of 8-sites with 500 nodes. In a realistic setting where the sampling interval is set to 0.5 seconds and the tolerable error to 2%, the CPU usage of the server to gather data from all nodes was 0.2% and the transfer rate was less than 5kbps. The transfer rate did not exceed 50kbps even if we gather a detailed per-process statistics.
Data Gathering, Monitoring, Real-time systems, Visualization, Distributed Conputing
Y. Kamoshida and K. Taura, "Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing," 2008 8th International Symposium on Cluster Computing and the Grid (CCGRID '08)(CCGRID), Lyon, 2008, pp. 425-432.