Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007) Grid Unit: A Self-Managing Building Block for Grid System Adelaide, Australia December 03-December 06 ISBN: 0-7695-3049-4
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PDCAT.2007.43
Grid system software is inherently complex, hard to build and maintain. In this paper, we propose a self- managing building block: Grid Unit, which facilitates constructing Grid system with higher availability and lower management overhead. We present an agent organization as autonomic management framework, and propose a self-recovering protocol to eliminate most of tough jobs from system administrator's routines. The system has been deployed on Dawning 4000A since 2004, the biggest node for China Grid system. We have done extensive experiments to evaluate Grid Unit, and the collected log data shows the availability of a Grid parallel process management service, built on the basis of Grid Unit, reaches 99.997%.
Citation:
Jianfeng Zhan, Lei Wang, Ming Zou, Hui Wang, Shuang Gao, Yulei Ding, "Grid Unit: A Self-Managing Building Block for Grid System," pdcat, pp.303-310, Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007), 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||