The Community for Technology Leaders
Cluster Computing and the Grid, IEEE International Symposium on (2011)
Newport Beach, California USA
May 23, 2011 to May 26, 2011
ISBN: 978-0-7695-4395-6
pp: 134-143
ABSTRACT
Volunteer PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availability. A communicating parallel program must employ explicit redundancy, or implicit redundancy with uncoordinated checkpoint-restart to make continuous forward progress in such an unreliable environment. A communication model based on one-sided Put/Get calls to an abstract global shared space is a good match as processes can execute their communication operations independently and asynchronously. However, no existing system is designed for redundant communicating processes. The key problem is that a single logical operation that impacts the global program state may be executed by different instances of the same process at different times leading to semantic inconsistency. This paper presents the design, execution model, implementation, and usage of {\em Volpex}, a communication layer for robust execution on volunteer PC grids. The research leads to a practical way to employ idle PCs for latency tolerant parallel computing applications.
INDEX TERMS
Volunteer Computing, Fault Tolerance, Redundant Computation, Desktop Grids, Parallel execution
CITATION
Nagarajan Kanna, Eshwar Rohit, Qian Wang, Edgar Gabriel, David Anderson, Margaret S. Cheung, Hien Nguyen, Jaspal Subhlok, "A Robust Communication Framework for Parallel Execution on Volunteer PC Grids", Cluster Computing and the Grid, IEEE International Symposium on, vol. 00, no. , pp. 134-143, 2011, doi:10.1109/CCGrid.2011.72
97 ms
(Ver 3.3 (11022016))