Cluster Computing and the Grid, IEEE International Symposium on (2002)
May 21, 2002 to May 24, 2002
The Grid Datafarm (Gfarm) architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulate filesystem metadata consistently. Fault tolerance and load balancing are automatically managed by file duplication or recomputation using a command history log. Preliminary performance evaluation has shown scalable disk I/O and network bandwidth on 64 nodes of the Presto III Athlon cluster. The Gfarm parallel I/O write and read operations has achieved data transfer rates of 1.74 GB/s and 1.97 GB/s, respectively, using 64 cluster nodes. The Gfarm parallel file copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000. The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computational power.
N. Soda, Y. Morita, O. Tatebe, S. Sekiguchi and S. Matsuoka, "Grid Datafarm Architecture for Petascale Data Intensive Computing," 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid(CCGRID), Berlin, Germany, 2002, pp. 102.