High Performance Computing and Grid in Asia Pacific Region, International Conference on (2004)
Omiya Sonic City, Tokyo, Japan
July 20, 2004 to July 22, 2004
Jie Ma , Chinese Academy of Sciences
Zhihua Fan , Chinese Academy of Sciences
Jin Xiong , Chinese Academy of Sciences
Distributed metadata servers are required for cluster file system's scalability. However, how to distribute the file system metadata among multiple metadata servers and how to make the file system reliable in case of server failures are two difficult problem. In this paper, we present a journal-based failure-recovery mechanism for distributed metadata servers in the Dawning Cluster File System-DCFS2. The DCFS2 metadata protocol exploits a modified two-phase commit protocol which ensures consistent metadata updates on multiple metadata servers even in case of one server's failure. In this paper we focus on the logging policy and concurrent control policy for metadata updates, and the failure recovery policy. The DCFS2 metadata protocol is compared with the two phase commit protocol and some virtues are shown. Some results of performance experiments on our system are also presented.
Jie Ma, Zhihua Fan, Jin Xiong, "A Failure Recovery Mechanism for Distributed Metadata Servers in DCFS2", High Performance Computing and Grid in Asia Pacific Region, International Conference on, vol. 00, no. , pp. 2-8, 2004, doi:10.1109/HPCASIA.2004.1324010