Monterey, California
Apr. 11, 2005 to Apr. 14, 2005
ISBN: 0-7695-2318-8
pp: 3-17
Richard Hedges , Lawrence Livermore National Laboratory
Bill Loewe , Lawrence Livermore National Laboratory
Tyce McLarty , Lawrence Livermore National Laboratory
Chris Morrone , Lawrence Livermore National Laboratory
Over the last several years there has been a major thrust at the Lawrence Livermore National Laboratory toward building extremely large scale computing clusters based on open source software and commodity hardware. On the storage front, our efforts have focused upon the development of the Lustre file system and bringing it into production in our computer center. Given our customers' requirements, it is assured that we will be living on the bleeding edge with this file system software as we press it into production. A further reality is that our partners are not able to duplicate the scale of systems as required for these testing purposes. For these practical reasons, the onus for file system testing at scale has fallen largely upon us. As an integral part of our testing efforts, we have developed programs for stress and performance testing of parallel file systems. This paper focuses on these unique test programs and upon how we apply them to understand the usage and failure modes of such large-scale parallel file systems.
