2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC) (2012)
Salt Lake City, UT
Nov. 10, 2012 to Nov. 16, 2012
HDF5 is a data model, library and file format for storing and managing data. It is designed for flexible and efficient I/O for high volume and complex data. Natively, it uses a single-file format where multiple HDF5 objects are stored in a single file. In a parallel HDF5 application, multiple processes access a single file, thereby resulting in a performance bottleneck in I/O. Additionally, a single-file format does not allow semantic post processing on individual objects outside the scope of the HDF5 application. We have developed a new plugin for HDF5 using its Virtual Object Layer that serves two purposes: 1) it uses PLFS to convert the single-file layout into a data layout that is optimized for the underlying file system, and 2) it stores data in a unique way that enables semantic post-processing on data. We measure the performance of the plugin and discuss work leveraging the new semantic post-processing functionality enabled. We further discuss the applicability of this approach for exascale burst buffer storage systems.
data handling, file organisation, parallel processing
K. Mehta, J. Bent, A. Torres, G. Grider and E. Gabriel, "A Plugin for HDF5 Using PLFS for Improved I/O Performance and Semantic Analysis," 2012 IEEE International Conference on Services Computing (SCC), Honolulu, HI, 2013, pp. 746-752.