2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
Numerous scientific teams use HDF5 files to store very large datasets, which can be located at remote sites. The HDF5-iRODS module for the iRODS data grid system allows applications to read subsets of datasets without transferring the entire file to a local machine. This capability can result in substantial savings of both time and space. HDF5 is a unique technology that handles extremely large and complex data. Petabytes of remote sensing data collected by satellites, terabytes of computational results from nuclear testing models, and megabytes of high-resolution MRI brain scans are stored in HDF5 files. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The HDF5-iRODS module was developed for this purpose. The usefulness of the HDF5-iRODS module was verified for FLASH, one of the NCSA/SDSC Strategic Application Program (SAP) projects.A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the goal.
HDF5, iROD, Distributed, Data Grid, Remote access, Client/server model, large/complex data
Peter Cao, Mike Wan, "The HDF5-iRODS Module: A Data Grid System for Object Level Access", 2008 IEEE Fourth International Conference on eScience, vol. 00, no. , pp. 339-340, 2008, doi:10.1109/eScience.2008.99