2012 IEEE International Conference on Cluster Computing (2012)
Beijing, China China
Sept. 24, 2012 to Sept. 28, 2012
The ability to efficiently handle massive amounts of data is necessary for the continuing development towards exascale scientific data-mining applications and database systems. Unfortunately, recent years have shown a growing gap between the size and complexity of data produced from scientific applications and the limited I/O bandwidth available on modern high-performance computing systems. Utilizing data compression in order to lower the degree of I/O activity offers a promising means to addressing this problem. However, the standard compression algorithms previously explored for such use offer limited gains on both the end-to-end throughput and storage fronts. In this paper, we introduce an in-situ compression scheme aimed at improving end-to-end I/O throughput as well as reduction of dataset size. Our technique, PRIMACY (Preconditioning Id-MApper for Compressing incompressibility), acts as a preconditioner for standard compression libraries by modifying representation of original floating-point scientific data to increase byte-level repeatability, allowing standard loss less compressors to take advantage of their entropy-based byte-level encoding schemes. We additionally present a theoretical model for compression efficiency in high-performance computing environments and evaluate the efficiency of our approach via comparative analysis. Based on our evaluations on 20 real-world scientific datasets, PRIMACY achieved up to 38% and 22% improvements upon standard end-to-end write and read throughputs respectively in addition to a 25% increase in compression ratios paired with 3-to-4-fold improvement in both compression and decompression throughput over general purpose compressors.
Throughput, Standards, Compressors, Encoding, Pipelines, Bandwidth, Data models, I/O, Lossless Compression, Performance Modeling
N. Shah, E. R. Schendel, S. Lakshminarasimhan, S. V. Pendse, T. Rogers and N. F. Samatova, "Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility," 2012 IEEE International Conference on Cluster Computing(CLUSTER), Beijing, China China, 2012, pp. 209-219.