The Community for Technology Leaders
2013 IEEE International Conference on Cluster Computing (CLUSTER) (2012)
Beijing, China China
Sept. 24, 2012 to Sept. 28, 2012
ISBN: 978-1-4673-2422-9
pp: 209-219
ABSTRACT
The ability to efficiently handle massive amounts of data is necessary for the continuing development towards exascale scientific data-mining applications and database systems. Unfortunately, recent years have shown a growing gap between the size and complexity of data produced from scientific applications and the limited I/O bandwidth available on modern high-performance computing systems. Utilizing data compression in order to lower the degree of I/O activity offers a promising means to addressing this problem. However, the standard compression algorithms previously explored for such use offer limited gains on both the end-to-end throughput and storage fronts. In this paper, we introduce an in-situ compression scheme aimed at improving end-to-end I/O throughput as well as reduction of dataset size. Our technique, PRIMACY (Preconditioning Id-MApper for Compressing incompressibility), acts as a preconditioner for standard compression libraries by modifying representation of original floating-point scientific data to increase byte-level repeatability, allowing standard loss less compressors to take advantage of their entropy-based byte-level encoding schemes. We additionally present a theoretical model for compression efficiency in high-performance computing environments and evaluate the efficiency of our approach via comparative analysis. Based on our evaluations on 20 real-world scientific datasets, PRIMACY achieved up to 38% and 22% improvements upon standard end-to-end write and read throughputs respectively in addition to a 25% increase in compression ratios paired with 3-to-4-fold improvement in both compression and decompression throughput over general purpose compressors.
INDEX TERMS
Throughput, Standards, Compressors, Encoding, Pipelines, Bandwidth, Data models, I/O, Lossless Compression, Performance Modeling
CITATION
Neil Shah, Eric R. Schendel, Sriram Lakshminarasimhan, Saurabh V. Pendse, Terry Rogers, Nagiza F. Samatova, "Improving I/O Throughput with PRIMACY: Preconditioning ID-Mapper for Compressing Incompressibility", 2013 IEEE International Conference on Cluster Computing (CLUSTER), vol. 00, no. , pp. 209-219, 2012, doi:10.1109/CLUSTER.2012.16
97 ms
(Ver 3.3 (11022016))