|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 11th International Conference on Data Mining
S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy
Vancouver, Canada
December 11-December 14
ISBN: 978-0-7695-4408-3
| ASCII Text | x | ||
| Ye Jin, Sriram Lakshminarasimhan, Neil Shah, Zhenhuan Gong, C.S. Chang, Jackie Chen, Stephane Ethier, Hemanth Kolla, Seung-Hoe Ku, Scott Klasky, Robert Latham, Robert Ross, Karen Schuchardt, Nagiza F. Samatova, "S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy," Data Mining, IEEE International Conference on, pp. 290-299, 2011 IEEE 11th International Conference on Data Mining, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2011.138, author = {Ye Jin and Sriram Lakshminarasimhan and Neil Shah and Zhenhuan Gong and C.S. Chang and Jackie Chen and Stephane Ethier and Hemanth Kolla and Seung-Hoe Ku and Scott Klasky and Robert Latham and Robert Ross and Karen Schuchardt and Nagiza F. Samatova}, title = {S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2011}, issn = {1550-4786}, pages = {290-299}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.138}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy SN - 1550-4786 SP290 EP299 A1 - Ye Jin, A1 - Sriram Lakshminarasimhan, A1 - Neil Shah, A1 - Zhenhuan Gong, A1 - C.S. Chang, A1 - Jackie Chen, A1 - Stephane Ethier, A1 - Hemanth Kolla, A1 - Seung-Hoe Ku, A1 - Scott Klasky, A1 - Robert Latham, A1 - Robert Ross, A1 - Karen Schuchardt, A1 - Nagiza F. Samatova, PY - 2011 KW - preconditioners for data mining KW - data reduction KW - data mining over decompressed data KW - in situ data analytics KW - extreme-scale data analytics VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.138
The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, loss less compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behind our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational "dwarfs" (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying "solver" for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-$calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SW)), robustly achieved a 4-to 5-fold data reduction-while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0.01 Normalized RMSE, and higher than 0.99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.
Index Terms:
preconditioners for data mining, data reduction, data mining over decompressed data, in situ data analytics, extreme-scale data analytics
Citation:
Ye Jin, Sriram Lakshminarasimhan, Neil Shah, Zhenhuan Gong, C.S. Chang, Jackie Chen, Stephane Ethier, Hemanth Kolla, Seung-Hoe Ku, Scott Klasky, Robert Latham, Robert Ross, Karen Schuchardt, Nagiza F. Samatova, "S-preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy," icdm, pp.290-299, 2011 IEEE 11th International Conference on Data Mining, 2011
Usage of this product signifies your acceptance of the Terms of Use.
