Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) (2017)
Atlanta, Georgia, USA
June 5, 2017 to June 8, 2017
With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.
Production, Tools, Batteries, Measurement, Middleware, Servers, Degradation
L. Wan, M. Wolf, F. Wang, J. Y. Choi, G. Ostrouchov and S. Klasky, "Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System," 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, Georgia, USA, 2017, pp. 1022-1031.