Scott Simmerman , NICS, Oak Ridge
James Osborne , UTK, Knoxville
Jian Huang , UTK, Knoxville
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2012.92
As multi-processor and multi-core technology becomes prevalent shared memory architectures with 1024 or more processing cores are becoming available for general purpose applications. As an early operator of such a system, the Remote Data Analysis and Visualization (RDAV) center at the University of Tennessee observed a host of user applications needing to scale up their computation by running many concurrent instances of generic codes. This is not a typical way of using HPC systems and naive solutions supporting such needs would cause significant issues that hamper scalability and stability of the system. We at the RDAV center developed a software package called Eden to manage large numbers of concurrent serial jobs with high throughput for any such application. In this article, we describe the motivation and technical nature of Eden and report representative actual use cases that we have collected during the past two years.
Scott Simmerman, James Osborne, Jian Huang, "Eden: Simplified Management of Atypical HPC Jobs", Computing in Science & Engineering, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/MCSE.2012.92