Cluster Computing and the Grid, IEEE International Symposium on (2009)
May 18, 2009 to May 21, 2009
From personal software to advanced systems, caching mechanisms have steadfastly been a ubiquitous means for reducing workloads. It is no surprise, then, that under the grid and cluster paradigms, middlewares and other large-scale applications often seek caching solutions. Among these distributed applications, scientific workflow management systems have gained ground towards mitigating the often painstaking process of composing sequences of scientific data sets and services to derive virtual data. In the past, workflow managers have relied on low-level system cache for reuse support. But in distributed query intensive environments, where high volumes of intermediate virtual data can potentially be stored anywhere on the grid, a novel cache structure is needed to efficiently facilitate workflow planning. In this paper, we describe an approach to combat the challenges of maintaining large, fast virtual data caches for workflow composition. A hierarchical structure is proposed for indexing scientific data with spatiotemporal annotations across grid nodes. Our experimental results show that our hierarchical index is scalable and outperforms a centralized indexing scheme by an exponential factor in query intensive environments.
workflow cache, grid workflows, scientific workflows, workflow management
D. Chiu and G. Agrawal, "Hierarchical Caches for Grid Workflows," Cluster Computing and the Grid, IEEE International Symposium on(CCGRID), Shanghai, China, 2009, pp. 228-235.