Issue No. 04 - April (2013 vol. 24)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.178
B. Veeravalli , Dept. of Electr. & Comput. Eng., Nat. Univ. of Singapore, Singapore, Singapore
Yang Wang , IBM Center for Adv. Studies (CAS Atlantic), Univ. of New Brunswick, Fredericton, NB, Canada
Chen-Khong Tham , Dept. of Electr. & Comput. Eng., Nat. Univ. of Singapore, Singapore, Singapore
In this paper, we study the strategies for efficiently achieving data staging and caching on a set of vantage sites in a cloud system with a minimum cost. Unlike the traditional research, we do not intend to identify the access patterns to facilitate the future requests. Instead, with such a kind of information presumably known in advance, our goal is to efficiently stage the shared data items to predetermined sites at advocated time instants to align with the patterns while minimizing the monetary costs for caching and transmitting the requested data items. To this end, we follow the cost and network models in  and extend the analysis to multiple data items, each with single or multiple copies. Our results show that under homogeneous cost model, when the ratio of transmission cost and caching cost is low, a single copy of each data item can efficiently serve all the user requests. While in multicopy situation, we also consider the tradeoff between the transmission cost and caching cost by controlling the upper bounds of transmissions and copies. The upper bound can be given either on per-item basis or on all-item basis. We present efficient optimal solutions based on dynamic programming techniques to all these cases provided that the upper bound is polynomially bounded by the number of service requests and the number of distinct data items. In addition to the homogeneous cost model, we also briefly discuss this problem under a heterogeneous cost model with some simple yet practical restrictions and present a 2-approximation algorithm to the general case. We validate our findings by implementing a data staging solver, whereby conducting extensive simulation studies on the behaviors of the algorithms.
dynamic programming, approximation theory, cache storage, cloud computing, data staging solver, data staging algorithms, data caching, cloud system, transmission cost, caching cost, dynamic programming techniques, homogeneous cost model, 2-approximation algorithm, Prediction algorithms, Distributed databases, Data models, Upper bound, Computational modeling, Bandwidth, Cloud computing, data placement and migration, Cloud computing, data staging and caching, resource constraints
B. Veeravalli, Yang Wang, Chen-Khong Tham, "On Data Staging Algorithms for Shared Data Accesses in Clouds", IEEE Transactions on Parallel & Distributed Systems, vol. 24, no. , pp. 825-838, April 2013, doi:10.1109/TPDS.2012.178