loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2
Servicing range queries on multidimensional datasets with partial replicas
Cardiff, Wales, UK
May 09-May 12
ISBN: 0-7803-9074-1
L. Weng, Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
U. Catalyurek, COPPE, Univ. Fed. do Rio de Janeiro, Brazil
T. Kurc, COPPE, Univ. Fed. do Rio de Janeiro, Brazil
Gagan Agrawal, COPPE, Univ. Fed. do Rio de Janeiro, Brazil
J. Saltz, Dept. of Comput. Sci., Nat. Univ. of Ireland, Cork, Ireland
Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.
Citation:
L. Weng, U. Catalyurek, T. Kurc, Gagan Agrawal, J. Saltz, "Servicing range queries on multidimensional datasets with partial replicas," ccgrid, vol. 2, pp.726-733, Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, 2005
Usage of this product signifies your acceptance of the Terms of Use.