loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications
Rhodes Island, Greece
April 25-April 29
ISBN: 1-4244-0054-6
Xi Zhang, Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
T. Kurc, Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
J. Saltz, Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
S. Parthasarathy, Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper, we present a scalable sampling implementation that supports efficient, multi-dimensional spatio-temporal sample generation on dynamic, large scale datasets stored on a storage cluster The proposed algorithm leverages Hilbert space-filling curves in order to provide an approximate linear order of multidimensional data while maintaining spatial locality. This new implementation is then bootstrapped on top of our previous implementation, which efficiently samples large datasets along a single dimension (e.g., time), thereby realizing a service for spatio-temporal sampling. We evaluate the performance of our approach comparing it to the popular R-tree based technique. The experimental results show that our approach achieves up to an order of magnitude higher efficiency and scalability.
Index Terms:
Hilbert space-filling curves, multidimensional data sampling service, large scale data analysis, scalable sampling, multidimensional spatiotemporal sample generation
Citation:
Xi Zhang, T. Kurc, J. Saltz, S. Parthasarathy, "Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications," ipdps, pp.58, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
Usage of this product signifies your acceptance of the Terms of Use.