This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Bulk-Loading of Gridfiles
May-June 1997 (vol. 9 no. 3)
pp. 410-420

Abstract—This paper considers the problem of bulk-loading large data sets for the gridfile multiattribute indexing technique. We propose a rectilinear partitioning algorithm that heuristically seeks to minimize the size of the gridfile needed to ensure no bucket overflows. Empirical studies on both synthetic data sets and on data sets drawn from computational fluid dynamics applications demonstrate that our algorithm is very efficient, and is able to handle large data sets. In addition, we present an algorithm for bulk-loading data sets too large to fit in main memory. Utilizing a sort of the entire data set it creates a gridfile without incurring any overflows.

[1] T.H. Horowitz and S. Sahni, Fundamentals of Computer Algorithms. Computer Science Press, 1978.
[2] J. Li, D. Rotem, and J. Srivastave, "Algorithms for Loading Parallel Grid Files," Proc. ACM SIGMOD 1993, pp. 347-356,Washington, D.C., 1993.
[3] D.J. Mavriplis, "Algebraic Turbulence Modeling for Unstructured and Adaptive Meshes," Am. Inst. of Aeronautics and Astronautics (AIAA) J., vol. 29, no. 12, pp. 2,086-2,093, Dec. 1991.
[4] D.M. Nicol, "Rectilinear Partitioning of Irregular Data Parallel Computations," ICASE Report 91-55, NASA Contractor Report #187601, July 1991, J. Parallel and Distributed Computation, vol. 23, no. 2, pp. 119-134, Nov.1994.
[5] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[6] A.L. Rosenberg and L. Snyder, "Time and Space Optimality in B-Trees," ACM Trans. Database Systems, vol. 6, no. 1, Mar. 1981.
[7] S. Seshadri, "Probalistic Method in Query Processing,," PhD thesis, Dept. of Computer Science, Univ. of Wisconsin-Madison, 1992.
[8] S. Seshadri and J.F. Naughton, "Sampling Issues in Parallel Database Systems," Proc. Third Int'l Conf. Extending Database Technology,Vienna, Austria, Mar. 1992.

Index Terms:
Bulk loading, databases, dynamic programming, gridfile, multidimensional indexing, rectilinear partitioning.
Citation:
Scott T. Leutenegger, David M. Nicol, "Efficient Bulk-Loading of Gridfiles," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 3, pp. 410-420, May-June 1997, doi:10.1109/69.599930
Usage of this product signifies your acceptance of the Terms of Use.