This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalability Analysis of Declustering Methods for Multidimensional Range Queries
March/April 1998 (vol. 10 no. 2)
pp. 310-327

Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods—Disk Modulo and Fieldwise Xor—for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions.

[1] K.A.S. Abdel-Ghaffar and A. El. Abbadi, "Optimal Disk Allocation for Partial Match Queries," ACM Trans. Database Systems, vol. 18, no. 1, pp. 132-156, 1993.
[2] C.K. Baru et al., "DB2 Parallel Edition," IBM Systems J., vol. 34, no. 2, pp. 292-322, Apr. 1995.
[3] S. Berson, S. Ghandeharizadeh, R.R. Muntz, and X. Ju, “Staggered Striping in Multimedia Information Systems,” Proc. SIGMOD, 1994.
[4] H. Boral,W. Alexander,L. Clay,G. Copeland,S. Danforth,M. Franklin,B. Hart,M. Smith,, and P. Valduriez,“Prototyping Bubba, a highly parallel database system,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 4-24, Mar. 1990.
[5] F. Carino and P. Kostamaa, "Exegesis of DBC/1012 and P-90—Industrial Supercomputer Database Machines," Proc. PARLE 92, Parallel Architectures and Languages Europe, Lecture Notes on Computer Science, vol. 605, pp. 877-892, Springer-Verlag, Berlin, 1992.
[6] V. Catania, A. Puliafito, S. Riccobene, and L. Vita, "Design and Performance Analysis of a Disk Array System," IEEE Trans. Computers, vol. 44, no. 10, pp. 1,236-1,247, Oct. 1995.
[7] C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz, "Titan: A High-Performance Remote-Sensing Database," Technical Report CS-TR-3689 and UMIACS-TR-96-67, Univ. of Maryland, College Park, Md., Sept. 1996.
[8] L.T. Chen and D. Rotem, “Declustering Objects for Visualization,” Proc. Int'l Conf. Very Large Data Bases, pp. 85-96, 1993.
[9] G. Copeland, W. Alexander, E. Boughter, and T. Keller,“Data placement in Bubba,”inProc. ACM SIGMOD Conf., May 1988, pp. 99–108.
[10] P.F. Corbett and D.G. Feitelson, "Design and Implementation of the Vesta Parallel File System," Proc. SHPCC, Scalable High-Performance Computing Conf., pp. 63-70,Knoxville, Tenn., May 1994.
[11] D. DeWitt and J. Gray, “Parallel Database Systems: The Future of High-Performance Database Systems,” Comm. ACM, Vol. 35, No. 6, June 1992, pp. 85-98.
[12] D.J. DeWitt,S. Ghandeharizadeh,D.A. Schneider,A. Bricker,H.I. Hsiao,, and R. Rasmussen,“The gamma database machine project,” IEEE Trans. on Knowledge and Data Engineering, vol. 2, no. 1, pp. 44-62, Mar. 1990.
[13] D.J. DeWitt, M. Smith, and H. Boral, "A Single-User Performance Evaluation of the Teradata Database Machine," Proc. Second Workshop High-Performance Transaction Systems, pp. 245-176,Pacific Grove, Calif., Sept. 1987.
[14] "Oracle Parallel Server in the Digital Environment," technical report, Oracle, June 1994.
[15] H.C. Du and J.S. Sobolewski, "Disk Allocation for Product Files on Multiple Disk Systems," ACM Trans. Database Systems, vol. 7, Mar. 1982.
[16] C. Faloutsos, “Multiattribute Hashing Using Gray Codes,” Proc. 1986 ACM SIGMOD Conf., pp. 227–238, May 1986.
[17] C. Faloutsos and P. Bhagwat, “Declustering Using Fractals,” Proc. Int'l Conf. Parallel and Distributed Information Systems, pp. 18-25, 1993.
[18] C. Faloutsos and D. Metaxas, "Disk Allocation Methods Using Error Correcting Codes," IEEE Trans. Computers, Aug. 1991.
[19] C. Faloutsos and S. Roseman, "Fractals for Secondary Key Retrival," Proc. Symp. Principles of Database Systems, SIGMOD-SIGACT PODS, 1989.
[20] M.T. Fang, R.C.T. Lee, and C.C. Chang, "The Idea of Declustering and its Applications," Proc. Int'l Conf. Very Large Databases, 1986.
[21] J.C. French, A.K. Jones, and J.L. Pfaltz, "Summary of the Final Report of the NSF Workshop on Scientific Database Management, SIGMOD Record, vol. 19, no. 4, pp. 32-40, Dec. 1990.
[22] G.R. Ganger, "Disk Arrays: High-Performance, High-Reliability Storage Subsystems," Computer, vol. 27, no. 3, pp. 30-36, Mar. 1994.
[23] S. Ghandeharizadeh and D.J. DeWitt, "Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines," Proc. 16th VLDB Conf., pp. 481-492,Brisbane, Australia, 1990.
[24] S. Ghandeharizadeh, D.J. DeWitt, and W. Qureshi, "A Performance Analysis of Alternative Multi-Attribute Declustering Strategies," Proc. ACM SIGMOD Conf., pp. 195-204,San Diego, Calif., June 1992.
[25] "The Tandem Performance Group: A Benchmark of Nonstop SQL on the Debit Credit Transaction," Proc. ACM SIGMOD Conf., pp. 337-341,Chicago, June 1988.
[26] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[27] B. Himatsingka and J. Srivastava, "Performance Evaluation of Grid Based Multi-Attribute Record Declustering Methods," Proc. 10th Int'l Conf. Data Eng., pp. 356-365,Houston, IEEE CS Press, Feb. 1994.
[28] J. Huber, C.L. Elford, D.A. Reed, A.A. Chien, and D.S. Blumenthal, “PPFS: A High Performance Portable Parallel File System,” Proc. Ninth ACM Int'l Conf. Supercomputing, pp. 385–394, July 1995.
[29] H.V. Jagadish, "Linear Clustering of Objects with Multiple Attributes," Proc. Int'l Conf. Management of Data, pp. 332-342, ACM SIGMOD, 1990.
[30] B.W. Kernighan and S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," Bell System Technical J., vol. 49, no. 2, pp. 291-307, Feb. 1970.
[31] M.H. Kim and S. Pramanik, “Optimal File Distribution for Partial Match Retrieval,” Proc. ACM Int'l Conf. Management of Data, pp. 173-182, 1988.
[32] M.H. Kim and S. Pramanik, "On the Data Distribution Problems for Range Queries," Proc. Int'l Conf. Parallel Processing, pp. I91-I94, Aug. 1989.
[33] J. Li, J. Srivastava, and D. Rotem, “CMD: A Multidimensional Declustering Method for Parallel Database Systems,” Proc. Int'l Conf. Very Large Databases, pp. 3-14, 1992.
[34] W.C. Lin, R.C.T. Lee, and H.C. Du, "Common Properties of Some Multi-Attribute File Systems," IEEE Trans. Software Eng., vol. 5, no. 2, pp. 160-174, Mar. 1979.
[35] D.R. Liu and S. Shekhar, "A Similarity Graph-Based Approach to Declustering Problem and its Applications," Proc. 11th Int'l Conf. Data Eng., IEEE CS Press, 1995.
[36] R.W. Marx, "The TIGER System: Yesterday, Today, and Tomorrow," Cartography and Geographic Information Systems, vol. 17, no. 1, pp. 89-97, 1990.
[37] B. Moon, A. Acharya, and J. Saltz, “Study of Scalable Declustering Algorithms for Parallel Grid Files,” Proc. Int'l Parallel Processing Symp., 1996.
[38] B. Moon, H.V. Jagadish, C. Faloutsos, and J.H. Saltz, “Analysis of Clustering Properties of Hilbert Space-Filling Curve,” Technical Report No. CS-TR-3590, Univ. of Maryland Dept. of Computer Science, Mar. 1996.
[39] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[40] A. Orenstein and T. Merrett, "A Class of Data Structures for Associative Searching," Proc. Symp. Principles of Database Systems, pp. 181-190, SIGMOD-SIGACT PODS, 1984.
[41] A. Reddy and P. Banerjee, “Evaluation of Multiple-Disk I/O Systems,” IEEE Trans. Computers, vol. 38, pp. 1,680–1,690, Dec. 1989.
[42] R.L. Rivest, "Partial Match Retrieval Algorithms," SIAM J. Computing, vol. 5, no. 1, pp. 19-50, Mar. 1976.
[43] C.A. Shaffer, H. Samet, and R.C. Nelson, "QUILT: A Geographic Information System Based on Quadtrees," Int'l J. Geographical Information Systems, vol. 4, no. 2, pp. 103-131, 1990.
[44] M. Stonebraker, "The Case for Shared Nothing," Quarterly Bulletin, IEEE CS Technical Committee Database Engineering, vol. 9, no. 1, pp. 4-9, Mar. 1986.
[45] M. Stonebraker, “Sequoia 2000: A Reflection on the First Three Years,” IEEE Computational Science&Engineering, Vol. 1, No. 4,Winter 1994, pp. 63–72.
[46] Y.Y. Sung, "Performance Analysis of Disk Allocation Method for Cartesian Product Files," IEEE Trans. Software Eng., vol. 13, no. 9, pp. 1,018- 1,026, 1987.

Index Terms:
Multiattribute access methods, range query, file declustering, scalability, Disk Modulo, Fieldwise Xor, Hilbert curve-allocation method.
Citation:
Bongki Moon, Joel H. Saltz, "Scalability Analysis of Declustering Methods for Multidimensional Range Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 2, pp. 310-327, March-April 1998, doi:10.1109/69.683759
Usage of this product signifies your acceptance of the Terms of Use.