This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines
May 1994 (vol. 5 no. 5)
pp. 509-524

During the past decade, parallel database systems have gained increased popularity dueto their high performance, scalability, and availability characteristics. With the predictedfuture database sizes and complexity of queries, the scalability of these systems tohundreds and thousands of processors is essential for satisfying the projected demand.Several studies have repeatedly demonstrated that both the performance and scalabilityof a parallel database system are contingent on the physical layout of the data acrossthe processors of the system. If the data are not declustered appropriately, theexecution of an operation might waste system resources, reducing the overall processingcapability of the system. With earlier, single-attribute partitioning mechanisms such asthose found in the Tandem, Teradata, Gamma, and Bubba parallel database systems,range selections on any attribute other than the partitioning attribute must be sent to allprocessors containing tuples of the relation, while range selections on the partitioningattribute can be directed to only a subset of the processors. Although using all theprocessors for an operation is reasonable for resource intensive operations, directing aquery with minimal resource requirements to processors that contain no relevant tupleswastes CPU cycles, communication bandwidth, and I/O bandwidth. As a solution, thispaper describes a new partitioning strategy, multiattribute grid declustering (MAGIC),which can use two or more attributes of a relation to decluster its tuples across multipleprocessors and disks. In addition, MAGIC declustering, unlike other multiattributepartitioning mechanisms that have been proposed, is able to support range selections aswell as exact match selections on each of the partitioning attributes. This capabilityenables a greater variety of selection operations to be directed to a restricted subset ofthe processors in the system. Finally, MAGIC partitions each relation based on theresource requirements of the queries that constitute the workload for the relation and theprocessing capacity of the system in order to ensure that the proper number ofprocessors are used to execute queries that reference the relation.

[1] W. Alexander and G. Copeland, "Process and dataflow control in distributed data-intensive systems," inProc. ACM SIGMOD Conf., Chicago, IL, June 1988.
[2] D. Bitton, D. J. DeWitt, and C. Turbyfill, "Benchmarking database systems--A systematic approach," inProc. 1983 Very Large Data-base conf., Oct. 1983.
[3] H. Boralet al., "Prototyping Bubba, a highly parallel database system,"IEEE Trans. Knowledge Data Eng., Mar. 1990.
[4] H. Boral and P. Faudemay, Eds.Database Machines: Database Processing Models in Parallel Processing. New York: Springer, 1989.
[5] G. Copeland, W. Alexander, E. Boughter, and T. Keller, "Data placement in bubba," inProc. ACM SIGMOD, Chicago, IL, June 1-3, 1988, pp. 99-109.
[6] D. DeWittet al., "The Gamma database machine project,"IEEE Trans. Knowledge Data Eng., vol. 1, no. 2, Mar. 1990.
[7] H. C. Du and J. S. Sobolewski, "Disk allocation for Cartesian product files on multiple-disk systems,"ACM Trans. Database Syst., vol. 7, no. 1, pp. 82-101, Mar. 1982.
[8] S. Ghandeharizadeh, "Physical database design in multiprocessor systems," Ph.D. dissertation, Univ. of Wisconsin, Madison, 1990.
[9] S. Ghandeharizadeh and D. DeWitt, "Hybrid-range partitioning strategy: A new declustering strategy for multiprocessor database machines," inProc. Int. Conf. Very Large Databases, Aug. 1990.
[10] S. Ghandeharizadeh and D. J. DeWitt, "Performance analysis of alternative declustering strategies," inProc. 6th Int. Conf. Data Eng., Los Angeles, CA, Feb. 1990.
[11] S. Ghandeharizadeh, D. DeWitt, and W. Qureshi, "A performance analysis of alternative multi-attribute declustering strategies," inProc. ACM SIGMOD Int. Conf. Management of Data, June 1992.
[12] S. Ghandeharizadeh, R. Meyer, G. Schultz, and J. Yackel, "Optimal processor assignment for parallel database design," inProc. 5th SIAM Conf. Parallel Processing for Scientific Computing, Mar. 1991.
[13] S. Ghandeharizadeh, R. Meyer, G. Schultz, and J. Yackel, "Optimal balanced partitions and a parallel database application,"Operation Research Soc. Amer. J. Computing, vol. 5, no. 2, Apr. 1993.
[14] G. Graefe, "Volcano: An extensible and parallel dataflow query processing system," Oregon Graduate Center, Beaverton, Comput. Science Tech. Rep., June 1989.
[15] H. Guttman, "R-trees: A dynamic index structure for spatial searching," inProc. ACM/SIGMOD Conf., 1984, pp. 47-56.
[16] H. Hsiao, "Availability in multiprocessor database machines," Ph.D. dissertation, University of Wisconsin, Madison, 1990.
[17] A. K. Hua and C. Lee, "An adaptive data placement scheme for parallel database computer systems," inProc. 16th Int. Conf. Very Large Databases, 1990, pp. 493-506.
[18] M. H. Kim and S. Pramanik, "Optimal file distribution for partial match retrieval," inProc. ACM SIGMOD Conf., 1988.
[19] S. Kirkpatrick, C. Gelatt, and M. Vecchi, "Optimization by simulated annealing," inScience 220, May 1983.
[20] M. Livny,DeNet User's Guide. University of Wisconsin, Madison, Version 1.0, May 1988.
[21] M. Livny, S. Khoshafian, and H. Boral, "Multi-disk management algorithms," inProc. SIGMETRICS, pp. 69-77, May 1987.
[22] J. Nieverglt, H. Hinterberger, and K. Sevcik, "The grid file: An adaptable, symmetric multikey file structure,"ACM TODS, Mar. 1984.
[23] E. Ozkarahan and M. Ouksel, "Dynamic and order preserving data partitioning for database machines," inProc. Int. Conf. Very Large Databases, 1985.
[24] D. A. Patterson, G. Gibson, and R. H. Katz, "A case for redundant arrays of inexpensive disks (RAID)," inProc. ACM SIGMOD, Chicago, IL, June 1-3, 1988, pp. 109-116.
[25] E. Rich,Artificial Intelligence. New York: McGraw-Hill, 1983.
[26] D. Ries and R. Epstein, "Evaluation of distribution criteria for distributed database systems," Univ. of California, Berkeley, UCB/ERL Tech. Rep. M78/22, May 1978.
[27] J. T. Robinson, "The k-D-B-tree: A search structure for large multidimensional dynamic indexes," inProc. ACM SIGMOD, 1981, pp. 10-18.
[28] D. Schneider, "Complex query processing in multiprocessor database machines," Ph.D. dissertation, Univ. of Wisconsin, Madison, 1990.
[29] T. Sellis, N. Roussopoulos, and C. Faloutsos, "The R+-tree: A dynamic index for multi-dimensional objects," inProc. VLDB, 1987.
[30] M. Stonebraker, "The design of XPRS," inProc. 14th Int. Conf. VLDB, pp. 318-330, Los Angeles, Aug. 1988.
[31] Tandem Performance Group, "A Benchmark of Non-Stop SQL on Debit-Credit Transaction,"Proc. ACM-SIGMOD, ACM, N.Y., 1988, pp. 337-341.
[32] T. Teorey and T. B. Pinkerton, "A comparative analysis of disk scheduling policies,"Commun. ACM, vol. 15, no. 3, pp. 177-184, Mar. 1972.
[33] Teradata Corp,DBC/1012 Data Base Computer System Manual, Nov. 1985. Teradata Corp. Document C10-0001-02, Release 2.0.

Index Terms:
Index Termsmultiprocessing systems; database management systems; special purpose computers;parallel programming; distributed databases; MAGIC; multiattribute declusteringmechanism; multiprocessor database machines; parallel database systems; partitioningattribute; multiattribute grid declustering; shared-nothing architecture; data placement
Citation:
S. Ghandeharizadeh, D.J. DeWitt, "MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 5, pp. 509-524, May 1994, doi:10.1109/71.282561
Usage of this product signifies your acceptance of the Terms of Use.