Issue No. 05 - May (1994 vol. 5)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.282561
<p>During the past decade, parallel database systems have gained increased popularity dueto their high performance, scalability, and availability characteristics. With the predictedfuture database sizes and complexity of queries, the scalability of these systems tohundreds and thousands of processors is essential for satisfying the projected demand.Several studies have repeatedly demonstrated that both the performance and scalabilityof a parallel database system are contingent on the physical layout of the data acrossthe processors of the system. If the data are not declustered appropriately, theexecution of an operation might waste system resources, reducing the overall processingcapability of the system. With earlier, single-attribute partitioning mechanisms such asthose found in the Tandem, Teradata, Gamma, and Bubba parallel database systems,range selections on any attribute other than the partitioning attribute must be sent to allprocessors containing tuples of the relation, while range selections on the partitioningattribute can be directed to only a subset of the processors. Although using all theprocessors for an operation is reasonable for resource intensive operations, directing aquery with minimal resource requirements to processors that contain no relevant tupleswastes CPU cycles, communication bandwidth, and I/O bandwidth. As a solution, thispaper describes a new partitioning strategy, multiattribute grid declustering (MAGIC),which can use two or more attributes of a relation to decluster its tuples across multipleprocessors and disks. In addition, MAGIC declustering, unlike other multiattributepartitioning mechanisms that have been proposed, is able to support range selections aswell as exact match selections on each of the partitioning attributes. This capabilityenables a greater variety of selection operations to be directed to a restricted subset ofthe processors in the system. Finally, MAGIC partitions each relation based on theresource requirements of the queries that constitute the workload for the relation and theprocessing capacity of the system in order to ensure that the proper number ofprocessors are used to execute queries that reference the relation.</p>
Index Termsmultiprocessing systems; database management systems; special purpose computers;parallel programming; distributed databases; MAGIC; multiattribute declusteringmechanism; multiprocessor database machines; parallel database systems; partitioningattribute; multiattribute grid declustering; shared-nothing architecture; data placement
D. DeWitt and S. Ghandeharizadeh, "MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines," in IEEE Transactions on Parallel & Distributed Systems, vol. 5, no. , pp. 509-524, 1994.