This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Optimal Bucket Allocation Design of k-ary MKH Files for Partial Match Retrieval
January-February 1997 (vol. 9 no. 1)
pp. 148-160

Abstract—This paper first shows that the bucket allocation problem of an MKH (Multiple Key Hashing) file for partial match retrieval can be reduced to that of a smaller sized subfile, called the remainder of the file. And it is pointed out that the remainder type MKH file is the hardest MKH file for which to design an optimal allocation scheme. We then particularly concentrate on the allocation of an important remainder type MKH file; namely, the k-ary MKH file. We present various sufficient conditions on the number of available disks and the number of attributes for a k-ary MKH file to have a perfectly optimal allocation among the disks for partial match queries. Based upon these perfectly optimal allocations, we further present a heuristic method, called the CH (Cyclic Hashing) method, to produce near optimal allocations for the general k-ary MKH files. Finally, a comparison, by experiment, between the performances of the proposed method and an "ideal" perfectly optimal method, shows that the CH method is indeed satisfactorily good for the general k-ary MKH files.

[1] K.A.S. Abdel-Ghaffar and A. El. Abbadi, "Optimal Disk Allocation for Partial Match Queries," ACM Trans. Database Systems, vol. 18, no. 1, pp. 132-156, 1993.
[2] A.V. Aho and J.D. Ullman, "Optimal Partial-Match Retrieval When Fields Are Independently Specified," ACM Trans. Database Systems, vol. 4, no. 2, pp. 168-179, 1979.
[3] A. Bolour, "Optimality Properties of Multiple Key Hashing Functions," J. Assoc. Computing, vol. 26, no. 2, pp. 196-210, 1979.
[4] W.A. Burkhard, "Partial Match Hash Coding: Benefits of Redundancy," ACM Trans. Database Systems, vol. 4, no. 2, pp. 228-239, 1979.
[5] M.Y. Chan, "Multidisk File Design: An Analysis of Folding Buckets to Disks," BIT, vol. 24, pp. 262-268, 1984.
[6] M.Y. Chan, "A Note on Redundant Disk Allocation," IPL, vol. 20, pp. 121-123, 1985.
[7] C.C. Chang, "Optimal Information Retrieval When Queries Are Not Random," Information Sciences, vol. 34, pp. 199-223, 1984.
[8] C.C. Chang, "Application of Principal Component Analysis to Multidisk Concurrent Accessing," BIT, vol. 28, pp. 205-214, 1988.
[9] C.C. Chang and C.Y. Chen, "Gray Code as a Declustering Scheme for Concurrent Disk Retrieval," Information Science and Eng., vol. 13, no. 2, pp. 177-188, 1987.
[10] C.C. Chang and C.Y. Chen, "Symbolic Gray Code as a Data Allocation Scheme for Two-disk Systems," The Computer J.,U.K., vol. 35, no. 3, pp. 299-305, 1992.
[11] C.C. Chang, M.W. Du, and R.C.T. Lee, "Performance Analysis of Cartesian Product Files and Random Files," IEEE Trans. Software Eng., vol. 10, no. 1, pp. 88-99, 1984.
[12] C.C. Chang, R.C.T. Lee, and H.C. Du, "Some Properties of Cartesian Product Files," Proc. ACM-SIGMOD Conf., pp. 157-168, 1980.
[13] C.C. Chang and J.C. Shieh, "On the Complexity of File Allocation Problem," Proc. Int'l Conf. Foundation of Data Organization,Kyoto, Japan, pp. 113-115, May 1985.
[14] C.Y. Chen and H.F. Lin, "Optimality Criteria of the Disk Modulo Allocation Method for Cartesian Product Files," BIT, vol. 31, pp. 566-575, 1991.
[15] C.Y. Chen, H.F. Lin, R.C.T. Lee, and C.C. Chang, "Redundant MKH Files Design among Multiple Disks for Concurrent Partial Match Retrieval," The J. Systems and Software, 1996, to appear.
[16] H.C. Du, "Disk Allocation Methods for Binary Cartesian Product Files," BIT, vol. 26, pp. 138-147, 1986.
[17] H.C. Du and J.S. Sobolewski, "Disk Allocation for Product Files on Multiple Disk Systems," ACM Trans. Database Systems, vol. 7, Mar. 1982.
[18] C. Faloutsos and D. Metaxas, "Disk Allocation Methods Using Error Correcting Codes," IEEE Trans. Computers, Aug. 1991.
[19] M.T. Fang, R.C.T. Lee, and C.C. Chang, "The Idea of Declustering and its Applications," Proc. Int'l Conf. Very Large Databases, 1986.
[20] M.H. Kim and S. Pramanik, “Optimal File Distribution for Partial Match Retrieval,” Proc. ACM Int'l Conf. Management of Data, pp. 173-182, 1988.
[21] R.C.T. Lee and S.H. Tseng, "Multikey Sorting," Policy Analysis and Information Systems, vol. 3, no. 2, pp. 1-20, 1979.
[22] W.C. Lin, R.C.T. Lee, and H.C. Du, "Common Properties of Some Multi-Attribute File Systems," IEEE Trans. Software Eng., vol. 1, SE-5, no. 2, pp. 160-174, 1979.
[23] J.H. Liou and S.B. Yao, "Multi-Dimension Clustering for Database Organizations," Information Systems, vol. 2, no. 2, pp. 187-198, 1977.
[24] K. Ramamohanarao, J. Shepherd, and R. Sacks-Davis, "Multi-Attribute Hashing with Multiple File Copies for High Performance Partial-Match Retrieval," BIT, vol. 30, pp. 404-423, 1990.
[25] R.L. Rivest, "Partial-Match Retrieval Algorithms," SIAM J. Computing, vol. 14, no. 1, pp. 19-50, 1976.
[26] J.B. Rothnie and T. Lozano, “Attribute Based File Organization in a Paged Memory Environment,” Comm. ACM, vol. 17, no. 2, pp. 63–69, Feb. 1974.
[27] Y.Y. Sung, "Performance Analysis of Disk Allocation Method for Cartesian Product Files," IEEE Trans. Software Eng., vol. 13, no. 9, pp. 1,018- 1,026, 1987.
[28] C.Y. Tang, D.J. Buehrer, and R.C.T. Lee, "On the Complexity of Some Multiattribute File Design Problems," Information Systems, vol. 10, no. 1, pp. 21-25, 1985.

Index Terms:
Multidisk file design, bucket allocation problem, multiple key hashing files, partial match queries, optimal performances.
Citation:
C.y. Chen, H.f. Lin, C.c. Chang, R.c.t. Lee, "Optimal Bucket Allocation Design of k-ary MKH Files for Partial Match Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 9, no. 1, pp. 148-160, Jan.-Feb. 1997, doi:10.1109/69.567057
Usage of this product signifies your acceptance of the Terms of Use.