This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Inverted File Partitioning Schemes in Multiple Disk Systems
February 1995 (vol. 6 no. 2)
pp. 142-153

Abstract—Multiple-disk I/O systems (disk arrays) have been an attractive approach to meet high performance I/O demands in data intensive applications such as information retrieval systems. When we partition and distribute files across multiple disks to exploit the potential for I/O parallelism, a balanced I/O workload distribution becomes important for good performance. Naturally, the performance of a parallel information retrieval system using an inverted file structure is affected by the partitioning scheme of the inverted file. In this paper, we propose two different partitioning schemes for an inverted file system for a shared-everything multiprocessor machine with multiple disks. We study the performance of these schemes by simulation under a number of workloads where the term frequencies in the documents are varied, the term frequencies in the queries are varied, the number of disks are varied and the multiprogramming level is varied.

[1] C. Baru and O. Frieder,“Implementing relational database operations in a cube-connected multicomputer,”inProc. IEEE Data Eng, Conf., 1987, pp. 6–43.
[2] A. Bhide and M. Stonebraker, "An Analysis of Three Transactions Processing Architectures," Proc. 14th Very Large Data Base Conf., pp. 339-350,Los Angeles, 1988.
[3] G. Copeland, W. Alexander, E. Boughter, and T. Keller,“Data placement in Bubba,”inProc. ACM SIGMOD Conf., May 1988, pp. 99–108.
[4] J. Cringean, R. England, G. Manson, and P. Willett,“Parallel text searching in serial files using a processor farm,”inProc. ACM SIGIR Conf., 1990, pp. 413–428.
[5] D. DeWitt et al., "GAMMA—A High Performance Backend Database Machine," Proc. 12th Conf. Very Large Data Bases,Kyoto, Japan, Aug. 1986.
[6] J. Fedorowicz,“Database performance evaluation in an indexed file environment,”ACM TODS, vol. 12, no. 1, pp. 85–110, Mar. 1987.
[7] O. Frieder and H. Siegelmann,“On the allocation of documents in multiprocessor information retrieval systems,”inProc. ACSIGIR Conf., 1991, pp. 230–239.
[8] S. Ghandeharizadeh and D. J. DeWitt,“A multiuser performance analysis of alternative declustering strategies,”inProc. 6th Int. Conf. Data Eng., 1990, pp. 466–475.
[9] R. H. Katz, G. A. Gibson, and D. A. Patterson,“Disk system architectures for high performance computing,”Proc. IEEE, vol. 77, no. 12, pp. 1842–1858, 1989.
[10] M. Kitsuregawa, H. Tanaka, and T. Moto-Oka,“Application of hash to database machine and its architecture,”New Generation Computing, vol. 1, no. 1, pp. 63–74, 1983.
[11] E. Omiecinski,“Performance analysis of a load balancing hash-join algorithm for a shared memory multiprocessor,”inProc. Very Large Database Conf., Sept. 1991, pp. 375–385.
[12] E. Omiecinski and E. Lin,“Hash-based and index-based join algorithms for cube and ring connected multicomputers,”IEEE Trans. Knowl. Data Eng., vol. 1, no. 3, pp. 329–342, Sept. 1989.
[13] C. Pogue and P. Willett,“Use of text signatures for document retrieval in a highly parallel environment,”Parallel Computing, vol. 4, pp. 259–268, June 1987.
[14] C. Pogue, E. Rasmussen, and P. Willett,“Searching and clustering of databases using the ICL distributed array processor,”Parallel Computing, vol. 8, pp. 399–407, Oct. 1988.
[15] E. M. Rasmussen,“Introduction: Parallel processing and information retrieval,”Inform. Processing, Manage., vol. 27, no. 4, pp. 255–263, 1991.
[16] A. L. Narasimha Reddy and P. Banerjee,“An evaluation of multiple-disk I/O systems,”IEEE Trans. Comput., vol. 38, no. 12, pp. 1680–1690, Dec. 1989.
[17] G. Salton and C. Buckley,“Parallel text search methods,”CACM, vol. 31, no. 2, pp. 202–215, Feb. 1988.
[18] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983.
[19] R. Sharma,“A generic machine for parallel information retrieval,”Inform. Processing, Manage., vol. 24, no. 3, pp. 223–235, 1989.
[20] C. Stanfill,“Partitioned posting files: A parallel inverted file structure for information retrieval,”inProc. ACM SIGIR Conf,, 1990, pp. 413–428.
[21] C. Stanfil, R. Thau, and D. Waltz,“A parallel indexed algorithm for information retrieval,”inProc. ACM SIGIR Conf., 1989, pp. 88–97.
[22] H. Stone,“Parallel querying of large databases: A case study,”IEEE Comput., vol. 20, no. 10, pp. 11–21, Oct. 1987.
[23] A. Tomasic and H. Garcia-Molina,“Performance of inverted indices in distributed text document retrieval system,”inProc. Parallel and Distrib. Info. Syst., 1993, pp. 8–17.
[24] D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley, 1973.

Citation:
Byeong-Soo Jeong, Edward Omiecinski, "Inverted File Partitioning Schemes in Multiple Disk Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 2, pp. 142-153, Feb. 1995, doi:10.1109/71.342125
Usage of this product signifies your acceptance of the Terms of Use.