This Article 
 Bibliographic References 
 Add to: 
Performance Analysis of Three Text-Join Algorithms
May/June 1998 (vol. 10 no. 3)
pp. 477-492

Abstract—When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins—joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.

[1] D. Beech, P. Chellone, and C. Ellis, "An ADT Approach to Full Text," ISO/IEC JTC1/SC21/WG3 DBL CBR-57, 1992.
[2] P.A. Bernstein, N. Goodman, E. Wong, C. Reeve, and J.B. Rothnie, “Query Processing in a System for Distributed Databases,” ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-625, Dec. 1981.
[3] C. Buckley, G. Salton, and J. Allan, "Automatic Retrieval with Locality Information Using Smart," Proc. First Text Retrieval Conf., pp. 59-72,Gaithersburg, Md., Mar. 1993.
[4] U. Dayal and H-Y. Hwang, "View Definition and Generalization for Database Integration in a Multidatabase system," IEEE Trans. Software Eng., vol. 10, no. 6, pp. 628-644, Nov. 1984.
[5] W. Du, R. Krishnamurthy, and M.C. Shan, "Query Optimization in a Heterogeneous DBMS," Proc. 18th Conf. Very Large Databases,Vancouver, B.C., Canada, Morgan Kaufmann, Aug. 1992.
[6] S. Dumais and J. Nielson, "Automating the Assignment of Submitted Manuscripts to Reviewers," Proc. ACM SIGIR Conf.,Copenhagen, June 1992.
[7] S. Ghose, "File Organization: The Consecutive Retrieval Property," Comm. ACM, vol. 15, no. 9, pp. 802-808, Sept. 1972.
[8] D. Harman, "Overview of the First Text Retrieval Conference," D. Harman, ed., Computer Systems Technology, U.S. Dept. of Commerce, National Institute of Statistics&Tech nology, 1993.
[9] W. Litwin, L. Mark, and N. Roussopoulos, "Interoperability of Multiple Autonomous Databases," ACM Computing Surveys, vol. 22, no. 3, 1990, pp. 267-293.
[10] C. Liu, Introduction to Combinatorial Mathematics. McGraw-Hill, 1968.
[11] L. Lilian and B. Bhargava, "A Scheme for Batch Verification of Integrity Assertions in a Database System," IEEE Trans. Software Eng., vol. 10, no. 6, pp. 664-680, Nov. 1984.
[12] W. Meng and C. Yu, "Query Processing in Multidatabase Systems," Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., chapter 27, pp. 551-572. Addison-Wesley/ACM Press, 1995.
[13] W. Meng, C. Yu, and W. Kim, "A Theory of Translation from Relational Queries to Hierarchical Queries," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 2, pp. 228-245, Apr. 1995.
[14] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983.
[15] L. Saxton and V. Raghavan, "Design of an Integrated Information Retrieval/Database Management System," IEEE Trans. Knowledge and Data Eng., vol. 2, no. 2, pp. 210-219, June 1990.
[16] A.P. Seth and J.A. Larson,“Federated database systems for managing distributed, heterogeneous andautonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 184-236, September 1990.
[17] R. Swaminathan and D. Wagner, "On the Consecutive-Retrieval Problem," SIAM J. Computing, vol. 23, no. 2, pp. 398-414, Apr. 1994.
[18] A. Tomasic, H. Garcia-Molina, and K. Shoens, "Incremental Updates of Inverted Lists for Text Document Retrieval," Proc. ACM SIGMOD Conf., pp. 289-300,Minneapolis, May 1994.
[19] C. Yu, Y. Zhang, W. Meng, W. Kim, G. Wang, T. Pham, and S. Dao, "Translation of Object-Oriented Queries to Relational Queries," Proc. IEEE Conf. Data Eng., pp. 90-97,Taipei, Taiwan, Mar. 1995.

Index Terms:
Query processing, textual database, information retrieval, join algorithm, multidatabase.
Weiyi Meng, Clement Yu, Wei Wang, Naphtali Rishe, "Performance Analysis of Three Text-Join Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 3, pp. 477-492, May-June 1998, doi:10.1109/69.687979
Usage of this product signifies your acceptance of the Terms of Use.