|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Weiyi Meng, Clement Yu, Wei Wang, Naphtali Rishe, "Performance Analysis of Three Text-Join Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 10, no. 3, pp. 477-492, May/June, 1998. | |||
| BibTex | x | ||
| @article{ 10.1109/69.687979, author = {Weiyi Meng and Clement Yu and Wei Wang and Naphtali Rishe}, title = {Performance Analysis of Three Text-Join Algorithms}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {10}, number = {3}, issn = {1041-4347}, year = {1998}, pages = {477-492}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.687979}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Performance Analysis of Three Text-Join Algorithms IS - 3 SN - 1041-4347 SP477 EP492 EPD - 477-492 A1 - Weiyi Meng, A1 - Clement Yu, A1 - Wei Wang, A1 - Naphtali Rishe, PY - 1998 KW - Query processing KW - textual database KW - information retrieval KW - join algorithm KW - multidatabase. VL - 10 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
Abstract—When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins—joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.
[1] D. Beech, P. Chellone, and C. Ellis, "An ADT Approach to Full Text," ISO/IEC JTC1/SC21/WG3 DBL CBR-57, 1992.
[2] P.A. Bernstein, N. Goodman, E. Wong, C. Reeve, and J.B. Rothnie, “Query Processing in a System for Distributed Databases,” ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-625, Dec. 1981.
[3] C. Buckley, G. Salton, and J. Allan, "Automatic Retrieval with Locality Information Using Smart," Proc. First Text Retrieval Conf., pp. 59-72,Gaithersburg, Md., Mar. 1993.
[4] U. Dayal and H-Y. Hwang, "View Definition and Generalization for Database Integration in a Multidatabase system," IEEE Trans. Software Eng., vol. 10, no. 6, pp. 628-644, Nov. 1984.
[5] W. Du, R. Krishnamurthy, and M.C. Shan, "Query Optimization in a Heterogeneous DBMS," Proc. 18th Conf. Very Large Databases,Vancouver, B.C., Canada, Morgan Kaufmann, Aug. 1992.
[6] S. Dumais and J. Nielson, "Automating the Assignment of Submitted Manuscripts to Reviewers," Proc. ACM SIGIR Conf.,Copenhagen, June 1992.
[7] S. Ghose, "File Organization: The Consecutive Retrieval Property," Comm. ACM, vol. 15, no. 9, pp. 802-808, Sept. 1972.
[8] D. Harman, "Overview of the First Text Retrieval Conference," D. Harman, ed., Computer Systems Technology, U.S. Dept. of Commerce, National Institute of Statistics&Tech nology, 1993.
[9] W. Litwin, L. Mark, and N. Roussopoulos, "Interoperability of Multiple Autonomous Databases," ACM Computing Surveys, vol. 22, no. 3, 1990, pp. 267-293.
[10] C. Liu, Introduction to Combinatorial Mathematics. McGraw-Hill, 1968.
[11] L. Lilian and B. Bhargava, "A Scheme for Batch Verification of Integrity Assertions in a Database System," IEEE Trans. Software Eng., vol. 10, no. 6, pp. 664-680, Nov. 1984.
[12] W. Meng and C. Yu, "Query Processing in Multidatabase Systems," Modern Database Systems: The Object Model, Interoperability, and Beyond, W. Kim, ed., chapter 27, pp. 551-572. Addison-Wesley/ACM Press, 1995.
[13] W. Meng, C. Yu, and W. Kim, "A Theory of Translation from Relational Queries to Hierarchical Queries," IEEE Trans. Knowledge and Data Eng., vol. 7, no. 2, pp. 228-245, Apr. 1995.
[14] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983.
[15] L. Saxton and V. Raghavan, "Design of an Integrated Information Retrieval/Database Management System," IEEE Trans. Knowledge and Data Eng., vol. 2, no. 2, pp. 210-219, June 1990.
[16] A.P. Seth and J.A. Larson,“Federated database systems for managing distributed, heterogeneous andautonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 184-236, September 1990.
[17] R. Swaminathan and D. Wagner, "On the Consecutive-Retrieval Problem," SIAM J. Computing, vol. 23, no. 2, pp. 398-414, Apr. 1994.
[18] A. Tomasic, H. Garcia-Molina, and K. Shoens, "Incremental Updates of Inverted Lists for Text Document Retrieval," Proc. ACM SIGMOD Conf., pp. 289-300,Minneapolis, May 1994.
[19] C. Yu, Y. Zhang, W. Meng, W. Kim, G. Wang, T. Pham, and S. Dao, "Translation of Object-Oriented Queries to Relational Queries," Proc. IEEE Conf. Data Eng., pp. 90-97,Taipei, Taiwan, Mar. 1995.

