2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06) Binary Search Join between an IR System and an RDBMS Hong Kong, China December 18-December 22 ISBN: 0-7695-2747-7
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2006.51
Integrating relational database technologies into Web Information Retrieval enables users to ask complex queries beyond traditional keyword searches over web pages. One approach to this integration is to have a software layer on top of an Information Retrieval (IR) system and an RDBMS (Relational Database Management System). A core operation in this top layer is to join the intermediate results from the two underlying systems (called the IR results and the DB results correspondingly) in order to produce the final ranked results for each query. Unfortunately, most conventional join algorithms are inefficient for this operation. In this paper, we propose one simple join algorithm called Binary Search Join (BSJ) for the operation of joining the IR results and the DB results. This algorithm takes advantage of the fact that the IR results are already ranked by relevance and that the DB results are already sorted by the join attribute. It scans the IR results and for each IR result tuple performs a binary search over the DB results. We analytically and empirically study the performance of BSJ in comparison with several conventional join algorithms on a repository of Chinese news web pages. The experiment results prove that BSJ works best in most cases.
Citation:
Ernest Dawei Wang, Qiong Luo, Dongqing Yang, Shiwei Tang, "Binary Search Join between an IR System and an RDBMS," wi, pp.782-785, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||