The Community for Technology Leaders
Web Intelligence, IEEE / WIC / ACM International Conference on (2006)
Hong Kong, China
Dec. 18, 2006 to Dec. 22, 2006
ISBN: 0-7695-2747-7
pp: 782-785
Ernest Dawei Wang , Peking University, China
Qiong Luo , HKUST, China
Dongqing Yang , Peking University, China
Shiwei Tang , Peking University, China
ABSTRACT
Integrating relational database technologies into Web Information Retrieval enables users to ask complex queries beyond traditional keyword searches over web pages. One approach to this integration is to have a software layer on top of an Information Retrieval (IR) system and an RDBMS (Relational Database Management System). A core operation in this top layer is to join the intermediate results from the two underlying systems (called the IR results and the DB results correspondingly) in order to produce the final ranked results for each query. Unfortunately, most conventional join algorithms are inefficient for this operation. <p>In this paper, we propose one simple join algorithm called Binary Search Join (BSJ) for the operation of joining the IR results and the DB results. This algorithm takes advantage of the fact that the IR results are already ranked by relevance and that the DB results are already sorted by the join attribute. It scans the IR results and for each IR result tuple performs a binary search over the DB results. We analytically and empirically study the performance of BSJ in comparison with several conventional join algorithms on a repository of Chinese news web pages. The experiment results prove that BSJ works best in most cases.</p>
INDEX TERMS
null
CITATION

D. Yang, S. Tang, Q. Luo and E. D. Wang, "Binary Search Join between an IR System and an RDBMS," 2006 IEEE/WIC/ACM International Conference on Web Intelligence(WI), Hong Kong, 2006, pp. 782-785.
doi:10.1109/WI.2006.51
80 ms
(Ver 3.3 (11022016))