loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)
Binary Search Join between an IR System and an RDBMS
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2747-7
Ernest Dawei Wang, Peking University, China
Qiong Luo, HKUST, China
Dongqing Yang, Peking University, China
Shiwei Tang, Peking University, China
Integrating relational database technologies into Web Information Retrieval enables users to ask complex queries beyond traditional keyword searches over web pages. One approach to this integration is to have a software layer on top of an Information Retrieval (IR) system and an RDBMS (Relational Database Management System). A core operation in this top layer is to join the intermediate results from the two underlying systems (called the IR results and the DB results correspondingly) in order to produce the final ranked results for each query. Unfortunately, most conventional join algorithms are inefficient for this operation.

In this paper, we propose one simple join algorithm called Binary Search Join (BSJ) for the operation of joining the IR results and the DB results. This algorithm takes advantage of the fact that the IR results are already ranked by relevance and that the DB results are already sorted by the join attribute. It scans the IR results and for each IR result tuple performs a binary search over the DB results. We analytically and empirically study the performance of BSJ in comparison with several conventional join algorithms on a repository of Chinese news web pages. The experiment results prove that BSJ works best in most cases.

Citation:
Ernest Dawei Wang, Qiong Luo, Dongqing Yang, Shiwei Tang, "Binary Search Join between an IR System and an RDBMS," wi, pp.782-785, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.