The Community for Technology Leaders
Green Image
Issue No. 06 - June (2012 vol. 24)
ISSN: 1041-4347
pp: 1014-1024
A. Moffat , Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
S. Pohl , Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
J. Zobel , Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
ABSTRACT
Extended Boolean retrieval (EBR) models were proposed nearly three decades ago, but have had little practical impact, despite their significant advantages compared to either ranked keyword or pure Boolean retrieval. In particular, EBR models produce meaningful rankings; their query model allows the representation of complex concepts in an and-or format; and they are scrutable, in that the score assigned to a document depends solely on the content of that document, unaffected by any collection statistics or other external factors. These characteristics make EBR models attractive in domains typified by medical and legal searching, where the emphasis is on iterative development of reproducible complex queries of dozens or even hundreds of terms. However, EBR is much more computationally expensive than the alternatives. We consider the implementation of the p-norm approach to EBR, and demonstrate that ideas used in the max-score and wand exact optimization techniques for ranked keyword retrieval can be adapted to allow selective bypass of documents via a low-cost screening process for this and similar retrieval models. We also propose term-independent bounds that are able to further reduce the number of score calculations for short, simple queries under the extended Boolean retrieval model. Together, these methods yield an overall saving from 50 to 80 percent of the evaluation cost on test queries drawn from biomedical search.
INDEX TERMS
statistical analysis, Boolean functions, document handling, information retrieval, keyword retrieval, efficient extended Boolean retrieval, EBR, complex concepts, document content, statistics collection, legal searching, medical searching, optimization techniques, Optimization, Query processing, Computational modeling, Biological system modeling, Law, Systematics, query processing., Document-at-a-time, efficiency, extended Boolean retrieval, p-norm
CITATION
A. Moffat, S. Pohl, J. Zobel, "Efficient Extended Boolean Retrieval", IEEE Transactions on Knowledge & Data Engineering, vol. 24, no. , pp. 1014-1024, June 2012, doi:10.1109/TKDE.2011.63
107 ms
(Ver 3.1 (10032016))