This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Logic and Computational Complexity for Boolean Information Retrieval
December 2006 (vol. 18 no. 12)
pp. 1659-1666
We study the complexity of query satisfiability and entailment for the Boolean Information Retrieval models {\cal WP} and {\cal AWP} using techniques from propositional logic and computational complexity. {\cal WP} and {\cal AWP} can be used to represent and query textual information under the Boolean model using the concept of attribute with values of type text, the concept of word, and word proximity constraints. Variations of {\cal WP} and {\cal AWP} are in use in most deployed digital libraries using the Boolean model, text extenders for relational database systems (e.g., Oracle 10g), search engines, and P2P systems for information retrieval and filtering.

[1] S. Amer-Yahia, C. Botev, and J. Shanmugasundaram, “TeXQuery: A Full-Text Search Extension to Query,” Proc. 13th Int'l World Wide Web Conf., pp.583-594, 2004.
[2] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley, 1999.
[3] H. Balakrishnan, M.F. Kaashoek, D.R. Karger, R. Morris, and I. Stoica, “Looking Up Data in P2P Systems,” Comm. ACM, vol. 46, no. 2, pp. 43-48, 2003.
[4] M. Benedikt, L. Libkin, T. Schwentick, and L. Segoufin, “Definable Relations and First-Order Query Languages over Strings,” J. ACM, vol. 50, no. 5, pp. 694-751, 2003.
[5] J. Callan, W. Croft, and S. Harding, “The INQUERY Retrieval System,” Proc. Third Int'l Conf. Database and Expert Systems Applications, pp. 78-83, 1992.
[6] A. Campailla, S. Chaki, E. Clarke, S. Jha, and H. Veith, “Efficient Filtering in Publish Subscribe Systems Using Binary Decision Diagrams,” Proc. 23rd Int'l Conf. Software Eng. (ICSE '01), pp. 443-452, May 2001.
[7] A. Carzaniga, D.S. Rosenblum, and A.L. Wolf, “Achieving Scalability and Expressiveness in an Internet-Scale Event Notification Service,” Proc. 19th ACM Symp. Principles of Distributed Computing (PODC '00), pp. 219-227, 2000.
[8] K.C.-C. Chang, “Query and Data Mapping across Heterogeneous Information Sources,” PhD thesis, Stanford Univ., Jan. 2001.
[9] K.C.-C. Chang, H. Garcia-Molina, and A. Paepcke, “Boolean Query Mapping across Heterogeneous Information Sources,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 4, pp. 515-521, 1996.
[10] K.C.-C. Chang, H. Garcia-Molina, and A. Paepcke, “Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System,” ACM Trans. Information Systems, vol. 17, no. 1, pp. 1-39, 1999.
[11] S. Chaudhuri, R. Ramakrishnan, and G. Weikum, “Integrating DB and IR Technologies: What is the Sound of One Hand Clapping?” Proc. Second Biennial Conf. Innovative Data Systems Research, pp. 1-12, 2005.
[12] T. Chinenyanga and N. Kushmerick, “Expressive Retrieval from XML Documents,” Proc. ACM SIGIR '01, Sept. 2001.
[13] W.W. Cohen and “WHIRL: A Word-Based Information Representation Language,” Artificial Intelligence, vol. 118, nos. 1-2, pp.163-196, 2000.
[14] R. Dechter, Constraint Processing. Morgan Kaufmann, 2003.
[15] R. Dechter, I. Meiri, and J. Pearl, “Temporal Constraint Networks,” Artificial Intelligence, special volume on knowledge representation, vol. 49, nos. 1-3, pp. 61-95, 1991.
[16] N. Fuhr and K. Großjohann, “XIRQL: An XML Query Language Based on Information Retrieval Concepts,” ACM Trans. Information Systems, vol. 22, no. 2, pp. 313-356, Apr. 2004.
[17] S. Idreos, C. Tryfonopoulos, M. Koubarakis, and Y. Drougas, “Query Processing in Super-Peer Networks with Languages Based on Information Retrieval: the P2P-DIET Approach,” Proc. Int'l Workshop Peer-to-Peer Computing and Databases (P2P&DB), Mar. 2004.
[18] M. Koubarakis, “The Complexity of Query Evaluation in Indefinite Temporal Constraint Databases,” Theoretical Computer Science, L.V.S. Lakshmanan, ed., special issue on uncertainty in databases and deductive systems, vol. 171, pp. 25-60, Jan. 1997.
[19] M. Koubarakis, T. Koutris, C. Tryfonopoulos, and P. Raftopoulou, “Information Alert in Distributed Digital Libraries: The Models, Languages, and Architecture of DIAS,” Proc. Sixth European Conf. Research and Advanced Technology for Digital Libraries (ECDL), pp.527-542, Sept. 2002.
[20] M. Koubarakis, C. Tryfonopoulos, S. Idreos, and Y. Drougas, “Selective Information Dissemination in P2P Networks: Problems and Solutions,” SIGMOD Record, special issue on peer-to-peer data management, vol. 32, no. 3, pp. 71-76, 2003.
[21] M. Koubarakis, C. Tryfonopoulos, P. Raftopoulou, and T. Koutris, “Data Models and Languages for Agent-Based Textual Information Dissemination,” Proc. Sixth Int'l Workshop Cooperative Information Agents (CIA), pp. 179-193, Sept. 2002.
[22] G. Navarro and R. Baeza-Yates, “Proximal Nodes: A Model to Query Document Databases by Content and Structure,” ACM Trans. Information Systems, vol. 15, no. 4, pp. 400-435, 1997.
[23] U. Pfeifer, N. Fuhr, and T. Huynh, “Searching Structured Documents with the Enhanced Retrieval Functionality of FreeWAIS-sf and SFgate,” Computer Networks and ISDN Systems, vol. 27, no. 6, pp. 1027-1036, 1995.
[24] P. Revesz, “A Closed Form Evaluation for Datalog Queries with Integer (Gap)-Order Constraints,” Theoretical Computer Science, vol. 116, no. 1, pp. 117-149, 1993.
[25] P. Revesz, Introduction to Constraint Databases. Springer, 2002.
[26] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison Wesley, 1995.
[27] A. Theobald and G. Weikum, “Adding Relevance to XML,” WebDB (Selected Papers), pp. 105-124, 2000.
[28] C. Tryfonopoulos, S. Idreos, and M. Koubarakis, “LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs,” Proc. Ninth European Conf. Research and Advanced Technology for Digital Libraries (ECDL), pp. 25-36, Sept. 2005.
[29] C. Tryfonopoulos, S. Idreos, and M. Koubarakis, “Publish/Subscribe Functionality in IR Environments Using Structured Overlay Networks,” Proc. 28th Ann. Int'l ACM SIGIR Conf., pp.322-329, Aug. 2005.
[30] C. Tryfonopoulos, M. Koubarakis, and Y. Drougas, “Filtering Algorithms for Information Retrieval Models with Named Attributes and Proximity Operators,” Proc. 27th Ann. Int'l ACM SIGIR Conf., pp. 313-320, July 2004.

Index Terms:
Boolean information retrieval, computational complexity, data models, query languages, satisfiability, entailment, proximity.
Citation:
Manolis Koubarakis, Spiros Skiadopoulos, Christos Tryfonopoulos, "Logic and Computational Complexity for Boolean Information Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1659-1666, Dec. 2006, doi:10.1109/TKDE.2006.193
Usage of this product signifies your acceptance of the Terms of Use.