This Article 
 Bibliographic References 
 Add to: 
Domain-Specific Web Search with Keyword Spices
January 2004 (vol. 16 no. 1)
pp. 17-27

Abstract—Domain-specific Web search engines are effective tools for reducing the difficulty experienced when acquiring information from the Web. Existing methods for building domain-specific Web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by adding domain-specific keywords, called “keyword spices,” to the user's input query and forwarding it to a general-purpose Web search engine. Keyword spices can be effectively discovered from Web documents using machine learning technologies. This paper will describe domain-specific Web search engines that use keyword spices for locating recipes, restaurants, and used cars.

[1] D. Butler, Souped-Up Search Engines Nature, vol. 405, pp. 112-115, 2000.
[2] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, A Machine Learning Approach to Building Domain-Specific Search Engines Proc. 16th Int'l Joint Conf. Artificial Intelligence (IJCAI-99), pp. 662-667, 1999.
[3] W.W. Cohen, A Web-Based Information System that Reasons with Structured Collections of Text Proc. Second Int'l Conf. Autonomous Agents (Agents '98), pp. 116-123, 1998.
[4] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, Learning to Extract Symbolic Knowledge from the World Wide Web Proc. 15th Nat'l Conf. Artificial Intelligence (AAAI-98), pp. 509-516, 1998
[5] O. Etzioni, Moving Up the Information Food Chain: Deploying Softbots on the World Wide Web Proc. 13th Nat'l Conf. Artificial Intelligence (AAAI-96), pp. 1322-1326, 1996.
[6] E. Selberg and O. Etzioni, "The MetaCrawler Architecture for Resource Aggregation on the Web," IEEE Expert, Jan.-Feb. 1997, pp. 11-14; also available at.
[7] J. Shakes, M. Langheinrich, and O. Etzioni, Dynamic Reference Sifting: A Case Study in the Homepage Domain Proc. Sixth Int'l World Wide Web Conf. (WWW6), pp. 189-200 1997.
[8] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley, 1999.
[9] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[10] D.D. Lewis and M. Ringuette, A Comparison of Two Learning Algorithms for Text Categorization Proc. Third Ann. Symp. Document Analysis and Information Retrieval (SDAIR-94), pp. 81-93, 1994.
[11] T. Joachims, A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization Proc. 14th Int'l Conf. Machine Learning (ICML '97), pp. 143-151, 1997.
[12] T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features Proc. 10th European Conf. Machine Learning (ECML-98), pp. 137-142, 1998.
[13] S. Oyama, T. Kokubo, T. Ishida, T. Yamada, and Y. Kitamura, Keyword Spices: A New Method for Building Domain-Specific Web Search Engines Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI-01), pp. 1457-1463, 2001.
[14] J.R. Quinlan, Induction of Decision Trees Machine Learning, vol. 1, pp. 81-106, 1986.
[15] L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Wadsworth, 1984.
[16] J. Mingers, An Empirical Comparison of Selection Measures for Decision-Tree Induction Machine Learning, vol. 3, pp. 319-342, 1989.
[17] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[18] W.M. Shaw Jr., R. Burgin, and P. Howell, Performance Standards and Evaluations in IR Test Collections: Cluster-Based Retrieval Models Information Processing&Management, vol. 33, no. 1, pp. 1-14, 1997.
[19] C.J. van Rijsbergen, Information Retrieval. Butterworths, 1979.
[20] G. Salton and C. Buckley, Improving Retrieval Performance by Relevance Feedback J. Am. Soc. Information Science, vol. 41, no. 4, pp. 288-297, 1990.
[21] E. Glover, G. Flake, S. Lawrence, W.P. Birmingham, A. Kruger, C.L. Giles, and D. Pennock, Improving Category Specific Web Search by Learning Query Modifications Proc. 2001 Symp. Applications and the Internet (SAINT 2001) pp. 23-31, 2001.
[22] S.M. Pahlevi and H. Kitagawa, Taxonomy-Based Adaptive Web Search Method Proc. Third IEEE Int'l Conf. Information Technology: Coding and Computing (ITCC 2002) pp. 320-325, 2002.
[23] K. Nigam, A.K. Mccallum, S. Thrun, and T. Mitchell, Text Classification from Labeled and Unlabeled Documents Using EM Machine Learning, vol. 39, no. 2/3, pp. 103-134 2000.
[24] B. Liu, W.S. Lee, P.S. Yu, and X. Li, Partially Supervised Classification of Text Documents Machine Learning: Proc. 19th Int'l Conf. (ICML 2002), pp. 387-394, 2002.

Index Terms:
Domain-specific Web search, query modification, decision tree, information retrieval, machine learning.
Satoshi Oyama, Takashi Kokubo, Toru Ishida, "Domain-Specific Web Search with Keyword Spices," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, pp. 17-27, Jan. 2004, doi:10.1109/TKDE.2004.1264819
Usage of this product signifies your acceptance of the Terms of Use.