This Article 
 Bibliographic References 
 Add to: 
Semantic Segment Extraction and Matching for Internet FAQ Retrieval
July 2006 (vol. 18 no. 7)
pp. 930-940
This investigation presents a novel approach to semantic segment extraction and matching for retrieving information from Internet FAQs with natural language queries. Two semantic segments, the question category segment (QS) and the keyword segment (KS), are extracted from the input queries and the FAQ questions with a semiautomatically derived question-semantic grammar. A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the FAQ collection. Additionally, the vector space model (VSM) is adopted to measure the similarity between the query and the answers of the QA pairs. Finally, a multistage ranking strategy is adopted to determine the optimally performing combination of similarity metrics. The experimental results illustrate that the proposed method achieves an average rank of 4.52 and a top-10 recall rate of 90.89 percent. Compared with the query-expansion method, this method improves the performance by 4.82 places in the average rank of correct answers, 25.34 percent in the top-5 recall rate, and 5.21 percent in the top-10 recall rate.

[1] S. Oyama, T. Kokubo, and T. Ishida, “Domain-Specific Web Search with Keyword Spices,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 17-27, Jan. 2004.
[2] C.O. Kwok, O. Etzioni, and D.S. Weld, “Scaling Question Answering to the Web,” ACM Trans. Information Systems, vol. 19, no. 3, pp. 242-262, 2001.
[3] P. Clark, V. Chaudhri, S. Mishra, J. Thomere, K. Barker, and B. Porter, “Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach,” Proc. Second Int'l Conf. Knowledge Capture (KCap '03), pp. 13-19, 2003.
[4] J.R. Wen, J.Y. Nie, and H.J. Zhang, “Query Clustering Using User Logs,” ACM Trans. Information Systems, vol. 20, no. 1, pp. 59-81, 2002.
[5] H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma, “Query Expansion by Mining User Logs,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 829-839, July/Aug. 2003.
[6] R.D. Burke, K.J. Hammond, V.A. Kulyukin, S.L. Lytinen, N. Tomuro, and S. Schoenber, “Question Answering from Frequently-Asked Question Files Experiences with the FAQ Finder System,” Technical Report TR-97-05, Univ. of Chicago, pp. 1-38, 1997.
[7] D. Camacho, “Using Hierarchical Knowledge Structure to Implement Dynamic FAQ System,” Proc. Int'l Conf. Practical Aspects of Knowledge Management, 2004.
[8] D. Moldovan, M. Pasca, S. Harabagiu, and M. Surdeanu, “Performance Issues and Error Analysis in an Open-Domain Question Answering System,” ACM Trans. Information Systems, vol. 21, no. 2, pp. 133-154, 2003.
[9] K.J. Cios, Medical Data Mining and Knowledge Discovery. Springer-Verlag, 2001.
[10] Y. Niu, G. Hirst, G. McArthur, and P. Rodriguez-Gianolli, “Answering Clinical Questions with Role Identification,” Proc. ACL Workshop Natural Language Processing in the Biomedicine, pp. 73-80, 2003.
[11] G. Leroy, A. Lally, and H. Chen, “The Use of Dynamic Contexts to Improve Casual Internet Searching,” ACM Trans. Information Systems, vol. 21, no. 3, pp. 229-253, 2003.
[12] C.H. Wu, J.F. Yeh, and M.J. Chen, “Domain-Specific FAQ Retrieval Using Independent Aspects,” ACM Trans. Asian Language Information Processing, vol. 4, no. 1, 2005.
[13] D. Camacho, “Using Hierarchical Knowledge Structure to Implement Dynamic FAQ System,” Proc. Fifth Int'l Conf. Practical Aspects of Knowledge Management (PAKM '04), 2004.
[14] V. Jijkoun, J. Mur, and M. de Rijke, “Information Extraction for Question Answering: Improving Recall through Syntactic Patterns,” Proc. Int'l Conf. Computational Linguistics, 2004.
[15] R. Soricut and E. Brill, “Automatic Question Answering: Beyond the Factoid,” Proc. Human Language Technology Conf., 2004.
[16] E. Sneiders, “Automated Question Answering Using Question Templates that Cover the Conceptual Model of the Database, Natural Language Processing and Information Systems,” Proc. Int'l Workshop Applications of Natural Language to Information Systems, pp. 235-239, 2002.
[17] H.M. Meng and K.C. Siu, “Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 1, pp. 172-181, Jan./Feb. 2002.
[18] L.L. Chang, “The Modality Words in Modern Mandarin,” Technical Report, no. 93-06, Chinese Knowledge Information Processing Group, Inst. of Information Science Academia Sinica, Taiwan, 1993.
[19] Y.S. Lai and C.H. Wu, “Unknown Word and Phrase Extraction Using a Phrase-Like-Unit-Based Likelihood Ratio,” Int'l J. Computer Processing of Oriental Languages, vol. 13, no. 1, pp. 83-95, 2000.
[20] J. Eaeley, “An Efficient Context-free Parsing Algorithm,” Comm. ACM, pp. 451-155, 1970.
[21] Q. Zhou and S. Feng, “Build a Relation Network Representation for How-Net,” Proc. Int'l Conf. Multilingual Information Processing, pp. 139-145, 2000.
[22] Y. Li, Z.A. Bandar, and D. McLean, “An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 871-882, July/Aug. 2003.
[23] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison-Wesley, 1999.
[24] T.A.S. Coelho, P.P. Calado, L.V. Souza, B. Ribeiro-Neto, and R. Muntz, “Image Retrieval Using Multiple Evidence Ranking,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 4, pp. 406-417 Apr. 2004.
[25] S.D. Whitehead, “Auto-FAQ: An Experiment in Cyberspace Leveraging,” Computer Networks and ISDN Systems, vol. 28, nos. 1-2, pp. 137-146, 1995.

Index Terms:
Natural language processing, retrieval models, query formulation, deduction and theorem proving, knowledge processing.
Chung-Hsien Wu, Jui-Feng Yeh, Yu-Sheng Lai, "Semantic Segment Extraction and Matching for Internet FAQ Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 7, pp. 930-940, July 2006, doi:10.1109/TKDE.2006.115
Usage of this product signifies your acceptance of the Terms of Use.