The Community for Technology Leaders
Subscribe
Issue No.03 - March (2013 vol.25)
pp: 528-540
Ryan Shaw , Google Inc., San Jose
Anindya Datta , National University of Singapore, Singapore
Debra VanderMeer , Florida International University, Miami
Kaushik Dutta , National University of Singapore, Singapore
ABSTRACT
In this paper, we describe the design and implementation of a reverse dictionary. Unlike a traditional forward dictionary, which maps from words to their definitions, a reverse dictionary takes a user input phrase describing the desired concept, and returns a set of candidate words that satisfy the input phrase. This work has significant application not only for the general public, particularly those who work closely with words, but also in the general field of conceptual search. We present a set of algorithms and the results of a set of experiments showing the retrieval accuracy of our methods and the runtime response time performance of our implementation. Our experimental results show that our approach can provide significant improvements in performance scale without sacrificing the quality of the result. Our experiments comparing the quality of our approach to that of currently available reverse dictionaries show that of our approach can provide significantly higher quality over either of the other currently available implementations.
INDEX TERMS
Dictionaries, Web and internet services, Semantics, Information processing, Information retrieval, Search methods, web-based services, Dictionaries, thesauruses, search process
CITATION
Ryan Shaw, Anindya Datta, Debra VanderMeer, Kaushik Dutta, "Building a Scalable Database-Driven Reverse Dictionary", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 3, pp. 528-540, March 2013, doi:10.1109/TKDE.2011.225
REFERENCES
 [1] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 2011. [2] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, Mar. 2003. [3] J. Carlberger, H. Dalianis, M. Hassel, and O. Knutsson, "Improving Precision in Information Retrieval for Swedish Using Stemming," Technical Report IPLab-194, TRITA-NA-P0116, Interaction and Presentation Laboratory, Royal Inst. of Technology and Stockholm Univ., Aug. 2001. [4] H. Cui, R. Sun, K. Li, M.-Y. Kan, and T.-S. Chua, "Question Answering Passage Retrieval Using Dependency Relations," Proc. 28th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 400-407, 2005. [5] T. Dao and T. Simpson, "Measuring Similarity between Sentences," http://opensvn.csie.org/WordNetDotNettrunk / Projects/Thanh/Paper/WordNetDotNet_Semantic_Similarity.pdf (last accessed 16 Oct. 2009), 2009. [6] Dictionary.com, LLC, "Reverse Dictionary," http://dictionary. reference.comreverse, 2009. [7] J. Earley, "An Efficient Context-Free Parsing Algorithm," Comm. ACM, vol. 13, no. 2, pp. 94-102, 1970. [8] Forrester Consulting, "Ecommerce Web Site Performance Today," http://www.akamai.com2seconds, Aug. 2009. [9] E. Gabrilovich and S. Markovitch, "Wikipedia-Based Semantic Interpretation for Natural Language Processing," J. Artificial Intelligence Research, vol. 34, no. 1, pp. 443-498, 2009. [10] V. Hatzivassiloglou, J. Klavans, and E. Eskin, "Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations Via Machine Learning," Proc. Joint SIGDAT Conf. Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 203-212, June 1999. [11] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. Int'l Conf. Research and Development in Information Retrieval (SIGIR), pp. 50-57, 1999. [12] T. Hofmann, "Probabilistic Latent Semantic Indexing," SIGIR '99: Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 50-57, 1999. [13] T. Joachims, "Svmlight," http:/svmlight.joachims.org/, 2008. [14] T. Joachims, "Svm$^{multiclass}$ ," http://svmlight.joachims.orgsvm_multiclass.html , 2008. [15] J. Kim and K. Candan, "Cp/cv: Concept Similarity Mining without Frequency Information from Domain Describing Taxonomies," Proc. ACM Conf. Information and Knowledge Management, 2006. [16] T. Korneius, J. Laurikkala, and M. Juhola, "On Principal Component Analysis, Cosine and Euclidean Measures in Information Retrieval," Information Sciences, vol. 177, pp. 4893-4905, 2007. [17] J. Lafferty and C. Zhai, "Document Language Models, Query Models, and Risk Minimization for Information Retrieval," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 111-119, 2001. [18] D. Lin, "An Information-Theoretic Definition of Similarity," Proc. Int'l Conf. Machine Learning, 1998. [19] X. Liu and W. Croft, "Passage Retrieval Based on Language Models," Proc. 11th Int'l Conf. Information and Knowledge Management, pp. 375-382, 2002. [20] C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008. [21] R. Mihalcea, C. Corley, and C. Strapparava, "Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity," Proc. Nat'l Conf. Artificial Intelligence, 2006. [22] G. Miller, C. Fellbaum, R. Tengi, P. Wakefield, and H. Langone, "Wordnet Lexical Database," http://wordnet.princeton.edu/wordnetdownload /, 2009. [23] D. Milne and I. Witten, "Learning to Link with Wikipedia," Proc. 17th ACM Conf. Information and Knowledge Management, pp. 509-518, 2008. [24] R. Nallapati, W. Cohen, and J. Lafferty, "Parallelized Variational em for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability," Proc. IEEE Seventh Int'l Conf. Data Mining Workshops, pp. 349-354, 2007. [25] U. of Pennsylvania, "The Penn Treebank Project," http:/www. cis.upenn.edu/ treebank/, 2009. [26] OneLook.com, "Onelook.com Reverse Dictionary," http:/www.onelook.com/, 2009. [27] X. Phan and C. Nguyen, "A c/c++ Implementation of Latent Dirichlet Allocation (lda) Using Gibbs Sampling for Parameter Estimation and Inference," http:/gibbslda.sourceforge.net/, 2010. [28] J. Ponte and W. Croft, "A Language Modeling Approach to Information Retrieval," Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 275-281, 1998. [29] M. Porter, "The Porter Stemming Algorithm," http://tartarus. org/martin/PorterStemmer/, 2009. [30] O.S. Project "Opennlp," http:/opennlp.sourceforge.net/, 2009. [31] P. Resnik, "Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language," J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999. [32] S. Berchtold, D.A. Keim, and H.-P. Kriegel, "Using Extended Feature Objects for Partial Similarity Retrieval," The VLDB J., vol. 6, no. 4, pp. 333-348, Nov. 1997. [33] S. Lawrence and C.L. Giles, "Searching the World Wide Web," Science, vol. 280, no. 5360, pp. 98-100, 1998. [34] G. Salton, J. Allan, and C. Buckley, "Approaches to Passage Retrieval in Full text Information Systems," Proc. 16th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 49-58, 1993. [35] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, Mar. 2002. [36] N. Segata and E. Blanzieri, "Fast Local Support Vector Machines for Large Datasets," Proc. Int'l Conf. Machine Learning and Data Mining in Pattern Recognition, July 2009. [37] S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, and M. Gatford, "Okapi at Trec-3," Proc. Third Text REtrieval Conf., Nov. 1994. [38] D. Widdows and K. Ferraro, "Semantic Vectors," http://code.google.com/psemanticvectors/, 2010. [39] I. Witten, A. Moffat, and T. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999. [40] Z. Wu and M. Palmer, "Verbs Semantics and Lexical Selection," Proc. 32nd Ann. Meeting Assoc. for Computational Linguistics, pp. 133-138, 1994. [41] R. Zwick, E. Carlstein, and D. Budescu, "Measures of Similarity Among Fuzzy Concepts: A Comparative Analysis," Int'l J. Approximate Reasoning, vol. 1, no. 2, pp. 221-242, 1987.