Research Issues in Data Engineering, International Workshop on (2001)
Apr. 1, 2001 to Apr. 2, 2001
Chung-Min Chen , Telcordia Technologies, Inc.
Ned Stoffel , Telcordia Technologies, Inc.
Mike Post , Telcordia Technologies, Inc.
Chumki Basu , Telcordia Technologies, Inc.
Devasis Bassu , Telcordia Technologies, Inc.
Clifford Behrens , Telcordia Technologies, Inc.
Abstract: Latent Semantic Indexing (LSI), a vector space- based approach to information retrieval, has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, if any, presents implementation issues in practice. In this paper, we describe a production-level implementation of LSI. The system integrates components including document collection and preprocessing, singular value decomposition (SVD), multilingual processing, and a tree-based access method for similarity querying. We discuss implementation issues encountered during the development of the system. In particular, we address scalability issues in the query engine and various components of the system, and present lessons learned.
C. Chen, N. Stoffel, M. Post, C. Basu, D. Bassu and C. Behrens, "Telcordia LSI Engine: Implementation and Scalability Issues," Research Issues in Data Engineering, International Workshop on(RIDE), Heidelberg, Germany, 2001, pp. 0051.