Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000 (2000)
Oct. 25, 2000 to Oct. 28, 2000
Chung-Min Chen , Telcordia Technologies
Duen-Ren Liu , National Chiao-Tung University
Linear algebra-based techniques have long been used to correlate similar documents. They map the documents to a multi-dimensional vector space, in which a vector represents each document. Searching related documents then translates into searching nearest neighbors in the vector space. In this paper, we propose an indexing structure, called cosine R-tree, which indexes multidimensional vector space and provides efficient nearest neighbor search. Our preliminary results show that it gives better performance than a brute-force linear scan strategy.
D. Liu and C. Chen, "Tree Indexing for Efficient Search of Similar Documents," Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000(COMPSAC), Taipei, Taiwan, 2000, pp. 210.