2008 32nd Annual IEEE International Computer Software and Applications Conference The Similarity Computing of Documents Based on VSM July 28-August 01 ISBN: 978-0-7695-3262-2
The precision and efficiency of the similarity computing of documents is the foundation and key of other documents processing. In this paper, the DF and TF-IDF algorithms are improved. First, DF's time complexity is linear which suits mass documents processing, but it has the fault that exceptional useful features may be deleted, so we make up that by adding the count of the words at the important places. Second, we rectify the weight of feature by the result of feature selection phase. In this way, we improve the precision of documents similarity without adding much time and space complexity.
Index Terms:
documents similarity, feature selection, TF-IDF, VSM
Citation:
Qinglin Guo, "The Similarity Computing of Documents Based on VSM," compsac, pp.585-586, 2008 32nd Annual IEEE International Computer Software and Applications Conference, 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||