loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
First International Conference on Innovative Computing, Information and Control - Volume II (ICICIC'06)
A Novel Multilingual Text Categorization System using Latent Semantic Indexing
Beijing, China
August 30-September 01
ISBN: 0-7695-2616-0
Chung-Hong Lee, National Kaohsiung University of Applied Sciences, Taiwan
Hsin-Chang Yang, Chang Jung University, Taiwan
Sheng-Min Ma, National Kaohsiung University of Applied Sciences, Taiwan
Latent Semantic Indexing is a well known technique in Information Retrieval, especially in dealing with polysemy and synonymy. LSI use SVD process to decompose the original term-document matrix into a lower dimension triplet. The triplet (the resulted matrices) is the approximation to original matrix and can capture the latent semantic relation between terms. In this paper, we propose a novel method for multilingual text categorization using Latent Semantic Indexing. The centroid of each class has been calculated in the decomposed SVD space. The similarity threshold of categorization is predefined for each centroid. Test documents with similarity measurement larger than the threshold will be labeled "Positive" (Relevant) or else would be labeled "Negative" (Non-Relevant). Experimental result indicated that the performance on the precision, recall and F1 are quite good using LSI technique to categorize the multi-language text. The F1 measurement has an average value of 70% and the precision can reach 80% using our algorithm.
Citation:
Chung-Hong Lee, Hsin-Chang Yang, Sheng-Min Ma, "A Novel Multilingual Text Categorization System using Latent Semantic Indexing," icicic, vol. 2, pp.503-506, First International Conference on Innovative Computing, Information and Control - Volume II (ICICIC'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.