2008 19th International Conference on Database and Expert Systems Application
Semantically Rich Spaces for Document Clustering
September 01-September 05
ISBN: 978-0-7695-3299-8
Dimensionality reduction techniques address a relevant problem of Vector Space Models that is the size of involved dictionaries. Certain geometrical transformations applied over the original feature space, like the Latent Semantic Analysis (LSA), aim at preserving and discovering semantic relations between documents within small dimensional spaces. In this paper, a linear transformation method, named Locality Preserving Projections (LPP), is evaluated with respect to a document clustering task and results are compared with LSA. LPP is here applied directly on the original space, through an efficient C-based implementation, and different parameterizations are investigated. Experimental results suggest that LPP is an effective technique able to account for the availability of a priori knowledge within an unsupervised learning framework.
Index Terms:
Linear embedding, Locality Preserving Projection, Latent Semantic Analysis, Document clustering
Citation:
Roberto Basili, Paolo Marocco, Daniele Milizia, "Semantically Rich Spaces for Document Clustering," dexa, pp.43-47, 2008 19th International Conference on Database and Expert Systems Application, 2008