The Community for Technology Leaders
Proceedings. 13th International Workshop on Database and Expert Systems Applications (2002)
Aix-en-Provence, France
Sept. 2, 2002 to Sept. 6, 2002
ISSN: 1529-4188
ISBN: 0-7695-1668-8
pp: 210
James Henderson , University of Geneva
Paola Merlo , University of Geneva
Ivan Petroff , University of Geneva
Gerold Schneider , University of Geneva
ABSTRACT
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words rep-resentation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.
INDEX TERMS
null
CITATION

J. Henderson, G. Schneider, P. Merlo and I. Petroff, "Using NLP to Efficiently Visualize Text Collections with SOMs," Proceedings. 13th International Workshop on Database and Expert Systems Applications(DEXA), Aix-en-Provence, France, 2002, pp. 210.
doi:10.1109/DEXA.2002.1045900
96 ms
(Ver 3.3 (11022016))