Proceedings. 13th International Workshop on Database and Expert Systems Applications (2002)
Sept. 2, 2002 to Sept. 6, 2002
James Henderson , University of Geneva
Paola Merlo , University of Geneva
Ivan Petroff , University of Geneva
Gerold Schneider , University of Geneva
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words rep-resentation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.
J. Henderson, G. Schneider, P. Merlo and I. Petroff, "Using NLP to Efficiently Visualize Text Collections with SOMs," Proceedings. 13th International Workshop on Database and Expert Systems Applications(DEXA), Aix-en-Provence, France, 2002, pp. 210.