IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 2
Self-Organizing Maps of Massive Document Collections
Como, Italy
July 24-July 27
ISBN: 0-7695-0619-4
Huge document collections can be organized according to textual similarities by the Self-Organizing Map (SOM) algorithm, when statistical representations of the textual contents are used as the feature vectors of the documents. In a practical experiment, we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. For the feature vectors, we selected 500-dimensional random projections of the weighted word histograms.