| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Evaluating Keyword Selection Methods for WEBSOM Text Archives
March 2004 (vol. 16 no. 3)
pp. 380-383
Abstract—The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is able to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.
[1] 380 T. Kohonen, Self-Organization of Very Large Document Collections: State of the Art Proc Int'l Conf. Artificial Neural Networks (ICANN '98), 1998.[2] S. Kaski, T. Honkela, K. Lagus, and T. Kohonen, WEBSOM Self-Organizing Maps of Document Collections Neurocomputing, vol. 21, pp. 101-117, 1998.[3] S. Kaski et al., Statistical Aspects of the WEBSOM System in Organizing Document Collections Computing Science and Statistics, vol. 29, 1998.[4] T. Kohonen et al., Self-Organization of a Massive Document Collection Kohonen Maps, Elsevier, 1999.[5] K. Lagus et al., WEBSOM for Textual Data Mining Artificial Intelligence Rev., vol. 13, pp. 345-364, 1999.[6] K. Lagus and S. Kaski, Keyword Selection Method for Characterizing Text Document Maps Proc. Ninth Int'l Conf. Artificial Neural Networks (ICANN '99), 1999.[7] T. Kohonen et al., "Self Organization of a Massive Document Collection," IEEE Trans. Neural Networks, vol. 11, no. 3, May 2000, pp. 574-585.[8] A. Rauber and D. Merkl, Automatic Labeling of Self-Organizing Maps: Making a Treasure Maps Reveal Its Secrets Proc. Fourth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '99), 1999.[9] M. Dittenbach, D. Merkl, and A. Rauber, Using Growing Hierarchical Self-Organizing Maps for Document Classification Proc. European Symp. Artificial Neural Networks (ESANN '00), 2000.[10] A. Azcarraga and T. YapJr., SOM-Based Methodology for Building Large Text Archives Proc. Seventh Int'l Conf. Database Systems for Advanced Applications (DASFAA '01), 2001.[11] A.P. Azcarraga and T. YapJr., Extracting Meaningful Labels for WEBSOM-Based Text Archives Proc. 10th ACM Int'l Conf. Information and Knowledge Management (CIKM '01), 2001.[12] D.W. Aha, Feature Weighting for Lazy Learning Algorithms Feature Extraction, Construction and Selection: A Data Mining Perspective, H. Liu and H. Motoda, eds., Norwell, Mass.: Kluwer, 1998.[13] E.H. (Sam) Han and G. Karypis, Centroid-Based Document Classification: Analysis and Experiment Results Proc Fourth European Conf. Principles of Knowledge Discovery and Data Mining (PKDD '00), 2000.[14] S. Shankar and G. Karypis, Weight Adjustment Schemes for a Centroid Based Classifier Text Mining Workshop, Proc Knowledge Discovery and Data Mining (KDD '00), 2000.[15] D. Memmi and J.G. Meunier, Using Competitive Networks for Text Mining Proc. Second Int'l ICSC Symp. Neural Computation (NC '00), 2000.[16] T. Kohonen, Self-Organized Formation of Topologically-Correct Feature Maps Biological Cybernetics, vol. 43, no. 1, pp. 59-69, 1982.[17] T. Kohonen, Self-Organization and Associative Memory, series in information sciences, second ed. Springer-Verlag, 1988.[18] T. Kohonen, Self-Organizing Maps. Berlin, Springer-Verlag, 1995.[19] S. Kaski, "Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering," Proc. Int'l Joint Conf. Neural Networks (IJCNN 98), vol. 1, IEEE Press, 1998, pp. 413-418.
Index Terms:
Keyword extraction, text archives, WEBSOM, random projection.
Citation:
Arnulfo P. Azcarraga, Teddy N. Yap, Jonathan Tan, Tat Seng Chua, "Evaluating Keyword Selection Methods for WEBSOM Text Archives," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 3, pp. 380-383, Mar. 2004, doi:10.1109/TKDE.2003.1262193