loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
7th International Conference on Database Systems for Advanced Applications (DASFAA '01)
SOM-Based Methodology for Building Large Text Archives
Hong Kong, China
April 18-April 21
ISBN: 0-7695-0996-7
Amulfo P. Azcarraga, National University of Singapore
Teddy N. Yap, Jr., National University of Singapore
Abstract: Not only have Self-Organizing Maps (SOMs), such as the WEBSOM, been shown to scale up to very large datasets, these maps also allow for a novel mode of navigating through a large collection of text documents. The entire text collection is presented to a user as a regular map, where each point in the map is associated to a group of documents that are likely to be composed of similar terms and phrases. In addition, the closer two points are in the map, the more similar are their respective associated documents. Thus, once an interesting document is found in the map, the user just has to click around the vicinity of that document to retrieve other similar documents. A major drawback of SOMs, however, is the long training time required, especially for document collections where both the volume and the dimensionality are huge. In this paper, we demonstrate how the size of the initial text collection is progressively and drastically reduced from the raw document collection to the final SOM-based text archive. We demonstrate this using a widely studied Reuters collection.
Citation:
Amulfo P. Azcarraga, Teddy N. Yap, Jr., "SOM-Based Methodology for Building Large Text Archives," dasfaa, pp.0066, 7th International Conference on Database Systems for Advanced Applications (DASFAA '01), 2001
Usage of this product signifies your acceptance of the Terms of Use.