loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
XVIII International Conference of the Chilean Computer Science Society
Parallel Generation of Inverted Files for Distributed Text Collections
Antofagasta, Chile
November 12-November 14
ISBN: 0-8186-8616-2
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations with a shared-nothing memory organization. The text collection is assumed to be evenly distributed among the disks of the various workstations. Compression is used to save space in main memory (where inverted lists are kept) and to save time when data have to be moved across the network. The algorithm average running cost is O(t/p) where t is the size of the whole text collection and p is the number of available processors. We implemented our algorithm and drew experimental results. In a 100 Mbits/s switched Ethernet network with 4 PentiumPro 200 megahertz, 128 megabytes RAM on each processor, we were able to invert 2 gigabytes of TREC documents in 15 minutes. Further, we also proposed an analytical model for the algorithm execution time.
Citation:
Berthier A. Ribeiro-Neto, Joao Paulo Kitajima, Gonzalo Navarro, Cláudio R.G. Sant'Ana, Nivio Ziviani, "Parallel Generation of Inverted Files for Distributed Text Collections," sccc, pp.149, XVIII International Conference of the Chilean Computer Science Society, 1998
Usage of this product signifies your acceptance of the Terms of Use.