This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Optimized High-Throughput Strategy for Constructing Inverted Files
Nov. 2012 (vol. 23 no. 11)
pp. 2033-2044
Zheng Wei, University of Maryland, College Park
Joseph JaJa, University of Maryland, College Park
Current high-throughput algorithms for constructing inverted files all follow the MapReduce framework, which presents a high-level programming model that hides the complexities of parallel programming. In this paper, we take an alternative approach and develop a novel strategy that exploits the current and emerging architectures of multicore processors. Our algorithm is based on a high-throughput pipelined strategy that produces parallel parsed streams, which are immediately consumed at the same rate by parallel indexers. We have performed extensive tests of our algorithm on a cluster of 32 nodes, and were able to achieve a throughput close to the peak throughput of the I/O system: a throughput of 280 MB/s on a single node and a throughput that ranges between 5.15 GB/s (1 Gb/s Ethernet interconnect) and 6.12 GB/s (10 Gb/s InfiniBand interconnect) on a cluster with 32 nodes for processing the ClueWeb09 data set. Such a performance represents a substantial gain over the best known MapReduce algorithms even when comparing the single node performance of our algorithm to MapReduce algorithms running on large clusters. Our results shed a light on the extent of the performance cost that may be incurred by using the simpler, higher level MapReduce programming model for large scale applications.
Index Terms:
Indexing,Clustering algorithms,Program processors,Multicore processing,Dictionaries,Throughput,pipeline,Inverted files,MapReduce,multicore processors,cluster,I/O throughput,parallel algorithms,parallel parsing and indexing
Citation:
Zheng Wei, Joseph JaJa, "An Optimized High-Throughput Strategy for Constructing Inverted Files," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, pp. 2033-2044, Nov. 2012, doi:10.1109/TPDS.2012.43
Usage of this product signifies your acceptance of the Terms of Use.