This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Load Balancing of Parallelized Information Filters
March/April 2002 (vol. 14 no. 2)
pp. 456-461

We investigate the data-parallel implementation of a set of information filters used to rule out uninteresting data from a database or data stream. We develop an analytic model for the costs and advantages of load rebalancing for the parallel filtering processes, as well as a quick heuristic for its desirability. Our model uses binomial models of the filter processes and fits key parameters to the results of extensive simulations. Experiments confirm our model. Rebalancing should pay off whenever processor communications costs are high. Further experiments showed it can also pay off even with low communications costs for 16-64 processes and 1-10 data items per processor; then, imbalances can increase processing time by up to 52 percent in representative cases, and rebalancing can increase it by 78 percent, so our quick predictive model can be valuable. Results also show that our proposed heuristic rebalancing criterion gives close to optimal balancing. We also extend our model to handle variations in filter processing time per data item.

[1] N.J. Belkin and W.B. Croft, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?" Comm. ACM, Vol. 35, No. 12, Dec. 1992, pp. 29-38.
[2] G. Dahlquist and A. Bjorck, Numerical Methods. Englewood Cliffs, N.J.: Prentice-Hall, 1974.
[3] J. De Keyser and D. Roose,“Load balancing data parallel programs on distributed memory computers,” Parallel Computing, vol. 19, pp. 1,199-1,219, 1993.
[4] C. Faloutsos, “Signature Based Text Retrieval Methods: A Survery,” IEEE Data Eng. Bull., vol. 13, no. 1, pp. 25-32, Mar. 1990.
[5] M. Jarke and J. Koch, “Query Optimization in Database Systems,” ACM Computer Surveys, vol. 16, pp. 111–152, 1984.
[6] D. Nicol and P. Reynolds, “Optimal Dynamic Remapping of Data Parallel Computations,” IEEE Trans. Computers, vol. 39, no. 2, pp. 206-219, Feb. 1990.
[7] K. Pattipatti and M. Dontamsetty, “On a Generalized Test Sequencing Problem,” IEEE Trans. Systems, Man, and Cybernetics, vol. 22, no. 2, pp. 392-396, Mar./Apr. 1992.
[8] N.C. Rowe, “Using Local Optimality Criteria for Efficient Information Retrieval with Redundant Information Filters,” ACM Trans. Information Systems, vol. 14, no. 2, pp. 138-174, Apr. 1996.
[9] N.C. Rowe, “Preicse and Efficient Retrieval of Captioned Images: The MARIE Project,” Library Trends, vol. 48, no. 2, pp. 475-495, Fall 1999.
[10] C. Stanfill and B. Kahle, "Parallel Free-Text Search on the Connection Machine System," Comm. ACM, Dec. 1986, pp. 1229-1239.
[11] H.S. Stone, High Performance Computer Architectures.Reading, Mass.: Addison-Wesley, 1987.

Index Terms:
information filtering, data parallelism, load balancing, information retrieval, conjunctions, optimality, and Monte Carlo methods
Citation:
N.C. Rowe, A. Zaky, "Load Balancing of Parallelized Information Filters," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 456-461, March-April 2002, doi:10.1109/69.991730
Usage of this product signifies your acceptance of the Terms of Use.