Issue No. 02 - March/April (2002 vol. 14)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.991730
<p>We investigate the data-parallel implementation of a set of information filters used to rule out uninteresting data from a database or data stream. We develop an analytic model for the costs and advantages of load rebalancing for the parallel filtering processes, as well as a quick heuristic for its desirability. Our model uses binomial models of the filter processes and fits key parameters to the results of extensive simulations. Experiments confirm our model. Rebalancing should pay off whenever processor communications costs are high. Further experiments showed it can also pay off even with low communications costs for 16-64 processes and 1-10 data items per processor; then, imbalances can increase processing time by up to 52 percent in representative cases, and rebalancing can increase it by 78 percent, so our quick predictive model can be valuable. Results also show that our proposed heuristic rebalancing criterion gives close to optimal balancing. We also extend our model to handle variations in filter processing time per data item.</p>
information filtering, data parallelism, load balancing, information retrieval, conjunctions, optimality, and Monte Carlo methods
A. Zaky and N. Rowe, "Load Balancing of Parallelized Information Filters," in IEEE Transactions on Knowledge & Data Engineering, vol. 14, no. , pp. 456-461, 2002.