The Community for Technology Leaders
Green Image
ABSTRACT
<p><b>Abstract</b>—In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, <it>IDD</it> and <it>HD</it>, address the shortcomings of two previously proposed parallel formulations <it>CD</it> and <it>DD</it>. Unlike the <it>CD</it> algorithm, the <it>IDD</it> algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The <it>IDD</it> algorithm also eliminates the redundant work inherent in <it>DD</it>, and requires substantially smaller communication overhead than <it>DD</it>. But <it>IDD</it> suffers from the added cost due to communication of transactions among processors. <it>HD</it> is a hybrid algorithm that combines the advantages of <it>CD</it> and <it>DD</it>. Experimental results on a 128-processor Cray T3E show that <it>HD</it> scales just as well as the <it>CD</it> algorithm with respect to the number of transactions, and scales as well as <it>IDD</it> with respect to increasing candidate set size.</p>
INDEX TERMS
Data mining, parallel processing, association rules, load balance, scalability.
CITATION

V. Kumar, G. Karypis and E. (. Han, "Scalable Parallel Data Mining for Association Rules," in IEEE Transactions on Knowledge & Data Engineering, vol. 12, no. , pp. 337-352, 2000.
doi:10.1109/69.846289
90 ms
(Ver 3.3 (11022016))