Issue No. 03 - May/June (2000 vol. 12)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.846289
<p><b>Abstract</b>—In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, <it>IDD</it> and <it>HD</it>, address the shortcomings of two previously proposed parallel formulations <it>CD</it> and <it>DD</it>. Unlike the <it>CD</it> algorithm, the <it>IDD</it> algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The <it>IDD</it> algorithm also eliminates the redundant work inherent in <it>DD</it>, and requires substantially smaller communication overhead than <it>DD</it>. But <it>IDD</it> suffers from the added cost due to communication of transactions among processors. <it>HD</it> is a hybrid algorithm that combines the advantages of <it>CD</it> and <it>DD</it>. Experimental results on a 128-processor Cray T3E show that <it>HD</it> scales just as well as the <it>CD</it> algorithm with respect to the number of transactions, and scales as well as <it>IDD</it> with respect to increasing candidate set size.</p>
Data mining, parallel processing, association rules, load balance, scalability.
Vipin Kumar, George Karypis, Eui-Hong (Sam) Han, "Scalable Parallel Data Mining for Association Rules", IEEE Transactions on Knowledge & Data Engineering, vol. 12, no. , pp. 337-352, May/June 2000, doi:10.1109/69.846289