loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
14th International Workshop on Database and Expert Systems Applications (DEXA'03)
Parallel Association Rule Mining with Minimum Inter-Processor Communication
Prague, Czech Republic
September 01-September 05
ISBN: 0-7695-1993-8
Mohammad El-Hajj, University of Alberta Edmonton, AB, Canada
Osmar R. Za?ane, University of Alberta Edmonton, AB, Canada
Existing parallel association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is that most of the parallel algorithms for a shared nothing environment are Apriori-based algorithms. Apriori-based algorithms are proven to be not scalable due to many reasons, mainly: (1) the repetitive I/O disk scans, (2) the huge computation and communication involved during the candidacy generation.
This paper proposes a new disk-based parallel association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding globally frequent patterns could be achieved in less than a full scan with random access. This data structure is replicated among the parallel nodes. Second, for each frequent item assigned to a parallel node, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed, and no communication between nodes is required to generate all globally frequent patterns.
Citation:
Mohammad El-Hajj, Osmar R. Za?ane, "Parallel Association Rule Mining with Minimum Inter-Processor Communication," dexa, pp.519, 14th International Workshop on Database and Expert Systems Applications (DEXA'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.