loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers
Parallel Mining of Association Rules from Text Databases on a Cluster of Workstations
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
John D. Holt, Wright State University
Soon M. Chung, Wright State University
In this paper, we propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our Multipass with Inverted Hashing and Pruning (MIHP) algorithm [13], which was shown to be quite efficient than other existing algorithms in the context of mining text databases. The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are mining local databases asynchronously and prune the global candidates by using the Inverted Hashing and Pruning technique. Compared with the well-known Count Distribution algorithm [2], PMIHP demonstrates superior performance characteristics for mining association rules in large text databases, and when the minimum support level is low, its speedup is superlinear as the number of processors increases. These experiments were performed on a cluster of Linux workstations using a collection of Wall Street Journal articles.
Index Terms:
Parallel association rule mining, text retrieval, multipass, inverted hashing and pruning, cluster computing, scalability
Citation:
John D. Holt, Soon M. Chung, "Parallel Mining of Association Rules from Text Databases on a Cluster of Workstations," ipdps, vol. 1, pp.86b, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Papers, 2004
Usage of this product signifies your acceptance of the Terms of Use.