17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05)
Anthill: A Scalable Run-Time Environment for Data Mining Applications
Rio de Janeiro, Brazil
October 24-October 27
ISBN: 0-7695-2446-X
Data mining techniques are becoming increasingly more popular as a reasonable means to collect summaries from the rapidly growing datasets in many areas. However, as the size of the raw data increases, parallel data mining algorithms are becoming a necessity. In this paper we present a run-time support system that was designed to allow the efficient implementation of data-mining algorithms on heterogeneous distributed environments. We believe that the runtime framework is suitable for a broader class of applications, beyond data mining. We also present a parallelization strategy that is supported by the run-time system. We show scalability results of three different data-mining algorithms that were parallelized using our approach and our run-time support. All applications scale almost linearly up to a large number of nodes.
Citation:
Renato A. Ferreira, Wagner Jr. Meira, Dorgival Guedes, Lucia M. A. Drummond, Bruno Coutinho, George Teodoro, Tulio Tavares, Renata Araujo, Guilherme T. Ferreira, "Anthill: A Scalable Run-Time Environment for Data Mining Applications," sbac-pad, pp.159-167, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05), 2005