Grid and Pervasive Computing Conference, Workshops at the (2009)
May 4, 2009 to May 8, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/GPC.2009.9
Workflow environments are widely used in data mining systems to manage data and execution flows associated to complex applications. Weka, one of the most used open-source data mining systems, includes the KnowledgeFlow environment which provides a drag-and-drop interface to compose and execute data mining workflows. The Weka KnowledgeFlow allows users to execute a whole workflow only on a single computer. On the other hand, most data mining workflows include several independent branches that could be run in parallel on a set of distributed machines to reduce the overall execution time. We implemented distributed workflow execution in Weka4WS, a framework that extends Weka and its KnowledgeFlow environment to exploit distributed resources available in a Grid using Web Service technologies. In this paper we describe the Weka4WS architecture and the functionalities provided by its service-oriented KnowledgeFlow component, showing its use to compose and execute simple parallel data mining workflows. Furthermore, we present ongoing work aimed at supporting also data-parallel workflows on a Grid.
Data Mining, Grid, Web Services, Weka4WS, Workflows
M. Lackovic, P. Trunfio and D. Talia, "A Service-Oriented Framework for Executing Data Mining Workflows on Grids," Grid and Pervasive Computing Conference, Workshops at the(GPC), Geneva, Switzerland, 2009, pp. 72-79.