This Article 
 Bibliographic References 
 Add to: 
Mining and Reasoning on Workflows
April 2005 (vol. 17 no. 4)
pp. 519-534
Today's workflow management systems represent a key technological infrastructure for advanced applications that is attracting a growing body of research, mainly focused in developing tools for workflow management, that allow users both to specify the "static” aspects, like preconditions, precedences among activities, and rules for exception handling, and to control its execution by scheduling the activities on the available resources. This paper deals with an aspect of workflows which has so far not received much attention even though it is crucial for the forthcoming scenarios of large scale applications on the Web: Providing facilities for the human system administrator for identifying the choices performed more frequently in the past that had lead to a desired final configuration. In this context, we formalize the problem of discovering the most frequent patterns of executions, i.e., the workflow substructures that have been scheduled more frequently by the system. We attacked the problem by developing two data mining algorithms on the basis of an intuitive and original graph formalization of a workflow schema and its occurrences. The model is used both to prove some intractability results that strongly motivate the use of data mining techniques and to derive interesting structural properties for reducing the search space for frequent patterns. Indeed, the experiments we have carried out show that our algorithms outperform standard data mining algorithms adapted to discover frequent patterns of workflow executions.

[1] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int'l Conf. Data Eng., pp. 3-14, 1995.
[2] M. Zaki, “Efficiently Mining Frequent Trees in a Forest,” Proc. Eighth Int'l Conf. Knowledge Discovery and Data Mining, pp. 71-80, 2002.
[3] J. Han, J. Pei, and Y. Yi, “Mining Frequent Patterns without Candidate Generation,” Proc. Int'l ACM Conf. Management of Data, pp. 1-12, 2000.
[4] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Prefixspan: Mining Sequential Patterns by Prefix-Projected Growth,” Proc. IEEE Int'l Conf. Data Eng., pp. 215-224, 2001.
[5] A. Inokuchi, T. Washi, and H. Motoda, “An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data,” Proc. Fourth European Conf. Principles of Data Mining and Knowledge Discovery, pp. 13-23, 2000.
[6] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Pining,” Proc. IEEE Int'l Conf. Data Mining, an extended version appeared as UIUC-CS Technical Report R-2002-2296, 2001.
[7] M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” Proc. IEEE Int'l Conf. Data Mining, pp. 313-320, 2001.
[8] X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns,” Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, pp. 286-295, 2003.
[9] P. Senkul, M. Kifer, and I.H. Toroslu, “A Logical Framework for Scheduling Workflows under Resource Allocation Constraints,” Proc. 28th Int'l Conf. Very Large Data Bases, pp. 694-702, 2002.
[10] H. Schuldt, G. Alonso, C. Beeri, and H. Schek, “Atomicity and Isolation for Transactional Processes,” ACM Trans. Database Systems, vol. 27, no. 1, pp. 63-116, 2002.
[11] A. Bonner, “Workflow, Transactions, and Datalog,” Proc. 18th ACM Symp. Principles of Database Systems, pp. 294-305, 1999.
[12] H. Davulcu, M. Kifer, C.R. Ramakrishnan, and I.V. Ramakrishnan, “Logic Based Modeling and Analysis of Workflows,” Proc. 17th ACM Symp. Principles of Database Systems, pp. 25-33, 1998.
[13] D. Wodtke and G. Weikum, “A Formal Foundation for Distributed Workflow Execution Based on State Charts,” Proc. Sixth Int'l Conf. Database Theory, pp. 230-246, 1997.
[14] D. Wodtkeand, J. Weissenfels, G. Weikum, and A. Dittrich, “The Mentor Project: Steps Towards Enterprise-Wide Workflow Management,” Proc. IEEE Int'l Conf. Data Eng., pp. 556-565, 1996.
[15] G. Kappel, P. Lang, S. Rausch-Schott, and W. Retschitzagger, “Workflow Management Based on Object, Rules, and Roles,” IEEE Data Eng. Bull., vol. 18, no. 1, pp. 11-18, 1995.
[16] M.P. Sing, “Semantical Considerations on Workflows: An Algebra for Intertask Dependencies,” Proc. Int'l Workshop Database Programming Languages, pp. 6-8, 1995.
[17] W.M.P. van der Aalst, “The Application of Petri Nets to Worflow Management,” J. Circuits, Systems, and Computers, vol. 8, no. 1, pp. 21-66, 1998.
[18] W.M.P. van der Aalst, A. Hirnschall, and H.M.W. Verbeek, “An Alternative Way to Analyze Workflow Graphs,” Proc. 14th Int'l Conf. Advanced Information Systems Eng., pp. 534-552, 2002.
[19] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs,” Proc. Sixth Int'l Conf. Extending Database Technology, pp. 469-483, 1998.
[20] W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.J.M.M. Weijters, “Workflow Mining: A Survey of Issues and Approaches,” Data and Knowledge Eng., vol. 47, no. 3, pp. 237-267, 2003.
[21] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Databases, 1994.
[22] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “H-Mine: Hyper-Structure Mining Of Frequent Patterns in Large Databases,” Proc. IEEE Int'l Conf. Data Mining, pp. 441-448, 2001.
[23] P. Koksal, S.N. Arpinar, and A. Dogac, “Workflow History Management,” SIGMOD Record Archive, vol. 27, no. 1, pp. 67-75, 1998.
[24] W.M.P. van der Aalst and K.M. van Hee, Workflow Management: Models, Methods, and Systems. MIT Press, 2002.
[25] J.E. Cook and A.L. Wolf, “Automating Process Discovery through Event-Data Analysis,” Proc. 17th Int'l Conf. Software Eng., pp. 73-82, 1995.
[26] L. Dehaspe and H. Toivonen, “Discovery of Frequent DATALOG Patterns,” Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 7-36, 1999.
[27] D. Georgakopoulos, M. Hornick, and A. Sheth, “An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure,” Distributed and Parallel Databases, vol. 3, no. 2, pp. 119-153, 1995.
[28] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of ${\bf NP}{\hbox{-}}completeness$ . New York: Freeman and Comp., 1979.
[29] N.D. Jones and T. Laaser, “Complete Problems for Deterministic Polynomial Time,” Theoretical Computer Science, vol. 3, pp. 105-117, 1977.
[30] W.M.P. van der Aalst, A.H.M. ter Hofstede, B. Kiepuszewski, and A.P. Barros, “Advanced Workflow Patterns,” Proc. Seventh Int'l Conf. Cooperative Information Systems, pp. 18-29, 2000.
[31] J. Yang, W. Wang, and P.S. Yu, “Mining Asynchronous Periodic Patterns in Time Series Data,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 275-279, 2000.

Index Terms:
Data mining, workflow management.
Gianluigi Greco, Antonella Guzzo, Giuseppe Manco, Domenico Sacc?, "Mining and Reasoning on Workflows," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 519-534, April 2005, doi:10.1109/TKDE.2005.63
Usage of this product signifies your acceptance of the Terms of Use.