This Article 
 Bibliographic References 
 Add to: 
Discovering Expressive Process Models by Clustering Log Traces
August 2006 (vol. 18 no. 8)
pp. 1010-1027
Process mining techniques have recently received notable attention in the literature for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic approaches, where the identification of different variants for the process is explicitly accounted for, based on the clustering of log traces. Indeed, modeling each group of similar executions with a different schema allows us to single out "conformant” models, which, specifically, minimize the number of modeled enactments that are extraneous to the process semantics. Therefore, a novel process mining framework is introduced and some relevant computational issues are deeply studied. As finding an exact solution to such an enhanced process mining problem is proven to require high computational costs, in most practical cases, a greedy approach is devised. This is founded on an iterative, hierarchical, refinement of the process model, where, at each step, traces sharing similar behavior patterns are clustered together and equipped with a specialized schema. The algorithm guarantees that each refinement leads to an increasingly sound model, thus attaining a monotonic search. Experimental results evidence the validity of the approach with respect to both effectiveness and scalability.

[1] W.M. P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.J.M.M. Weijters, “Workflow Mining: A Survey of Issues and Approaches,” Data and Knowledge Eng., vol. 47, no. 2, pp. 237-267, 2003.
[2] A. Rozinat and W.M.P. van der Aalst, “Conformance Testing: Measuring the Fit and Appropriateness of Event Logs and Process Models,” Proc. Int'l Workshop Business Process Intelligence (BPI '05), pp. 1-12, 2005.
[3] F. Casati, M. Castellanos, and M. Shan, “Enterprise Cockpit for Business Operation Management,” Proc. 23rd Int'l Conf. Conceptual Modeling (ER '04), pp. 825-827, 2004.
[4] D.-R. Liu and M. Shen, “Workflow Modeling for Virtual Processes: An Order-Preserving Process-View Approach,” Information Systems, vol. 28, pp. 505-532, 2003.
[5] J.E. Cook and A.L. Wolf, “Event-Based Detection of Concurrency,” Proc. Sixth Int'l Symp. Foundations of Software Eng. (FSE '98), pp. 35-45, 1998.
[6] A.J.M.M. Weijters and W.M.P. van der Aalst, “Rediscovering Workflow Models from Event-Based Data Using Little Thumb,” Integrated Computer-Aided Eng., vol. 10, no. 2, pp. 151-162, 2003.
[7] J. Herbst, “Dealing with Concurrency in Workflow Induction,” Proc. European Concurrent Eng. Conf., 2000.
[8] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs,” Proc. Sixth Int'l Conf. Extending Database Technology (EDBT '98), pp. 469-483, 1998.
[9] G. Schimm, “Mining Most Specific Workflow Models from Event-Based Data,” Proc. Int'l Conf. Business Process Management, pp. 25-40, 2003.
[10] B.F. van Dongen and W.M.P. van der Aalst, “Multi-Phase Process Mining: Building Instance Graphs,” Proc. Int'l Conf. Conceptual Modeling (ER), pp. 362-376, 2004.
[11] B.F. van Dongen and W.M.P. van der Aalst, “Multi-Phase Process Mining: Aggregating Instance Graphs into Epcs and Petri Nets,” Proc. Int'l Workshop Applications of Petri Nets to Coordination, Worklflow and Business Process Management (PNCWB) at ICATPN '05, 2005.
[12] H. Davulcu, M. Kifer, C.R. Ramakrishnan, and I.V. Ramakrishnan, “Logic Based Modeling and Analysis of Workflows,” Proc. 17th ACM Symp. Principles of Database Systems (PODS '98), pp. 25-33, 1998.
[13] P. Muth, J. Weifenfels, M. Gillmann, and G. Weikum, “Integrating Light-Weight Workflow Management Systems within Existing Business Environments,” Proc. 15th IEEE Int'l Conf. Data Eng. (ICDE '99), pp. 286-293, 1999.
[14] P. Senkul, M. Kifer, and I.H. Toroslu, “A Logical Framework for Scheduling Workflows under Resource Allocation Constraints,” Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 694-702, 2002.
[15] H. Schuldt, G. Alonso, C. Beeri, and H. Schek, “Atomicity and Isolation for Transactional Processes,” ACM Trans. Database Systems, vol. 27, no. 1, pp. 63-116, 2002.
[16] W.M.P. van der Aalst, A. Hirnschall, and H.M.W. Verbeek, “An Alternative Way to Analyze Workflow Graphs,” Proc. 14th Int'l Conf. Advanced Information Systems Eng., pp. 534-552, 2002.
[17] M. Kamath and K. Ramamritham, “Correctness Issues in Workflow Management,” Distributed Systems Eng., vol. 3, no. 4, pp. 213-221, 1996.
[18] W.M.P. van der Aalst, “The Application of Petri Nets to Worflow Management,” J. Circuits, Systems, and Computers, vol. 8, no. 1, pp. 21-66, 1998.
[19] W.M.P. van der Aalst, J. Desel, and E. Kindler, “On the Semantics of EPCS: A Vicious Circle,” Proc. EPK 2002: Business Process Management Using EPCs, pp. 71-80, 2002.
[20] G. Keller, M. Nuttgens, and A.W. Scheer, Semantische Processmodellierung auf der Grundlage Ereignisgesteuerter Processketten (EPK). Univ. of Saarland, Saarbrucken, 1992.
[21] W.M.P. van der Aalst, A.J.M.M. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from Event Logs,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1128-1142, Sept. 2004.
[22] L. Pontieri, G. Greco, A. Guzzo, and D. Sacca, “Discovering Expressive Process Models by Clustering Log Traces [Appendix],” http://www.icar.cnr.itwfmining, 2005.
[23] G. Greco, A. Guzzo, G. Manco, and D. Sacca, “Mining and Reasoning on Workflows,” IEEE Trans. Data and Knowledge Eng., vol. 17, no. 4, pp. 519-534, Apr. 2005.
[24] G. Greco, A. Guzzo, and L. Pontieri, “Mining Hierarchies of Models: From Abstract Views to Concrete Specifications,” Proc. Int'l Conf. Business Process Management, pp. 32-47, 2005.
[25] R.J. van Gabbeek and W.P. Weijland, “Branching Time and Abstraction in Bisimulation Semantics,” J. ACM, vol. 43, no. 3, pp. 555-600, 1996.
[26] T. Basten and W. van der Aalst, “Inheritance of Workflows: An Approach to Tackling Problems Related to Change,” Theoretical Computer Science, vol. 270, nos. 1-2, pp. 125-203, 2002.
[27] V. Guralnik and G. Karypis, “A Scalable Algorithm for Clustering Sequential Data,” Proc. IEEE Int'l Conf. Data Maning (ICDM '01), pp. 179-186, 2001.
[28] J. Han, J. Pei, B. Mortazavi-Asl, U. Dayal, and M. Hsu, “Freespan: Frequent Pattern-Projected Sequential Pattern Mining,” Proc. Int'l ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 355-359, 2000.
[29] Y.S. Kim, W.N. Street, and F. Menczer, “Feature Selection in Unsupervised Learning via Evolutionary Search,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 365-369, 2000.
[30] N. Lesh, M.J. Zaki, and M. Ogihara, “Mining Features for Sequence Classification,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 342-346, 1999.
[31] H. Motoda and H. Liu, “Data Reduction: Feature Selection,” Handbook of Data Mining and Knowledge Discovery, pp. 208-213, 2002.
[32] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD, pp. 207-216, 1993.
[33] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Databases, pp. 487-499, 1994.
[34] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, 1995.
[35] B. Padmanabhan and A. Tuzhilin, “Small Is Beautiful: Discovering the Minimal Set of Unexpected Patterns,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 54-63, 2000.
[36] B.F. van Dongen, A.K.A. de Medeiros, H.M.W. Verbeek, A.J.M.M. Weijters, and W.M.P. van der Aalst, “The Prom Framework: A New Era in Process Mining Tool Support,” Proc. 26th Int'l Conf. Applications and Theory of Petri Nets (ICATPN '05), pp. 444-454, 2005.
[37] J.E. Cook and A.L. Wolf, “Automating Process Discovery through Event-Data Analysis,” Proc. 17th Int'l Conf. Software Eng. (ICSE '95), pp. 73-82, 1995.
[38] J.E. Cook and A.L. Wolf, “Software Process Validation: Quantitatively Measuring the Correspondence of a Process to a Model,” ACM Trans. Software Eng. Methodology, vol. 8, no. 2, pp. 147-176, 1999.
[39] W.M. P. van der Aalst and K.M. van Hee, Workflow Management: Models, Methods, and Systems. MIT Press, 2002.
[40] W.M. P. van der Aalst and B.F. van Dongen, “Discovering Workflow Performance Models from Timed Logs,” Proc. Int'l Conf. Eng. and Deployment of Cooperative Information Systems (EDCIS '02), pp. 45-63, 2002.
[41] A.K.A de Medeiros, B.F. van Dongen, W.M.P. van der Aalst, and A.J.M.M. Weijters, “Process Mining: Extending the A-Algorithm to Mine Short Loops,” Technical Report, Univ. of Technology, Eindhoven, BETA Working Paper Series, WP 113, 2004.
[42] J. Herbst and D. Karagiannis, “Integrating Machine Learning and Workflow Management to Support Acquisition and Adaptation of Workflow Models,” J. Intelligent Systems in Accounting, Finance, and Management, vol. 9, pp. 67-92, 2000.
[43] S. Junginger, H. Kuhn, R. Strobl, and D. Karagiannis, “Ein Geschafts-Prozessmanagement-Werkzeug der Nachsten Generation— adonis: Konzeption und Anwendungen,” Wirtschaftsinformatik, vol. 42, no. 3, pp. 392-401, 2000.
[44] I.D.S. Scheer, “Aris Process Performance Manager (Aris PPM): Measure, Analyze and Optimize Your Business Process Performance (whitepaper),” http:/, 2002.

Index Terms:
Process mining, data mining, workflow management, clustering, classification, association rules.
Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, Domenico Sacc?, "Discovering Expressive Process Models by Clustering Log Traces," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1010-1027, Aug. 2006, doi:10.1109/TKDE.2006.123
Usage of this product signifies your acceptance of the Terms of Use.