Process Mining Manifesto: Toward Real Business Intelligence
by Wil van der Aalst
Process mining research started in the late ’90s but has only recently appeared on the radar for business intelligence practitioners. Process mining lets users and managers look inside processes, providing valuable insights for process improvement and compliance checking. On 7 October 2011, the IEEE Task Force on Process Mining released the Process Mining Manifesto. Fifty-three organizations support the manifesto, and 77 process mining experts contributed to it. The active involvement of end users, tool vendors, consultants, analysts, and researchers illustrates the growing relevance of process mining as a bridge between data mining and business process modeling.
The term business intelligence refers to a broad collection of tools and methods that use data to support decision making. Forrester defines BI as “a set of methodologies, processes, architectures, and technologies that leverage the output of information management processes for analysis, reporting, performance management, and information delivery”.1 BI is, unfortunately, an oxymoron in many companies, which use primitive tools to monitor and analyze processes. Moreover, most BI vendors offer products that are data-centric and focus on rather simplistic forms of analysis, such as “dashboards” and “scorecards.” Mainstream BI tools are not as “intelligent” as the term suggests, and end users are easily confused by the marking terms BI vendors use. Nevertheless, the market for BI products is steadily growing, showing BI’s practical relevance. Therefore, we need truly intelligent approaches to BI, such as those that process mining enables. Here, I’ll summarize the recently released Process Mining Manifesto of the IEEE Task Force on Process Mining and explain its practical relevance to business intelligence.
Process mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand and process modeling and analysis on the other. The idea of process mining is to discover, monitor, and improve real processes (not assumed processes) by extracting knowledge from event logs readily available in today’s information systems.2 Process mining includes
- automated process discovery (extracting process models from an event log),
- conformance checking (monitoring deviations by comparing model and log),
- social network and organizational mining,
- automated construction of simulation models,
- model extension and repair,
- case prediction, and
- history-based recommendations.
Figure 1 illustrates the scope of process mining. The starting point for process mining is an event log. All process-mining techniques assume that it’s possible to sequentially record events such that each event refers to an activity (a well-defined step in some process) and is related to a particular case, or process instance. Event logs can store additional information about events. In fact, whenever possible, process-mining techniques use extra information such as the resource (person or device) executing or initiating the activity, the timestamp of the event, or data elements recorded with the event (for instance, the size of an order).
Figure 1. Process-mining techniques extract knowledge from event logs to discover, monitor, and improve processes.
We can use event logs to conduct three types of process mining.2,3 The first type is discovery. A discovery technique produces a model from an event log without using any a-priori information. Process discovery is the most prominent process-mining technique. In many organizations, people are surprised to see that existing techniques are indeed able to discover real processes merely based on example executions in event logs. The second type of process mining is conformance. Here, an existing process model is compared with an event log of the same process. For example, there are various algorithms to compute the percentage of events that can be explained by the model. Conformance checking can confirm whether reality, as recorded in the log, conforms to the model and vice versa. The third type of process mining is enhancement. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. For instance, by using timestamps in the event log, you can extend the model to show bottlenecks, service levels, throughput times, and frequencies.
Figure 1 shows how an end-to-end process model is first discovered. The model is visualized as a BPMN (Business Process Modeling Notation) model, but internally algorithms often use more formal notations such as Petri nets, C-nets, and transition systems.2 By replaying the event log on the model, it’s possible to add information about bottlenecks, decisions, roles, and resources.
The growing interest in log-based process analysis motivated the establishment of the IEEE Task Force on Process Mining. The goal of this task force is to promote the research, development, education, and understanding of process mining. The task force was established in 2009 in the context of the Data Mining Technical Committee of the Computational Intelligence Society of the IEEE. Members of the task force include representatives of more than a dozen commercial software vendors (including Pallas Athena, Software AG, Futura Process Intelligence, HP, IBM, Fujitsu, Infosys, and Fluxicon), 10 consultancy firms (such as Gartner and Deloitte), and more than 20 universities.
Among the concrete objectives of the task force are to
- make end users, developers, consultants, managers, and researchers aware of the state of the art in process mining;
- promote the use of process-mining techniques and tools;
- stimulate new process-mining applications;
- play a role in standardization efforts for logging event data;
- organize tutorials, special sessions, workshops, and panels; and
- publish articles, books, videos, and special issues of journals.
For example, in 2010, the task force standardized XES, an extensible logging format supported by the OpenXES library and by tools such as ProM, XESame, and Nitro. See www.win.tue.nl/ieeetfpm for recent task force activities.
The Process Mining Manifesto by the IEEE Task Force on Process Mining aims to guide software developers, scientists, consultants, and end users and to increase process mining’s visibility as a new tool to improve the design and redesign, control, and support of operational business processes. As an introduction to the state of the art in process mining, here’s a brief summary of the manifesto’s main findings.3
As with any new technology, it’s possible to make obvious mistakes when applying process mining in real-life settings. The six guiding principles in Table 1 aim to prevent users and analysts from making such mistakes.
Table 1. The Six Guiding Principles Listed in the Process Mining Manifesto.
|GP1: Event data should be treated as first-class citizens.||Events should be trustworthy; that is, it should be safe to assume that the recorded events actually happened and that the attributes of events are correct. Event logs should be complete; given a particular scope, no events may be missing. Any recorded event should have well-defined semantics. Moreover, the event data should be safe in terms of privacy and security.|
|GP2: Log extraction should be driven by questions.||Without concrete questions, extracting meaningful event data is very difficult. Consider, for example, the thousands of tables in the database of an enterprise resource planning system such as SAP. Without questions, you don’t know where to start.|
|GP3: Process-mining techniques should support concurrency, choice, and other basic control-flow constructs.||The basic workflow patterns supported by all mainstream languages (such as BPMN, EPCs, Petri nets, BPEL, and UML activity diagrams) are sequence, parallel routing (AND-splits/joins), choice (XOR-splits/joins), and loops. Obviously, process-mining techniques should support these patterns.|
|GP4: Events should be related to model elements.||Conformance checking and enhancement rely heavily on the relationship between elements in the model and events in the log. Process mining tools can use this relationship to “replay” the event log on the model. Replay can reveal discrepancies between event log and model (for example, some events in the log aren’t possible according to the model). It can also enrich the model with additional information from the event log (for example, it can identify bottlenecks by using timestamps).|
|GP5: Models should be treated as purposeful abstractions of reality.||A model derived from event data provides a view on reality. Such a view should serve as a purposeful abstraction of the behavior captured in the event log. Given an event log, multiple useful views might exist.|
|GP6: Process mining should be a continuous process.||Given the dynamic nature of processes, we shouldn’t view process mining as a one-time activity. The goal should be not to create a fixed model, but to breathe life into process models in a way that encourages users and analysts to look at them on a daily basis.|
As an example, consider guiding principle GP4: “Events Should Be Related to Model Elements.” It’s a misconception that process mining is limited to control-flow discovery; other perspectives such as the organizational perspective, the time perspective, and the data perspective are equally important. However, the control-flow perspective (that is, the ordering of activities) serves as the layer connecting the different perspectives. Therefore, it’s important to relate events in the log to activities in the model. After relating events to model elements, it’s possible to “replay” the event log on the model.2 We can use timestamps in the event log to analyze temporal behavior during replay, and we can use time differences between causally related activities to add average or expected waiting times to the model. These examples illustrate the importance of this guiding principle; the relationship between events in the log and elements in the model serves as a starting point for different types of analysis.
Process mining is an important tool for modern organizations that must manage nontrivial operational processes. On the one hand, more and more event data are becoming available. On the other hand, processes and information must be aligned perfectly to meet compliance, efficiency, and customer service requirements. Despite the applicability of process mining, we must still address important challenges; these illustrate that process mining is an emerging discipline. Table 2 lists the 11 challenges the manifesto describes.3
Table 2. Some of the Most Important Process Mining Challenges Identified in the Manifesto.
|C1: Finding, merging, and cleaning event data||When extracting event data suitable for process mining, we must address several challenges: data can be distributed over a variety of sources, event data might be incomplete, an event log could contain outliers, logs could contain events at different level of granularity, and so on.|
|C2: Dealing with complex event logs having diverse characteristics||Event logs can have very different characteristics. Some event logs might be extremely large, making them difficult to handle, whereas others are so small that they don’t provide enough data to make reliable conclusions.|
|C3: Creating representative benchmarks||We need good benchmarks consisting of example data sets and representative quality criteria to compare and improve the various tools and algorithms.|
|C4: Dealing with concept drift||The process might be changing while under analysis. Understanding such concept drifts is of prime importance for process management.|
|C5: Improving the representational bias used for process discovery||A careful and refined selection of the representational bias is necessary to ensure high-quality process-mining results.|
|C6: Balancing between quality criteria such as fitness, simplicity, precision, and generalization||Four competing quality dimensions exist: fitness, simplicity, precision, and generalization. The challenge is to find models that can balance all four dimensions.|
|C7: Cross-organizational mining||In some use cases, event logs from multiple organizations are available for analysis. Some organizations, such as supply chain partners, work together to handle process instances; other organizations execute essentially the same process while sharing experiences, knowledge, or a common infrastructure. However, traditional process-mining techniques typically consider one event log in one organization.|
|C8: Providing operational support||Process mining isn’t restricted to offline analysis; it can also provide online operational support. Detection, prediction, and recommendation are examples of operational support activities.|
|C9: Combining process mining with other types of analysis||The challenge is to combine automated process-mining techniques with other analysis approaches (optimization techniques, data mining, simulation, visual analytics, and so on) to extract more insights from event data.|
|C10: Improving usability for non-experts||The challenge is to hide the sophisticated process-mining algorithms behind user-friendly interfaces that automatically set parameters and suggest suitable types of analysis.|
|C11: Improving understandability for non-experts||The user might have problems understanding the output or be tempted to infer incorrect conclusions. To avoid such problems, process mining tools should present results using a suitable representation and the trustworthiness of the results should always be clearly indicated.|
As an example, consider Challenge C4: “Dealing with Concept Drift.” The term concept drift refers to a situation in which the process is changing while we’re analyzing it. For instance, in the beginning of the event log, two activities might be concurrent, whereas later in the log, they become sequential. Processes might change because of periodic or seasonal changes (for example, “in December, there is more demand” or “on Friday afternoon, fewer employees are available”) or changing conditions (“the market is getting more competitive”). Such changes impact processes, and detecting and analyzing them is vital. However, most process-mining techniques analyze processes as if they’re in steady state.
We started developing the open-source ProM process-mining tool at Eindhoven University of Technology in 2004, applying lessons we had learned from developing previous process-mining tools such as MiMo, EMiT, and Little Thumb. ProM provides a high-performing pluggable architecture and a common basis for all kinds of process-mining techniques. Hundreds of plugins are available; for instance, ProM supports dozens of process-discovery algorithms as plugins. ProM is available for download from prom.sf.net and www.processmining.org.
We have applied process mining (and specifically ProM) in more than 100 organizations, including
- municipalities such as Alkmaar, Heusden, and Harderwijk;
- government agencies such as Rijkswaterstaat, Centraal Justitieel Incasso Bureau, and the Dutch Justice department
- insurance-related agencies such as UWV;
- banks such as ING;
- hospitals such as AMC and Catharina hospitals;
- multinational corporations such as DSM and Deloitte;
- high-tech system manufacturers, such as Philips Healthcare, ASML, Ricoh, and Thales, and their customers; and
- media companies such as Winkwaves.
This illustrates the broad spectrum of situations to which we can apply process mining. To further illustrate this spectrum, here are two examples from Process Mining: Discovery, Conformance and Enhancement of Business Processes.2
Hospitals are particularly challenging from a process-mining viewpoint: they record lots of data, but care processes tend to be highly variable. We conducted several process-mining experiments based on event data of the AMC hospital in Amsterdam. Figure 2 shows an example of a process model constructed using ProM for the hospital. The model was discovered based on event data of a group of 627 gynecological oncology patients treated in 2005 and 2006. The Spaghetti-like process model illustrates that care processes are rather unstructured. A more refined analysis of the model in Figure 2 can show the “highways” in the diagnosis and treatment. We often find that the so-called “80/20” rule applies; that is, 80 percent of the cases follow the “highways” in the process and account for only 20 percent of the model’s complexity.
Figure 2. A Spaghetti process discovered based on an event log containing 24,331 events referring to 627 gynecological oncology patients and 376 different activities.
To date, we have applied process mining in about a dozen municipalities. Moreover, last year we started a new project: CoSeLoG (Configurable Services for Local Governments). Ten municipalities are involved in CoSeLog and they are all interested in cross-organizational process mining — that is, analyzing differences between similar processes in different municipalities. Processes within municipalities tend to be structured, but it’s interesting to see that municipalities tend to do things differently (the infamous “Couleur Locale”). This could explain the large differences in operational performance (response times, people involved, and so forth).
Figure 3. Performance analysis based on a discovered process model for a Dutch municipality.
Figure 3 shows a model of the “WOZ process” discovered for a Dutch municipality based on an event log containing information about 745 objections against the so-called WOZ (“WaarderingOnroerendeZaken”) valuation. Dutch municipalities use the WOZ value, an estimation of the value of houses and apartments, as a basis for determining property taxes. The higher the WOZ value, the more tax the owner pays. Therefore, Dutch municipalities must handle many objections and appeals from citizens who assert that the WOZ values of their properties are too high.
Figure 3 shows how such objections are handled. The rectangles represent activities and the arcs describe causal dependencies. Using the timestamps in the event log, various performance indicators are mapped onto the model. The discovered process model has a good fitness (0.98876214), indicating that the model explains almost all recorded events. The average flow time is approximately 178 days. Figure 3 shows some more performance-related diagnostics computed while replaying the event log containing timestamps. The standard deviation is approximately 53 days.
ProM also visualizes the bottlenecks by coloring the places in the WF-net. Tokens tend to reside longest in the purple places. For example, the place between “OZ16 Uitspraak start” and “OZ16 Uitspraak complete” was visited 436 times. The average time spent in this place is 7.84 days. This indicates that activity “OZ16 Uitspraak” (final judgment) takes about a week. The place before “OZ16 Uitspraak start” is also purple; on average, it takes 138 days to start this activity after enabling. As Figure 3 shows, it’s also possible to simply select two activities and measure the time that passes between them. On average, 202.73 days pass between the completion of activity “OZ02 Voorbereiden” (preparation) and the completion of “OZ16 Uitspraak” (final judgment). Note that this is longer than the average overall flow time because only 416 of the objections (approximately 56 percent) follow this route; the other cases follow the branch “OZ15 Zelfuitspraak” which, on average, takes less time.
Figure 3 illustrates how process mining helps organizations “look inside” their processes. This is in stark contrast with contemporary BI tools, which seem to focus on reporting and on fancy-looking dashboards.
The Process Mining Manifesto is available at www.win.tue.nl/ieeetfpm and is being translated into Chinese, German, French, Spanish, Greek, Italian, Korean, Dutch, Portuguese, Turkish, and Japanese. Other good resources on process mining include my recent book, Process Mining: Discovery, Conformance and Enhancement of Business Processes, and www.processmining.org, which provides sample logs, videos, slides, articles, and ProM software.
- Forrester, "The Forrester Wave: Enterprise Business Intelligence Platforms," www.forrester.com, Q4 2010.
- W. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, Springer-Verlag, 2011.
- W. van der Aalst et al., “Process Mining Manifesto,” Business Process Management Workshops, LNBIP 99, F. Daniel, K. Barkaoui and S. Dustdar, eds., Springer-Verlag, 2011.
Wil van der Aalst is a professor at the Department of Mathematics & Computer Science of the Technische Universiteit Eindhoven, where he chairs the Architecture of Information Systems group. His research and teaching interests include information systems, workflow management, Petri nets, process mining, specification languages, and simulation. Contact him at email@example.com.