Keynote IEEE Symposium Series on Computational Intelligence (SSCI 2011)/IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), April 2011, Paris, France
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Keynote on Process Mining at SSCI 2010 / CIDM 2011
1. Process Mining:
Discovering and Improving
Spaghetti and Lasagna Processes
prof.dr.ir. Wil van der Aalst
www.processmining.org
2. Architecture of Information Systems @ TU/e
process BPM/WFM/
discovery SOA systems
Process PAIS
Mining Technology
conformance workflow
checking patterns
simulation
Process
Modeling/
Analysis
verification
4. The World's Technological Capacity to Store, Communicate, and Compute
Information by Martin Hilbert and Priscila López (DOI 10.1126/science.1200970)
PAGE 3
5. Process Mining =
(RM,RD)
c11
modify
conditions
(YE,RD)
check_A c5
(RM,RD) c2 check_A c8
(E,SD) needed? (RM,RD) (E,RD)
Smoker
c6(YE,RD)
No start register c1 initial c3 check_B check_B c9 asses c12 decline
conditions needed? risk
Yes
c7(FE,FD)
Drinker c4 check_C check_C c10
needed?
Short
(91/10)
<81.5
Yes
Weight
≥81.5
No
Long
(30/1)
+ (SM,SD) (E,SD) c13
(E,FD)
(E,SD)
make c14 handle c15 handle c16 send
offer response payment insurance
documents (E,SD)
Long Short
(150/20) (321/25) c17
withdraw
timeout1 timeout2
offer
Data Mining Process Analysis PAGE 4
6. Process Mining
• Process discovery: "What is
really happening?"
• Conformance checking: "Do
we do what was agreed
upon?"
• Performance analysis:
"Where are the bottlenecks?"
• Process prediction: "Will this
case be late?"
• Process improvement: "How
to redesign this process?"
• Etc.
PAGE 5
7. We applied ProM in >100 organizations
• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)
• Government agencies (e.g., Rijkswaterstaat, Centraal
Justitieel Incasso Bureau, Justice department)
• Insurance related agencies (e.g., UWV)
• Banks (e.g., ING Bank)
• Hospitals (e.g., AMC hospital, Catharina hospital)
• Multinationals (e.g., DSM, Deloitte)
• High-tech system manufacturers and their customers
(e.g., Philips Healthcare, ASML, Ricoh, Thales)
• Media companies (e.g. Winkwaves)
• ...
PAGE 6
8. Process Mining
supports/
“world” business
controls
processes software
people machines system
components
organizations records
events, e.g.,
messages,
specifies transactions,
models
configures etc.
analyzes
implements
analyzes
discovery
(process) event
conformance
model logs
enhancement
10. Simplified event log
a = register request,
b = examine thoroughly,
c = examine casually,
d = check ticket,
e = decide,
f = reinitiate request,
g = pay compensation,
and h = reject request
PAGE 9
11. Process
discovery
b
examine
thoroughly
g
c1 c3 pay
c compensation
a examine
e
start register casually decide c5 end
request
h
c2 d c4 reject
check ticket request
f
reinitiate
request
PAGE 10
12. Conformance
checking
b case 7: e is
executed
examine without
thoroughly case 8: g or
being g h is missing
enabled
c1 c3 pay
c compensation
a examine
e
start register casually decide c5 end
request case 10: e h
d is missing
c2 c4 reject
in second
check ticket round request
f
reinitiate
request PAGE 11
13. Extension: Adding perspectives to
model based on event log
The event log can be used to
discover roles in the organization
(e.g., groups of people with similar
work patterns). These roles can be Performance information (e.g., the
used to relate individuals and average time between two
activities. subsequent activities) can be
extracted from the event log and
visualized on top of the model.
Role A: Role E: Role M:
Assistant Expert Manager
Decision rules (e.g., a decision tree
based on data known at the time a
Pete Sue Sara particular choice was made) can be
learned from the event log and used
Mike Sean to annotated decisions.
Ellen E
b
A
examine
thoroughly
A
g
A M
c1 c3 pay
c compensation
a examine
e
A
start register casually
A decide c5 end
request
h
c2 d c4 M reject
check ticket request
f
reinitiate
request PAGE 12
26. >,→,||,# relations
• Direct succession: x>y iff
for some case x is directly
followed by y. abcd
• Causality: x→y iff x>y and acbd
not y>x. aed
• Parallel: x||y iff x>y and a>b
y>x a>c a→b b#e
a>e
• Choice: x#y iff not x>y and a→c e#b
b>c b||c c#e
not y>x. a→e
b>d c||b
c>b b→d a#d
…
c>d c→d
e>d e→d PAGE 25
27. Basic Idea Used by α Algorithm (1)
a b
(a) sequence pattern: a→b
PAGE 26
28. Basic Idea Used by α Algorithm (2)
a
b c
b
a
(c) XOR-join pattern:
b
a→c, b→c, and a#b
a c
c
(b) XOR-split pattern:
(b) XOR-split pattern:a→c, and b#c
a→b,
a→b, a→c, and b#c PAGE 27
29. Basic Idea Used by α Algorithm (3)
a
b c
b
a
(e) AND-join pattern:
b a→c, b→c, and a||b
a
c
c
(d) AND-split pattern:
(d) AND-split pattern:
a→b, a→c, and b||c
a→b, a→c, and b||c PAGE 28
30. Example Revisited
a>b a→b b||c b#e
a>c a→c c||b e#b
a>e a→e c#e
b>c a#d
b→d
b>d …
c>b c→ d
c>d e→d b
e>d
a p1 e p3 d
start end
p2 c p4
Result produced by α algorithm PAGE 29
34. What is the best model?
A D
C
ACD 99 B E
ACE 0
BCE 85
A D
BCD 0
C
B E
PAGE 33
35. What is the best model?
A D
C
ACD 99 B E
ACE 88
BCE 85
A D
BCD 78
C
B E
PAGE 34
36. What is the best model?
A D
C
ACD 99 B E
ACE 2
BCE 85
A D
BCD 3
C
B E
PAGE 35
37. Example: one log four models
b
examine
thoroughly
g
pay
c compensation
a examine e
start register casually decide end
# trace
request
h 455 acdeh
d reject
check ticket request 191 abdeg
f reinitiate
request 177 adceh
N1 : fitness = +, precision = +, generalization = +, simplicity = +
144 abdeh
111 acdeg
a c d e h
82 adceg
start register examine check decide reject end
request casually ticket request
56 adbeh
N2 : fitness = -, precision = +, generalization = -, simplicity = +
47 acdefdbeh
“able to replay event log” “Occam’s razor”
38 adbeg
examine check
thoroughly b d ticket g 33 acdefbdeh
fitness simplicity pay
compensation
a 14 acdefbdeg
start register examine
c end 11 acdefdbeg
request casually
e f reinitiate h
process decide request reject
request
9 adcefcdeh
discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh
5 adcefbdeg
a d c e g
3 acdefbdefdbeg
generalization precision register
request
check
ticket
examine
casually
decide pay
compensation
2 adcefdbeg
a c d e g 2 adcefbdefbdeg
“not overfitting the log” “not underfitting the log” register examine check decide pay
request casually ticket compensation 1 adcefdbefbdeh
a d c e h 1 adbefbdefdbeg
register check examine decide reject
request ticket casually request 1 adcefdbefcdefdbeg
a c d e h 1391
start end
register examine check decide reject
request casually ticket request
… (all 21 variants seen in the log)
a b d e g
register examine check decide pay
request thoroughly ticket compensation
a d b e h
register check examine decide reject
request ticket thoroughly request
a b d e h
register examine check decide reject
request thoroughly ticket request PAGE 36
N4 : fitness = +, precision = +, generalization = -, simplicity = -
38. # trace
455 acdeh
Model N1 191 abdeg
177 adceh
144 abdeh
111 acdeg
82 adceg
56 adbeh
b 47 acdefdbeh
examine
thoroughly 38 adbeg
g 33 acdefbdeh
pay
c compensation 14 acdefbdeg
a examine e
11 acdefdbeg
start register casually decide end
request 9 adcefcdeh
h
d reject 8 adcefdbeh
check ticket request 5 adcefbdeg
f reinitiate 3 acdefbdefdbeg
request
N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg
2 adcefbdefbdeg
1 adcefdbefbdeh
1 adbefbdefdbeg
1 adcefdbefcdefdbeg
PAGE 37
1391
40. # trace
455 acdeh
Model N3 191 abdeg
177 adceh
144 abdeh
111 acdeg
82 adceg
56 adbeh
47 acdefdbeh
examine check
thoroughly b d ticket g 38 adbeg
pay 33 acdefbdeh
compensation
a 14 acdefbdeg
start register examine end 11 acdefdbeg
request casually c
e f reinitiate
reject
h 9 adcefcdeh
decide request
request 8 adcefdbeh
N3 : fitness = +, precision = -, generalization = +, simplicity = +
5 adcefbdeg
3 acdefbdefdbeg
2 adcefdbeg
2 adcefbdefbdeg
1 adcefdbefbdeh
1 adbefbdefdbeg
1 adcefdbefcdefdbeg
PAGE 39
1391
41. # trace
455 acdeh
Model N4 191 abdeg
177 adceh
144 abdeh
a d c e g 111 acdeg
register check examine decide pay
request ticket casually compensation 82 adceg
a c d e g 56 adbeh
register examine check decide pay
request casually ticket compensation 47 acdefdbeh
a d c e h 38 adbeg
register check examine decide reject
request ticket casually request 33 acdefbdeh
a c d e h 14 acdefbdeg
start end
register examine check decide reject
request casually ticket request 11 acdefdbeg
… (all 21 variants seen in the log)
9 adcefcdeh
8 adcefdbeh
5 adcefbdeg
a b d e g
register examine check decide pay 3 acdefbdefdbeg
request thoroughly ticket compensation
2 adcefdbeg
a d b e h
register check examine decide reject 2 adcefbdefbdeg
request ticket thoroughly request
1 adcefdbefbdeh
a b d e h
register examine check decide reject 1 adbefbdefdbeg
request thoroughly ticket request
1 adcefdbefcdefdbeg
N4 : fitness = +, precision = +, generalization = -, simplicity = -
PAGE 40
1391
42. Why is process mining such a difficult
problem?
• There are no negative examples (i.e., a log shows
what has happened but does not show what could
not happen).
• Due to concurrency, loops, and choices the search
space has a complex structure and the log typically
contains only a fraction of all possible behaviors.
• There is no clear relation between the size of a model
and its behavior (i.e., a smaller model may generate
more or less behavior although classical analysis
and evaluation methods typically assume some
monotonicity property).
PAGE 41
43. How can process mining help?
• Detect bottlenecks • Provide mirror
• Detect deviations • Highlight important
• Performance problems
measurement • Avoid ICT failures
• Suggest improvements • Avoid management by
• Decision support (e.g., PowerPoint
recommendation and • From “politics” to
prediction) “analytics”
PAGE 42
45. Example of a Lasagna process: WMO
process of a Dutch municipality
Each line corresponds to one of the 528 requests that were
handled in the period from 4-1-2009 until 28-2-2010. In total
there are 5498 events represented as dots. The mean time
needed to handled a case is approximately 25 days. PAGE 44
46. WMO process
(Wet Maatschappelijke Ondersteuning)
• WMO refers to the social support act that came into
force in The Netherlands on January 1st, 2007.
• The aim of this act is to assist people with disabilities
and impairments. Under the act, local authorities are
required to give support to those who need it, e.g.,
household help, providing wheelchairs and
scootmobiles, and adaptations to homes.
• There are different processes for the different kinds of
help. We focus on the process for handling requests
for household help.
• In a period of about one year, 528 requests for
household WMO support were received.
• These 528 requests generated 5498 events.
PAGE 45
52. Conformance check WMO process (3/3)
The fitness of the discovered
process is 0.99521667. Of the 528
cases, 496 cases fit perfectly
whereas for 32 cases there are
missing or remaining tokens.
PAGE 51
55. Bottleneck analysis WMO process (3/3)
flow time of
approx. 25 days
with a standard
deviation of
approx. 28
PAGE 54
56. Two additional Lasagna processes
RWS
(“Rijkswaterstaat”)
process
WOZ (“Waardering
Onroerende Zaken”)
process
PAGE 55
57. RWS Process
• The Dutch national public works department, called
“Rijkswaterstaat” (RWS), has twelve provincial offices.
We analyzed the handling of invoices in one of these
offices.
• The office employs about 1,000 civil servants and is
primarily responsible for the construction and
maintenance of the road and water infrastructure in its
province.
• To perform its functions, the RWS office subcontracts
various parties such as road construction companies,
cleaning companies, and environmental bureaus. Also,
it purchases services and products to support its
construction, maintenance, and administrative
activities. PAGE 56
59. Social network constructed based on
handovers of work
Each of the 271 nodes
corresponds to a civil
servant. Two civil
servants are
connected if one
executed an activity
causally following an
activity executed by the
other civil servant
PAGE 58
60. Social network consisting of civil servants that
executed more than 2000 activities in a 9 month period.
The darker arcs
indicate the strongest
relationships in the
social network.
Nodes having the
same color belong to
the same clique.
PAGE 59
61. WOZ process
• Event log containing information about 745 objections
against the so-called WOZ (“Waardering Onroerende
Zaken”) valuation.
• Dutch municipalities need to estimate the value of
houses and apartments. The WOZ value is used as a
basis for determining the real-estate property tax.
• The higher the WOZ value, the more tax the owner needs
to pay. Therefore, there are many objections (i.e.,
appeals) of citizens that assert that the WOZ value is too
high.
• “WOZ process” discovered for another municipality (i.e.,
different from the one for which we analyzed the WMO
process).
PAGE 60
62. Discovered process model
The log contains events related to 745 objections against the
so-called WOZ valuation. These 745 objections generated 9583
events. There are 13 activities. For 12 of these activities both
start and complete events are recorded. Hence, the WF-net has
PAGE 61
25 transitions.
64. Performance analysis
bottleneck detection: places are
colored based on average durations
time required to
move from one
activity to another
information on
total flow time
PAGE 63
67. Example of a Spaghetti process
Spaghetti process describing the diagnosis and treatment of 2765
patients in a Dutch hospital. The process model was constructed
based on an event log containing 114,592 events. There are 619
different activities (taking event types into account) executed by 266
different individuals (doctors, nurses, etc.).
PAGE 66
69. Another example
(event log of Dutch housing agency)
The event log contains 208
cases that generated 5987
events. There are 74
different activities.
PAGE 68
72. Example of a map
Road map of The
Netherlands. The map
abstracts from smaller
cities and less significant
roads; only the bigger
cities, highways, and
other important roads are
shown. Moreover, cities
aggregate local roads
and local districts. Also
not use of color, size, etc.
PAGE 71
73. Illustrating the problem
x
start
y 1.0 z 1.0
1.0 a f j
p3 p9
p1 p12
p7
0.4 0.3
0.4 0.6 0.6 0.4
0.3
0.6 0.4
b c d g h k l
0.4 0.3 0.3 p4 0.4 0.6 p10
p2 p8
p5 p11
1.0 e i 1.0
p6
end
PAGE 72
74. Classical top level view:
low level connections still exist
p3
p9
p4
x y z
p10
p5
p11
x
start
y 1.0 z 1.0
p6
1.0 a f j
p3 p9
p1 p12
p7
0.4 0.3
0.4 0.6 0.6 0.4
0.3
0.6 0.4
b c d g h k l
0.4 0.3 0.3 p4 0.4 0.6 p10
p2 p8
p5 p11
1.0 e i 1.0
p6
end
PAGE 73
75. Seamless zoom
Threshold: 1.0
x y z
a f j
x y z
e i
Threshold: 0.6
x y z
a f j
h k
x y z
e i
Threshold: 0.4
x y z
a f j
b g h k l x y z
e i
Threshold: 0.3
x y z
a f j
b c d g h k l x y z
e i
PAGE 74
77. Fuzzy miner:
two views on the same process
fuzzy model showing fuzzy model
all activities
showing only
two activities
color and
width of arc
indicates
significance
of connection
PAGE 76
78. Balancing between both extremes
fuzzy model showing
all activities
fuzzy model
showing only
two activities
color and
width of arc
indicates
significance
of connection
aggregated node
containing 10 activities
inner structure of
aggregated node
PAGE 77
83. Navigation
• Whereas a TomTom device is continuously showing
the expected arrival time, users of today’s
information systems are often left clueless about
likely outcomes of the cases they are working on.
• Car navigation systems provide directions and
guidance without controlling the driver. The driver is
still in control, but, given a goal (e.g. to get from A to
B as fast as possible), the navigation system
recommends the next action to be taken.
• Operational support provides TomTom functionality
for business processes.
PAGE 82
84. Recommend: How to get home ASAP? Take a left turn!
Detect: You drive too fast!
Predict: When will I be home? At 11.26!
PAGE 83