SlideShare una empresa de Scribd logo
1 de 165
Descargar para leer sin conexión
Diego Calvanese, Marco Montali, Tahir Emre Kalayci, Ario Santoso
KRDB Research Centre for Knowledge and Data
Faculty of Computer Science
Free University of Bozen-Bolzano
montali@inf.unibz.it
Process Mining
OBDA
for
Log Extraction
in
<Managing Organisations…
Mobile by Calder
<Managing Organisations…
models
managers/
analysts
Mobile by Calder
<Managing Organisations…
data
(knowledge)

workers
Mobile by Calder
models
managers/
analysts
<
Mobile by Calder
Managing Organisations…
models data
?
Marrying processes and data

is extremely difficult….
… but is a must 

if we want to really understand 

how complex dynamic systems operate.
6
Our Approach
7
Business Process
Management
Data
Management
Conceptual
Modeling
Formal
Methods
Artificial
Intelligence
Our Research
8
Theory
Practice
Our Research
9
Theory
Practice
<Agenda
1. Intro to process mining
2. The problem of data
preparation in process
mining
3. The onprom framework:
OBDA for data preparation
in process mining
4. Process mining demo
Process Mining
Process Management Based on Facts
Extensive credits: Wil van der Aalst (TU/e), Chiara Ghidini (FBK)
Disclaimer
• We will simplify to make the issues more apparent
• Criticism has to be seen as a positive force towards
improvement
The two realities
Reality 1: Managers and Analysts
Reality studied, analyzed, planned through using different types of models.
Decision making to improve the overall organization.
The two realities
Reality 2: Daily workers
Reality experienced directly.
Decision making to determine how to best handle the current situation.
Management 

of the organisation
Daily work
within the organisation
Critical Dichotomy
IT
Our Goal
Management 

of the organisation
Daily work
within the organisation
The Traditional
Model-Driven Approach
Model (Def.)
A simplifying mapping of reality to serve a
specific purpose
(Stachowiak: Allgemeine Modelltheorie, 1973)
• The model corresponds to the modelled object in
the sense that it faithfully reproduces some
fundamental aspects of such an object
Conceptual Modeling
The activity of formally describing some aspects of the
physical and social world around us for the purposes of
understanding and communication.
(John Mylopoulos, 1992)
Conceptual Models in Organisations
A model is an abstraction of reality according to a
certain conceptualization. Once represented as a
concrete artifact, a model can support
communication, learning and analysis about
relevant aspects of the underlying domain. [. . . ] a
represented model (a dusty diagram) created by
an unknown predecessor is a medium to preserve
and communicate a certain view of the world, and
can serve as a vehicle for reasoning and problem
solving, and for acquiring new knowledge (maybe
having striking new ideas!) about this view of the
world.
(Guizzardi, 2005)
Models as Human Mediators
Models as IT Mediators
Operational process
Information
System
Model
Easy…right?
…right?Conceptual Modeling Languag
Clarity: how easy the language can be
stakeholders).
• Graphical
• The langu
foundatio
• The more
di cult is
• Less expr
combinat
• Abstraction: remove unnecessary
Business Process
A set of logically related tasks performed to achieve a defined
business outcome for a particular customer or market.
(Davenport, 1992)
A collection of activities that take one or more kinds of input
and create an output that is of value to the customer.
(Hammer & Champy, 1993)
A set of activities performed in coordination in an
organizational and technical environment. These activities
jointly realize a business goal.
(Weske, 2011)
25
Business Process
Management
A collection of 

concepts, methods, and techniques 

to support humans in

modeling, administration, 

configuration, execution, 

analysis, and continuous improvement 

of business processes
26
Short History
• Smith (~1750): division of labour
• Taylor (~1911): scientific method
applied to organisations
• Hammer and Champy (~1990):
processes as the basis for
reengineering
• 2000s: business process
lifecycle, process-orientation
27
Value Chains, Business Functions, Tasks
ss Functions and Refinement into Activities
y of business
s follows the
ion abstraction.
iness functions
activities.
From tasks…
AnalyseOrder
SimpleCheck
AdvancedCheck
… to their coordination
OrderManagement
GetOrder CheckOrder
AnalyseOrder SimpleCheck AdvancedCheck
End-To-End, Reactive Behaviour
Order-to-cash, procure-to-pay, issue-to-resolution, …
30
Receive
order
Check
availability
Article available?
Ship article
Financial
settlement
yes
Procurement
no
Payment
received
Inform
customer
Late deliveryUndeliverable
Customer
informed
Inform
customer
Article
removed
Remove
article from
catalogue
Input Output
Process Modelling LanguagesCustomerTravelAgencyAirline
Flight needed
Check travel
agency web site
Check flight offer
Reject offer
Book and pay
flight
Make flight offer
Prepare ticket
offer received
request received
Ticket received
Flight paid
Offer rejected
Booking and
payment received
Offer rejection
received
Flight organised
Offer cancelled
Flight offer
Flight offer
[rejected]
Booking and payment
Ticket
Pool
Start event
Exclusive
gateway
Message
event
End event
Task
Event-based
gateway
Data object
BPMN
Flight offer
requested
Make flight offer
Travel Agency
Flight offer sent
to client
Check flight offer
Customer
XOR
Reject Offer
Book and pay flight
Customer
Customer
Offer rejected Cancel Offer
Travel Agency
Flight ticket
needed
Check travel
agency website
Customer
Offer canceled
Offer accepted
Website Flight offer
Flight offer
[rejected]
Flight offer [paid]
Flight offer
[cancelled]
Prepare ticket
Travel Agency Ticket
Airline issues ticket
Ticket prepared
Send ticket to
customer
Flight organised
Travel Agency
Event
Function
Organization unit
Owner
Supporting
system
Process path
Logical operation
Data
Ensure
confortable flight
Goal
EPC/ARIS
Process Modelling Languages
UML
Activity Diagrams
Process Modelling LanguagesCustomerTravelAgency
Check travel
agency website
Make flight
offer
Flight Offer
Check flight
offer
Reject Offer
Book and
pay flight
Flight Offer
[paid]
Prepare
ticket
Ticket
Flight Offer
[rejected]
Cancel
Offer
Activity partition
Action node
Initial node
Activity
final node
Decision node Merge node
Object node
Unsatisfied
Satisfied
Guard condition
A Process Example
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
System
ProcessesData Resources
35
But There is More!
But There is More!
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
data logic
transactional
lifecycle
resources
decision
logic
case
notion
Case Object
The main subject of the process
• May be a concrete or abstract object
• An order, a claim, a paper, a request, …
• Contemporary process notations: capture well only
processes with a single notion of case
• The case object is 1-1 with the start event 

(paper submission -> paper, order request -> order)
• But in reality, multiple case objects typically coexist!
• Flow of papers vs flow of reviews, flow of customer
orders vs flow of packages containing order parts, …
Task Instances
• A process model represents abstract tasks
• The concrete execution of a task on a case object
results in a task instance
• The evolution of a task instance goes through multiple
events and transitions (durative tasks)
• This is regulated by a task transactional lifecycle
Resources
Humans/devices responsible for the execution of
tasks instances
• Usually structured in an organisational model
defining roles, duties, capabilities, security levels,
…
ARIS
Organisational
structure
Data Logic
Management of the master data of the company,
including case data and data produced/consumed by
processes
• Master data are persisted inside information systems
• Processes perform CRUD operations over such data
• Processes acquire data from the external
environment
Structural Models
• Represent the structure of the domain of interest
• Capture the relevant concepts, attributes, and relationships
• Lead to the logical schema of information systems
41
Conceptual Data Models
UML
Class Diagram
ORM Schema
Decision Models
Encapsulate the decision logic that leads to infer certain
conclusions given input data
• This in turn determines how to route a case object in the
process
• May be implicitly embedded in the process, or represented
explicitly
42
DMN
Decision table
What are Models Used For?
• Understanding and communication
• Documentation and audits
• Verification and simulation
• Basis for unambiguous contracts between a company
and its customers
• Basis of IT systems supporting the daily work within the
organisation
How to best combine models and support all
these tasks is a very active area of research!
50%
data models
50%
configure/
deploy
enact/
monitor
(re)
design
IT support
reality
(knowledge)
workers
managers/
analysts
Traditional Process Enacment:
From Handmade Models to Execution
Limits of the traditional approach
Problem #1: Lack of Interaction
data models
configure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to involve all
actors in the creation of
shared models? How to
share strategic goals?
Problem #1: Lack of Interaction
models
configure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to improve such
models using data?
data
Impasse!
• (Knowledge) workers: experience the real
organisation, but locally and subjectively
• Management: have a global view of the expected
organisation, not aligned with reality
• Key, open questions:
• How to reconcile these two worlds?
• How to connect models with reality? How to take
strategic decisions based on such connection?
• How to ensure that the organisation as a whole is
going in the right direction?
Problem #2: Flexibility
BPM!
Problem #2: Flexibility
BPM?
The Issue of Flexibility
A Clinical Guideline
A Real Clinical Process
eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe
icationserverlogsusingtheheuristicsminer[4].
Exception
(complete)
187
EstabelecimentoNotFoundException
(complete)
187
0,991
152
GREJBPersistencyException
(complete)
179
0,909
159
PGWSException
(complete)
168
0,889
12
ITPTExternalServiceException
(complete)
183
0,944
162
SIPSCNoRecordsFoundException
(complete)
160
0,8
5
PessoaSingularNotFoundException
(complete)
138
0,667
3
BusinessLogicException
(complete)
183
0,75
4
SICCLException
(complete)
175
0,857
19
NaoExistemRegistosException
(complete)
143
0,833
6
RPCBusinessException
(complete)
38
0,75
3
SAFBusinessException
(complete)
115
0,8
68
GREJBBusinessException
(complete)
45
0,75
23
DESWSException
(complete)
14
0,667
14
NullPointerException
(complete)
104
0,8
91
ValidationException
(complete)
31
0,8
12
GILBusinessException
(complete)
14
0,5
6
GRServicesException
(complete)
7
0,667
3
CSIBusinessException
(complete)
14
0,5
6
ConcorrenciaException
(complete)
5
0,5
2
CSIPersistencyException
(complete)
3
0,5
2
0,857
34
ITPTServerException
(complete)
21
0,667
15
COOPException
(complete)
4
0,5
2
RSIValidationException
(complete)
25
0,667
18
BasicSystemException
(complete)
16
0,667
11
PesquisaAmbiguaException
(complete)
6
0,5
6
CPFBusinessException
(complete)
3
0,5
2
0,8
95
ADOPException
(complete)
6
0,5
5
AFBusinessException
(complete)
64
SIPSCRemoteBusinessException
(complete)
51
0,833
13
ConcurrentModificationException
(complete)
5
0,5
1
CDFBusinessException
(complete)
6
0,667
2
AssinaturaNaoIncluidaException
(complete)
1
0,5
1
SICCSException
(complete)
32
0,8
11
CartaoCidadaoException
(complete)
64
0,833
38
SOAPException
(complete)
22
0,667
14
TooManyRowsException
(complete)
112
0,667
18
SIPSCFatalException
(complete)
20
0,667
9
LimiteTemporalException
(complete)
4
0,5
2
0,8
28
SVIBusinessUserException
(complete)
18
0,75
12
GRConcurrencyException
(complete)
8
0,5
2
ContribuinteRegionalNotFoundException
(complete)
63
0,75
30
JDOFatalUserException
(complete)
124
0,947
49
0,667
5
SQLException
(complete)
9
0,667
7
IOException
(complete)
27
0,75
22
PessoaColectivaNotFoundException
(complete)
23
0,75
20
ServiceDelegateRemoteException
(complete)
3
0,5
2
0,5
5
PASException
(complete)
2
0,5
1
FileNotFoundException
(complete)
31
0,75
13
QgenMIParametrizedBusinessException
(complete)
1
0,5
1
ADOPMessageException
(complete)
3
0,5
2
LayoffException
(complete)
1
0,5
1
0,75
8
CMPException
(complete)
1
0,5
1
GREJBRemoteServiceException
(complete)
34
0,75
4
RSIPersistenceException
(complete)
24
0,75
4
CSIRemoteException
(complete)
3
0,5
1
SIPSCFatalRemoteCallException
(complete)
3
0,5
1
SIPSCDatabaseException
(complete)
1
0,5
1
BusinessException
(complete)
159
0,667
9
SVIBusinessException
(complete)
1
0,5
1
ParametrizedBusinessException
(complete)
2
0,5
2
GDServicesException
(complete)
4
0,5
3
ServerException
(complete)
132
0,75
16
PGException
(complete)
6
0,667
5
0,75
4
DESException
(complete)
135
0,667
13
0,667
2
0,75
9
SIPSCException
(complete)
27
0,75
9
ReportException
(complete)
5
0,667
2
SSNServiceException
(complete)
1
0,5
1
AFException
(complete)
1
0,5
1
InvalidNISSException
(complete)
14
0,75
4
0,75
14
GILConcurrencyException
(complete)
1
0,5
1
RSISystemException
(complete)
28
0,75
7
0,667
5
0,667
1
0,75
2
0,667
5
0,833
5
0,667
5
0,667
4
0,75
12
0,981
53
ADOPUserChoiceException
(complete)
1
0,5
1
0,667
5
RPCException
(complete)
1
0,5
1
GREJBConcurrencyException
(complete)
15
0,875
8
0,5
1
0,5
1
0,667
1
MoradaPortuguesaNotFoundException
(complete)
1
0,5
1
0,75
4
0,5
1
0,667
6
0,5
1
0,5
2
0,889
8
0,75
3
0,8
3
RSIException
(complete)
1
0,5
1
0,5
1
0,5
1
0,667
4
0,667
3
0,5
1
0,5
2
0,75
5
0,5
1
0,5
1
0,5
2
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,8
1
0,5
1
0,5
1
0,5
1
4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics
r.
A Real Clinical Process
eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe
icationserverlogsusingtheheuristicsminer[4].
Exception
(complete)
187
EstabelecimentoNotFoundException
(complete)
187
0,991
152
GREJBPersistencyException
(complete)
179
0,909
159
PGWSException
(complete)
168
0,889
12
ITPTExternalServiceException
(complete)
183
0,944
162
SIPSCNoRecordsFoundException
(complete)
160
0,8
5
PessoaSingularNotFoundException
(complete)
138
0,667
3
BusinessLogicException
(complete)
183
0,75
4
SICCLException
(complete)
175
0,857
19
NaoExistemRegistosException
(complete)
143
0,833
6
RPCBusinessException
(complete)
38
0,75
3
SAFBusinessException
(complete)
115
0,8
68
GREJBBusinessException
(complete)
45
0,75
23
DESWSException
(complete)
14
0,667
14
NullPointerException
(complete)
104
0,8
91
ValidationException
(complete)
31
0,8
12
GILBusinessException
(complete)
14
0,5
6
GRServicesException
(complete)
7
0,667
3
CSIBusinessException
(complete)
14
0,5
6
ConcorrenciaException
(complete)
5
0,5
2
CSIPersistencyException
(complete)
3
0,5
2
0,857
34
ITPTServerException
(complete)
21
0,667
15
COOPException
(complete)
4
0,5
2
RSIValidationException
(complete)
25
0,667
18
BasicSystemException
(complete)
16
0,667
11
PesquisaAmbiguaException
(complete)
6
0,5
6
CPFBusinessException
(complete)
3
0,5
2
0,8
95
ADOPException
(complete)
6
0,5
5
AFBusinessException
(complete)
64
SIPSCRemoteBusinessException
(complete)
51
0,833
13
ConcurrentModificationException
(complete)
5
0,5
1
CDFBusinessException
(complete)
6
0,667
2
AssinaturaNaoIncluidaException
(complete)
1
0,5
1
SICCSException
(complete)
32
0,8
11
CartaoCidadaoException
(complete)
64
0,833
38
SOAPException
(complete)
22
0,667
14
TooManyRowsException
(complete)
112
0,667
18
SIPSCFatalException
(complete)
20
0,667
9
LimiteTemporalException
(complete)
4
0,5
2
0,8
28
SVIBusinessUserException
(complete)
18
0,75
12
GRConcurrencyException
(complete)
8
0,5
2
ContribuinteRegionalNotFoundException
(complete)
63
0,75
30
JDOFatalUserException
(complete)
124
0,947
49
0,667
5
SQLException
(complete)
9
0,667
7
IOException
(complete)
27
0,75
22
PessoaColectivaNotFoundException
(complete)
23
0,75
20
ServiceDelegateRemoteException
(complete)
3
0,5
2
0,5
5
PASException
(complete)
2
0,5
1
FileNotFoundException
(complete)
31
0,75
13
QgenMIParametrizedBusinessException
(complete)
1
0,5
1
ADOPMessageException
(complete)
3
0,5
2
LayoffException
(complete)
1
0,5
1
0,75
8
CMPException
(complete)
1
0,5
1
GREJBRemoteServiceException
(complete)
34
0,75
4
RSIPersistenceException
(complete)
24
0,75
4
CSIRemoteException
(complete)
3
0,5
1
SIPSCFatalRemoteCallException
(complete)
3
0,5
1
SIPSCDatabaseException
(complete)
1
0,5
1
BusinessException
(complete)
159
0,667
9
SVIBusinessException
(complete)
1
0,5
1
ParametrizedBusinessException
(complete)
2
0,5
2
GDServicesException
(complete)
4
0,5
3
ServerException
(complete)
132
0,75
16
PGException
(complete)
6
0,667
5
0,75
4
DESException
(complete)
135
0,667
13
0,667
2
0,75
9
SIPSCException
(complete)
27
0,75
9
ReportException
(complete)
5
0,667
2
SSNServiceException
(complete)
1
0,5
1
AFException
(complete)
1
0,5
1
InvalidNISSException
(complete)
14
0,75
4
0,75
14
GILConcurrencyException
(complete)
1
0,5
1
RSISystemException
(complete)
28
0,75
7
0,667
5
0,667
1
0,75
2
0,667
5
0,833
5
0,667
5
0,667
4
0,75
12
0,981
53
ADOPUserChoiceException
(complete)
1
0,5
1
0,667
5
RPCException
(complete)
1
0,5
1
GREJBConcurrencyException
(complete)
15
0,875
8
0,5
1
0,5
1
0,667
1
MoradaPortuguesaNotFoundException
(complete)
1
0,5
1
0,75
4
0,5
1
0,667
6
0,5
1
0,5
2
0,889
8
0,75
3
0,8
3
RSIException
(complete)
1
0,5
1
0,5
1
0,5
1
0,667
4
0,667
3
0,5
1
0,5
2
0,75
5
0,5
1
0,5
1
0,5
2
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,8
1
0,5
1
0,5
1
0,5
1
4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics
r.
The Effect
• Processes are only partially encoded into IT
systems
• IT systems need to support a backdoor to
circumvent the encoded processes
• Otherwise, people will act “outside”, using the
system just to “record”
• No hope to improve: knowledge-intensive
processes cannot be automated in the classical
sense!
Cosa vedremo oggiAn Hazardous Attempt:
Go Without Models
Process Model vs Instance
Business Process
• Tasks
• Data schema
Process Instance
• Task instances
• Data values
• Car assembly
process
• Task: mount doors
• Frame#, 

buyer ID, 

car color
• Assembly of car 123
• Task instance #54:
mount doors on car
123
• Frame#: 123, 

buyer: Diego, 

car color: white
Processes Leave Digital
Footprints
Within organisations: event data related to process
executions are continuously stored for
• Internal management
• Calculation of process metrics/KPIs
• Legal reasons (compliance, external audits)
In addition: internally and externally, more data are stored
somewhere
• We live in a digital society!
• Social networks, sensors, cyberphysical systems, mobile
devices are data loggers
Situation 1: Explicit Event Logs
Organisation equipped with process-aware information
systems
• Supporting humans in the execution of processes (task
assignments, todo lists)
• Explicitly logging events, with info about: 

- timestamp

- event type (start, end, reassign, …)

- reference task

- reference case

- task instance id

- responsible resource

- additional attributes
Explicit Event Logs
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
the same paper, and then executed in parallel. In the case of CONFSYS, this section is
instantiated for each reviewer selected by the conference chair for the paper, and con-
sists of the following three activities: (i) a reviewer is assigned to the paper; (ii) the
reviewer produces the review; (iii) the reviewer submits the review to CONFSYS. The
multi-instance section is considered completed only when all its parallel instantiations
Event Data
Case ID ID Timestamp Activity User . . .
1
35654423 30-12-2010:11.02 create paper Pete . . .
35654424 31-12-2010:10.06 submit paper Pete . . .
35654425 05-01-2011:15.12 assign review Mike . . .
35654426 06-01-2011:11.18 submit review Sara . . .
35654428 07-01-2011:14.24 accept paper Mike . . .
35654429 06-01-2011:11.18 upload CR Pete . . .
2
35654483 30-12-2010:11.32 create paper George . . .
35654485 30-12-2010:12.12 submit paper John . . .
35654487 30-12-2010:14.16 assign review Mike . . .
35654489 16-01-2011:10.30 submit review Ellen . . .
35654490 18-01-2011:12.05 reject paper Mike . . .
50%
Situation 2: Implicit Event Logs
Organisation equipped with generic enterprise information
systems
• CRM, ERP systems to handle customers and tasks
• Legacy information systems
• Domain-specific systems
Data are stored with different formats and according to
different domain-specific schemas
• No explicit events
• Data scattered in several data sources
63
Implicit Event Logs
64
Implicit Event Logs
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Data never sleeps
50%
data
50%
enact/
monitor
IT support++
reality
(knowledge)
workers
managers/
analysts
The New Trend: No Models!
adjust
Some Famous Examples
Do we Need Models at All?
Calm down, and think…
Alcohol and Fat
It’s a relief to know the truth after all those conflicting medical
studies.

The Japanese eat very little fat and suffer fewer heart attacks than
the British or Americans.

The French eat a lot of fat and also suffer fewer heart attacks than
the British or Americans.

The Japanese drink very little red wine and suffer fewer heart
attacks than the British or Americans.

The Italians drink excessive amount of red wine and also suffer
fewer heart attacks than the British or Americans.

The Germans drink a lot of beer and eat lots of sausages and
fats and suffer fewer heart attacks than the British or Americans.
Conclusion: Eat and drink what you like. Speaking English is
apparently what kills you
Spurious Correlations
Spurious Correlations
Result
Crompton (2008): domain experts loose (too
much) time in finding data to operate and take
strategic decisions
• Engineers in the oil/gas industry: 30-70%
working time in data crawling and data
quality
Models Enable Decision Making
Humans understand reality through models
• Data alone are meaningless
• Machine learning/deep learning techniques are
unable to expose their models: no human in the
decision making loop!
• Models not only for decision making, but also to
explain and guide
Process Management Based on Facts
50%
data models
50%
configure/
deploy
diagnose/
get reqs.
enact/
monitor
(re)
design
adjust
IT support
reality
(knowledge)
workers
managers/
analysts
50% 50%
configure/
deploy
diagnose/
get reqs.
enact/
monitor
(re)
design
adjust
(knowledge)
workers
managers/
analysts
data models
IT support
reality
Process Mining: Data Science in Action
[See process mining manifesto]
1.3 Process Mining 9
Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and en-
hancement
The Three Pillars of Process Mining
elements is essential for proc
event log process model
Play-In
event logprocess model
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
event log process model
Play-In
event logprocess model
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
event log process model
Play-In
event logprocess model
Play-Out
event log process model
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
• recommendations
Play in
Play out
Replay
Play In
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
Play-in: Discovery
Event logs implicitly contain the real
process!
Making it explicit gives:
•knowledge and understanding
•ground for discussion
•possibility to act by:
•correcting issues
•compare with the designed
models (“should be” vs “as is”)
•evolve the models
•re-engineer the organisation
credits to W.M.P. van der Aalst
Discovery: Crash Course
• L’idea principale: guardare ai dati da una
prospettiva “process oriented”
Case Id = l’istanza di processo
Evento
Tempo di inizio
Tempo di fine
From Data Mining…
Credits: Anne Rozinat
…to Process Mining
Credits: Anne Rozinat
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B
C
D
E
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B C
D
E
Discovery: Idea
Case Activity
1 A
2 A
1 B
1
1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B C
D
E
Discovery in a Tool: DISCO
Demo Later
Event Log Discovered Process
Play Out In a Nutshell
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
Play-out: Simulation
Replay in a Nutshell
credits to W.M.P. van der Aalst
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
Replay: Animation
Replay: Enhancement
Replay: Enhancement
Replay: Conformance Checking
Replay: Conformance Checking
Conformance Checking
Goal: understand and
quantify the degree
of alignment between
models and reality
credits to W.M.P. van der Aalst
Conformance Checking: Idea
A
B
C
D
E
Case 1
A
B C
D
E
Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A
Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A
B
Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A
B C
Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A
B C
D
Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A
B C
D
E
3
Analisi di conformità: come funziona
A
D
C
D
E
Case 2
A
B C
D
E
Analisi di conformità: come funziona
A
D
C
D
E
Case 2
A
B C
D
E
A D
?
7
Process Repair: Beyond
Conformance Checking
Deviations are
incorporated into the
process model
Repair: Idea
A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
…….
D
?
7
C
D
E
A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
…….
D
?
7
C
D
E
A common deviation:
maybe the model is wrong/
outdated!
Repair: Idea
A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
……. C
D
E
D
D
3
Repair: Idea
Practice
115
PracticeCamunda
ERP
Signavio
Document-driven
EPCs
GSM
BPMN
CMMN
Case Management
Legacy SystemsCRM
E-R
Bizagi
Aris
UML
Artifac-Centric
SAP
Object-Centric
116
Proprietary Systems
Bonita
IEEE XES Standard
[www.xes-standard.org]
IEEE Standard for the representation of event logs
• Based on XML
• Minimal mandatory structure: 

log consists of traces, each representing the history of a case

trace consists of a list of atomic events
• Extensions to “decorate” log, trace, event with informative
attributes: timestamps, task names, transactional lifecycle,
resources, additional event data
• Supports “meta-level” declarations useful for log processors
117
118
<log xes.version="1.0"
xes.features="nested-attributes"
openxes.version="1.0RC7">
<extension name="Time"
prefix="time"
uri="http://www.xes-standard.org/time.xesext"/>
<classifier name="Event Name" keys="concept:name"/>
<string key="concept:name" value="XES Event Log"/>
...
<trace>
<string key="concept:name" value="1"/>
<event>
<string key="User" value="Pete"/>
<string key="concept:name" value="create paper"/>
<int key="Event ID" value="35654424"/>
...
</event>
<event>
...
<string key="concept:name" value="submit paper"/>
...
</event>
...
</trace>
<trace> ... </trace>
…
</log>
Full XES Schema
119
Attribute are used. In addition, the role names e-has-a, t-has-a, and t-contains-e are
used to capture the binary relations among such concepts. To restrict the usage of those
attKey: String
attType: String
Attribute
extName: String
extPrefix: String
extUri: String
Extension
attValue: String
ElementaryAttribute CompositeAttribute
{disjoint}
ca-contains-a
*
*
logFeatures: String
logVersion: String
Log Trace Event
GlobalAttribute
GlobalEventAttribute
GlobalTraceAttribute
EventClassifierTraceClassifier
name: String
Classifier
a-contains-a
*
*
** e-usedBy-a
e-usedBy-l
*
*
l-contains-t t-contains-e* **1..*
l-contains-e
*
*
* * *
***
l-has-a
t-has-a
e-has-a
l-has-gea *
1..*
l-contains-ec
1..*
*
1..*
l-contains-tc
*
ec-definedBy-gea
1..*
*
1..*
1..*
* tc-definedBy-gea
l-has-gta
*
{disjoint}
{disjoint}
Core XES Schema
120
OBDA for Log Extraction in Process Mining
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Fig. 13: Core event schema
We show now how such a simple schema can be suitably encoded in DL-LiteA.
code the core event schema of Figure 13, the three concept names Trace, Event, a
Quality of Logs
121
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and
★★★
★★
From Event Logs to XES
• Level 4-5: straightforward syntactic manipulation
• Level 3: much more difficult, due to
• Multiple data sources
• Interpretation of data
• Lack of explicit information about cases and
events
122
Traditional Extraction from
Legacy Data
123
itional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
xtraction and Process Mining
, EBITmax converted the log view into a CSV file, and analysed it usin
process mining toolkit7
.
• Manual construction of views and ETL procedures to fetch the data
• Done by IT experts, not by knowledge workers (domain experts)
• Crucial issues:
• Correctness: who knows?

Process mining is dangerous if applied on wrong data
• maintenance, evolution, change of perspective are hard…

But process mining should be highly interactive
The onprom Approach
[http://onprom.inf.unibz.it]
124
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,
Intelligent data management and conceptual modelling to:
1. Understand the data
2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs
Step 1
Understand the data
An Enterprise System
126
Information Structure
127
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
Actual Data
128
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
Actual Data: Meaning?
129
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
130
Ontology-Based
Data Access
Conference Example:
Conceptual Data Schema
131
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running exampleN.B.: in on prom we use DL-LiteA
(supports a controlled form of functionality)
132
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram
corresponds to satisfiability of the corresponding concept or role [29,7].
Example 9. We illustrate the encoding of UML class diagrams in DL-LiteA on the
Mapping Example
133
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
er
ean
fiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
of the Encoding. The encoding we have provided is faithful, in the sense
eserves in the DL-LiteA ontology the semantics of the UML class diagram.
nce, due to reification, the ontology alphabet may contain additional sym-
Primary keys are underlined and foreign keys are shown in italic
Example 10. Consider the CONFSYS running example, and an informatio
whose db schema R consists of the eight relational tables shown in Figur
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Crea
term :submission/{oid} in the target part represents a URI temp
one placeholder, {oid}, which gets replaced with the values for oid
through the source query. This mapping expresses that each value in SUB
identified by oid and such that its upload time equals the correspondin
creation time, is mapped to an object :submission/oid, which bec
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instanc
concept Paper, and instantiates also their features title and type with value
String.
SELECT ID, title, type
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
Step 2
Find the event data
Annotating the
Conceptual Data Schema
Fix perspective: declare the case
• Find the class whose instances are considered as case objects
• Express additional filters
Find the events (looking for timestamps)
• Find the classes whose instances refer to events
• Declare how they are connected to corresponding case objects
—> navigation in the UML class diagram
• Declare how they are (in)directly related to event attributes

(timestamp, task name, optionally event type and resource)

—> navigation in the UML class diagram
135
136
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
137
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
138
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
139
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
140
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Switching Perspective
Simply amounts to redefine the annotations
• Flow of accepted papers
• Flow of full papers
• Flow of reviews
• Flow of authors
• Flow of reviewers
• ….
141
Step 3
Get your log, automatically!
Formalizing Annotations
Annotations are nothing else than SPARQL queries over
the conceptual data schema!
• Case annotation: query retrieving case objects
• Event annotation: query retrieving event objects
• Case-attribute annotation: query retrieving pairs
<attribute, case>
• Event-attribute annotation: query retrieving pairs
<attribute, event>
143
144
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identifiers
occurrences of events.
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creatio
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identifiers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.
Annotations and XES
Elements
Annotations can be easily “mapped” onto XES elements:

case annotation query —> traces

event annotation query —> events

attribute annotation query —> trace/event attributes with given key

145
OBDA for Log Extraction in Process Mining
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
146
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identifiers
occurrences of events.
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creatio
XES events:

- id: ?creationEvent
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identifiers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.
XES attribute:

- key: timestamp extension
- type: milliseconds

- value: ?creationTime
- parent event: ?creationEvent
Rewriting Annotations
Annotations are nothing else than SPARQL queries
over the conceptual data schema
147
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts
148
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
ng
CRUpload Creation
chairs
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
1
NFSYS running example
nship between the event and its cor-
ted out before, the timestamp anno-
o applies to the activity annotation,
functional navigation, the activity
that independently fixes the name
additional optional attribute anno-
standard extensions provided XES,
y transactional lifecycle, as well as
urce name and/or role.
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events:

- id: ?creationEvent
OBDA for Log Extraction in Process Mining 43
ry q(c) 2 Lsql obtained from a case annotation, we insert into
OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
mapping populates the concept Trace in E with the case objects
m the answers returned by query q(c).
ry q(e) 2 Lsql that is obtained from an event annotation, we
following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
mapping populates the concept Event in E with the event objects
m the answers returned by query q(e).
OBDA for Log Extraction in Process Mining
or each SQL query q(c) 2 Lsql obtained from a case annotation, we ins
ME
P the following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
tuitively, such a mapping populates the concept Trace in E with the case
at are created from the answers returned by query q(c).
or each SQL query q(e) 2 Lsql that is obtained from an event annotati
sert into ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
tuitively, such a mapping populates the concept Event in E with the event
at are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M,
event schema E, and produces new OBDA system hI, ME
P , Ei, where the a
in L are automatically reformulated as OBDA mappings ME
P that directly l
Such mappings are synthesised using the three-step approach described nex
In the first step, the SPARQL queries formalising the annotations in L ar
lated into corresponding SQL queries posed directly over I. This is done by
standard query rewriting and unfolding, where each SPARQL query q 2 Lq
considering the contribution of the conceptual data schema T , and then unfo
the mappings in M. The resulting query qsql can then be posed directly ove
retrieve the data associated to the corresponding annotation. In the following
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the e
tation that accounts for the creation of papers. A possible reformulation of th
and unfolding of such a query respectively using the conceptual data sche
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submiss
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Recap
149
OBDA for Log Extraction in Process Mining 37
D
(database)
R
(db schema)
conforms to
M
(mapping specification)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping specification)
I (information system)
B (OBDA model)
Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy
data
• Example: get empty and nonempty traces; for nonempty traces, also fetch all
their events
Answers can be serialised into a fully compliant XES log!
150
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology
The onprom Toolchain
Implementation of all the described steps using
• Java (GUIs, algorithms)
• OWL 2 QL plus functionality (conceptual schemas)
• ontop (OBDA system)
• OpenXES (XES serialisation and manipulation)
• ProM process mining framework (environment)
151
onprom UML Editor
152
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
onprom Annotation Editor
153
OBDA for Log Extraction in Process Mining 47
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
onprom Log Extractor
154
OBDA for Log Extraction in Process Mining 49
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
Experiments
• Very encouraging initial experiments
• Carried out using synthetic data
• We are looking for real case studies!
155
Data Generation with CPN Tools
156
Results
157
Postgres
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Runningtime(inmilliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
Runningtime(inmilliseconds)
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Runningtime(inmilliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
Runningtime(inmilliseconds)
~11 mins to extract ~9M XES
components from ~3,5M tuples
158
159
So What?
Demo with Disco
[fluxicon.com]
Other tools: ProM

[http://www.promtools.org]
• The most famous
academic initiative in
process mining
• Cutting-edge process
mining algorithms are
there
• Pluggable architecture
• Dozens of plug-ins
Other Tools: Celonis
[http://www.celonis.com]
Native Process Mining on top of SAP
Conclusions
• Process Mining as a way to reconcile model-driven
management and the real behaviours
• Data preparation is an issue in presence of legacy
data
• Ontology-Based Data Access: solid theoretical
basis with optimised implementations
• onprom as an effective tool chain for extracting
event logs from legacy databases
Future Work
• Conceptual Modeling
• How to improve the discovery of events?
• How to semi-automatically proposed events to the user?
• How to integrate methodologies and results from formal
ontology?
• Engineering
• How to handle different types of data?
• How to deal with different event schemas that go beyond
XES?
• How to generalise the approach to handle rich ontology-to-
ontology-mappings?
OBDA for Log Extraction in Process Mining

Más contenido relacionado

La actualidad más candente

The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
bartlowe
 

La actualidad más candente (6)

Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data Architecture
 
IT and Business Process Modelling course at IT University of Copenhagen (Lect...
IT and Business Process Modelling course at IT University of Copenhagen (Lect...IT and Business Process Modelling course at IT University of Copenhagen (Lect...
IT and Business Process Modelling course at IT University of Copenhagen (Lect...
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
Visual and analytical mining of transactions data for production planning f...
Visual and analytical mining of transactions data  for production planning  f...Visual and analytical mining of transactions data  for production planning  f...
Visual and analytical mining of transactions data for production planning f...
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
A Novel Intelligence-based e-Procurement System to offer Maximum Fairness Ind...
A Novel Intelligence-based e-Procurement System to offer Maximum Fairness Ind...A Novel Intelligence-based e-Procurement System to offer Maximum Fairness Ind...
A Novel Intelligence-based e-Procurement System to offer Maximum Fairness Ind...
 

Similar a OBDA for Log Extraction in Process Mining

Information Architecture Profession
Information Architecture ProfessionInformation Architecture Profession
Information Architecture Profession
guestd2298c
 
Coe eim-introduction-2
Coe eim-introduction-2Coe eim-introduction-2
Coe eim-introduction-2
prakashveda
 
Coe eim-introduction-2
Coe eim-introduction-2Coe eim-introduction-2
Coe eim-introduction-2
prakashveda
 
Analyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible EnterpriseAnalyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible Enterprise
Dafna Levy
 

Similar a OBDA for Log Extraction in Process Mining (20)

Evolution of pdm plm technology &amp; value to the industry
Evolution of pdm   plm technology &amp; value to the industryEvolution of pdm   plm technology &amp; value to the industry
Evolution of pdm plm technology &amp; value to the industry
 
Information Architecture Profession
Information Architecture ProfessionInformation Architecture Profession
Information Architecture Profession
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
 
Towards a Visual Modeling Approach to Manage the Impact of Digital Transforma...
Towards a Visual Modeling Approach to Manage the Impact of Digital Transforma...Towards a Visual Modeling Approach to Manage the Impact of Digital Transforma...
Towards a Visual Modeling Approach to Manage the Impact of Digital Transforma...
 
Knowledge management
Knowledge managementKnowledge management
Knowledge management
 
Smart Process Automation - Connected Brains 2018
Smart Process Automation - Connected Brains 2018Smart Process Automation - Connected Brains 2018
Smart Process Automation - Connected Brains 2018
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Coe eim-introduction-2
Coe eim-introduction-2Coe eim-introduction-2
Coe eim-introduction-2
 
Coe eim-introduction-2
Coe eim-introduction-2Coe eim-introduction-2
Coe eim-introduction-2
 
Ontology And Taxonomy Modeling Quick Guide
Ontology And Taxonomy Modeling Quick GuideOntology And Taxonomy Modeling Quick Guide
Ontology And Taxonomy Modeling Quick Guide
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Dpbok context i
Dpbok   context iDpbok   context i
Dpbok context i
 
Business Knowledge Management Through Search
Business Knowledge Management Through SearchBusiness Knowledge Management Through Search
Business Knowledge Management Through Search
 
Analyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible EnterpriseAnalyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible Enterprise
 
Process-Oriented Business Requirements
Process-Oriented Business RequirementsProcess-Oriented Business Requirements
Process-Oriented Business Requirements
 
Plug-n-Play Knowledge Management
Plug-n-Play Knowledge ManagementPlug-n-Play Knowledge Management
Plug-n-Play Knowledge Management
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
 
EA Roadmap To Success
EA Roadmap To SuccessEA Roadmap To Success
EA Roadmap To Success
 

Más de Faculty of Computer Science - Free University of Bozen-Bolzano

Soundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic ConditionsSoundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic Conditions
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Faculty of Computer Science - Free University of Bozen-Bolzano
 
From legacy data to event data
From legacy data to event dataFrom legacy data to event data
Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)Putting Decisions in Perspective(s)

Más de Faculty of Computer Science - Free University of Bozen-Bolzano (20)

From Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two ModelsFrom Case-Isolated to Object-Centric Processes - A Tale of two Models
From Case-Isolated to Object-Centric Processes - A Tale of two Models
 
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic SettingReasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
Reasoning on Labelled Petri Nets and Their Dynamics in a Stochastic Setting
 
Constraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPMConstraints for Process Framing in Augmented BPM
Constraints for Process Framing in Augmented BPM
 
Intelligent Systems for Process Mining
Intelligent Systems for Process MiningIntelligent Systems for Process Mining
Intelligent Systems for Process Mining
 
Declarative process mining
Declarative process miningDeclarative process mining
Declarative process mining
 
Process Reasoning and Mining with Uncertainty
Process Reasoning and Mining with UncertaintyProcess Reasoning and Mining with Uncertainty
Process Reasoning and Mining with Uncertainty
 
From Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric ProcessesFrom Case-Isolated to Object-Centric Processes
From Case-Isolated to Object-Centric Processes
 
Modeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware ProcessesModeling and Reasoning over Declarative Data-Aware Processes
Modeling and Reasoning over Declarative Data-Aware Processes
 
Soundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic ConditionsSoundness of Data-Aware Processes with Arithmetic Conditions
Soundness of Data-Aware Processes with Arithmetic Conditions
 
Probabilistic Trace Alignment
Probabilistic Trace AlignmentProbabilistic Trace Alignment
Probabilistic Trace Alignment
 
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple ActorsStrategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
 
Extending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with UncertaintyExtending Temporal Business Constraints with Uncertainty
Extending Temporal Business Constraints with Uncertainty
 
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
Modeling and Reasoning over Declarative Data-Aware Processes with Object-Cent...
 
From legacy data to event data
From legacy data to event dataFrom legacy data to event data
From legacy data to event data
 
Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)Putting Decisions in Perspective(s)
Putting Decisions in Perspective(s)
 
Enriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral ConstraintsEnriching Data Models with Behavioral Constraints
Enriching Data Models with Behavioral Constraints
 
Representing and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data accessRepresenting and querying norm states using temporal ontology-based data access
Representing and querying norm states using temporal ontology-based data access
 
Compliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process modelsCompliance monitoring of multi-perspective declarative process models
Compliance monitoring of multi-perspective declarative process models
 
Processes and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wallProcesses and organizations - a look behind the paper wall
Processes and organizations - a look behind the paper wall
 

Último

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

OBDA for Log Extraction in Process Mining

  • 1. Diego Calvanese, Marco Montali, Tahir Emre Kalayci, Ario Santoso KRDB Research Centre for Knowledge and Data Faculty of Computer Science Free University of Bozen-Bolzano montali@inf.unibz.it Process Mining OBDA for Log Extraction in
  • 5. < Mobile by Calder Managing Organisations… models data ?
  • 6. Marrying processes and data
 is extremely difficult…. … but is a must 
 if we want to really understand 
 how complex dynamic systems operate. 6
  • 10. <Agenda 1. Intro to process mining 2. The problem of data preparation in process mining 3. The onprom framework: OBDA for data preparation in process mining 4. Process mining demo
  • 11. Process Mining Process Management Based on Facts Extensive credits: Wil van der Aalst (TU/e), Chiara Ghidini (FBK)
  • 12. Disclaimer • We will simplify to make the issues more apparent • Criticism has to be seen as a positive force towards improvement
  • 13. The two realities Reality 1: Managers and Analysts Reality studied, analyzed, planned through using different types of models. Decision making to improve the overall organization.
  • 14. The two realities Reality 2: Daily workers Reality experienced directly. Decision making to determine how to best handle the current situation.
  • 15. Management 
 of the organisation Daily work within the organisation Critical Dichotomy
  • 16. IT Our Goal Management 
 of the organisation Daily work within the organisation
  • 18. Model (Def.) A simplifying mapping of reality to serve a specific purpose (Stachowiak: Allgemeine Modelltheorie, 1973) • The model corresponds to the modelled object in the sense that it faithfully reproduces some fundamental aspects of such an object
  • 19. Conceptual Modeling The activity of formally describing some aspects of the physical and social world around us for the purposes of understanding and communication. (John Mylopoulos, 1992)
  • 20. Conceptual Models in Organisations A model is an abstraction of reality according to a certain conceptualization. Once represented as a concrete artifact, a model can support communication, learning and analysis about relevant aspects of the underlying domain. [. . . ] a represented model (a dusty diagram) created by an unknown predecessor is a medium to preserve and communicate a certain view of the world, and can serve as a vehicle for reasoning and problem solving, and for acquiring new knowledge (maybe having striking new ideas!) about this view of the world. (Guizzardi, 2005)
  • 21. Models as Human Mediators
  • 22. Models as IT Mediators Operational process Information System Model
  • 24. …right?Conceptual Modeling Languag Clarity: how easy the language can be stakeholders). • Graphical • The langu foundatio • The more di cult is • Less expr combinat • Abstraction: remove unnecessary
  • 25. Business Process A set of logically related tasks performed to achieve a defined business outcome for a particular customer or market. (Davenport, 1992) A collection of activities that take one or more kinds of input and create an output that is of value to the customer. (Hammer & Champy, 1993) A set of activities performed in coordination in an organizational and technical environment. These activities jointly realize a business goal. (Weske, 2011) 25
  • 26. Business Process Management A collection of 
 concepts, methods, and techniques 
 to support humans in
 modeling, administration, 
 configuration, execution, 
 analysis, and continuous improvement 
 of business processes 26
  • 27. Short History • Smith (~1750): division of labour • Taylor (~1911): scientific method applied to organisations • Hammer and Champy (~1990): processes as the basis for reengineering • 2000s: business process lifecycle, process-orientation 27
  • 28. Value Chains, Business Functions, Tasks ss Functions and Refinement into Activities y of business s follows the ion abstraction. iness functions activities.
  • 29. From tasks… AnalyseOrder SimpleCheck AdvancedCheck … to their coordination OrderManagement GetOrder CheckOrder AnalyseOrder SimpleCheck AdvancedCheck
  • 30. End-To-End, Reactive Behaviour Order-to-cash, procure-to-pay, issue-to-resolution, … 30 Receive order Check availability Article available? Ship article Financial settlement yes Procurement no Payment received Inform customer Late deliveryUndeliverable Customer informed Inform customer Article removed Remove article from catalogue Input Output
  • 31. Process Modelling LanguagesCustomerTravelAgencyAirline Flight needed Check travel agency web site Check flight offer Reject offer Book and pay flight Make flight offer Prepare ticket offer received request received Ticket received Flight paid Offer rejected Booking and payment received Offer rejection received Flight organised Offer cancelled Flight offer Flight offer [rejected] Booking and payment Ticket Pool Start event Exclusive gateway Message event End event Task Event-based gateway Data object BPMN
  • 32. Flight offer requested Make flight offer Travel Agency Flight offer sent to client Check flight offer Customer XOR Reject Offer Book and pay flight Customer Customer Offer rejected Cancel Offer Travel Agency Flight ticket needed Check travel agency website Customer Offer canceled Offer accepted Website Flight offer Flight offer [rejected] Flight offer [paid] Flight offer [cancelled] Prepare ticket Travel Agency Ticket Airline issues ticket Ticket prepared Send ticket to customer Flight organised Travel Agency Event Function Organization unit Owner Supporting system Process path Logical operation Data Ensure confortable flight Goal EPC/ARIS Process Modelling Languages
  • 33. UML Activity Diagrams Process Modelling LanguagesCustomerTravelAgency Check travel agency website Make flight offer Flight Offer Check flight offer Reject Offer Book and pay flight Flight Offer [paid] Prepare ticket Ticket Flight Offer [rejected] Cancel Offer Activity partition Action node Initial node Activity final node Decision node Merge node Object node Unsatisfied Satisfied Guard condition
  • 34. A Process Example create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged.
  • 36. But There is More! create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged. data logic transactional lifecycle resources decision logic case notion
  • 37. Case Object The main subject of the process • May be a concrete or abstract object • An order, a claim, a paper, a request, … • Contemporary process notations: capture well only processes with a single notion of case • The case object is 1-1 with the start event 
 (paper submission -> paper, order request -> order) • But in reality, multiple case objects typically coexist! • Flow of papers vs flow of reviews, flow of customer orders vs flow of packages containing order parts, …
  • 38. Task Instances • A process model represents abstract tasks • The concrete execution of a task on a case object results in a task instance • The evolution of a task instance goes through multiple events and transitions (durative tasks) • This is regulated by a task transactional lifecycle
  • 39. Resources Humans/devices responsible for the execution of tasks instances • Usually structured in an organisational model defining roles, duties, capabilities, security levels, … ARIS Organisational structure
  • 40. Data Logic Management of the master data of the company, including case data and data produced/consumed by processes • Master data are persisted inside information systems • Processes perform CRUD operations over such data • Processes acquire data from the external environment
  • 41. Structural Models • Represent the structure of the domain of interest • Capture the relevant concepts, attributes, and relationships • Lead to the logical schema of information systems 41 Conceptual Data Models UML Class Diagram ORM Schema
  • 42. Decision Models Encapsulate the decision logic that leads to infer certain conclusions given input data • This in turn determines how to route a case object in the process • May be implicitly embedded in the process, or represented explicitly 42 DMN Decision table
  • 43. What are Models Used For? • Understanding and communication • Documentation and audits • Verification and simulation • Basis for unambiguous contracts between a company and its customers • Basis of IT systems supporting the daily work within the organisation How to best combine models and support all these tasks is a very active area of research!
  • 45. Limits of the traditional approach
  • 46. Problem #1: Lack of Interaction data models configure/ deploy enact/ monitor IT support reality 50% (re) design 50% (knowledge) workers managers/ analysts ? How to involve all actors in the creation of shared models? How to share strategic goals?
  • 47. Problem #1: Lack of Interaction models configure/ deploy enact/ monitor IT support reality 50% (re) design 50% (knowledge) workers managers/ analysts ? How to improve such models using data? data
  • 48. Impasse! • (Knowledge) workers: experience the real organisation, but locally and subjectively • Management: have a global view of the expected organisation, not aligned with reality • Key, open questions: • How to reconcile these two worlds? • How to connect models with reality? How to take strategic decisions based on such connection? • How to ensure that the organisation as a whole is going in the right direction?
  • 49.
  • 52. The Issue of Flexibility
  • 54. A Real Clinical Process eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe icationserverlogsusingtheheuristicsminer[4]. Exception (complete) 187 EstabelecimentoNotFoundException (complete) 187 0,991 152 GREJBPersistencyException (complete) 179 0,909 159 PGWSException (complete) 168 0,889 12 ITPTExternalServiceException (complete) 183 0,944 162 SIPSCNoRecordsFoundException (complete) 160 0,8 5 PessoaSingularNotFoundException (complete) 138 0,667 3 BusinessLogicException (complete) 183 0,75 4 SICCLException (complete) 175 0,857 19 NaoExistemRegistosException (complete) 143 0,833 6 RPCBusinessException (complete) 38 0,75 3 SAFBusinessException (complete) 115 0,8 68 GREJBBusinessException (complete) 45 0,75 23 DESWSException (complete) 14 0,667 14 NullPointerException (complete) 104 0,8 91 ValidationException (complete) 31 0,8 12 GILBusinessException (complete) 14 0,5 6 GRServicesException (complete) 7 0,667 3 CSIBusinessException (complete) 14 0,5 6 ConcorrenciaException (complete) 5 0,5 2 CSIPersistencyException (complete) 3 0,5 2 0,857 34 ITPTServerException (complete) 21 0,667 15 COOPException (complete) 4 0,5 2 RSIValidationException (complete) 25 0,667 18 BasicSystemException (complete) 16 0,667 11 PesquisaAmbiguaException (complete) 6 0,5 6 CPFBusinessException (complete) 3 0,5 2 0,8 95 ADOPException (complete) 6 0,5 5 AFBusinessException (complete) 64 SIPSCRemoteBusinessException (complete) 51 0,833 13 ConcurrentModificationException (complete) 5 0,5 1 CDFBusinessException (complete) 6 0,667 2 AssinaturaNaoIncluidaException (complete) 1 0,5 1 SICCSException (complete) 32 0,8 11 CartaoCidadaoException (complete) 64 0,833 38 SOAPException (complete) 22 0,667 14 TooManyRowsException (complete) 112 0,667 18 SIPSCFatalException (complete) 20 0,667 9 LimiteTemporalException (complete) 4 0,5 2 0,8 28 SVIBusinessUserException (complete) 18 0,75 12 GRConcurrencyException (complete) 8 0,5 2 ContribuinteRegionalNotFoundException (complete) 63 0,75 30 JDOFatalUserException (complete) 124 0,947 49 0,667 5 SQLException (complete) 9 0,667 7 IOException (complete) 27 0,75 22 PessoaColectivaNotFoundException (complete) 23 0,75 20 ServiceDelegateRemoteException (complete) 3 0,5 2 0,5 5 PASException (complete) 2 0,5 1 FileNotFoundException (complete) 31 0,75 13 QgenMIParametrizedBusinessException (complete) 1 0,5 1 ADOPMessageException (complete) 3 0,5 2 LayoffException (complete) 1 0,5 1 0,75 8 CMPException (complete) 1 0,5 1 GREJBRemoteServiceException (complete) 34 0,75 4 RSIPersistenceException (complete) 24 0,75 4 CSIRemoteException (complete) 3 0,5 1 SIPSCFatalRemoteCallException (complete) 3 0,5 1 SIPSCDatabaseException (complete) 1 0,5 1 BusinessException (complete) 159 0,667 9 SVIBusinessException (complete) 1 0,5 1 ParametrizedBusinessException (complete) 2 0,5 2 GDServicesException (complete) 4 0,5 3 ServerException (complete) 132 0,75 16 PGException (complete) 6 0,667 5 0,75 4 DESException (complete) 135 0,667 13 0,667 2 0,75 9 SIPSCException (complete) 27 0,75 9 ReportException (complete) 5 0,667 2 SSNServiceException (complete) 1 0,5 1 AFException (complete) 1 0,5 1 InvalidNISSException (complete) 14 0,75 4 0,75 14 GILConcurrencyException (complete) 1 0,5 1 RSISystemException (complete) 28 0,75 7 0,667 5 0,667 1 0,75 2 0,667 5 0,833 5 0,667 5 0,667 4 0,75 12 0,981 53 ADOPUserChoiceException (complete) 1 0,5 1 0,667 5 RPCException (complete) 1 0,5 1 GREJBConcurrencyException (complete) 15 0,875 8 0,5 1 0,5 1 0,667 1 MoradaPortuguesaNotFoundException (complete) 1 0,5 1 0,75 4 0,5 1 0,667 6 0,5 1 0,5 2 0,889 8 0,75 3 0,8 3 RSIException (complete) 1 0,5 1 0,5 1 0,5 1 0,667 4 0,667 3 0,5 1 0,5 2 0,75 5 0,5 1 0,5 1 0,5 2 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,8 1 0,5 1 0,5 1 0,5 1 4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics r.
  • 55. A Real Clinical Process eexceptions.Figure4depictstheresultofafirstattempttoanalyzethe icationserverlogsusingtheheuristicsminer[4]. Exception (complete) 187 EstabelecimentoNotFoundException (complete) 187 0,991 152 GREJBPersistencyException (complete) 179 0,909 159 PGWSException (complete) 168 0,889 12 ITPTExternalServiceException (complete) 183 0,944 162 SIPSCNoRecordsFoundException (complete) 160 0,8 5 PessoaSingularNotFoundException (complete) 138 0,667 3 BusinessLogicException (complete) 183 0,75 4 SICCLException (complete) 175 0,857 19 NaoExistemRegistosException (complete) 143 0,833 6 RPCBusinessException (complete) 38 0,75 3 SAFBusinessException (complete) 115 0,8 68 GREJBBusinessException (complete) 45 0,75 23 DESWSException (complete) 14 0,667 14 NullPointerException (complete) 104 0,8 91 ValidationException (complete) 31 0,8 12 GILBusinessException (complete) 14 0,5 6 GRServicesException (complete) 7 0,667 3 CSIBusinessException (complete) 14 0,5 6 ConcorrenciaException (complete) 5 0,5 2 CSIPersistencyException (complete) 3 0,5 2 0,857 34 ITPTServerException (complete) 21 0,667 15 COOPException (complete) 4 0,5 2 RSIValidationException (complete) 25 0,667 18 BasicSystemException (complete) 16 0,667 11 PesquisaAmbiguaException (complete) 6 0,5 6 CPFBusinessException (complete) 3 0,5 2 0,8 95 ADOPException (complete) 6 0,5 5 AFBusinessException (complete) 64 SIPSCRemoteBusinessException (complete) 51 0,833 13 ConcurrentModificationException (complete) 5 0,5 1 CDFBusinessException (complete) 6 0,667 2 AssinaturaNaoIncluidaException (complete) 1 0,5 1 SICCSException (complete) 32 0,8 11 CartaoCidadaoException (complete) 64 0,833 38 SOAPException (complete) 22 0,667 14 TooManyRowsException (complete) 112 0,667 18 SIPSCFatalException (complete) 20 0,667 9 LimiteTemporalException (complete) 4 0,5 2 0,8 28 SVIBusinessUserException (complete) 18 0,75 12 GRConcurrencyException (complete) 8 0,5 2 ContribuinteRegionalNotFoundException (complete) 63 0,75 30 JDOFatalUserException (complete) 124 0,947 49 0,667 5 SQLException (complete) 9 0,667 7 IOException (complete) 27 0,75 22 PessoaColectivaNotFoundException (complete) 23 0,75 20 ServiceDelegateRemoteException (complete) 3 0,5 2 0,5 5 PASException (complete) 2 0,5 1 FileNotFoundException (complete) 31 0,75 13 QgenMIParametrizedBusinessException (complete) 1 0,5 1 ADOPMessageException (complete) 3 0,5 2 LayoffException (complete) 1 0,5 1 0,75 8 CMPException (complete) 1 0,5 1 GREJBRemoteServiceException (complete) 34 0,75 4 RSIPersistenceException (complete) 24 0,75 4 CSIRemoteException (complete) 3 0,5 1 SIPSCFatalRemoteCallException (complete) 3 0,5 1 SIPSCDatabaseException (complete) 1 0,5 1 BusinessException (complete) 159 0,667 9 SVIBusinessException (complete) 1 0,5 1 ParametrizedBusinessException (complete) 2 0,5 2 GDServicesException (complete) 4 0,5 3 ServerException (complete) 132 0,75 16 PGException (complete) 6 0,667 5 0,75 4 DESException (complete) 135 0,667 13 0,667 2 0,75 9 SIPSCException (complete) 27 0,75 9 ReportException (complete) 5 0,667 2 SSNServiceException (complete) 1 0,5 1 AFException (complete) 1 0,5 1 InvalidNISSException (complete) 14 0,75 4 0,75 14 GILConcurrencyException (complete) 1 0,5 1 RSISystemException (complete) 28 0,75 7 0,667 5 0,667 1 0,75 2 0,667 5 0,833 5 0,667 5 0,667 4 0,75 12 0,981 53 ADOPUserChoiceException (complete) 1 0,5 1 0,667 5 RPCException (complete) 1 0,5 1 GREJBConcurrencyException (complete) 15 0,875 8 0,5 1 0,5 1 0,667 1 MoradaPortuguesaNotFoundException (complete) 1 0,5 1 0,75 4 0,5 1 0,667 6 0,5 1 0,5 2 0,889 8 0,75 3 0,8 3 RSIException (complete) 1 0,5 1 0,5 1 0,5 1 0,667 4 0,667 3 0,5 1 0,5 2 0,75 5 0,5 1 0,5 1 0,5 2 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,5 1 0,8 1 0,5 1 0,5 1 0,5 1 4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics r.
  • 56. The Effect • Processes are only partially encoded into IT systems • IT systems need to support a backdoor to circumvent the encoded processes • Otherwise, people will act “outside”, using the system just to “record” • No hope to improve: knowledge-intensive processes cannot be automated in the classical sense!
  • 57. Cosa vedremo oggiAn Hazardous Attempt: Go Without Models
  • 58. Process Model vs Instance Business Process • Tasks • Data schema Process Instance • Task instances • Data values • Car assembly process • Task: mount doors • Frame#, 
 buyer ID, 
 car color • Assembly of car 123 • Task instance #54: mount doors on car 123 • Frame#: 123, 
 buyer: Diego, 
 car color: white
  • 59. Processes Leave Digital Footprints Within organisations: event data related to process executions are continuously stored for • Internal management • Calculation of process metrics/KPIs • Legal reasons (compliance, external audits) In addition: internally and externally, more data are stored somewhere • We live in a digital society! • Social networks, sensors, cyberphysical systems, mobile devices are data loggers
  • 60. Situation 1: Explicit Event Logs Organisation equipped with process-aware information systems • Supporting humans in the execution of processes (task assignments, todo lists) • Explicitly logging events, with info about: 
 - timestamp
 - event type (start, end, reassign, …)
 - reference task
 - reference case
 - task instance id
 - responsible resource
 - additional attributes
  • 61. Explicit Event Logs create paper author submit paper author assign reviewer chair review paper reviewer submit review reviewer take decision chair accept? accept paper chair reject paper chair upload camera ready author Y N Fig. 2: The process for managing papers in a simplified conference submission system; gray tasks are external to the conference information system and cannot be logged. Example 1. As a running example, we consider a simplified conference submission system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au- thors, reviewers, and conference chairs in the submission of papers to conferences, the consequent review process, and the final decision about paper acceptance or rejection. Figure 2 shows the process control flow considering papers as case objects. Under this perspective, the management of a single paper evolves through the following execution steps. First, the paper is created by one of its authors, and submitted to a conference available in the system. Once the paper is submitted, the review phase for that paper starts. This phase of the process consists of a so-called multi-instance section, i.e., a section of the process where the same set of activities is instantiated multiple times on the same paper, and then executed in parallel. In the case of CONFSYS, this section is instantiated for each reviewer selected by the conference chair for the paper, and con- sists of the following three activities: (i) a reviewer is assigned to the paper; (ii) the reviewer produces the review; (iii) the reviewer submits the review to CONFSYS. The multi-instance section is considered completed only when all its parallel instantiations Event Data Case ID ID Timestamp Activity User . . . 1 35654423 30-12-2010:11.02 create paper Pete . . . 35654424 31-12-2010:10.06 submit paper Pete . . . 35654425 05-01-2011:15.12 assign review Mike . . . 35654426 06-01-2011:11.18 submit review Sara . . . 35654428 07-01-2011:14.24 accept paper Mike . . . 35654429 06-01-2011:11.18 upload CR Pete . . . 2 35654483 30-12-2010:11.32 create paper George . . . 35654485 30-12-2010:12.12 submit paper John . . . 35654487 30-12-2010:14.16 assign review Mike . . . 35654489 16-01-2011:10.30 submit review Ellen . . . 35654490 18-01-2011:12.05 reject paper Mike . . . 50%
  • 62. Situation 2: Implicit Event Logs Organisation equipped with generic enterprise information systems • CRM, ERP systems to handle customers and tasks • Legacy information systems • Domain-specific systems Data are stored with different formats and according to different domain-specific schemas • No explicit events • Data scattered in several data sources
  • 64. 64 Implicit Event Logs ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic
  • 67.
  • 69. Do we Need Models at All?
  • 70. Calm down, and think…
  • 71. Alcohol and Fat It’s a relief to know the truth after all those conflicting medical studies.
 The Japanese eat very little fat and suffer fewer heart attacks than the British or Americans.
 The French eat a lot of fat and also suffer fewer heart attacks than the British or Americans.
 The Japanese drink very little red wine and suffer fewer heart attacks than the British or Americans.
 The Italians drink excessive amount of red wine and also suffer fewer heart attacks than the British or Americans.
 The Germans drink a lot of beer and eat lots of sausages and fats and suffer fewer heart attacks than the British or Americans. Conclusion: Eat and drink what you like. Speaking English is apparently what kills you
  • 74. Result Crompton (2008): domain experts loose (too much) time in finding data to operate and take strategic decisions • Engineers in the oil/gas industry: 30-70% working time in data crawling and data quality
  • 75. Models Enable Decision Making Humans understand reality through models • Data alone are meaningless • Machine learning/deep learning techniques are unable to expose their models: no human in the decision making loop! • Models not only for decision making, but also to explain and guide
  • 79. Process Mining: Data Science in Action [See process mining manifesto] 1.3 Process Mining 9 Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and en- hancement
  • 80. The Three Pillars of Process Mining elements is essential for proc event log process model Play-In event logprocess model Play-Out Replay • extended model showing times, frequencies, etc. • diagnostics event log process model Play-In event logprocess model Play-Out Replay • extended model showing times, frequencies, etc. • diagnostics • predictions event log process model Play-In event logprocess model Play-Out event log process model Replay • extended model showing times, frequencies, etc. • diagnostics • predictions • recommendations Play in Play out Replay
  • 81. Play In register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  • 82. Play-in: Discovery Event logs implicitly contain the real process! Making it explicit gives: •knowledge and understanding •ground for discussion •possibility to act by: •correcting issues •compare with the designed models (“should be” vs “as is”) •evolve the models •re-engineer the organisation credits to W.M.P. van der Aalst
  • 83. Discovery: Crash Course • L’idea principale: guardare ai dati da una prospettiva “process oriented” Case Id = l’istanza di processo Evento Tempo di inizio Tempo di fine
  • 86. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E
  • 87. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1
  • 88. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2
  • 89. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B
  • 90. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  • 91. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  • 92. Discovery: Idea Case Activity 1 A 2 A 1 B 1
1 C 3 A 2 C 3 B 2 B 1 D 2 D 2 E 3 C 3 D 1 E 3 D 3 E A B C D E Case 1 A C B D E Case 2 A C D D E Case 3 B A B C D E
  • 93. Discovery in a Tool: DISCO Demo Later Event Log Discovered Process
  • 94. Play Out In a Nutshell register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  • 96. Replay in a Nutshell credits to W.M.P. van der Aalst register travel request (a) get detailed motivation letter (c) get support from local manager (b) check budget by finance (d) decide (e) accept request (g) reject request (h) reinitiate request (f) start end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
  • 102. Conformance Checking Goal: understand and quantify the degree of alignment between models and reality credits to W.M.P. van der Aalst
  • 104. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A
  • 105. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B
  • 106. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C
  • 107. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C D
  • 108. Analisi di conformità: come funziona A B C D E Case 1 A B C D E A B C D E 3
  • 109. Analisi di conformità: come funziona A D C D E Case 2 A B C D E
  • 110. Analisi di conformità: come funziona A D C D E Case 2 A B C D E A D ? 7
  • 111. Process Repair: Beyond Conformance Checking Deviations are incorporated into the process model
  • 112. Repair: Idea A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. D ? 7 C D E
  • 113. A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. D ? 7 C D E A common deviation: maybe the model is wrong/ outdated! Repair: Idea
  • 114. A D C D E Case 3 A B C D E AA D C D E Case 1 A D C D E Case 2 A D C D E Case 4 A D C D E Case k ……. C D E D D 3 Repair: Idea
  • 117. IEEE XES Standard [www.xes-standard.org] IEEE Standard for the representation of event logs • Based on XML • Minimal mandatory structure: 
 log consists of traces, each representing the history of a case
 trace consists of a list of atomic events • Extensions to “decorate” log, trace, event with informative attributes: timestamps, task names, transactional lifecycle, resources, additional event data • Supports “meta-level” declarations useful for log processors 117
  • 118. 118 <log xes.version="1.0" xes.features="nested-attributes" openxes.version="1.0RC7"> <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/> <classifier name="Event Name" keys="concept:name"/> <string key="concept:name" value="XES Event Log"/> ... <trace> <string key="concept:name" value="1"/> <event> <string key="User" value="Pete"/> <string key="concept:name" value="create paper"/> <int key="Event ID" value="35654424"/> ... </event> <event> ... <string key="concept:name" value="submit paper"/> ... </event> ... </trace> <trace> ... </trace> … </log>
  • 119. Full XES Schema 119 Attribute are used. In addition, the role names e-has-a, t-has-a, and t-contains-e are used to capture the binary relations among such concepts. To restrict the usage of those attKey: String attType: String Attribute extName: String extPrefix: String extUri: String Extension attValue: String ElementaryAttribute CompositeAttribute {disjoint} ca-contains-a * * logFeatures: String logVersion: String Log Trace Event GlobalAttribute GlobalEventAttribute GlobalTraceAttribute EventClassifierTraceClassifier name: String Classifier a-contains-a * * ** e-usedBy-a e-usedBy-l * * l-contains-t t-contains-e* **1..* l-contains-e * * * * * *** l-has-a t-has-a e-has-a l-has-gea * 1..* l-contains-ec 1..* * 1..* l-contains-tc * ec-definedBy-gea 1..* * 1..* 1..* * tc-definedBy-gea l-has-gta * {disjoint} {disjoint}
  • 120. Core XES Schema 120 OBDA for Log Extraction in Process Mining Attribute attKey: String attType: String attValue: String EventTrace e-has-at-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..* Fig. 13: Core event schema We show now how such a simple schema can be suitably encoded in DL-LiteA. code the core event schema of Figure 13, the three concept names Trace, Event, a
  • 121. Quality of Logs 121 Level Characterization Examples ★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy and complete) and events are well-defined. Events are recorded in an automatic, systematic, reliable, and safe manner. Privacy and security considerations are addressed adequately. Moreover, the events recorded (and all of their attributes) have clear semantics. This implies the existence of one or more ontologies. Events and their attributes point to this ontology. Semantically annotated logs of BPM systems. ★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable manner, i.e., logs are trustworthy and complete. Unlike the systems operating at level , notions such as process instance (case) and activity are supported in an explicit manner. Events logs of traditional BPM/ workflow systems. ★ ★ ★ Events are recorded automatically, but no systematic approach is followed to record events. However, unlike logs at level , there is some level of guarantee that the events recorded match reality (i.e., the event log is trustworthy but not necessarily complete). Consider, for example, the events recorded by an ERP system. Although events need to be extracted from a variety of tables, the information can be assumed to be correct (e.g., it is safe to assume that a payment recorded by the ERP actually exists and vice versa). Tables in ERP systems, event logs of CRM systems, transaction logs of messaging systems, event logs of high-tech systems, etc. ★ ★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and ★★★ ★★
  • 122. From Event Logs to XES • Level 4-5: straightforward syntactic manipulation • Level 3: much more difficult, due to • Multiple data sources • Interpretation of data • Lack of explicit information about cases and events 122
  • 123. Traditional Extraction from Legacy Data 123 itional Methodology Create data model Choose per- spective Extract relevant tables Design views with relevant attributes Design composite views Design log view Export to XES/CSV Do process mining Other perspective? Y N xtraction and Process Mining , EBITmax converted the log view into a CSV file, and analysed it usin process mining toolkit7 . • Manual construction of views and ETL procedures to fetch the data • Done by IT experts, not by knowledge workers (domain experts) • Crucial issues: • Correctness: who knows?
 Process mining is dangerous if applied on wrong data • maintenance, evolution, change of perspective are hard…
 But process mining should be highly interactive
  • 124. The onprom Approach [http://onprom.inf.unibz.it] 124 34 D. Calvanese et al. high-level IS? Create conceptual data schema Create mappings Bootstrap model + mappings Enrich model + mappings Choose perspective Create event-data annotations Get XES/CSV Do process mining Other perspective? N Y Y N Fig. 12: The onprom methodology and its four phases the same time generating (identity) mappings to link the two specifications. The result of bootstrapping can then be manually refined. Once the first phase is completed, process analysts and the other involved stake- holders do not need anymore to consider the structure of the legacy information system, Intelligent data management and conceptual modelling to: 1. Understand the data 2. Access the data using the domain vocabulary 3. Express the perspective for process mining using the domain vocabulary 4. Automatise the extraction of XES event logs
  • 127. Information Structure 127 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  • 128. Actual Data 128 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  • 129. Actual Data: Meaning? 129 ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status REVIEW ID RRid submissiontime REVIEWREQUEST ID invitationtime reviewer paper Fig. 11: DB schema for the information system of the conference submission system. Primary keys are underlined and foreign keys are shown in italic Intuitively, mapping assertions involving such atoms are used to map source relations (and the tuples they store), to concepts, roles, and features of the ontology (and the ob- jects and the values that constitute their instances), respectively. Note that for a feature atom, the type of values retrieved from the source database is not specified, and needs to be determined based on the data type of the variable v2 in the source query (~x).
  • 131. Conference Example: Conceptual Data Schema 131 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running exampleN.B.: in on prom we use DL-LiteA (supports a controlled form of functionality)
  • 132. 132 (title) ⌘ Paper ⇢(title) v string (funct title) (type) ⌘ Paper ⇢(type) v string (funct type) (decTime) ⌘ DecidedPaper ⇢(decTime) v ts (funct decTime) (accepted) ⌘ DecidedPaper ⇢(accepted) v boolean (funct accepted) (pName) ⌘ Person ⇢(pName) v string (funct pName) (regTime) ⌘ Person ⇢(regTime) v ts (funct regTime) (cName) ⌘ Conference ⇢(cName) v string (funct cName) (crTime) ⌘ Conference ⇢(crTime) v ts (funct crTime) (uploadTime) ⌘ Submission ⇢(uploadTime) v ts (funct uploadTime) (invTime) ⌘ Assignment ⇢(invTime) v ts (funct invTime) (subTime) ⌘ Review ⇢(subTime) v ts (funct subTime) DecidedPaper v Paper Creation v Submission CRUpload v Submission 9Submission1 ⌘ Submission 9Submission1 ⌘ Paper (funct Submission1) 9Submission2 ⌘ Submission 9Submission2 v Person (funct Submission2) 9Assignment1 ⌘ Assignment 9Assignment1 v Paper (funct Assignment1) 9Assignment2 ⌘ Assignment 9Assignment2 v Person (funct Assignment2) 9leadsTo v Assignment 9leadsTo ⌘ Review (funct leadsTo) (funct leadsTo ) 9submittedTo ⌘ Paper 9submittedTo v Conference (funct submittedTo) 9notifiedBy ⌘ DecidedPaper 9notifiedBy v Person (funct notifiedBy) 9chairs v Person 9chairs ⌘ Conference (funct chairs ) OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example Correctness of the Encoding. The encoding we have provided is faithful, in the sense that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram. Obviously, since, due to reification, the ontology alphabet may contain additional sym- bols with respect to those used in the UML class diagram, the two specifications cannot have the same logical models. However, it is possible to show that the logical models of a UML class diagram and those of the DL-LiteA ontology derived from it correspond to each other, and hence that satisfiability of a class or association in the UML diagram corresponds to satisfiability of the corresponding concept or role [29,7]. Example 9. We illustrate the encoding of UML class diagrams in DL-LiteA on the
  • 133. Mapping Example 133 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation er ean fiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example of the Encoding. The encoding we have provided is faithful, in the sense eserves in the DL-LiteA ontology the semantics of the UML class diagram. nce, due to reification, the ontology alphabet may contain additional sym- Primary keys are underlined and foreign keys are shown in italic Example 10. Consider the CONFSYS running example, and an informatio whose db schema R consists of the eight relational tables shown in Figur give some examples of mapping assertions: – The following mapping assertion explicitly populates the concept Crea term :submission/{oid} in the target part represents a URI temp one placeholder, {oid}, which gets replaced with the values for oid through the source query. This mapping expresses that each value in SUB identified by oid and such that its upload time equals the correspondin creation time, is mapped to an object :submission/oid, which bec instance of concept Creation in T . SELECT DISTINCT SUBMISSION.ID AS oid FROM SUBMISSION, PAPER WHERE SUBMISSION.PAPER = PAPER.ID AND SUBMISSION.UPLOADTIME = PAPER.CT :submission/{oid} rdf:type :Creation . – The following mapping assertion retrieves from the PAPER table instanc concept Paper, and instantiates also their features title and type with value String. SELECT ID, title, type ACCEPTANCE ID uploadtime user paper CONFERENCE ID name organizer time DECISION ID decisiontime chair outcome LOGIN ID user CT SUBMISSION ID uploadtime user paper PAPER ID title CT user conf type status
  • 134. Step 2 Find the event data
  • 135. Annotating the Conceptual Data Schema Fix perspective: declare the case • Find the class whose instances are considered as case objects • Express additional filters Find the events (looking for timestamps) • Find the classes whose instances refer to events • Declare how they are connected to corresponding case objects —> navigation in the UML class diagram • Declare how they are (in)directly related to event attributes
 (timestamp, task name, optionally event type and resource)
 —> navigation in the UML class diagram 135
  • 136. 136 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 137. 137 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 138. 138 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 139. 139 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 140. 140 OBDA for Log Extraction in Process Mining 25 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs * * * 1..* * 1 1 0..1 * 1 1 * Fig. 9: Data model of our CONFSYS running example OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 * OBDA for Log Extraction in Process Mining 39 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation DecidedPaper decTime: ts accepted: boolean notifiedBy Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs CaseCase Event Submission Timestamp: uploadTime Case: Submission1 Event Submission Timestamp: uploadTime Case: Submission1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Review Timestamp: subTime Case: leadsTo !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Decision Timestamp: decTime Case: Paper Event Decision Timestamp: decTime Case: Paper * * * 1..* * 1 1 0..1 * 1 1 *
  • 141. Switching Perspective Simply amounts to redefine the annotations • Flow of accepted papers • Flow of full papers • Flow of reviews • Flow of authors • Flow of reviewers • …. 141
  • 142. Step 3 Get your log, automatically!
  • 143. Formalizing Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema! • Case annotation: query retrieving case objects • Event annotation: query retrieving event objects • Case-attribute annotation: query retrieving pairs <attribute, case> • Event-attribute annotation: query retrieving pairs <attribute, event> 143
  • 144. 144 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Time !Assignment1 Time !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* 1 1 0..1 * 1 1 * Annotated data model of our CONFSYS running example ively used to capture the relationship between the event and its cor- timestamp, and activity. As pointed out before, the timestamp anno- a functional navigation. This also applies to the activity annotation, ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT qu swer variable, this time matching with actual event identifiers occurrences of events. Example 14. Consider the event annotation for creation, as sh actual events for this annotation are retrieved using the following PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT q variables, establishing a relation between events and their corre ues. In this light, for timestamp and activity attribute annotatio variable will be substituted by corresponding values for timestam case attribute annotations, instead, the second answer variable case objects, thus establishing a relationship between events an long to. Example 15. Consider again the annotation for creation events, The relationship between creation events and their correspondin lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creatio vent annotations are also tackled using SPARQL SELECT queries with a single an- wer variable, this time matching with actual event identifiers, i.e., objects denoting ccurrences of events. xample 14. Consider the event annotation for creation, as shown in Figure 16. The ctual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } hich in fact returns all instances of the Creation class. ttribute annotations are formalised using SPARQL SELECT queries with two answer ariables, establishing a relation between events and their corresponding attribute val- es. In this light, for timestamp and activity attribute annotations, the second answer ariable will be substituted by corresponding values for timestamps/activity names. For ase attribute annotations, instead, the second answer variable will be substituted by ase objects, thus establishing a relationship between events and the case(s) they be- ong to. xample 15. Consider again the annotation for creation events, as shown in Figure 16. he relationship between creation events and their corresponding timestamps is estab- shed by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } hich indeed retrieves all instances of Creation, together with the corresponding values ken by the uploadTime attribute.
  • 145. Annotations and XES Elements Annotations can be easily “mapped” onto XES elements:
 case annotation query —> traces
 event annotation query —> events
 attribute annotation query —> trace/event attributes with given key
 145 OBDA for Log Extraction in Process Mining Attribute attKey: String attType: String attValue: String EventTrace e-has-at-has-a t-contains-e 0..* 0..* 0..* 0..* 1..* 0..*
  • 146. 146 Paper title : String type : String Person pName : String regTime: ts Assignment invTime: ts Submission uploadTime: ts CRUpload Creation Review subTime: ts leadsTo Conference cName: String crTime: ts submittedTo chairs Time !Assignment1 Time !Assignment1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 * * * 1..* 1 1 0..1 * 1 1 * Annotated data model of our CONFSYS running example ively used to capture the relationship between the event and its cor- timestamp, and activity. As pointed out before, the timestamp anno- a functional navigation. This also applies to the activity annotation, ?case rdf:type :Paper . } which retrieves all instances of the Paper class. Event annotations are also tackled using SPARQL SELECT qu swer variable, this time matching with actual event identifiers occurrences of events. Example 14. Consider the event annotation for creation, as sh actual events for this annotation are retrieved using the following PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT q variables, establishing a relation between events and their corre ues. In this light, for timestamp and activity attribute annotatio variable will be substituted by corresponding values for timestam case attribute annotations, instead, the second answer variable case objects, thus establishing a relationship between events an long to. Example 15. Consider again the annotation for creation events, The relationship between creation events and their correspondin lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creatio XES events:
 - id: ?creationEvent vent annotations are also tackled using SPARQL SELECT queries with a single an- wer variable, this time matching with actual event identifiers, i.e., objects denoting ccurrences of events. xample 14. Consider the event annotation for creation, as shown in Figure 16. The ctual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } hich in fact returns all instances of the Creation class. ttribute annotations are formalised using SPARQL SELECT queries with two answer ariables, establishing a relation between events and their corresponding attribute val- es. In this light, for timestamp and activity attribute annotations, the second answer ariable will be substituted by corresponding values for timestamps/activity names. For ase attribute annotations, instead, the second answer variable will be substituted by ase objects, thus establishing a relationship between events and the case(s) they be- ong to. xample 15. Consider again the annotation for creation events, as shown in Figure 16. he relationship between creation events and their corresponding timestamps is estab- shed by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } hich indeed retrieves all instances of Creation, together with the corresponding values ken by the uploadTime attribute. XES attribute:
 - key: timestamp extension - type: milliseconds
 - value: ?creationTime - parent event: ?creationEvent
  • 147. Rewriting Annotations Annotations are nothing else than SPARQL queries over the conceptual data schema 147 They can be automatically reformulated as SQL queries over the legacy data We automatically get a standard OBDA mapping from the legacy data to the XES concepts
  • 148. 148 In the first step, the SPARQL queries formalising the annotations in L are reformu- lated into corresponding SQL queries posed directly over I. This is done by relying on standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten considering the contribution of the conceptual data schema T , and then unfolded using the mappings in M. The resulting query qsql can then be posed directly over I so as to retrieve the data associated to the corresponding annotation. In the following, we denote the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the event anno- tation that accounts for the creation of papers. A possible reformulation of the rewriting and unfolding of such a query respectively using the conceptual data schema in Fig- ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submission."ID") AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND Submission."UploadTime" = Paper."CT" AND Submission."ID" IS NOT NULL This query is generated by the ontop OBDA system, which applies various optimisa- tions so as to obtain a final SQL query that is not only correct, but also possibly compact and fast to process by a standard DBMS. One such optimisations is the application of ng CRUpload Creation chairs Event Creation Timestamp: uploadTime Case: Submission!Submission1 Event Creation Timestamp: uploadTime Case: Submission!Submission1 1 NFSYS running example nship between the event and its cor- ted out before, the timestamp anno- o applies to the activity annotation, functional navigation, the activity that independently fixes the name additional optional attribute anno- standard extensions provided XES, y transactional lifecycle, as well as urce name and/or role. occurrences of events. Example 14. Consider the event annotation for creation, as shown in Figure 16. The actual events for this annotation are retrieved using the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent WHERE { ?creationEvent rdf:type :Creation . } which in fact returns all instances of the Creation class. Attribute annotations are formalised using SPARQL SELECT queries with two answer variables, establishing a relation between events and their corresponding attribute val- ues. In this light, for timestamp and activity attribute annotations, the second answer variable will be substituted by corresponding values for timestamps/activity names. For case attribute annotations, instead, the second answer variable will be substituted by case objects, thus establishing a relationship between events and the case(s) they be- long to. Example 15. Consider again the annotation for creation events, as shown in Figure 16. The relationship between creation events and their corresponding timestamps is estab- lished by the following query: PREFIX : <http://www.example.com/> SELECT DISTINCT ?creationEvent ?creationTime WHERE { ?creationEvent rdf:type :Creation . ?creationEvent :Submission1 ?Paper . ?creationEvent :uploadTime ?creationTime . } which indeed retrieves all instances of Creation, together with the corresponding values taken by the uploadTime attribute. XES events:
 - id: ?creationEvent OBDA for Log Extraction in Process Mining 43 ry q(c) 2 Lsql obtained from a case annotation, we insert into OBDA mapping: q(c) :trace/{c} rdf:type :Trace . mapping populates the concept Trace in E with the case objects m the answers returned by query q(c). ry q(e) 2 Lsql that is obtained from an event annotation, we following OBDA mapping: q(e) :event/{e} rdf:type :Event . mapping populates the concept Event in E with the event objects m the answers returned by query q(e). OBDA for Log Extraction in Process Mining or each SQL query q(c) 2 Lsql obtained from a case annotation, we ins ME P the following OBDA mapping: q(c) :trace/{c} rdf:type :Trace . tuitively, such a mapping populates the concept Trace in E with the case at are created from the answers returned by query q(c). or each SQL query q(e) 2 Lsql that is obtained from an event annotati sert into ME P the following OBDA mapping: q(e) :event/{e} rdf:type :Event . tuitively, such a mapping populates the concept Event in E with the event at are created from the answers returned by query q(e). as a XES event log, and also to actually materialise such an event log. Technically, onprom takes as input an onprom model P = hI, T , M, event schema E, and produces new OBDA system hI, ME P , Ei, where the a in L are automatically reformulated as OBDA mappings ME P that directly l Such mappings are synthesised using the three-step approach described nex In the first step, the SPARQL queries formalising the annotations in L ar lated into corresponding SQL queries posed directly over I. This is done by standard query rewriting and unfolding, where each SPARQL query q 2 Lq considering the contribution of the conceptual data schema T , and then unfo the mappings in M. The resulting query qsql can then be posed directly ove retrieve the data associated to the corresponding annotation. In the following the set of all so-obtained SQL queries as Lsql. Example 16. Consider the SPARQL query in Example 13, formalising the e tation that accounts for the creation of papers. A possible reformulation of th and unfolding of such a query respectively using the conceptual data sche ure 9, and the mappings from Example 10, is the following SQL query: SELECT DISTINCT CONCAT(’http://www.example.com/submission/’,Submiss AS "creationEvent" FROM Submission, Paper WHERE Submission."Paper" = Paper."ID" AND
  • 149. Recap 149 OBDA for Log Extraction in Process Mining 37 D (database) R (db schema) conforms to M (mapping specification) T (conceptual data schema) L (event-data annotations) P (onprom model) E (conceptual event schema) annotates points to ME P (log mapping specification) I (information system) B (OBDA model)
  • 150. Querying the “Virtual Log” SPARQL queries over the event schema are answered using legacy data • Example: get empty and nonempty traces; for nonempty traces, also fetch all their events Answers can be serialised into a fully compliant XES log! 150 name. The following query is instead meant to retrieve (elementary) attributes, considering in particular their key, type, and value. PREFIX : <http://www.example.org/> SELECT DISTINCT ?att ?attType ?attKey ?attValue WHERE { ?att rdf:type :Attribute; :attType ?attType; :attKey ?attKey; :attVal ?attValue. } The following query handles the retrieval of empty and nonempty traces, simulta- neously obtaining, for nonempty traces, their constitutive events: PREFIX : <http://www.example.org/> SELECT DISTINCT ?trace ?event WHERE { ?trace a :Trace . OPTIONAL { ?trace :t-contain-e ?event . ?event :e-contain-a ?timestamp . ?timestamp :attKey "time:timestamp"ˆˆxsd:string . ?event :e-contain-a ?name . ?name :attKey "concept:name"ˆˆxsd:string . } } 4.6 The onprom Toolchain onprom comes with a toolchain that supports the various phases of the methodology
  • 151. The onprom Toolchain Implementation of all the described steps using • Java (GUIs, algorithms) • OWL 2 QL plus functionality (conceptual schemas) • ontop (OBDA system) • OpenXES (XES serialisation and manipulation) • ProM process mining framework (environment) 151
  • 152. onprom UML Editor 152 46 D. Calvanese et al. Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
  • 153. onprom Annotation Editor 153 OBDA for Log Extraction in Process Mining 47 Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
  • 154. onprom Log Extractor 154 OBDA for Log Extraction in Process Mining 49 Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
  • 155. Experiments • Very encouraging initial experiments • Carried out using synthetic data • We are looking for real case studies! 155
  • 156. Data Generation with CPN Tools 156
  • 157. Results 157 Postgres 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Runningtime(inmilliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database # Tuple(s) in the whole database Runningtime(inmilliseconds) 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Extracted Components in XES log Runningtime(inmilliseconds) 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 # Tuple(s) in the whole database # Tuple(s) in the whole database Runningtime(inmilliseconds) ~11 mins to extract ~9M XES components from ~3,5M tuples
  • 158. 158
  • 161. Other tools: ProM
 [http://www.promtools.org] • The most famous academic initiative in process mining • Cutting-edge process mining algorithms are there • Pluggable architecture • Dozens of plug-ins
  • 163. Conclusions • Process Mining as a way to reconcile model-driven management and the real behaviours • Data preparation is an issue in presence of legacy data • Ontology-Based Data Access: solid theoretical basis with optimised implementations • onprom as an effective tool chain for extracting event logs from legacy databases
  • 164. Future Work • Conceptual Modeling • How to improve the discovery of events? • How to semi-automatically proposed events to the user? • How to integrate methodologies and results from formal ontology? • Engineering • How to handle different types of data? • How to deal with different event schemas that go beyond XES? • How to generalise the approach to handle rich ontology-to- ontology-mappings?