OBDA for Log Extraction in Process Mining

Diego Calvanese, Marco Montali, Tahir Emre Kalayci, Ario Santoso
KRDB Research Centre for Knowledge and Data
Faculty of Computer Science
Free University of Bozen-Bolzano
montali@inf.unibz.it
Process Mining
OBDA
for
Log Extraction
in

<Managing Organisations…
Mobile by Calder

models
managers/
analysts
Mobile by Calder

data
(knowledge) 
workers
Mobile by Calder
models
managers/
analysts

<
Mobile by Calder
Managing Organisations…
models data
?

Marrying processes and data 
is extremely difﬁcult….
… but is a must  
if we want to really understand  
how complex dynamic systems operate.
6

Our Approach
7
Business Process
Management
Data
Management
Conceptual
Modeling
Formal
Methods
Artiﬁcial
Intelligence

Our Research
8
Theory
Practice

Our Research
9
Theory
Practice

<Agenda
1. Intro to process mining
2. The problem of data
preparation in process
mining
3. The onprom framework:
OBDA for data preparation
in process mining
4. Process mining demo

Process Mining
Process Management Based on Facts
Extensive credits: Wil van der Aalst (TU/e), Chiara Ghidini (FBK)

Disclaimer
• We will simplify to make the issues more apparent
• Criticism has to be seen as a positive force towards
improvement

The two realities
Reality 1: Managers and Analysts
Reality studied, analyzed, planned through using different types of models.
Decision making to improve the overall organization.

The two realities
Reality 2: Daily workers
Reality experienced directly.
Decision making to determine how to best handle the current situation.

Management  
of the organisation
Daily work
within the organisation
Critical Dichotomy

IT
Our Goal
Management  
of the organisation
Daily work
within the organisation

The Traditional
Model-Driven Approach

Model (Def.)
A simplifying mapping of reality to serve a
speciﬁc purpose
(Stachowiak: Allgemeine Modelltheorie, 1973)
• The model corresponds to the modelled object in
the sense that it faithfully reproduces some
fundamental aspects of such an object

Conceptual Modeling
The activity of formally describing some aspects of the
physical and social world around us for the purposes of
understanding and communication.
(John Mylopoulos, 1992)

Conceptual Models in Organisations
A model is an abstraction of reality according to a
certain conceptualization. Once represented as a
concrete artifact, a model can support
communication, learning and analysis about
relevant aspects of the underlying domain. [. . . ] a
represented model (a dusty diagram) created by
an unknown predecessor is a medium to preserve
and communicate a certain view of the world, and
can serve as a vehicle for reasoning and problem
solving, and for acquiring new knowledge (maybe
having striking new ideas!) about this view of the
world.
(Guizzardi, 2005)

Models as IT Mediators
Operational process
Information
System
Model

…right?Conceptual Modeling Languag
Clarity: how easy the language can be
stakeholders).
• Graphical
• The langu
foundatio
• The more
di cult is
• Less expr
combinat
• Abstraction: remove unnecessary

Business Process
A set of logically related tasks performed to achieve a deﬁned
business outcome for a particular customer or market.
(Davenport, 1992)
A collection of activities that take one or more kinds of input
and create an output that is of value to the customer.
(Hammer & Champy, 1993)
A set of activities performed in coordination in an
organizational and technical environment. These activities
jointly realize a business goal.
(Weske, 2011)
25

Business Process
Management
A collection of  
concepts, methods, and techniques  
to support humans in 
modeling, administration,  
conﬁguration, execution,  
analysis, and continuous improvement  
of business processes
26

Short History
• Smith (~1750): division of labour
• Taylor (~1911): scientiﬁc method
applied to organisations
• Hammer and Champy (~1990):
processes as the basis for
reengineering
• 2000s: business process
lifecycle, process-orientation
27

Value Chains, Business Functions, Tasks
ss Functions and Reﬁnement into Activities
y of business
s follows the
ion abstraction.
iness functions
activities.

From tasks…
AnalyseOrder
SimpleCheck
AdvancedCheck
… to their coordination
OrderManagement
GetOrder CheckOrder
AnalyseOrder SimpleCheck AdvancedCheck

End-To-End, Reactive Behaviour
Order-to-cash, procure-to-pay, issue-to-resolution, …
30
Receive
order
Check
availability
Article available?
Ship article
Financial
settlement
yes
Procurement
no
Payment
received
Inform
customer
Late deliveryUndeliverable
Customer
informed
Inform
customer
Article
removed
Remove
article from
catalogue
Input Output

Process Modelling LanguagesCustomerTravelAgencyAirline
Flight needed
Check travel
agency web site
Check flight offer
Reject offer
Book and pay
flight
Make flight offer
Prepare ticket
offer received
request received
Ticket received
Flight paid
Offer rejected
Booking and
payment received
Offer rejection
received
Flight organised
Offer cancelled
Flight offer
Flight offer
[rejected]
Booking and payment
Ticket
Pool
Start event
Exclusive
gateway
Message
event
End event
Task
Event-based
gateway
Data object
BPMN

Flight offer
requested
Make flight offer
Travel Agency
Flight offer sent
to client
Check flight offer
Customer
XOR
Reject Offer
Book and pay flight
Customer
Customer
Offer rejected Cancel Offer
Travel Agency
Flight ticket
needed
Check travel
agency website
Customer
Offer canceled
Offer accepted
Website Flight offer
Flight offer
[rejected]
Flight offer [paid]
Flight offer
[cancelled]
Prepare ticket
Travel Agency Ticket
Airline issues ticket
Ticket prepared
Send ticket to
customer
Flight organised
Travel Agency
Event
Function
Organization unit
Owner
Supporting
system
Process path
Logical operation
Data
Ensure
confortable flight
Goal
EPC/ARIS
Process Modelling Languages

UML
Activity Diagrams
Process Modelling LanguagesCustomerTravelAgency
Check travel
agency website
Make flight
offer
Flight Offer
Check flight
offer
Reject Offer
Book and
pay flight
Flight Offer
[paid]
Prepare
ticket
Ticket
Flight Offer
[rejected]
Cancel
Offer
Activity partition
Action node
Initial node
Activity
final node
Decision node Merge node
Object node
Unsatisfied
Satisfied
Guard condition

A Process Example
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simpliﬁed conference submission system;
gray tasks are external to the conference information system and cannot be logged.

System
ProcessesData Resources
35
But There is More!

But There is More!
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
data logic
transactional
lifecycle
resources
decision
logic
case
notion

Case Object
The main subject of the process
• May be a concrete or abstract object
• An order, a claim, a paper, a request, …
• Contemporary process notations: capture well only
processes with a single notion of case
• The case object is 1-1 with the start event  
(paper submission -> paper, order request -> order)
• But in reality, multiple case objects typically coexist!
• Flow of papers vs flow of reviews, flow of customer
orders vs flow of packages containing order parts, …

Task Instances
• A process model represents abstract tasks
• The concrete execution of a task on a case object
results in a task instance
• The evolution of a task instance goes through multiple
events and transitions (durative tasks)
• This is regulated by a task transactional lifecycle

Resources
Humans/devices responsible for the execution of
tasks instances
• Usually structured in an organisational model
deﬁning roles, duties, capabilities, security levels,
…
ARIS
Organisational
structure

Data Logic
Management of the master data of the company,
including case data and data produced/consumed by
processes
• Master data are persisted inside information systems
• Processes perform CRUD operations over such data
• Processes acquire data from the external
environment

Structural Models
• Represent the structure of the domain of interest
• Capture the relevant concepts, attributes, and relationships
• Lead to the logical schema of information systems
41
Conceptual Data Models
UML
Class Diagram
ORM Schema

Decision Models
Encapsulate the decision logic that leads to infer certain
conclusions given input data
• This in turn determines how to route a case object in the
process
• May be implicitly embedded in the process, or represented
explicitly
42
DMN
Decision table

What are Models Used For?
• Understanding and communication
• Documentation and audits
• Veriﬁcation and simulation
• Basis for unambiguous contracts between a company
and its customers
• Basis of IT systems supporting the daily work within the
organisation
How to best combine models and support all
these tasks is a very active area of research!

50%
data models
50%
conﬁgure/
deploy
enact/
monitor
(re)
design
IT support
reality
(knowledge)
workers
managers/
analysts
Traditional Process Enacment:
From Handmade Models to Execution

Limits of the traditional approach

Problem #1: Lack of Interaction
data models
conﬁgure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to involve all
actors in the creation of
shared models? How to
share strategic goals?

Problem #1: Lack of Interaction
models
conﬁgure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to improve such
models using data?
data

Impasse!
• (Knowledge) workers: experience the real
organisation, but locally and subjectively
• Management: have a global view of the expected
organisation, not aligned with reality
• Key, open questions:
• How to reconcile these two worlds?
• How to connect models with reality? How to take
strategic decisions based on such connection?
• How to ensure that the organisation as a whole is
going in the right direction?

A Real Clinical Process
eexceptions.Figure4depictstheresultofaﬁrstattempttoanalyzethe
icationserverlogsusingtheheuristicsminer[4].
Exception
(complete)
187
EstabelecimentoNotFoundException
(complete)
187
0,991
152
GREJBPersistencyException
(complete)
179
0,909
159
PGWSException
(complete)
168
0,889
12
ITPTExternalServiceException
(complete)
183
0,944
162
SIPSCNoRecordsFoundException
(complete)
160
0,8
5
PessoaSingularNotFoundException
(complete)
138
0,667
3
BusinessLogicException
(complete)
183
0,75
4
SICCLException
(complete)
175
0,857
19
NaoExistemRegistosException
(complete)
143
0,833
6
RPCBusinessException
(complete)
38
0,75
3
SAFBusinessException
(complete)
115
0,8
68
GREJBBusinessException
(complete)
45
0,75
23
DESWSException
(complete)
14
0,667
14
NullPointerException
(complete)
104
0,8
91
ValidationException
(complete)
31
0,8
12
GILBusinessException
(complete)
14
0,5
6
GRServicesException
(complete)
7
0,667
3
CSIBusinessException
(complete)
14
0,5
6
ConcorrenciaException
(complete)
5
0,5
2
CSIPersistencyException
(complete)
3
0,5
2
0,857
34
ITPTServerException
(complete)
21
0,667
15
COOPException
(complete)
4
0,5
2
RSIValidationException
(complete)
25
0,667
18
BasicSystemException
(complete)
16
0,667
11
PesquisaAmbiguaException
(complete)
6
0,5
6
CPFBusinessException
(complete)
3
0,5
2
0,8
95
ADOPException
(complete)
6
0,5
5
AFBusinessException
(complete)
64
SIPSCRemoteBusinessException
(complete)
51
0,833
13
ConcurrentModificationException
(complete)
5
0,5
1
CDFBusinessException
(complete)
6
0,667
2
AssinaturaNaoIncluidaException
(complete)
1
0,5
1
SICCSException
(complete)
32
0,8
11
CartaoCidadaoException
(complete)
64
0,833
38
SOAPException
(complete)
22
0,667
14
TooManyRowsException
(complete)
112
0,667
18
SIPSCFatalException
(complete)
20
0,667
9
LimiteTemporalException
(complete)
4
0,5
2
0,8
28
SVIBusinessUserException
(complete)
18
0,75
12
GRConcurrencyException
(complete)
8
0,5
2
ContribuinteRegionalNotFoundException
(complete)
63
0,75
30
JDOFatalUserException
(complete)
124
0,947
49
0,667
5
SQLException
(complete)
9
0,667
7
IOException
(complete)
27
0,75
22
PessoaColectivaNotFoundException
(complete)
23
0,75
20
ServiceDelegateRemoteException
(complete)
3
0,5
2
0,5
5
PASException
(complete)
2
0,5
1
FileNotFoundException
(complete)
31
0,75
13
QgenMIParametrizedBusinessException
(complete)
1
0,5
1
ADOPMessageException
(complete)
3
0,5
2
LayoffException
(complete)
1
0,5
1
0,75
8
CMPException
(complete)
1
0,5
1
GREJBRemoteServiceException
(complete)
34
0,75
4
RSIPersistenceException
(complete)
24
0,75
4
CSIRemoteException
(complete)
3
0,5
1
SIPSCFatalRemoteCallException
(complete)
3
0,5
1
SIPSCDatabaseException
(complete)
1
0,5
1
BusinessException
(complete)
159
0,667
9
SVIBusinessException
(complete)
1
0,5
1
ParametrizedBusinessException
(complete)
2
0,5
2
GDServicesException
(complete)
4
0,5
3
ServerException
(complete)
132
0,75
16
PGException
(complete)
6
0,667
5
0,75
4
DESException
(complete)
135
0,667
13
0,667
2
0,75
9
SIPSCException
(complete)
27
0,75
9
ReportException
(complete)
5
0,667
2
SSNServiceException
(complete)
1
0,5
1
AFException
(complete)
1
0,5
1
InvalidNISSException
(complete)
14
0,75
4
0,75
14
GILConcurrencyException
(complete)
1
0,5
1
RSISystemException
(complete)
28
0,75
7
0,667
5
0,667
1
0,75
2
0,667
5
0,833
5
0,667
5
0,667
4
0,75
12
0,981
53
ADOPUserChoiceException
(complete)
1
0,5
1
0,667
5
RPCException
(complete)
1
0,5
1
GREJBConcurrencyException
(complete)
15
0,875
8
0,5
1
0,5
1
0,667
1
MoradaPortuguesaNotFoundException
(complete)
1
0,5
1
0,75
4
0,5
1
0,667
6
0,5
1
0,5
2
0,889
8
0,75
3
0,8
3
RSIException
(complete)
1
0,5
1
0,5
1
0,5
1
0,667
4
0,667
3
0,5
1
0,5
2
0,75
5
0,5
1
0,5
1
0,5
2
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,5
1
0,8
1
0,5
1
0,5
1
0,5
1
4.Spaghettimodelobtainedfromtheapplicationserverlogsusingtheheuristics
r.

The Effect
• Processes are only partially encoded into IT
systems
• IT systems need to support a backdoor to
circumvent the encoded processes
• Otherwise, people will act “outside”, using the
system just to “record”
• No hope to improve: knowledge-intensive
processes cannot be automated in the classical
sense!

Cosa vedremo oggiAn Hazardous Attempt:
Go Without Models

Process Model vs Instance
Business Process
• Tasks
• Data schema
Process Instance
• Task instances
• Data values
• Car assembly
process
• Task: mount doors
• Frame#,  
buyer ID,  
car color
• Assembly of car 123
• Task instance #54:
mount doors on car
123
• Frame#: 123,  
buyer: Diego,  
car color: white

Processes Leave Digital
Footprints
Within organisations: event data related to process
executions are continuously stored for
• Internal management
• Calculation of process metrics/KPIs
• Legal reasons (compliance, external audits)
In addition: internally and externally, more data are stored
somewhere
• We live in a digital society!
• Social networks, sensors, cyberphysical systems, mobile
devices are data loggers

Situation 1: Explicit Event Logs
Organisation equipped with process-aware information
systems
• Supporting humans in the execution of processes (task
assignments, todo lists)
• Explicitly logging events, with info about:  
- timestamp 
- event type (start, end, reassign, …) 
- reference task 
- reference case 
- task instance id 
- responsible resource 
- additional attributes

Explicit Event Logs
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
the same paper, and then executed in parallel. In the case of CONFSYS, this section is
instantiated for each reviewer selected by the conference chair for the paper, and con-
sists of the following three activities: (i) a reviewer is assigned to the paper; (ii) the
reviewer produces the review; (iii) the reviewer submits the review to CONFSYS. The
multi-instance section is considered completed only when all its parallel instantiations
Event Data
Case ID ID Timestamp Activity User . . .
1
35654423 30-12-2010:11.02 create paper Pete . . .
35654424 31-12-2010:10.06 submit paper Pete . . .
35654425 05-01-2011:15.12 assign review Mike . . .
35654426 06-01-2011:11.18 submit review Sara . . .
35654428 07-01-2011:14.24 accept paper Mike . . .
35654429 06-01-2011:11.18 upload CR Pete . . .
2
35654483 30-12-2010:11.32 create paper George . . .
35654485 30-12-2010:12.12 submit paper John . . .
35654487 30-12-2010:14.16 assign review Mike . . .
35654489 16-01-2011:10.30 submit review Ellen . . .
35654490 18-01-2011:12.05 reject paper Mike . . .
50%

Situation 2: Implicit Event Logs
Organisation equipped with generic enterprise information
systems
• CRM, ERP systems to handle customers and tasks
• Legacy information systems
• Domain-speciﬁc systems
Data are stored with different formats and according to
different domain-speciﬁc schemas
• No explicit events
• Data scattered in several data sources

64
Implicit Event Logs
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic

50%
data
50%
enact/
monitor
IT support++
reality
(knowledge)
workers
managers/
analysts
The New Trend: No Models!
adjust

Alcohol and Fat
It’s a relief to know the truth after all those conﬂicting medical
studies. 
The Japanese eat very little fat and suffer fewer heart attacks than
the British or Americans. 
The French eat a lot of fat and also suffer fewer heart attacks than
the British or Americans. 
The Japanese drink very little red wine and suffer fewer heart
attacks than the British or Americans. 
The Italians drink excessive amount of red wine and also suffer
fewer heart attacks than the British or Americans. 
The Germans drink a lot of beer and eat lots of sausages and
fats and suffer fewer heart attacks than the British or Americans.
Conclusion: Eat and drink what you like. Speaking English is
apparently what kills you

Result
Crompton (2008): domain experts loose (too
much) time in ﬁnding data to operate and take
strategic decisions
• Engineers in the oil/gas industry: 30-70%
working time in data crawling and data
quality

Models Enable Decision Making
Humans understand reality through models
• Data alone are meaningless
• Machine learning/deep learning techniques are
unable to expose their models: no human in the
decision making loop!
• Models not only for decision making, but also to
explain and guide

Process Management Based on Facts

50%
data models
50%
conﬁgure/
deploy
diagnose/
get reqs.
enact/
monitor
(re)
design
adjust
IT support
reality
(knowledge)
workers
managers/
analysts

50% 50%
conﬁgure/
deploy
diagnose/
get reqs.
enact/
monitor
(re)
design
adjust
(knowledge)
workers
managers/
analysts
data models
IT support
reality

Process Mining: Data Science in Action
[See process mining manifesto]
1.3 Process Mining 9
Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and en-
hancement

The Three Pillars of Process Mining
elements is essential for proc
event log process model
Play-In
event logprocess model
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
Play-In
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
Play-In
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
• recommendations
Play in
Play out
Replay

Play In
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary

Play-in: Discovery
Event logs implicitly contain the real
process!
Making it explicit gives:
•knowledge and understanding
•ground for discussion
•possibility to act by:
•correcting issues
•compare with the designed
models (“should be” vs “as is”)
•evolve the models
•re-engineer the organisation
credits to W.M.P. van der Aalst

Discovery: Crash Course
• L’idea principale: guardare ai dati da una
prospettiva “process oriented”
Case Id = l’istanza di processo
Evento
Tempo di inizio
Tempo di ﬁne

From Data Mining…
Credits: Anne Rozinat

…to Process Mining
Credits: Anne Rozinat

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B
C
D
E

Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B C
D
E

Discovery in a Tool: DISCO
Demo Later
Event Log Discovered Process

Play Out In a Nutshell
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
432 decide (e) 19-3-2014:9.36 Sue

Replay in a Nutshell
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
432 decide (e) 19-3-2014:9.36 Sue

Conformance Checking
Goal: understand and
quantify the degree
of alignment between
models and reality

Conformance Checking: Idea
A
B
C
D
E
Case 1
A
B C
D
E

Analisi di conformità: come funziona
A
B
C
D
E
Case 1
A
B C
D
E
A

A
B
C
D
E
Case 1
A
B C
D
E
A
B

A
B
C
D
E
Case 1
A
B C
D
E
A
B C

A
B
C
D
E
Case 1
A
B C
D
E
A
B C
D

A
B
C
D
E
Case 1
A
B C
D
E
A
B C
D
E
3

A
D
C
D
E
Case 2
A
B C
D
E

A
D
C
D
E
Case 2
A
B C
D
E
A D
?
7

Process Repair: Beyond
Conformance Checking
Deviations are
incorporated into the
process model

Repair: Idea
A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
…….
D
?
7
C
D
E

A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
…….
D
?
7
C
D
E
A common deviation:
maybe the model is wrong/
outdated!
Repair: Idea

A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
……. C
D
E
D
D
3
Repair: Idea

PracticeCamunda
ERP
Signavio
Document-driven
EPCs
GSM
BPMN
CMMN
Case Management
Legacy SystemsCRM
E-R
Bizagi
Aris
UML
Artifac-Centric
SAP
Object-Centric
116
Proprietary Systems
Bonita

IEEE XES Standard
[www.xes-standard.org]
IEEE Standard for the representation of event logs
• Based on XML
• Minimal mandatory structure:  
log consists of traces, each representing the history of a case 
trace consists of a list of atomic events
• Extensions to “decorate” log, trace, event with informative
attributes: timestamps, task names, transactional lifecycle,
resources, additional event data
• Supports “meta-level” declarations useful for log processors
117

118
<log xes.version="1.0"
xes.features="nested-attributes"
openxes.version="1.0RC7">
<extension name="Time"
prefix="time"
uri="http://www.xes-standard.org/time.xesext"/>
<classifier name="Event Name" keys="concept:name"/>
<string key="concept:name" value="XES Event Log"/>
...
<trace>
<string key="concept:name" value="1"/>
<event>
<string key="User" value="Pete"/>
<string key="concept:name" value="create paper"/>
<int key="Event ID" value="35654424"/>
...
</event>
<event>
...
<string key="concept:name" value="submit paper"/>
...
</event>
...
</trace>
<trace> ... </trace>
…
</log>

Full XES Schema
119
Attribute are used. In addition, the role names e-has-a, t-has-a, and t-contains-e are
used to capture the binary relations among such concepts. To restrict the usage of those
attKey: String
attType: String
Attribute
extName: String
extPrefix: String
extUri: String
Extension
attValue: String
ElementaryAttribute CompositeAttribute
{disjoint}
ca-contains-a
*
*
logFeatures: String
logVersion: String
Log Trace Event
GlobalAttribute
GlobalEventAttribute
GlobalTraceAttribute
EventClassifierTraceClassifier
name: String
Classifier
a-contains-a
*
*
** e-usedBy-a
e-usedBy-l
*
*
l-contains-t t-contains-e* **1..*
l-contains-e
*
*
* * *
***
l-has-a
t-has-a
e-has-a
l-has-gea *
1..*
l-contains-ec
1..*
*
1..*
l-contains-tc
*
ec-definedBy-gea
1..*
*
1..*
1..*
* tc-definedBy-gea
l-has-gta
*
{disjoint}
{disjoint}

Core XES Schema
120
OBDA for Log Extraction in Process Mining
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Fig. 13: Core event schema
We show now how such a simple schema can be suitably encoded in DL-LiteA.
code the core event schema of Figure 13, the three concept names Trace, Event, a

Quality of Logs
121
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and
★★★
★★

From Event Logs to XES
• Level 4-5: straightforward syntactic manipulation
• Level 3: much more difﬁcult, due to
• Multiple data sources
• Interpretation of data
• Lack of explicit information about cases and
events
122

Traditional Extraction from
Legacy Data
123
itional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
xtraction and Process Mining
, EBITmax converted the log view into a CSV ﬁle, and analysed it usin
process mining toolkit7
.
• Manual construction of views and ETL procedures to fetch the data
• Done by IT experts, not by knowledge workers (domain experts)
• Crucial issues:
• Correctness: who knows? 
Process mining is dangerous if applied on wrong data
• maintenance, evolution, change of perspective are hard… 
But process mining should be highly interactive

The onprom Approach
[http://onprom.inf.unibz.it]
124
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,
Intelligent data management and conceptual modelling to:
1. Understand the data
2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs

Information Structure
127
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER
REVIEW
REVIEWREQUEST
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not speciﬁed, and needs
to be determined based on the data type of the variable v2 in the source query (~x).

Actual Data
128
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER
REVIEW
REVIEWREQUEST

Actual Data: Meaning?
129
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER
REVIEW
REVIEWREQUEST

130
Ontology-Based
Data Access

Conference Example:
Conceptual Data Schema
131
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notiﬁedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running exampleN.B.: in on prom we use DL-LiteA
(supports a controlled form of functionality)

132
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram
corresponds to satisfiability of the corresponding concept or role [29,7].
Example 9. We illustrate the encoding of UML class diagrams in DL-LiteA on the

Mapping Example
133
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
er
ean
fiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
1
1
0..1
* 1
1
*
of the Encoding. The encoding we have provided is faithful, in the sense
eserves in the DL-LiteA ontology the semantics of the UML class diagram.
nce, due to reification, the ontology alphabet may contain additional sym-
Example 10. Consider the CONFSYS running example, and an informatio
whose db schema R consists of the eight relational tables shown in Figur
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Crea
term :submission/{oid} in the target part represents a URI temp
one placeholder, {oid}, which gets replaced with the values for oid
through the source query. This mapping expresses that each value in SUB
identified by oid and such that its upload time equals the correspondin
creation time, is mapped to an object :submission/oid, which bec
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instanc
concept Paper, and instantiates also their features title and type with value
String.
SELECT ID, title, type
ACCEPTANCE
CONFERENCE
DECISION
LOGIN
ID user CT
SUBMISSION
PAPER

Annotating the
Conceptual Data Schema
Fix perspective: declare the case
• Find the class whose instances are considered as case objects
• Express additional ﬁlters
Find the events (looking for timestamps)
• Find the classes whose instances refer to events
• Declare how they are connected to corresponding case objects
—> navigation in the UML class diagram
• Declare how they are (in)directly related to event attributes 
(timestamp, task name, optionally event type and resource) 
—> navigation in the UML class diagram
135

136
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notiﬁedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notiﬁedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Event Creation
Case: Submission!Submission1
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

137
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

138
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

139
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

140
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Case: Submission1
Event Submission
Case: Submission1
Event Review
Timestamp: subTime
Event Review
Timestamp: subTime
Event Creation
Event Creation
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*

Switching Perspective
Simply amounts to redeﬁne the annotations
• Flow of accepted papers
• Flow of full papers
• Flow of reviews
• Flow of authors
• Flow of reviewers
• ….
141

Step 3
Get your log, automatically!

Formalizing Annotations
Annotations are nothing else than SPARQL queries over
the conceptual data schema!
• Case annotation: query retrieving case objects
• Event annotation: query retrieving event objects
• Case-attribute annotation: query retrieving pairs
<attribute, case>
• Event-attribute annotation: query retrieving pairs
<attribute, event>
143

144
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Event Creation
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identiﬁers
occurrences of events.
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
lished by the following query:
SELECT DISTINCT ?creationEvent ?creatio
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identiﬁers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
WHERE {
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.

Annotations and XES
Elements
Annotations can be easily “mapped” onto XES elements: 
case annotation query —> traces 
event annotation query —> events 
attribute annotation query —> trace/event attributes with given key 
145
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*

146
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Event Creation
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identiﬁers
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
WHERE {
}
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
SELECT DISTINCT ?creationEvent ?creatio
XES events: 
- id: ?creationEvent
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identiﬁers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
WHERE {
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
WHERE {
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.
XES attribute: 
- key: timestamp extension
- type: milliseconds 
- value: ?creationTime
- parent event: ?creationEvent

Rewriting Annotations
Annotations are nothing else than SPARQL queries
over the conceptual data schema
147
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts

148
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
ng
CRUpload Creation
chairs
Event Creation
Event Creation
1
NFSYS running example
nship between the event and its cor-
ted out before, the timestamp anno-
o applies to the activity annotation,
functional navigation, the activity
that independently fixes the name
additional optional attribute anno-
standard extensions provided XES,
y transactional lifecycle, as well as
urce name and/or role.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
WHERE {
}
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
WHERE {
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events: 
- id: ?creationEvent
ry q(c) 2 Lsql obtained from a case annotation, we insert into
OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
mapping populates the concept Trace in E with the case objects
m the answers returned by query q(c).
ry q(e) 2 Lsql that is obtained from an event annotation, we
following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
mapping populates the concept Event in E with the event objects
m the answers returned by query q(e).
or each SQL query q(c) 2 Lsql obtained from a case annotation, we ins
ME
P the following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
tuitively, such a mapping populates the concept Trace in E with the case
at are created from the answers returned by query q(c).
or each SQL query q(e) 2 Lsql that is obtained from an event annotati
sert into ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
tuitively, such a mapping populates the concept Event in E with the event
at are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M,
event schema E, and produces new OBDA system hI, ME
P , Ei, where the a
in L are automatically reformulated as OBDA mappings ME
P that directly l
Such mappings are synthesised using the three-step approach described nex
In the first step, the SPARQL queries formalising the annotations in L ar
lated into corresponding SQL queries posed directly over I. This is done by
standard query rewriting and unfolding, where each SPARQL query q 2 Lq
considering the contribution of the conceptual data schema T , and then unfo
the mappings in M. The resulting query qsql can then be posed directly ove
retrieve the data associated to the corresponding annotation. In the following
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the e
tation that accounts for the creation of papers. A possible reformulation of th
and unfolding of such a query respectively using the conceptual data sche
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submiss
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND

Recap
149
D
(database)
R
(db schema)
conforms to
M
(mapping speciﬁcation)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping speciﬁcation)
I (information system)
B (OBDA model)

Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy
data
• Example: get empty and nonempty traces; for nonempty traces, also fetch all
their events
Answers can be serialised into a fully compliant XES log!
150
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology

The onprom Toolchain
Implementation of all the described steps using
• Java (GUIs, algorithms)
• OWL 2 QL plus functionality (conceptual schemas)
• ontop (OBDA system)
• OpenXES (XES serialisation and manipulation)
• ProM process mining framework (environment)
151

onprom UML Editor
152
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our

onprom Annotation Editor
153
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case

onprom Log Extractor
154
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.

Experiments
• Very encouraging initial experiments
• Carried out using synthetic data
• We are looking for real case studies!
155

Data Generation with CPN Tools
156

Results
157
Postgres
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
Runningtime(inmilliseconds)
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 10,000,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Extracted Components in XES log
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
# Tuple(s) in the
whole database
# Tuple(s) in the whole database
~11 mins to extract ~9M XES
components from ~3,5M tuples

Demo with Disco
[ﬂuxicon.com]

Other tools: ProM 
[http://www.promtools.org]
• The most famous
academic initiative in
process mining
• Cutting-edge process
mining algorithms are
there
• Pluggable architecture
• Dozens of plug-ins

Other Tools: Celonis
[http://www.celonis.com]
Native Process Mining on top of SAP

Conclusions
• Process Mining as a way to reconcile model-driven
management and the real behaviours
• Data preparation is an issue in presence of legacy
data
• Ontology-Based Data Access: solid theoretical
basis with optimised implementations
• onprom as an effective tool chain for extracting
event logs from legacy databases

Future Work
• Conceptual Modeling
• How to improve the discovery of events?
• How to semi-automatically proposed events to the user?
• How to integrate methodologies and results from formal
ontology?
• Engineering
• How to handle different types of data?
• How to deal with different event schemas that go beyond
XES?
• How to generalise the approach to handle rich ontology-to-
ontology-mappings?

OBDA for Log Extraction in Process Mining

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (6)

Similar a OBDA for Log Extraction in Process Mining

Similar a OBDA for Log Extraction in Process Mining (20)

Más de Faculty of Computer Science - Free University of Bozen-Bolzano

Más de Faculty of Computer Science - Free University of Bozen-Bolzano (20)

Último

Último (20)

OBDA for Log Extraction in Process Mining