1. Diego Calvanese, Marco Montali, Tahir Emre Kalayci, Ario Santoso
KRDB Research Centre for Knowledge and Data
Faculty of Computer Science
Free University of Bozen-Bolzano
montali@inf.unibz.it
Process Mining
OBDA
for
Log Extraction
in
10. <Agenda
1. Intro to process mining
2. The problem of data
preparation in process
mining
3. The onprom framework:
OBDA for data preparation
in process mining
4. Process mining demo
12. Disclaimer
• We will simplify to make the issues more apparent
• Criticism has to be seen as a positive force towards
improvement
13. The two realities
Reality 1: Managers and Analysts
Reality studied, analyzed, planned through using different types of models.
Decision making to improve the overall organization.
14. The two realities
Reality 2: Daily workers
Reality experienced directly.
Decision making to determine how to best handle the current situation.
15. Management
of the organisation
Daily work
within the organisation
Critical Dichotomy
18. Model (Def.)
A simplifying mapping of reality to serve a
specific purpose
(Stachowiak: Allgemeine Modelltheorie, 1973)
• The model corresponds to the modelled object in
the sense that it faithfully reproduces some
fundamental aspects of such an object
19. Conceptual Modeling
The activity of formally describing some aspects of the
physical and social world around us for the purposes of
understanding and communication.
(John Mylopoulos, 1992)
20. Conceptual Models in Organisations
A model is an abstraction of reality according to a
certain conceptualization. Once represented as a
concrete artifact, a model can support
communication, learning and analysis about
relevant aspects of the underlying domain. [. . . ] a
represented model (a dusty diagram) created by
an unknown predecessor is a medium to preserve
and communicate a certain view of the world, and
can serve as a vehicle for reasoning and problem
solving, and for acquiring new knowledge (maybe
having striking new ideas!) about this view of the
world.
(Guizzardi, 2005)
24. …right?Conceptual Modeling Languag
Clarity: how easy the language can be
stakeholders).
• Graphical
• The langu
foundatio
• The more
di cult is
• Less expr
combinat
• Abstraction: remove unnecessary
25. Business Process
A set of logically related tasks performed to achieve a defined
business outcome for a particular customer or market.
(Davenport, 1992)
A collection of activities that take one or more kinds of input
and create an output that is of value to the customer.
(Hammer & Champy, 1993)
A set of activities performed in coordination in an
organizational and technical environment. These activities
jointly realize a business goal.
(Weske, 2011)
25
26. Business Process
Management
A collection of
concepts, methods, and techniques
to support humans in
modeling, administration,
configuration, execution,
analysis, and continuous improvement
of business processes
26
27. Short History
• Smith (~1750): division of labour
• Taylor (~1911): scientific method
applied to organisations
• Hammer and Champy (~1990):
processes as the basis for
reengineering
• 2000s: business process
lifecycle, process-orientation
27
28. Value Chains, Business Functions, Tasks
ss Functions and Refinement into Activities
y of business
s follows the
ion abstraction.
iness functions
activities.
30. End-To-End, Reactive Behaviour
Order-to-cash, procure-to-pay, issue-to-resolution, …
30
Receive
order
Check
availability
Article available?
Ship article
Financial
settlement
yes
Procurement
no
Payment
received
Inform
customer
Late deliveryUndeliverable
Customer
informed
Inform
customer
Article
removed
Remove
article from
catalogue
Input Output
31. Process Modelling LanguagesCustomerTravelAgencyAirline
Flight needed
Check travel
agency web site
Check flight offer
Reject offer
Book and pay
flight
Make flight offer
Prepare ticket
offer received
request received
Ticket received
Flight paid
Offer rejected
Booking and
payment received
Offer rejection
received
Flight organised
Offer cancelled
Flight offer
Flight offer
[rejected]
Booking and payment
Ticket
Pool
Start event
Exclusive
gateway
Message
event
End event
Task
Event-based
gateway
Data object
BPMN
32. Flight offer
requested
Make flight offer
Travel Agency
Flight offer sent
to client
Check flight offer
Customer
XOR
Reject Offer
Book and pay flight
Customer
Customer
Offer rejected Cancel Offer
Travel Agency
Flight ticket
needed
Check travel
agency website
Customer
Offer canceled
Offer accepted
Website Flight offer
Flight offer
[rejected]
Flight offer [paid]
Flight offer
[cancelled]
Prepare ticket
Travel Agency Ticket
Airline issues ticket
Ticket prepared
Send ticket to
customer
Flight organised
Travel Agency
Event
Function
Organization unit
Owner
Supporting
system
Process path
Logical operation
Data
Ensure
confortable flight
Goal
EPC/ARIS
Process Modelling Languages
36. But There is More!
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
data logic
transactional
lifecycle
resources
decision
logic
case
notion
37. Case Object
The main subject of the process
• May be a concrete or abstract object
• An order, a claim, a paper, a request, …
• Contemporary process notations: capture well only
processes with a single notion of case
• The case object is 1-1 with the start event
(paper submission -> paper, order request -> order)
• But in reality, multiple case objects typically coexist!
• Flow of papers vs flow of reviews, flow of customer
orders vs flow of packages containing order parts, …
38. Task Instances
• A process model represents abstract tasks
• The concrete execution of a task on a case object
results in a task instance
• The evolution of a task instance goes through multiple
events and transitions (durative tasks)
• This is regulated by a task transactional lifecycle
39. Resources
Humans/devices responsible for the execution of
tasks instances
• Usually structured in an organisational model
defining roles, duties, capabilities, security levels,
…
ARIS
Organisational
structure
40. Data Logic
Management of the master data of the company,
including case data and data produced/consumed by
processes
• Master data are persisted inside information systems
• Processes perform CRUD operations over such data
• Processes acquire data from the external
environment
41. Structural Models
• Represent the structure of the domain of interest
• Capture the relevant concepts, attributes, and relationships
• Lead to the logical schema of information systems
41
Conceptual Data Models
UML
Class Diagram
ORM Schema
42. Decision Models
Encapsulate the decision logic that leads to infer certain
conclusions given input data
• This in turn determines how to route a case object in the
process
• May be implicitly embedded in the process, or represented
explicitly
42
DMN
Decision table
43. What are Models Used For?
• Understanding and communication
• Documentation and audits
• Verification and simulation
• Basis for unambiguous contracts between a company
and its customers
• Basis of IT systems supporting the daily work within the
organisation
How to best combine models and support all
these tasks is a very active area of research!
46. Problem #1: Lack of Interaction
data models
configure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to involve all
actors in the creation of
shared models? How to
share strategic goals?
47. Problem #1: Lack of Interaction
models
configure/
deploy
enact/
monitor
IT support
reality
50%
(re)
design
50%
(knowledge)
workers
managers/
analysts
?
How to improve such
models using data?
data
48. Impasse!
• (Knowledge) workers: experience the real
organisation, but locally and subjectively
• Management: have a global view of the expected
organisation, not aligned with reality
• Key, open questions:
• How to reconcile these two worlds?
• How to connect models with reality? How to take
strategic decisions based on such connection?
• How to ensure that the organisation as a whole is
going in the right direction?
56. The Effect
• Processes are only partially encoded into IT
systems
• IT systems need to support a backdoor to
circumvent the encoded processes
• Otherwise, people will act “outside”, using the
system just to “record”
• No hope to improve: knowledge-intensive
processes cannot be automated in the classical
sense!
58. Process Model vs Instance
Business Process
• Tasks
• Data schema
Process Instance
• Task instances
• Data values
• Car assembly
process
• Task: mount doors
• Frame#,
buyer ID,
car color
• Assembly of car 123
• Task instance #54:
mount doors on car
123
• Frame#: 123,
buyer: Diego,
car color: white
59. Processes Leave Digital
Footprints
Within organisations: event data related to process
executions are continuously stored for
• Internal management
• Calculation of process metrics/KPIs
• Legal reasons (compliance, external audits)
In addition: internally and externally, more data are stored
somewhere
• We live in a digital society!
• Social networks, sensors, cyberphysical systems, mobile
devices are data loggers
60. Situation 1: Explicit Event Logs
Organisation equipped with process-aware information
systems
• Supporting humans in the execution of processes (task
assignments, todo lists)
• Explicitly logging events, with info about:
- timestamp
- event type (start, end, reassign, …)
- reference task
- reference case
- task instance id
- responsible resource
- additional attributes
61. Explicit Event Logs
create
paper
author
submit
paper
author
assign
reviewer
chair
review
paper
reviewer
submit
review
reviewer
take
decision
chair
accept?
accept
paper
chair
reject
paper
chair
upload
camera
ready
author
Y
N
Fig. 2: The process for managing papers in a simplified conference submission system;
gray tasks are external to the conference information system and cannot be logged.
Example 1. As a running example, we consider a simplified conference submission
system, which we call CONFSYS. The main purpose of CONFSYS is to coordinate au-
thors, reviewers, and conference chairs in the submission of papers to conferences, the
consequent review process, and the final decision about paper acceptance or rejection.
Figure 2 shows the process control flow considering papers as case objects. Under this
perspective, the management of a single paper evolves through the following execution
steps. First, the paper is created by one of its authors, and submitted to a conference
available in the system. Once the paper is submitted, the review phase for that paper
starts. This phase of the process consists of a so-called multi-instance section, i.e., a
section of the process where the same set of activities is instantiated multiple times on
the same paper, and then executed in parallel. In the case of CONFSYS, this section is
instantiated for each reviewer selected by the conference chair for the paper, and con-
sists of the following three activities: (i) a reviewer is assigned to the paper; (ii) the
reviewer produces the review; (iii) the reviewer submits the review to CONFSYS. The
multi-instance section is considered completed only when all its parallel instantiations
Event Data
Case ID ID Timestamp Activity User . . .
1
35654423 30-12-2010:11.02 create paper Pete . . .
35654424 31-12-2010:10.06 submit paper Pete . . .
35654425 05-01-2011:15.12 assign review Mike . . .
35654426 06-01-2011:11.18 submit review Sara . . .
35654428 07-01-2011:14.24 accept paper Mike . . .
35654429 06-01-2011:11.18 upload CR Pete . . .
2
35654483 30-12-2010:11.32 create paper George . . .
35654485 30-12-2010:12.12 submit paper John . . .
35654487 30-12-2010:14.16 assign review Mike . . .
35654489 16-01-2011:10.30 submit review Ellen . . .
35654490 18-01-2011:12.05 reject paper Mike . . .
50%
62. Situation 2: Implicit Event Logs
Organisation equipped with generic enterprise information
systems
• CRM, ERP systems to handle customers and tasks
• Legacy information systems
• Domain-specific systems
Data are stored with different formats and according to
different domain-specific schemas
• No explicit events
• Data scattered in several data sources
64. 64
Implicit Event Logs
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
71. Alcohol and Fat
It’s a relief to know the truth after all those conflicting medical
studies.
The Japanese eat very little fat and suffer fewer heart attacks than
the British or Americans.
The French eat a lot of fat and also suffer fewer heart attacks than
the British or Americans.
The Japanese drink very little red wine and suffer fewer heart
attacks than the British or Americans.
The Italians drink excessive amount of red wine and also suffer
fewer heart attacks than the British or Americans.
The Germans drink a lot of beer and eat lots of sausages and
fats and suffer fewer heart attacks than the British or Americans.
Conclusion: Eat and drink what you like. Speaking English is
apparently what kills you
74. Result
Crompton (2008): domain experts loose (too
much) time in finding data to operate and take
strategic decisions
• Engineers in the oil/gas industry: 30-70%
working time in data crawling and data
quality
75. Models Enable Decision Making
Humans understand reality through models
• Data alone are meaningless
• Machine learning/deep learning techniques are
unable to expose their models: no human in the
decision making loop!
• Models not only for decision making, but also to
explain and guide
79. Process Mining: Data Science in Action
[See process mining manifesto]
1.3 Process Mining 9
Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and en-
hancement
80. The Three Pillars of Process Mining
elements is essential for proc
event log process model
Play-In
event logprocess model
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
event log process model
Play-In
event logprocess model
Play-Out
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
event log process model
Play-In
event logprocess model
Play-Out
event log process model
Replay
• extended model
showing times,
frequencies, etc.
• diagnostics
• predictions
• recommendations
Play in
Play out
Replay
81. Play In
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
82. Play-in: Discovery
Event logs implicitly contain the real
process!
Making it explicit gives:
•knowledge and understanding
•ground for discussion
•possibility to act by:
•correcting issues
•compare with the designed
models (“should be” vs “as is”)
•evolve the models
•re-engineer the organisation
credits to W.M.P. van der Aalst
83. Discovery: Crash Course
• L’idea principale: guardare ai dati da una
prospettiva “process oriented”
Case Id = l’istanza di processo
Evento
Tempo di inizio
Tempo di fine
88. Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
89. Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
90. Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B
C
D
E
91. Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B C
D
E
92. Discovery: Idea
Case Activity
1 A
2 A
1 B
1 1 C
3 A
2 C
3 B
2 B
1 D
2 D
2 E
3 C
3 D
1 E
3 D
3 E
A
B
C
D
E
Case 1
A
C
B
D
E
Case 2
A
C
D
D
E
Case 3
B
A
B C
D
E
93. Discovery in a Tool: DISCO
Demo Later
Event Log Discovered Process
94. Play Out In a Nutshell
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
96. Replay in a Nutshell
credits to W.M.P. van der Aalst
register travel
request (a)
get detailed
motivation
letter (c)
get support
from local
manager (b)
check budget
by finance (d)
decide (e)
accept
request (g)
reject
request (h)
reinitiate
request (f)
start end
Case Activity Timestamp Resource
432 register travel request (a) 18-3-2014:9.15 John
432 get support from local manager (b) 18-3-2014:9.25 Mary
432 check budget by finance (d) 19-3-2014:8.55 John
432 decide (e) 19-3-2014:9.36 Sue
432 accept request (g) 19-3-2014:9.48 Mary
113. A
D
C
D
E
Case 3
A
B C
D
E
AA
D
C
D
E
Case 1
A
D
C
D
E
Case 2
A
D
C
D
E
Case 4 A
D
C
D
E
Case k
…….
D
?
7
C
D
E
A common deviation:
maybe the model is wrong/
outdated!
Repair: Idea
117. IEEE XES Standard
[www.xes-standard.org]
IEEE Standard for the representation of event logs
• Based on XML
• Minimal mandatory structure:
log consists of traces, each representing the history of a case
trace consists of a list of atomic events
• Extensions to “decorate” log, trace, event with informative
attributes: timestamps, task names, transactional lifecycle,
resources, additional event data
• Supports “meta-level” declarations useful for log processors
117
119. Full XES Schema
119
Attribute are used. In addition, the role names e-has-a, t-has-a, and t-contains-e are
used to capture the binary relations among such concepts. To restrict the usage of those
attKey: String
attType: String
Attribute
extName: String
extPrefix: String
extUri: String
Extension
attValue: String
ElementaryAttribute CompositeAttribute
{disjoint}
ca-contains-a
*
*
logFeatures: String
logVersion: String
Log Trace Event
GlobalAttribute
GlobalEventAttribute
GlobalTraceAttribute
EventClassifierTraceClassifier
name: String
Classifier
a-contains-a
*
*
** e-usedBy-a
e-usedBy-l
*
*
l-contains-t t-contains-e* **1..*
l-contains-e
*
*
* * *
***
l-has-a
t-has-a
e-has-a
l-has-gea *
1..*
l-contains-ec
1..*
*
1..*
l-contains-tc
*
ec-definedBy-gea
1..*
*
1..*
1..*
* tc-definedBy-gea
l-has-gta
*
{disjoint}
{disjoint}
120. Core XES Schema
120
OBDA for Log Extraction in Process Mining
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
Fig. 13: Core event schema
We show now how such a simple schema can be suitably encoded in DL-LiteA.
code the core event schema of Figure 13, the three concept names Trace, Event, a
121. Quality of Logs
121
Level Characterization Examples
★ ★ ★ ★ ★ Highest level: the event log is of excellent quality (i.e., trustworthy
and complete) and events are well-defined. Events are recorded in
an automatic, systematic, reliable, and safe manner. Privacy and
security considerations are addressed adequately. Moreover, the
events recorded (and all of their attributes) have clear semantics.
This implies the existence of one or more ontologies. Events and
their attributes point to this ontology.
Semantically annotated logs of
BPM systems.
★ ★ ★ ★ Events are recorded automatically and in a systematic and reliable
manner, i.e., logs are trustworthy and complete. Unlike the systems
operating at level , notions such as process instance (case)
and activity are supported in an explicit manner.
Events logs of traditional BPM/
workflow systems.
★ ★ ★ Events are recorded automatically, but no systematic approach is
followed to record events. However, unlike logs at level , there
is some level of guarantee that the events recorded match reality
(i.e., the event log is trustworthy but not necessarily complete).
Consider, for example, the events recorded by an ERP system.
Although events need to be extracted from a variety of tables, the
information can be assumed to be correct (e.g., it is safe to assume
that a payment recorded by the ERP actually exists and vice versa).
Tables in ERP systems, event
logs of CRM systems,
transaction logs of messaging
systems, event logs of high-tech
systems, etc.
★ ★ Events are recorded automatically, i.e., as a by-product of some Event logs of document and
★★★
★★
122. From Event Logs to XES
• Level 4-5: straightforward syntactic manipulation
• Level 3: much more difficult, due to
• Multiple data sources
• Interpretation of data
• Lack of explicit information about cases and
events
122
123. Traditional Extraction from
Legacy Data
123
itional Methodology
Create
data
model
Choose
per-
spective
Extract
relevant
tables
Design
views with
relevant
attributes
Design
composite
views
Design
log view
Export to
XES/CSV
Do process
mining
Other perspective?
Y N
xtraction and Process Mining
, EBITmax converted the log view into a CSV file, and analysed it usin
process mining toolkit7
.
• Manual construction of views and ETL procedures to fetch the data
• Done by IT experts, not by knowledge workers (domain experts)
• Crucial issues:
• Correctness: who knows?
Process mining is dangerous if applied on wrong data
• maintenance, evolution, change of perspective are hard…
But process mining should be highly interactive
124. The onprom Approach
[http://onprom.inf.unibz.it]
124
34 D. Calvanese et al.
high-level IS?
Create
conceptual
data
schema
Create
mappings
Bootstrap
model +
mappings
Enrich
model +
mappings
Choose
perspective
Create
event-data
annotations
Get
XES/CSV
Do process
mining
Other perspective?
N
Y
Y
N
Fig. 12: The onprom methodology and its four phases
the same time generating (identity) mappings to link the two specifications. The result
of bootstrapping can then be manually refined.
Once the first phase is completed, process analysts and the other involved stake-
holders do not need anymore to consider the structure of the legacy information system,
Intelligent data management and conceptual modelling to:
1. Understand the data
2. Access the data using the domain vocabulary
3. Express the perspective for process mining using the domain vocabulary
4. Automatise the extraction of XES event logs
127. Information Structure
127
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
128. Actual Data
128
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
129. Actual Data: Meaning?
129
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
REVIEW
ID RRid submissiontime
REVIEWREQUEST
ID invitationtime reviewer paper
Fig. 11: DB schema for the information system of the conference submission system.
Primary keys are underlined and foreign keys are shown in italic
Intuitively, mapping assertions involving such atoms are used to map source relations
(and the tuples they store), to concepts, roles, and features of the ontology (and the ob-
jects and the values that constitute their instances), respectively. Note that for a feature
atom, the type of values retrieved from the source database is not specified, and needs
to be determined based on the data type of the variable v2 in the source query (~x).
131. Conference Example:
Conceptual Data Schema
131
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running exampleN.B.: in on prom we use DL-LiteA
(supports a controlled form of functionality)
132. 132
(title) ⌘ Paper
⇢(title) v string
(funct title)
(type) ⌘ Paper
⇢(type) v string
(funct type)
(decTime) ⌘ DecidedPaper
⇢(decTime) v ts
(funct decTime)
(accepted) ⌘ DecidedPaper
⇢(accepted) v boolean
(funct accepted)
(pName) ⌘ Person
⇢(pName) v string
(funct pName)
(regTime) ⌘ Person
⇢(regTime) v ts
(funct regTime)
(cName) ⌘ Conference
⇢(cName) v string
(funct cName)
(crTime) ⌘ Conference
⇢(crTime) v ts
(funct crTime)
(uploadTime) ⌘ Submission
⇢(uploadTime) v ts
(funct uploadTime)
(invTime) ⌘ Assignment
⇢(invTime) v ts
(funct invTime)
(subTime) ⌘ Review
⇢(subTime) v ts
(funct subTime)
DecidedPaper v Paper
Creation v Submission
CRUpload v Submission
9Submission1 ⌘ Submission
9Submission1 ⌘ Paper
(funct Submission1)
9Submission2 ⌘ Submission
9Submission2 v Person
(funct Submission2)
9Assignment1 ⌘ Assignment
9Assignment1 v Paper
(funct Assignment1)
9Assignment2 ⌘ Assignment
9Assignment2 v Person
(funct Assignment2)
9leadsTo v Assignment
9leadsTo ⌘ Review
(funct leadsTo)
(funct leadsTo )
9submittedTo ⌘ Paper
9submittedTo v Conference
(funct submittedTo)
9notifiedBy ⌘ DecidedPaper
9notifiedBy v Person
(funct notifiedBy)
9chairs v Person
9chairs ⌘ Conference
(funct chairs )
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
Correctness of the Encoding. The encoding we have provided is faithful, in the sense
that it fully preserves in the DL-LiteA ontology the semantics of the UML class diagram.
Obviously, since, due to reification, the ontology alphabet may contain additional sym-
bols with respect to those used in the UML class diagram, the two specifications cannot
have the same logical models. However, it is possible to show that the logical models
of a UML class diagram and those of the DL-LiteA ontology derived from it correspond
to each other, and hence that satisfiability of a class or association in the UML diagram
corresponds to satisfiability of the corresponding concept or role [29,7].
Example 9. We illustrate the encoding of UML class diagrams in DL-LiteA on the
133. Mapping Example
133
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
er
ean
fiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
of the Encoding. The encoding we have provided is faithful, in the sense
eserves in the DL-LiteA ontology the semantics of the UML class diagram.
nce, due to reification, the ontology alphabet may contain additional sym-
Primary keys are underlined and foreign keys are shown in italic
Example 10. Consider the CONFSYS running example, and an informatio
whose db schema R consists of the eight relational tables shown in Figur
give some examples of mapping assertions:
– The following mapping assertion explicitly populates the concept Crea
term :submission/{oid} in the target part represents a URI temp
one placeholder, {oid}, which gets replaced with the values for oid
through the source query. This mapping expresses that each value in SUB
identified by oid and such that its upload time equals the correspondin
creation time, is mapped to an object :submission/oid, which bec
instance of concept Creation in T .
SELECT DISTINCT SUBMISSION.ID AS oid
FROM SUBMISSION, PAPER
WHERE SUBMISSION.PAPER = PAPER.ID
AND SUBMISSION.UPLOADTIME = PAPER.CT
:submission/{oid} rdf:type :Creation .
– The following mapping assertion retrieves from the PAPER table instanc
concept Paper, and instantiates also their features title and type with value
String.
SELECT ID, title, type
ACCEPTANCE
ID uploadtime user paper
CONFERENCE
ID name organizer time
DECISION
ID decisiontime chair outcome
LOGIN
ID user CT
SUBMISSION
ID uploadtime user paper
PAPER
ID title CT user conf type status
135. Annotating the
Conceptual Data Schema
Fix perspective: declare the case
• Find the class whose instances are considered as case objects
• Express additional filters
Find the events (looking for timestamps)
• Find the classes whose instances refer to events
• Declare how they are connected to corresponding case objects
—> navigation in the UML class diagram
• Declare how they are (in)directly related to event attributes
(timestamp, task name, optionally event type and resource)
—> navigation in the UML class diagram
135
136. 136
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
137. 137
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
138. 138
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
139. 139
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
140. 140
OBDA for Log Extraction in Process Mining 25
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
*
*
*
1..*
*
1
1
0..1
* 1
1
*
Fig. 9: Data model of our CONFSYS running example
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
OBDA for Log Extraction in Process Mining 39
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
DecidedPaper
decTime: ts
accepted: boolean
notifiedBy
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
CaseCase
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Submission
Timestamp: uploadTime
Case: Submission1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Review
Timestamp: subTime
Case: leadsTo !Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Decision
Timestamp: decTime
Case: Paper
Event Decision
Timestamp: decTime
Case: Paper
*
*
*
1..*
*
1
1
0..1
* 1
1
*
141. Switching Perspective
Simply amounts to redefine the annotations
• Flow of accepted papers
• Flow of full papers
• Flow of reviews
• Flow of authors
• Flow of reviewers
• ….
141
143. Formalizing Annotations
Annotations are nothing else than SPARQL queries over
the conceptual data schema!
• Case annotation: query retrieving case objects
• Event annotation: query retrieving event objects
• Case-attribute annotation: query retrieving pairs
<attribute, case>
• Event-attribute annotation: query retrieving pairs
<attribute, event>
143
144. 144
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identifiers
occurrences of events.
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creatio
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identifiers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.
145. Annotations and XES
Elements
Annotations can be easily “mapped” onto XES elements:
case annotation query —> traces
event annotation query —> events
attribute annotation query —> trace/event attributes with given key
145
OBDA for Log Extraction in Process Mining
Attribute
attKey: String
attType: String
attValue: String
EventTrace
e-has-at-has-a
t-contains-e
0..*
0..*
0..*
0..*
1..* 0..*
146. 146
Paper
title : String
type : String
Person
pName : String
regTime: ts
Assignment
invTime: ts
Submission
uploadTime: ts
CRUpload Creation
Review
subTime: ts
leadsTo
Conference
cName: String
crTime: ts
submittedTo
chairs
Time
!Assignment1
Time
!Assignment1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
*
*
*
1..*
1
1
0..1
* 1
1
*
Annotated data model of our CONFSYS running example
ively used to capture the relationship between the event and its cor-
timestamp, and activity. As pointed out before, the timestamp anno-
a functional navigation. This also applies to the activity annotation,
?case rdf:type :Paper .
}
which retrieves all instances of the Paper class.
Event annotations are also tackled using SPARQL SELECT qu
swer variable, this time matching with actual event identifiers
occurrences of events.
Example 14. Consider the event annotation for creation, as sh
actual events for this annotation are retrieved using the following
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT q
variables, establishing a relation between events and their corre
ues. In this light, for timestamp and activity attribute annotatio
variable will be substituted by corresponding values for timestam
case attribute annotations, instead, the second answer variable
case objects, thus establishing a relationship between events an
long to.
Example 15. Consider again the annotation for creation events,
The relationship between creation events and their correspondin
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creatio
XES events:
- id: ?creationEvent
vent annotations are also tackled using SPARQL SELECT queries with a single an-
wer variable, this time matching with actual event identifiers, i.e., objects denoting
ccurrences of events.
xample 14. Consider the event annotation for creation, as shown in Figure 16. The
ctual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
hich in fact returns all instances of the Creation class.
ttribute annotations are formalised using SPARQL SELECT queries with two answer
ariables, establishing a relation between events and their corresponding attribute val-
es. In this light, for timestamp and activity attribute annotations, the second answer
ariable will be substituted by corresponding values for timestamps/activity names. For
ase attribute annotations, instead, the second answer variable will be substituted by
ase objects, thus establishing a relationship between events and the case(s) they be-
ong to.
xample 15. Consider again the annotation for creation events, as shown in Figure 16.
he relationship between creation events and their corresponding timestamps is estab-
shed by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
hich indeed retrieves all instances of Creation, together with the corresponding values
ken by the uploadTime attribute.
XES attribute:
- key: timestamp extension
- type: milliseconds
- value: ?creationTime
- parent event: ?creationEvent
147. Rewriting Annotations
Annotations are nothing else than SPARQL queries
over the conceptual data schema
147
They can be automatically reformulated as SQL
queries over the legacy data
We automatically get a standard OBDA mapping
from the legacy data to the XES concepts
148. 148
In the first step, the SPARQL queries formalising the annotations in L are reformu-
lated into corresponding SQL queries posed directly over I. This is done by relying on
standard query rewriting and unfolding, where each SPARQL query q 2 Lq is rewritten
considering the contribution of the conceptual data schema T , and then unfolded using
the mappings in M. The resulting query qsql can then be posed directly over I so as to
retrieve the data associated to the corresponding annotation. In the following, we denote
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the event anno-
tation that accounts for the creation of papers. A possible reformulation of the rewriting
and unfolding of such a query respectively using the conceptual data schema in Fig-
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submission."ID")
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
Submission."UploadTime" = Paper."CT" AND
Submission."ID" IS NOT NULL
This query is generated by the ontop OBDA system, which applies various optimisa-
tions so as to obtain a final SQL query that is not only correct, but also possibly compact
and fast to process by a standard DBMS. One such optimisations is the application of
ng
CRUpload Creation
chairs
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
Event Creation
Timestamp: uploadTime
Case: Submission!Submission1
1
NFSYS running example
nship between the event and its cor-
ted out before, the timestamp anno-
o applies to the activity annotation,
functional navigation, the activity
that independently fixes the name
additional optional attribute anno-
standard extensions provided XES,
y transactional lifecycle, as well as
urce name and/or role.
occurrences of events.
Example 14. Consider the event annotation for creation, as shown in Figure 16. The
actual events for this annotation are retrieved using the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent
WHERE {
?creationEvent rdf:type :Creation .
}
which in fact returns all instances of the Creation class.
Attribute annotations are formalised using SPARQL SELECT queries with two answer
variables, establishing a relation between events and their corresponding attribute val-
ues. In this light, for timestamp and activity attribute annotations, the second answer
variable will be substituted by corresponding values for timestamps/activity names. For
case attribute annotations, instead, the second answer variable will be substituted by
case objects, thus establishing a relationship between events and the case(s) they be-
long to.
Example 15. Consider again the annotation for creation events, as shown in Figure 16.
The relationship between creation events and their corresponding timestamps is estab-
lished by the following query:
PREFIX : <http://www.example.com/>
SELECT DISTINCT ?creationEvent ?creationTime
WHERE {
?creationEvent rdf:type :Creation .
?creationEvent :Submission1 ?Paper .
?creationEvent :uploadTime ?creationTime .
}
which indeed retrieves all instances of Creation, together with the corresponding values
taken by the uploadTime attribute.
XES events:
- id: ?creationEvent
OBDA for Log Extraction in Process Mining 43
ry q(c) 2 Lsql obtained from a case annotation, we insert into
OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
mapping populates the concept Trace in E with the case objects
m the answers returned by query q(c).
ry q(e) 2 Lsql that is obtained from an event annotation, we
following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
mapping populates the concept Event in E with the event objects
m the answers returned by query q(e).
OBDA for Log Extraction in Process Mining
or each SQL query q(c) 2 Lsql obtained from a case annotation, we ins
ME
P the following OBDA mapping:
q(c)
:trace/{c} rdf:type :Trace .
tuitively, such a mapping populates the concept Trace in E with the case
at are created from the answers returned by query q(c).
or each SQL query q(e) 2 Lsql that is obtained from an event annotati
sert into ME
P the following OBDA mapping:
q(e)
:event/{e} rdf:type :Event .
tuitively, such a mapping populates the concept Event in E with the event
at are created from the answers returned by query q(e).
as a XES event log, and also to actually materialise such an event log.
Technically, onprom takes as input an onprom model P = hI, T , M,
event schema E, and produces new OBDA system hI, ME
P , Ei, where the a
in L are automatically reformulated as OBDA mappings ME
P that directly l
Such mappings are synthesised using the three-step approach described nex
In the first step, the SPARQL queries formalising the annotations in L ar
lated into corresponding SQL queries posed directly over I. This is done by
standard query rewriting and unfolding, where each SPARQL query q 2 Lq
considering the contribution of the conceptual data schema T , and then unfo
the mappings in M. The resulting query qsql can then be posed directly ove
retrieve the data associated to the corresponding annotation. In the following
the set of all so-obtained SQL queries as Lsql.
Example 16. Consider the SPARQL query in Example 13, formalising the e
tation that accounts for the creation of papers. A possible reformulation of th
and unfolding of such a query respectively using the conceptual data sche
ure 9, and the mappings from Example 10, is the following SQL query:
SELECT DISTINCT
CONCAT(’http://www.example.com/submission/’,Submiss
AS "creationEvent"
FROM Submission, Paper
WHERE Submission."Paper" = Paper."ID" AND
149. Recap
149
OBDA for Log Extraction in Process Mining 37
D
(database)
R
(db schema)
conforms to
M
(mapping specification)
T
(conceptual data schema)
L
(event-data annotations)
P (onprom model)
E
(conceptual event schema)
annotates
points to
ME
P
(log mapping specification)
I (information system)
B (OBDA model)
150. Querying the “Virtual Log”
SPARQL queries over the event schema are answered using legacy
data
• Example: get empty and nonempty traces; for nonempty traces, also fetch all
their events
Answers can be serialised into a fully compliant XES log!
150
name.
The following query is instead meant to retrieve (elementary) attributes, considering
in particular their key, type, and value.
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?att ?attType ?attKey ?attValue
WHERE {
?att rdf:type :Attribute;
:attType ?attType;
:attKey ?attKey;
:attVal ?attValue.
}
The following query handles the retrieval of empty and nonempty traces, simulta-
neously obtaining, for nonempty traces, their constitutive events:
PREFIX : <http://www.example.org/>
SELECT DISTINCT ?trace ?event
WHERE {
?trace a :Trace .
OPTIONAL {
?trace :t-contain-e ?event .
?event :e-contain-a ?timestamp .
?timestamp :attKey "time:timestamp"ˆˆxsd:string .
?event :e-contain-a ?name .
?name :attKey "concept:name"ˆˆxsd:string .
}
}
4.6 The onprom Toolchain
onprom comes with a toolchain that supports the various phases of the methodology
151. The onprom Toolchain
Implementation of all the described steps using
• Java (GUIs, algorithms)
• OWL 2 QL plus functionality (conceptual schemas)
• ontop (OBDA system)
• OpenXES (XES serialisation and manipulation)
• ProM process mining framework (environment)
151
152. onprom UML Editor
152
46 D. Calvanese et al.
Fig. 17: The onprom UML Editor, showing the conceptual data schema used in our
153. onprom Annotation Editor
153
OBDA for Log Extraction in Process Mining 47
Fig. 18: The Annotation Editor showing annotations for the CONFSYS use case
154. onprom Log Extractor
154
OBDA for Log Extraction in Process Mining 49
Fig. 20: Screenshot of Log Extractor Plug-in in Prom 6.6.
155. Experiments
• Very encouraging initial experiments
• Carried out using synthetic data
• We are looking for real case studies!
155
161. Other tools: ProM
[http://www.promtools.org]
• The most famous
academic initiative in
process mining
• Cutting-edge process
mining algorithms are
there
• Pluggable architecture
• Dozens of plug-ins
163. Conclusions
• Process Mining as a way to reconcile model-driven
management and the real behaviours
• Data preparation is an issue in presence of legacy
data
• Ontology-Based Data Access: solid theoretical
basis with optimised implementations
• onprom as an effective tool chain for extracting
event logs from legacy databases
164. Future Work
• Conceptual Modeling
• How to improve the discovery of events?
• How to semi-automatically proposed events to the user?
• How to integrate methodologies and results from formal
ontology?
• Engineering
• How to handle different types of data?
• How to deal with different event schemas that go beyond
XES?
• How to generalise the approach to handle rich ontology-to-
ontology-mappings?