Seminar given by Marco Montali on 31/05/2016 at the Department of Computer Science, University of Verona. Title: Data-aware business - balancing between expressiveness and verifiability.
3. Our Starting Point
Marrying processes and data is a must if we
want to really understand how complex dynamic
systems operate
Dynamic systems of interest:
• business processes
• multiagent systems
• distributed systems
3
5. Formal Verification
Automated analysis
of a formal model of the system
against a property of interest,
considering all possible system behaviors
5
picture by Wil van der Aalst
6. Our Thesis
Knowledge representation and
computational logics
can become a swiss-army knife to
understand data-aware dynamic systems,
and
provide automated reasoning and verification
capabilities along their entire lifecycle
6
11. Our Approach
1. Develop formal models for data-aware dynamic systems
2. Outline a map of (un)decidability and complexity
3. Find robust conditions for decidability/tractability
4. Bring them back into practice
5. Implement proof-of-concept prototypes
11
13. The Three Pillars of Complex Systems
System
ProcessesData Resources
In AI and CS, we know a lot about each pillar!
13
14. Information Assets
• Data: the main information source about the history
of the domain of interest and the relevant aspects
of the current state of affairs
• Processes: how work is carried out in the domain
of interest, leading to manipulate data
• Resources: humans and devices responsible for
the execution of work units within a process
We focus on the first two aspects!
14
16. State of the Art
• Traditional isolation between processes and data
• Why? To attack the complexity (divide et impera)
• AI has greatly contributed to these two aspects
• Data: knowledge bases, conceptual models,
ontologies, ontology-based data access and
integration, inconsistency-tolerant semantics, …
• Processes: reasoning about actions, temporal/
dynamic logics, situation/event calculus, temporal
reasoning, planning, verification, synthesis, …
16
18. Data/Process Fragmentation
• A business process consists of a set of activities that
are performed in coordination in an organizational and
technical environment [Weske, 2007]
• Activities change the real world
• The corresponding updates are reflected into the
organizational information system(s)
• Data trigger decision-making, which in turn determines
the next steps to be taken in the process
• Survey by Forrester [Karel et al, 2009]: lack of
interaction between data and process experts
18
19. Experts Dichotomy
• BPM professionals: think that data are subsidiary to
processes, and neglect the importance of data quality
• Master data managers: claim that data are the main
driver for the company’s existence, and they only focus
on data quality
• Forrester: in 83/100 companies, no interaction at all
between these two groups
• This isolation propagates to languages and tools,
which never properly account for the process-data
connection
19
20. Conventional Data Modeling
Focus: revelant entities, relations, static constraints
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns
0..1
Material
But… how do data evolve?
Where can we find the “state” of a purchase order?
20
21. Conventional Process Modeling
Focus: control-flow of activities in response to events
But… how do activities update data?
What is the impact of canceling an order?
21
22. Do you like Spaghetti?
Manage
Cancelation
ShipAssemble
Manage
Material POs
Decompose
Customer PO
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Customers Suppliers&CataloguesCustomer POs Work Orders Material POs
IT integration: difficult to manage, understand, evolve
22
23. Too late to reconstruct the
missing pieces…
• Where are the data?
• Part in the DBs
• Part inside the process engine
• Where shall we model relevant business rules?
• At the DB level? Which DB? How to get all the necessary data?
23
Too late to reconstruct the missing pieces
Where is our data?
part is in the DBs,
part is hidden in the process execution engine.
Where are the relevant business rules, and how are they modeled?
At the DB level? Which DB? How to import the process data?
(Also) in the business model? How to import data from the DBs?
DataProcess
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns
0..1
Determine
cancelation
penalty
Notify penalty
Material
Process Engine
Process State
Business rules
For each work order W
For each material PO M in W
if M has been shipped
add returnCost(M) to penalty
24. “
How is Research Reacting?
A recent review…
Verification typically takes place at the design stage
of a business process type. However, at this stage,
required knowledge about data (database
schema, integrity constraints) is typically not yet
available. Conditions should be explicitly stated
under which the proposed method is applicable.
24
25. …But There is Hope!
• [Meyer et al, 2011]: data-process integration
crucial to assess the value of processes and
evaluate KPIs
• [Dumas, 2011]: data-process integration crucial to
aggregate all relevant information, and to suitably
inject business rules into the system
• [Reichert, 2012]: “Process and data are just two
sides of the same coin”
25
26. Business Entities/Artifacts
Data-centric paradigm for process modeling
• First: elicitation of relevant business entities that are
evolved within given organizational boundaries
• Then: definition of the lifecycle of such entities, and
how tasks trigger the progression within the
lifecycle
• Active research area, with concrete languages
(e.g., IBM GSM, OMG CMMN)
• Cf. EU project ACSI (completed)
26
37. Why FO Temporal Logics
• To inspect data: FO queries
• To capture system dynamics: temporal
modalities
• To track the evolution of objects: FO
quantification across states
• Example: It is always the case that every
order is eventually either cancelled, or
paid and then delivered
37
38. Problem Dimensions
Data
component
Relational DB Description
logic KB
OBDA system Inconsistency
tolerant KB
…
Process
component
condition-
action rules
ECA-like
rules
Golog program …
Task
modeling
Conditional
effects
Add/delete
assertions
Logic
programs
…
External
inputs
None External
services
Input DB Fixed input …
Network
topology
Single
orchestrator
Full mesh Connected, fixed
graph
…
Interaction
mechanism
None Synchronous Asynchronous
and ordered
…
38
39. Data-Centric Dynamic Systems
39
model underlying variants of artifact-centric systems
lly equivalent to the most expressive models for bu
e.g., GSM).
Data Process Data+Process
r: Relational databases / ontologies
schema, specifying constraints on the allowed states
40. DCDS [PODS 2013, …]
• A “pristine” framework for data-aware process modeling
• Data layer: relational DB with constraints
• Schema: constraints on the data
• Instance: state of the DCDS
• Fixed initial instance
• Infinite data domain with a finite subset for the “initial data”
• Process layer
• Atomic actions
• Condition-action rules for action executability
40
41. More on Actions
• Actions have parameters
• Bound to corresponding values for execution
• Actions have conditional “CRUD” effects (a là ADL)
• IF-THEN rules
• IF part: query over the current DB
• THEN part: ADD/DELETE tuples, using
• Action parameters
• Results to the IF query (bulk interpretation)
• Service calls to account for new data
41
43. User Cart Actions
• Any customer may decide to insert a new item of a
given product into her cart
• Any customer may empty her own cart
43
9~y, ~z.Customer(c, ~y) ^ Product(p, ~z) 7! AddToCart(c, p)
9~y.Customer(c, ~y) 7! EmptyCart(c)
44. User Cart Actions
• Adding to a cart…
• Emptying a cart…
44
AddToCart(c, p) :
true add{InCart(getBarCode(p), c, p)}
EmptyCart(c) :
InCart(b, c, p) del{InCart(b, c, p)}
45. Execution Semantics
Relational transition systems with database-labeled states
Successors constructed considering all possible ground
executable actions and all possible service call results
(s.t. the resulting state satisfies the schema constraints)
…45
53. Which logics?
• First-order temporal logics with active domain
quantification
• Branching-time: μLa
• Linear-time: LTL-FOa
• Only initial constants can explicitly appear in the formulae
• Corresponding notions of bisimulation/trace equivalence
53
G(8s.Student(s) ! F(Retired(s) _ Graduated(s))
AG(8o.Order(o) ^ Status(o, open) ! EF(mathitStatus(o, shipped))
54. The Good…
DCDS are:
• Markovian: Next state only depends on the
current state + input.
Two states with identical DBs are bisimilar.
• Generic: FO/SQL (as all query languages) does
not distinguish structures which are identical
modulo uniform renaming of data objects.
—> Two isomorphic states are bisimilar
54
55. …the Bad…
55
What is a DCDS?
A Turing machine
I.e., You are doomed to
undecidability, even for
56. …the Bad…
56
What is a DCDS?
A finite-state control process
operating over an unbounded
memory.
So what is it?
A Turing machine
I.e., You are doomed to
undecidability, even for
57. …the Bad…
57
What is a DCDS?
A finite-state control process
operating over an unbounded
memory.
So what is it?
A Turing machine
I.e., You are doomed to
undecidability, even for
propositional reachability!
58. …and the Ugly
Simulation of a 2-counter Minsky machine
• Counter —> “size” of a unary relation
• Test counter1 for zero: check whether counter relation is empty
• What matters is the # of tuples, not the actual values
• Can be reconstructed also without negation in the queries
58
New
Increment Decrement
59. State-Boundedness
Put a pre-defined bound on the DB size
• Resulting transition system: still infinite-state
(even in the 1-bounded case)
• But: infinitely-many encountered values along a
run cannot be “accumulated” in a single state
59
62. Effect of state-boundedness
62
General State-bounded
μLa undecidable
decidable
abstraction must
depend on the formula!
LTL-FOa undecidable undecidable
Reason?
FO quantification
across states.
63. Effect of state-boundedness
63
General State-bounded
μLa undecidable
decidable
abstraction must
depend on the formula!
LTL-FOa undecidable undecidable
Reason?
Logic unable to isolate
single runs/computations
64. Logics with
Persistent Quantification
• Intuition: control the ability of the logic to quantify
across states
• Only objects that persist in the active domain of
some node can be tracked
• When an object is lost, the formula trivializes to true
or false
• E.g.: “guarded” until
Persistence-Preserving µ-calculus (µLP)
In some cases, objects maintain their identity only if they persist in th
active domain (cf. business artifacts and their IDs).
. . .
StudId : 123
. . .
StudId : 123
. . .dismiss(123) newStud()
ID() = 123
µLA
µLFOG(8s.Student(s) ! Student(s)U(Retired(s) _ Graduated(s)))
64
66. Effect of Persistent
Quantification
66
General State-bounded
μLa undecidable
decidable
abstraction must
depend on the formula!
μLp undecidable
LTL-FOa undecidable undecidable
LTL-FOp undecidable
67. Effect of Persistent
Quantification
67
General State-bounded
μLa undecidable
decidable
abstraction must
depend on the formula!
μLp undecidable
decidable
formula-independent
abstraction!
LTL-FOa undecidable undecidable
LTL-FOp undecidable
decidable
formula-independent
abstraction!
68. Pruning Infinite-Branching
• Consider a system snapshot and its node DBs
• Input is bounded —> only boundedly many
isomorphic types relating the input objects and
those in the DDS active domain
• Input configurations in the same isomorphic
type produce isomorphic snapshots
• Keep only one representative successor
state per isomorphic type
• The “pruned” transition system is finite-
branching and bisimilar to the original one
68
71. Compacting Infinite Runs
• Key observation: due to persistent quantification, the
logic is unable to distinguish local freshness from
global freshness
• So we modify the transition system construction:
whenever we need to consider a fresh representative
object…
• … if there is an old object that can be recycled
—> use that one
• … if not —> pick a globally fresh object
• This recycling technique preserves bisimulation!
71
72. Compacting Infinite Runs
• [Calvanese et al, 2013]: if the system is size-
bounded, the recycling technique reaches a
point were no new objects are needed
—> finite-state transition system
• N.B.: the technique does not need to know
the value of the bound
72
74. Is My System State-
Bounded?
• For a fixed k: decidable.
• For “some” bound: undecidable.
• Studied sufficient, syntactic conditions in the
style of “weak acyclicity” in data exchange
• See PODS 2013, KR 2014
74
75. How to Make my System
State-Bounded?
• Different methodologies under investigation
• See in particular the work on BAUML (artifact-
centric UML processes)
• Bounds obtained by suitably “constraining” the
UML diagrams representing the data
75
76. What if I have “Cases”?
• Each case may reach boundedly many values in
each state
• If cases “interact”: also need to bound how many
cases are simultaneously present
• If they do not interact: unboundedly many cases
can coexist
• See [STTT 2016]
76
77. Conclusion
Marriage between processes and data is challenging,
though necessary
• State-boundedness is a robust condition towards the effective
verifiability of such systems
• The same results hold by enriching the data component with
knowledge (ontologies, inconsistency-tolerance, …)
[RR 2012, ECAI 2012, IJCAI 2013, IJCAI 2015, JAIR 2013]
• The same results hold by changing the query language (Datalog,
…)
• Complexity: exponential in the “data that can be changed”
• Same formal model for execution and verification
77
78. Current lines
• Emphasis on other reasoning services: monitoring, planning,
adversarial synthesis [RR 2013, IJCAI 2015, IJCAI 2016]
• Connections with several modeling languages [ICSOC 2013,
CIKM 2014, FAOC 2016]
• Distributed and multi agent systems [AAMAS 2014, AAAI
2015]
• Relaxations of size-boundedness, with the help of
• Parameterized verification
• Verification via underapproximation [PODS 2016]
78
79. Current and Future Work
• Implementations, leveraging the long-standing
literature in data management and formal verification
• Controlling state-boundedness is important
• Domain-specific case studies… how do they relate to
our approach?
• Relaxations of size-boundedness, with the help of
• Parameterized verification
• Verification via underapproximation [PODS 2016]
79
80. Acknowledgments
All coauthors of this research,
in particular
Babak Bagheri-Hariri (did PhD @ UNIBZ)
Diego Calvanese (UNIBZ)
Giuseppe De Giacomo (UNIROMA)
Alin Deutsch (UCSD)
Fabio Patrizi (UNIBZ)
Ario Santoso (did PhD @ UNIBZ)
80