Talk given at "Problems and techniques for Incremental Re-computation: provenance and beyond".
A workshop co-organized with Provenance Week 2018
King's College London, 12th and 13th July, 2018
Organizers: Paolo Missier (Newcastle University), Tanu Malik (DePaul University), Jacek Cala (Newcastle University)
Abstract: Incremental recomputation has applications, e.g., in databases and workflow systems. Methods and algorithms for recomputation depend on the underlying model of computation (MoC) and model of provenance (MoP). This relation is explored with some examples from databases and workflow systems.
9. Step 2: Graph Transformation (“G-trick”)
• Reify provenance atoms & firings in a labeled graph g/3
• Example for N = 2 subgoals and 1 head atom …
fire2(X,Z,Y) :- e(X,Z), tc(Z,Y). % two in-edges
tc(X,Y) :- fire2(X,Z,Y). % one out-edge
… generates N+1 “reification rules” (Skolems are safe):
g( e(X,Z), in, ffire2(X,Z,Y) ) :- fire2(X,Z,Y).
g( tc(Z,Y), in, ffire2(X,Z,Y) ) :- fire2(X,Z,Y).
g( ffire2(X,Z,Y), out, tc(X,Y) ) :- fire2(X,Z,Y).
e(a,b)
fire2(a,b,d)
in
tc(a,d)
out
tc(b,d)
in
Example instance generated by these rules
9
10. Step 3: Using Statelog (“S-Trick”)
• Use Statelog to keep record of firing rounds:
– Add state (=stage) argument to provenance rules and graph relations
– EDB facts are derived in state 0.
– Subsequently: extract earliest round for firings and IDB facts
• Example:
rin : fr(S1, X) :- B1(S, X1), … , Bn(S, Xn), next(S, S1).
rout : H(S, Y) :- fr(S, X).
e(a,b) r1 [1]
r2 [3]
tc(a,b)
[1]e(b,c)
r2 [2]
tc(b,b)
[2]
e(c,b)
r1 [1]
r2 [3]
tc(c,b)
[1]
10
13. Application Example: Protein 3D Structure
Resonance Assignments
(a) Sequential
(b) Side-Chain
Identification of Secondary
Structural Elements
(a) Based on Chemical Shift
(b) Based on NOE Patterns
Determine Distance
Constraints
(a) From 2D/3D NOESY Spectra
(b) Calibrate Distance from Vol
Determine Torsion
Angle Constraints (f, y, c)
(a) Based on Chemical Shift
(b) Based on J-couplings
Structure Determination
(a) Torsion Angle Dynamics
(b) Simulated Annealing
High
Resolution
Structure
Iterative
Michael Gryk: We cannot assign all of the resonances in part (1), or all of the NOESY
peaks in part (3) before doing step (5). So we run (5) with incomplete information and get
a preliminary answer. This helps rectify ambiguities in steps 1-4 and we fix that data and
run again. And again. And again. It literally can take dozens of attempts before we get
a high-resolution structure.
è Question of both efficiency and (months or years later) reproducibility
14. A simpler example …
• Some inputs and/or
params of the workflow
change
è “smart re-run”
• Similar to executing Make
• … on a DAG
– … eg via Datalog to compute
subworkflow to be re-executed
(“rescue-DAG”)
• So much winning! But ...
Ludäscher: Incremental Recomp 14
https://openprovenance.org/provenance-challenge/WebHome.html
20. 20 7/20/2011“Fault Tolerance through Provenance-based Recovery”
Example: Checkpoint in SDF
• Workflow with a mix of stateful and
stateless actors .
Corresponding schedule of the workflow
with a fault during invocation B:2
21. 21 7/20/2011“Fault Tolerance through Provenance-based Recovery”
Prototype Implementation in Kepler
• Upon recovery request:
– SDF director calls the recovery engine
• Recovery:
– Restore the internal state of actors
– Replay successful invocations using input tokens from
provenance
– Restore content of all queues
– Repeat faulty invocations
– Return to SDF director with information about where to
resume
22. 22 7/20/2011UC Davis: S. Koehler, T. McPhillips, S. Riddle, D. Zinn, B. Ludaescher
Execution with Failure
• Execution of the
previous workflow
• Checkpoints for
actor B and D but
not for C
• At invocation B:2 -
Crash
• Tokens t4 and t7 -
in queue
• Token t9 - to be
restored
• Token t10 - to be
deleted
24. Provenance Recording Overhead
24 7/20/2011“Fault Tolerance through Provenance-based Recovery”
Without
provenance
Standard
provenance
Extended
provenance
Worst-case scenario
If you already capture provenance …
You might as well do it right J
27. Hamming Numbers in a Dataflow Network
Compute Hamming numbers H in order, where
H = 2i · 3j · 5k, where i, j, k ≥ 0
a.k.a. regular numbers or 5-smooth numbers (numbers whose prime factors are <= 5).
27
X2
X3
X5
S2
S3
S5
Q1
Q2
Q3
M1
M2
Q4
Q5
Q6
Q7
Q8
31. Computational / Workflow Thinking:
The limits of my language are the limits of my world …
• Vanilla Process Network
• Functional Programming
Dataflow Network
• XML Transformation
Network
• Collection-oriented
Modeling & Design
framework (COMAD)
– “Look Ma: No Shims!”
34. From MoC to MoP via Observables
• Model of Computation MoC
– specification/algorithm to compute Outputs = MoC(Wf,Params,Inputs)
– a director or scheduler implements MoC
– gives rise to formal notions of
• computation (aka run) R
– Formalisms to define M?
• Model of Provenance MoP
– associate with a MoC a “default” MoP (= MoC ± Δ)
– the MoP is a “trimmed” MoC
• T = R – I + M
– Trace = Run – Ignored-observables + Modeled-observables
• Observables (of a MoC / MoP)
– functional observables (may influence output o)
• token rate, notions of firing, …
– non-functional observables (not part of M, do not influence o)
• token timestamp, size, … (unless the MoC cares about those)
35. All-in-One (Summary)
• Provenance & Incremental Recomputation
– What You See (Think/Model) Is What You Get!
– WYTIWYG (“witty-wig”)
• These assembly language instructions
• … implementing these VM Instructions
• … in this programming language
• ... implementing an algorithm
• ... that schedules a workflows
• ... that applies this bioinformatics method
• … to test this scientific hypothesis ....
è Need to capture provenance at the “right level”
– … for efficiency
– ... for transparency & understanding
• Bottom Line: MoP = MoC +/- ∆
– T = R – I + M
– Provenance Trace (MoP thing) = Run (MoC thing) – “nah..” + “yeah!”
Ludäscher: Incremental Recomp 35
37. Argumentation Frameworks
& Game Provenance
a
b
1
c
3
d e
f
1
g
3
m
h
1
k
l
oo
n
oo
oo
oo
2 2
2
Ludäscher: Incremental Recomp
37
• Query evaluation and logic-
based argumentation can be
understood as a game!
• One logic rule to rule them all …
win(X) :- move(X,Y), not win(Y)
• node color => edge color
– good vs bad moves
• good moves = natural, new
notion of provenance!
• Implement, e.g. using Answer
Set Programming
Aside: Games ~ Argumentation Frameworks
win(X) :- move(X,Y), not win(Y)
def(X) :- attacks(Y,X), not def(Y)