Gen AI in Business - Global Trends Report 2024.pdf
Panda Provenance
1. Panda: A System for
Provenance and Data
Presented By Vladimir Bukhin
2. Contents
• Use of Provenance.
• Panda Goals.
• Example Workflow.
• Provenance Operations.
• Panda Implementation.
3. Use of Provenance
• Explanation: Examine sources and evolution
of data elements.
• Verification: Auditing how data was
produced.
• Re-computation: Having found error,
propagate changes downstream.
4. Panda Goals
• Merge data-based and process-based
provenance.
• Define provenance operators to query and
analyze data mixed with provenance.
• Create open-source configurable system
that can be used for wide variety of
applications, having coupling capabilities
with outside data/systems.
5. Example Workflow
• De-duplicate 2 data sets.
• Partition into Euro and USA, then take union.
• Process predict Items most likely to be purchased from
addition of 2 data sets.
• Aggregation: Output of prediction table.
6. Provenance Operations
• Backward Tracing: If cowboy hats most sold
item, where are people buying from?
• Forward Tracing: If we correct an error in
the data, how would the outcome change?
• Forward Propagation: Rerun processes for
concerned erroneous data and recalc end
result.
• Refresh: Recalculate due to new data.
7. Panda Implementation
• Query language to answer
questions like: Which
customer list contributes the
most to the top 100 predicted
items?
• Use ‘predicates’ as references
to refer back to data origins.
(to trace back to info src.)
• Python mapping/transformation
nodes run per data point.
8. References
• R. Ikeda and J. Widom, Panda: A System for
Provenance and Data, IEEE Data
Engineering Bulletin,Vol. 33, No. 3.
September 2010.