SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Provenance Central:
More Mileage from Provenance Metadata
Bertram Ludäscher
UC Davis, USA
ludaesch@ucdavis.edu
Paolo Missier
Newcastle University, UK
paolo.missier@ncl.ac.uk
Members of the DataONE Provenance Working Group
CAMP-4-DATA workshop @IPres 2013
Sept, 6, 2013
Lisbon, Portugal
Friday, 6 September 13
Outline
• A foundation for Provenance management: the PROV data model
– From the W3C. Recommendation as of Spring, 2013
– generic, extensible model
• The role of provenance in the DataONE project
– Provenance enables search and discovery, reuse, reproducibility
– PBase: Provenance warehousing
– Integration with the DataONE architecture
– Provenance mining: the social life of research data
2
Friday, 6 September 13
PROV: scope and structure
3 source: http://www.w3.org/TR/prov-overview/
Recommendation
track
Prov-dictionaryplus:
Friday, 6 September 13
PROV: scope and structure
3 source: http://www.w3.org/TR/prov-overview/
Recommendation
track
Prov-dictionaryplus:
Friday, 6 September 13
PROV Core Elements (graph depiction)
4
An entity is a physical, digital, conceptual, or other kind of thing with some fixed
aspects; entities may be real or imaginary.
Entity
Activity
Agent
An activity is something that occurs over a period of time and acts upon or with entities; it
may include consuming, processing, transforming, ..., using, or generating entities.
An agent is something that bears some form of responsibility for an activity taking place,
for the existence of an entity, or for another agent's activity.
drafting commenting
paper1
paper2
used
draft
v1
wasGeneratedBy used draft
comments
wasGeneratedBy
Alice
Bob
wasAssociatedWith
actedOnBehalfOf
Remote past Recent past
distribution=internal
status=draft
version=0.1
ex:role=main_editor
type=person
ex:role=sr_editor
prov:role=editor
time=...
time=...
Friday, 6 September 13
Summary of the PROV Core model
5
– PROV-DC mapping available
– Recent Tutorial @EDBT’13 (June, 2013) [1]
• Model, Constraints, Applications
[1] Missier, Paolo, Khalid Belhajjame, and James Cheney. “The W3C PROV Family of Specifications for
Modelling Provenance Metadata.” In Procs. EDBT’13 (Tutorial). Genova, Italy: ACM, 2013.
Friday, 6 September 13
PROV-DM relations at a glance
6
Friday, 6 September 13
Context: ProvWG@DataONE
• DataONE: Data Observation Network for Earth
– 5yr NSF DataNet data preservation project (current phase)
– Provides a large scale, federated data infrastructure to the Earth Sciences
community
• Provenance Working Group
– Active until July, 2014 (current phase, looking at extending)
– One/two interns per year since 2010
– One dedicated researcher (postdoc) since 2012
– 12 core members, additional guest members on a rotation
• specific focus on the provenance of workflow-based e-science data
7
Friday, 6 September 13
DataONE collaboration scenario - 2012
8
Alice’s Workflow: generates benchmark climate data for model comparison
Input is retrieved from DataONE to generate an output file
Friday, 6 September 13
DataONE collaboration scenario - 2012
8
."."." ."."." ."."."
The workflow, provenance, and other metadata are uploaded to DataONE
A data package is created and indexed
Friday, 6 September 13
Searching
9
Bob: Search based on keywords in the metadata
➡ including provenance terms
Bob discovers Alice’s workflow. He may be able to execute it again
Friday, 6 September 13
PBase and DataONE
10
System
Metadata
Extract-Align-AugmentMetadata
ScienceData
Search
API
Science
Metadata
Provenance Curation
Index
Identifiers/ Text fields
Graph Structure
ProvExplorer
Internal
Metadata
Index
Repository
PBase /D-PROV
Querying
– Provenance traces in PBase linked to DataONE packages
– Provenance traces indexed for searching
Friday, 6 September 13
DataOne Provenance components I: D-PROV
11
D-PROV extends PROV - Connects trace metadata to workflow structure
Missier, Paolo, Saumen Dey, Khalid Belhajjame, Victor Cuevas, and Bertram Ludaescher. “D-PROV: Extending the PROV
Provenance Model with Workflow Structure.” In Procs. TAPP’13. Lombard, IL, 2013.
Friday, 6 September 13
DataOne Provenance components I: D-PROV
onOutPort
T1Inv
d
onInPort
T2Inv
wasAssociatedWith
T1
wasAssociatedWith
T2
op1
ip1
wf
isTaskOf
isTaskOf
hasInputPort
hasOutputPort
wfInv
wasAssociatedWith
wasStartedBy
wasStartedBy
dataLink
12
D-PROV extends PROV
Connects trace metadata to workflow structure
Missier, Paolo, Saumen Dey, Khalid Belhajjame, Victor Cuevas, and Bertram Ludaescher. “D-PROV: Extending the
PROV Provenance Model with Workflow Structure.” In Procs. TAPP’13. Lombard, IL, 2013.
Friday, 6 September 13
DataOne Provenance components II: PBase
13
R ➞ DProv
T ➞ DProv
V ➞ DProv
eSc ➞ DProv
Tr ➞ DProv
K ➞ DProv
Neo4J&loader& Graph&
storage&
Query&layer&
indexing&
Analy8cal&services&
Friday, 6 September 13
DataOne Provenance components II: PBase
13
R ➞ DProv
T ➞ DProv
V ➞ DProv
eSc ➞ DProv
Tr ➞ DProv
K ➞ DProv
In-house components
Neo4J&loader& Graph&
storage&
Query&layer&
indexing&
Analy8cal&services&
Neo4J graph DBMS
[AllegroGraph]
[Graph-*] Can we do better
than the built-in Neo
indexing?
Friday, 6 September 13
DataOne Provenance components II: PBase
13
R ➞ DProv
T ➞ DProv
V ➞ DProv
eSc ➞ DProv
Tr ➞ DProv
K ➞ DProv
In-house components
Neo4J&loader& Graph&
storage&
Query&layer&
indexing&
Analy8cal&services&
Neo4J graph DBMS
[AllegroGraph]
[Graph-*]
Cypher (Neo, declarative)
[Gremlin (procedural)]
can we do better? scaling
graph queries
Can we do better
than the built-in Neo
indexing?
to be developed
Friday, 6 September 13
Baseline provenance queries in PBase
14
Ancestors and descendents (lineage): [2,3]
– Which datasets were involved in the production of data at node “e33”?
– Reachability: was task “e11_miny” involved in producing data at node “e38”?
Execution analysis: [3]
– Which tasks did not execute to completion for execution X of a given workflow?
– Find all inputs [outputs] of a given workflow across all its executions
– Given a data item, find all workflows / tasks that have used it as input
– Suppose we discover that service S has a bug, which data products were impacted by it?
– How many times was task T activated across a pool of workflow executions?
Provenance differencing: [4]
– Why do the results from two executions of the same workflow differ?
Attribution: [5]
– Who was responsible for this {data {usage, production}, service invocation}?
[2] Dey, Saumen, Víctor Cuevas-Vicenttín, Sven Köhler, Eric Gribkoff, Michael Wang, and Bertram Ludäscher. "On
implementing provenance-aware regular path queries with relational query engines." In Proceedings of the Joint
EDBT/ICDT 2013 Workshops, pp. 214-223. ACM, 2013.
[3] Dey, Saumen, Sven Köhler, Shawn Bowers, and Bertram Ludäscher. "Datalog as a lingua franca for
provenance querying and reasoning." In Workshop on the theory and practice of provenance (TaPP). 2012.
[4] Missier, Paolo, Simon Woodman, Hugo Hiden, and Paul Watson. “Provenance and Data Differencing for
Workflow Reproducibility Analysis.” Concurrency and Computation: Practice and Experience, 2013
[5] Missier, Paolo, Bertram Ludäscher, Saumen Dey, Michael Wang, Tim McPhillips, Shawn Bowers, Michael Agun,
and Ilkay Altintas. "Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance
Repository." International Journal of Digital Curation 7, no. 1 (2012): 139-150.
Friday, 6 September 13
Application - The social life of research data
• We know all about searching in the publications space
– who else is working on problems similar to mine?
– which results are available?
• In the data and process space:
1.Search and discovery
• Who else has used the {datasets, services, workflows,...} I am using?
– how do others rate them?
• Who used my {datasets, services, workflows,...}? How were they used?
2.Reuse, reproduction, validation
• Can I reproduce these results?
– using the same exact method
– using a variation of the method
• How do I apply this method to my data?
• ...
15
Social provenance for community building
Friday, 6 September 13
From Pull (client queries) to Push (notifications)
• Uncovering latent connections amongst services / data / people:
– Ranking, clustering, association rules
– Requires new similarity metrics
• A recommender system for scientists
– Analytics layer activated when new traces are added
• Challenges:
– How large a corpus of provenance graphs is needed?
– How global should the community be?
• Little new to discover in a small community
– Requires graphs with rich attribution / association relations
16
Graph&
storage&
Query&layer&
indexing&
Analy5cal&services&
Friday, 6 September 13
Credits
17
Current members of the DataONE Provenance Working Group:
In the USA:
Bertram Ludaescher, UC Davis (co-lead)
Victor Cuevas Vicenttin, UC Davis (DataONE postdoc researcher)
Saumen Dey, UC Davis (researcher)
Parisa Kianmajd, UC Davis (intern)
Juliana Freire, NYU-Poly
David Koop, NYU-Poly
Fernando Chirigati, NYU-Poly
Shawn Bowers, Gonzaga University
Ilkay Altintas, SDSC/UCSD
Karthik Ram, UC Berkeley
Yolanda Gil,USC - ISI
Yaxing Wei, ORNL
Dave Vieglais, DataONE Technical Lead
In the UK:
Paolo Missier, Newcastle University
James Cheney, University of Edinburgh
Khalid Belhajjame, University of Manchester
Friday, 6 September 13

Más contenido relacionado

La actualidad más candente

Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesStefan Dietze
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionEUCLID project
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Stefan Dietze
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)Stefan Dietze
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)robin fay
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataMathieu d'Aquin
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsTobias Kuhn
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Opportunistic Persistent Data Storage
Opportunistic Persistent Data StorageOpportunistic Persistent Data Storage
Opportunistic Persistent Data StorageLuke Weerasooriya
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 

La actualidad más candente (20)

Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)LAK Dataset and Challenge (April 2013)
LAK Dataset and Challenge (April 2013)
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Opportunistic Persistent Data Storage
Opportunistic Persistent Data StorageOpportunistic Persistent Data Storage
Opportunistic Persistent Data Storage
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 

Destacado

provenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialprovenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialPaolo Missier
 
Paper talk: Idcc 11
Paper talk: Idcc 11  Paper talk: Idcc 11
Paper talk: Idcc 11 Paolo Missier
 
Repro pdiff-talk (invited, Humboldt University, Berlin)
Repro pdiff-talk (invited, Humboldt University, Berlin)Repro pdiff-talk (invited, Humboldt University, Berlin)
Repro pdiff-talk (invited, Humboldt University, Berlin)Paolo Missier
 
Paper presentation @ CGW ‘06 workshop, 2006
Paper presentation @ CGW ‘06 workshop, 2006Paper presentation @ CGW ‘06 workshop, 2006
Paper presentation @ CGW ‘06 workshop, 2006Paolo Missier
 
Paper presentation @ SEBD '09
Paper presentation @ SEBD '09Paper presentation @ SEBD '09
Paper presentation @ SEBD '09Paolo Missier
 
Prezentacia website malka bezplatna biblioteka
Prezentacia website malka bezplatna bibliotekaPrezentacia website malka bezplatna biblioteka
Prezentacia website malka bezplatna bibliotekaBilyana Karamfilova
 
Internal seminar @Newcastle University, Feb 2011
Internal seminar @Newcastle University, Feb 2011Internal seminar @Newcastle University, Feb 2011
Internal seminar @Newcastle University, Feb 2011Paolo Missier
 
Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07Paolo Missier
 
Paper presentation: Taverna, reloaded
Paper presentation: Taverna, reloadedPaper presentation: Taverna, reloaded
Paper presentation: Taverna, reloadedPaolo Missier
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud Paolo Missier
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Paolo Missier
 
Nesc invited presentation: Semantic Provenance and Linked Open Data
Nesc invited presentation: Semantic Provenance and Linked Open DataNesc invited presentation: Semantic Provenance and Linked Open Data
Nesc invited presentation: Semantic Provenance and Linked Open DataPaolo Missier
 
Paper presentation @IPAW'08
Paper presentation @IPAW'08Paper presentation @IPAW'08
Paper presentation @IPAW'08Paolo Missier
 
създаване на сайт малка безплатна библиотека
създаване на  сайт малка безплатна библиотекасъздаване на  сайт малка безплатна библиотека
създаване на сайт малка безплатна библиотекаBilyana Karamfilova
 
Invited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopInvited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopPaolo Missier
 
ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016Paolo Missier
 

Destacado (19)

provenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorialprovenance of lists - TAPP'11 Mini-tutorial
provenance of lists - TAPP'11 Mini-tutorial
 
Paper talk: Idcc 11
Paper talk: Idcc 11  Paper talk: Idcc 11
Paper talk: Idcc 11
 
Repro pdiff-talk (invited, Humboldt University, Berlin)
Repro pdiff-talk (invited, Humboldt University, Berlin)Repro pdiff-talk (invited, Humboldt University, Berlin)
Repro pdiff-talk (invited, Humboldt University, Berlin)
 
Tapp11 presentation
Tapp11 presentationTapp11 presentation
Tapp11 presentation
 
Paper presentation @ CGW ‘06 workshop, 2006
Paper presentation @ CGW ‘06 workshop, 2006Paper presentation @ CGW ‘06 workshop, 2006
Paper presentation @ CGW ‘06 workshop, 2006
 
Paper presentation @ SEBD '09
Paper presentation @ SEBD '09Paper presentation @ SEBD '09
Paper presentation @ SEBD '09
 
Prezentacia website malka bezplatna biblioteka
Prezentacia website malka bezplatna bibliotekaPrezentacia website malka bezplatna biblioteka
Prezentacia website malka bezplatna biblioteka
 
Internal seminar @Newcastle University, Feb 2011
Internal seminar @Newcastle University, Feb 2011Internal seminar @Newcastle University, Feb 2011
Internal seminar @Newcastle University, Feb 2011
 
Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07Invited talk @Roma La Sapienza, April '07
Invited talk @Roma La Sapienza, April '07
 
Paper presentation: Taverna, reloaded
Paper presentation: Taverna, reloadedPaper presentation: Taverna, reloaded
Paper presentation: Taverna, reloaded
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
 
Sydney talk-6-2015
Sydney talk-6-2015Sydney talk-6-2015
Sydney talk-6-2015
 
Охота на Работу!EXCLUSIVE
Охота на Работу!EXCLUSIVEОхота на Работу!EXCLUSIVE
Охота на Работу!EXCLUSIVE
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009
 
Nesc invited presentation: Semantic Provenance and Linked Open Data
Nesc invited presentation: Semantic Provenance and Linked Open DataNesc invited presentation: Semantic Provenance and Linked Open Data
Nesc invited presentation: Semantic Provenance and Linked Open Data
 
Paper presentation @IPAW'08
Paper presentation @IPAW'08Paper presentation @IPAW'08
Paper presentation @IPAW'08
 
създаване на сайт малка безплатна библиотека
създаване на  сайт малка безплатна библиотекасъздаване на  сайт малка безплатна библиотека
създаване на сайт малка безплатна библиотека
 
Invited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopInvited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshop
 
ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016ReComp project kickoff presentation 11-03-2016
ReComp project kickoff presentation 11-03-2016
 

Similar a Camp 4-data workshop presentation

Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Gridbutest
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Qazi Maaz Arshad
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Virtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open scienceVirtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open scienceBlue BRIDGE
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Talis Consulting
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSotiris Beis
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 
Jisc Research Data Discovery Service Project
Jisc Research Data Discovery Service ProjectJisc Research Data Discovery Service Project
Jisc Research Data Discovery Service ProjectJisc RDM
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...EDINA, University of Edinburgh
 
Research data spring: streamlining deposit
Research data spring: streamlining depositResearch data spring: streamlining deposit
Research data spring: streamlining depositJisc RDM
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data InitiativesSarah Jones
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionAlessandro Bozzon
 

Similar a Camp 4-data workshop presentation (20)

Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Grid
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Virtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open scienceVirtual research environments for implementing long tail open science
Virtual research environments for implementing long tail open science
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Dolap13 v9 7.docx
Dolap13 v9 7.docxDolap13 v9 7.docx
Dolap13 v9 7.docx
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Jisc Research Data Discovery Service Project
Jisc Research Data Discovery Service ProjectJisc Research Data Discovery Service Project
Jisc Research Data Discovery Service Project
 
Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...Services, policy, guidance and training: Improving research data management a...
Services, policy, guidance and training: Improving research data management a...
 
Research data spring: streamlining deposit
Research data spring: streamlining depositResearch data spring: streamlining deposit
Research data spring: streamlining deposit
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data Initiatives
 
ICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo SessionICWE 2013 - Slides From The Poster And Demo Session
ICWE 2013 - Slides From The Poster And Demo Session
 

Más de Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

Más de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Camp 4-data workshop presentation

  • 1. Provenance Central: More Mileage from Provenance Metadata Bertram Ludäscher UC Davis, USA ludaesch@ucdavis.edu Paolo Missier Newcastle University, UK paolo.missier@ncl.ac.uk Members of the DataONE Provenance Working Group CAMP-4-DATA workshop @IPres 2013 Sept, 6, 2013 Lisbon, Portugal Friday, 6 September 13
  • 2. Outline • A foundation for Provenance management: the PROV data model – From the W3C. Recommendation as of Spring, 2013 – generic, extensible model • The role of provenance in the DataONE project – Provenance enables search and discovery, reuse, reproducibility – PBase: Provenance warehousing – Integration with the DataONE architecture – Provenance mining: the social life of research data 2 Friday, 6 September 13
  • 3. PROV: scope and structure 3 source: http://www.w3.org/TR/prov-overview/ Recommendation track Prov-dictionaryplus: Friday, 6 September 13
  • 4. PROV: scope and structure 3 source: http://www.w3.org/TR/prov-overview/ Recommendation track Prov-dictionaryplus: Friday, 6 September 13
  • 5. PROV Core Elements (graph depiction) 4 An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. Entity Activity Agent An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, ..., using, or generating entities. An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. drafting commenting paper1 paper2 used draft v1 wasGeneratedBy used draft comments wasGeneratedBy Alice Bob wasAssociatedWith actedOnBehalfOf Remote past Recent past distribution=internal status=draft version=0.1 ex:role=main_editor type=person ex:role=sr_editor prov:role=editor time=... time=... Friday, 6 September 13
  • 6. Summary of the PROV Core model 5 – PROV-DC mapping available – Recent Tutorial @EDBT’13 (June, 2013) [1] • Model, Constraints, Applications [1] Missier, Paolo, Khalid Belhajjame, and James Cheney. “The W3C PROV Family of Specifications for Modelling Provenance Metadata.” In Procs. EDBT’13 (Tutorial). Genova, Italy: ACM, 2013. Friday, 6 September 13
  • 7. PROV-DM relations at a glance 6 Friday, 6 September 13
  • 8. Context: ProvWG@DataONE • DataONE: Data Observation Network for Earth – 5yr NSF DataNet data preservation project (current phase) – Provides a large scale, federated data infrastructure to the Earth Sciences community • Provenance Working Group – Active until July, 2014 (current phase, looking at extending) – One/two interns per year since 2010 – One dedicated researcher (postdoc) since 2012 – 12 core members, additional guest members on a rotation • specific focus on the provenance of workflow-based e-science data 7 Friday, 6 September 13
  • 9. DataONE collaboration scenario - 2012 8 Alice’s Workflow: generates benchmark climate data for model comparison Input is retrieved from DataONE to generate an output file Friday, 6 September 13
  • 10. DataONE collaboration scenario - 2012 8 ."."." ."."." ."."." The workflow, provenance, and other metadata are uploaded to DataONE A data package is created and indexed Friday, 6 September 13
  • 11. Searching 9 Bob: Search based on keywords in the metadata ➡ including provenance terms Bob discovers Alice’s workflow. He may be able to execute it again Friday, 6 September 13
  • 12. PBase and DataONE 10 System Metadata Extract-Align-AugmentMetadata ScienceData Search API Science Metadata Provenance Curation Index Identifiers/ Text fields Graph Structure ProvExplorer Internal Metadata Index Repository PBase /D-PROV Querying – Provenance traces in PBase linked to DataONE packages – Provenance traces indexed for searching Friday, 6 September 13
  • 13. DataOne Provenance components I: D-PROV 11 D-PROV extends PROV - Connects trace metadata to workflow structure Missier, Paolo, Saumen Dey, Khalid Belhajjame, Victor Cuevas, and Bertram Ludaescher. “D-PROV: Extending the PROV Provenance Model with Workflow Structure.” In Procs. TAPP’13. Lombard, IL, 2013. Friday, 6 September 13
  • 14. DataOne Provenance components I: D-PROV onOutPort T1Inv d onInPort T2Inv wasAssociatedWith T1 wasAssociatedWith T2 op1 ip1 wf isTaskOf isTaskOf hasInputPort hasOutputPort wfInv wasAssociatedWith wasStartedBy wasStartedBy dataLink 12 D-PROV extends PROV Connects trace metadata to workflow structure Missier, Paolo, Saumen Dey, Khalid Belhajjame, Victor Cuevas, and Bertram Ludaescher. “D-PROV: Extending the PROV Provenance Model with Workflow Structure.” In Procs. TAPP’13. Lombard, IL, 2013. Friday, 6 September 13
  • 15. DataOne Provenance components II: PBase 13 R ➞ DProv T ➞ DProv V ➞ DProv eSc ➞ DProv Tr ➞ DProv K ➞ DProv Neo4J&loader& Graph& storage& Query&layer& indexing& Analy8cal&services& Friday, 6 September 13
  • 16. DataOne Provenance components II: PBase 13 R ➞ DProv T ➞ DProv V ➞ DProv eSc ➞ DProv Tr ➞ DProv K ➞ DProv In-house components Neo4J&loader& Graph& storage& Query&layer& indexing& Analy8cal&services& Neo4J graph DBMS [AllegroGraph] [Graph-*] Can we do better than the built-in Neo indexing? Friday, 6 September 13
  • 17. DataOne Provenance components II: PBase 13 R ➞ DProv T ➞ DProv V ➞ DProv eSc ➞ DProv Tr ➞ DProv K ➞ DProv In-house components Neo4J&loader& Graph& storage& Query&layer& indexing& Analy8cal&services& Neo4J graph DBMS [AllegroGraph] [Graph-*] Cypher (Neo, declarative) [Gremlin (procedural)] can we do better? scaling graph queries Can we do better than the built-in Neo indexing? to be developed Friday, 6 September 13
  • 18. Baseline provenance queries in PBase 14 Ancestors and descendents (lineage): [2,3] – Which datasets were involved in the production of data at node “e33”? – Reachability: was task “e11_miny” involved in producing data at node “e38”? Execution analysis: [3] – Which tasks did not execute to completion for execution X of a given workflow? – Find all inputs [outputs] of a given workflow across all its executions – Given a data item, find all workflows / tasks that have used it as input – Suppose we discover that service S has a bug, which data products were impacted by it? – How many times was task T activated across a pool of workflow executions? Provenance differencing: [4] – Why do the results from two executions of the same workflow differ? Attribution: [5] – Who was responsible for this {data {usage, production}, service invocation}? [2] Dey, Saumen, Víctor Cuevas-Vicenttín, Sven Köhler, Eric Gribkoff, Michael Wang, and Bertram Ludäscher. "On implementing provenance-aware regular path queries with relational query engines." In Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 214-223. ACM, 2013. [3] Dey, Saumen, Sven Köhler, Shawn Bowers, and Bertram Ludäscher. "Datalog as a lingua franca for provenance querying and reasoning." In Workshop on the theory and practice of provenance (TaPP). 2012. [4] Missier, Paolo, Simon Woodman, Hugo Hiden, and Paul Watson. “Provenance and Data Differencing for Workflow Reproducibility Analysis.” Concurrency and Computation: Practice and Experience, 2013 [5] Missier, Paolo, Bertram Ludäscher, Saumen Dey, Michael Wang, Tim McPhillips, Shawn Bowers, Michael Agun, and Ilkay Altintas. "Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository." International Journal of Digital Curation 7, no. 1 (2012): 139-150. Friday, 6 September 13
  • 19. Application - The social life of research data • We know all about searching in the publications space – who else is working on problems similar to mine? – which results are available? • In the data and process space: 1.Search and discovery • Who else has used the {datasets, services, workflows,...} I am using? – how do others rate them? • Who used my {datasets, services, workflows,...}? How were they used? 2.Reuse, reproduction, validation • Can I reproduce these results? – using the same exact method – using a variation of the method • How do I apply this method to my data? • ... 15 Social provenance for community building Friday, 6 September 13
  • 20. From Pull (client queries) to Push (notifications) • Uncovering latent connections amongst services / data / people: – Ranking, clustering, association rules – Requires new similarity metrics • A recommender system for scientists – Analytics layer activated when new traces are added • Challenges: – How large a corpus of provenance graphs is needed? – How global should the community be? • Little new to discover in a small community – Requires graphs with rich attribution / association relations 16 Graph& storage& Query&layer& indexing& Analy5cal&services& Friday, 6 September 13
  • 21. Credits 17 Current members of the DataONE Provenance Working Group: In the USA: Bertram Ludaescher, UC Davis (co-lead) Victor Cuevas Vicenttin, UC Davis (DataONE postdoc researcher) Saumen Dey, UC Davis (researcher) Parisa Kianmajd, UC Davis (intern) Juliana Freire, NYU-Poly David Koop, NYU-Poly Fernando Chirigati, NYU-Poly Shawn Bowers, Gonzaga University Ilkay Altintas, SDSC/UCSD Karthik Ram, UC Berkeley Yolanda Gil,USC - ISI Yaxing Wei, ORNL Dave Vieglais, DataONE Technical Lead In the UK: Paolo Missier, Newcastle University James Cheney, University of Edinburgh Khalid Belhajjame, University of Manchester Friday, 6 September 13