Open Annotation Rollout, Manchester, 2013-06-25
See also PPTX version with Notes: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting
Exploring the Future Potential of AI-Enabled Smartphone Processors
Â
2013 06-24 Wf4Ever: Annotating research objects (PDF)
1. Wf4Ever: Annotating
research objects
Stian Soiland-Reyes, Sean BechHofer
myGrid, University of Manchester
Open Annotation Rollout, Manchester, 2013-06-24
This work is licensed under a
Creative Commons Attribution 3.0 Unported License
2. Motivation: Scientific workflows
Coordinated execution of
services and linked resources
Dataflow between services
Web services (SOAP, REST)
Command line tools
Scripts
User interactions
Components (nested workflows)
Method becomes:
Documented visually
Shareable as single definition
Reusable with new inputs
Repurposable other services
Reproducible?
http://www.myexperiment.org/workflows/3355
http://www.taverna.org.uk/
http://www.biovel.eu/
3. But workflows are complex machines
Outputs
Inputs
Configuration
Components
http://www.myexperiment.org/workflows/3355
âą Will it still work after a year? 10 years?
âą Expanding components, we see a workflow involves a
series of specific tools and services which
âą Depend on datasets, software libraries, other tools
âą Are often poorly described or understood
âą Over time evolve, change, break or are replaced
âą User interactions are not reproducible
âą But can be tracked and replayed
4. Electronic Paper Not Enough
Hypothesis Experiment
Result
Analysis
Conclusions
Investigation
Data Data
Electronic
paper
Publish
http://www.force11.org/beyondthepdf2http://figshare.com/
Open Research movement: Openly share the data of your experiments
http://datadryad.org/
6. What is in a research object?
A Research Object bundles and relates digital
resources of a scientific experiment or
investigation:
Data used and results produced in
experimental study
Methods employed to produce and analyse
that data
Provenance and settings for the experiments
People involved in the investigation
Annotations about these resources, that are
essential to the understanding and
interpretation of the scientific outcomes
captured by a research object
http://www.researchobject.org/
7. Gathering everything
Research Objects (RO) aggregate related resources, their
provenance and annotations
Conveys âeverything you need to knowâ about a
study/experiment/analysis/dataset/workflow
Shareable, evolvable, contributable, citable
ROs have their own provenance and lifecycles
8. Why Research Objects?
i. To share your research materials
(RO as a social object)
ii. To facilitate reproducibility and reuse of methods
iii. To be recognized and cited
(even for constituent resources)
iv. To preserve results and prevent decay
(curation of workflow definition;
using provenance for partial rerun)
13. Annotations in research objects
Types: âThis document contains an hypothesisâ
Relations: âThese datasets are consumed by that toolâ
Provenance: âThese results came from this workflow runâ
Descriptions: âPurpose of this step is to filter out invalid dataâ
Comments: âThis method looks useful, but how do I install it?â
Examples: âThis is how you could use itâ
15. What is provenance?
By Dr Stephen Dann
licensed under Creative Commons Attribution-ShareAlike 2.0 Generic
http://www.flickr.com/photos/stephendann/3375055368/
Attribution
who did it?
Derivation
how did it change?
Activity
what happens to it?
Licensing
can I use it?
Attributes
what is it?
Origin
where is it from?
Annotations
what do others say about it?
Aggregation
what is it part of?
Date and tool
when was it made?
using what?
16. Attribution
Who collected this sample? Who helped?
Which lab performed the sequencing?
Who did the data analysis?
Who curated the results?
Who produced the raw data this analysis is based on?
Who wrote the analysis workflow?
Why do I need this?
i. To be recognized for my work
ii. Who should I give credits to?
iii. Who should I complain to?
iv. Can I trust them?
v. Who should I make friends with?
prov:wasAttributedTo
prov:actedOnBehalfOf
dct:creator
dct:publisher
pav:authoredBy
pav:contributedBy
pav:curatedBy
pav:createdBy
pav:importedBy
pav:providedBy
...
Roles
Person
Organization
SoftwareAgent
Agent types
Alice
The
lab
Data
wasAttributedTo
actedOnBehalfOf
http://practicalprovenance.wordpress.com/
17. Derivation
Which sample was this metagenome sequenced from?
Which meta-genomes was this sequence extracted from?
Which sequence was the basis for the results?
What is the previous revision of the new results?
Why do I need this?
i. To verify consistency (did I use
the correct sequence?)
ii. To find the latest revision
iii. To backtrack where a diversion
appeared after a change
iv. To credit work I depend on
v. Auditing and defence for peer review
wasDerivedFrom
wasQuotedFrom
Sequence
New
results
wasDerivedFrom
Sample
Meta -
genome
Old
results
wasRevisionOf
wasInfluencedBy
18. Activities
What happened? When? Who?
What was used and generated?
Why was this workflow started?
Which workflow ran? Where?
Why do I need this?
i. To see which analysis was performed
ii. To find out who did what
iii. What was the metagenome
used for?
iv. To understand the whole process
âmake me a Methods sectionâ
v. To track down inconsistencies
used
wasGeneratedBy
wasStartedAt
"2012-06-21"
Metagenome
Sample
wasAssociatedWith
Workflow
server
wasInformedBy
wasStartedBy
Workflow
run
wasGeneratedBy
Results
Sequencing
wasAssociatedWith
Alice
hadPlan
Workflow
definition
hadRole
Lab
technician
Results
20. Provenance of what?
Who made the (content of) research object? Who maintains it?
Who wrote this document? Who uploaded it?
Which CSV was this Excel file imported from?
Who wrote this description? When? How did we get it?
What is the state of this RO? (Live or Published?)
What did the research object look like before? (Revisions) â are
there newer versions?
Which research objects are derived from this RO?
21. Research object model at a glance
Research
Object
Resource
Resource
Resource
Annotation
Annotation
Annotation
oa:hasTarget
Resource
Resource
Annotation graph
oa:hasBody
ore:aggregates
«ore:Aggregation»
«ro:ResearchObject»
«oa:Annotation»
«ro:AggregatedAnnotation»
«trig:Graph»
«ore:AggregatedResource»
«ro:Resource»
Manifest
«ore:ResourceMap»
«ro:Manifest»
22. Wf4Ever architecture
Blob store
Graph
store
Resource
Uploaded to
Manifest
Annotation
graph
Research
object
AnnotationORE Proxy
External
reference
Redirects to
If RDF, import as named graph
SPARQL
REST resources
http://www.wf4ever-project.org/wiki/display/docs/RO+API+6
23. Where do RO annotations come
from?
Imported from uploaded resources, e.g. embedded in
workflow-specific format (creator: unknown!)
Created by users filling in Title, Description etc. on website
By automatically invoked software agents, e.g.:
A workflow transformation service extracts the workflow
structure as RDF from the native workflow format
Provenance trace from a workflow run, which describes the
origin of aggregated output files in the research object
24. How we are using the OA model
Multiple oa:Annotation contained within the manifest RDF and
aggregated by the RO.
Provenance (PAV, PROV) on oa:Annotation (who made the link)
and body resource (who stated it)
Typically a single oa:hasTarget, either the RO or an aggregated
resource.
oa:hasBody to a trig:Graph resource (read: RDF file) with the
âactualâ annotation as RDF:
<workflow1> dct:title "The wonderful workflow" .
Multiple oa:hasTarget for relationships, e.g. graph body:
<workflow1> roterms:inputSelected <input2.txt> .
25. What should we also be using?
Motivations
myExperiment: commenting, describing, moderating, questioning,
replying, tagging â made our own vocabulary as OA did not exist
Selectors on compound resources
E.g. description on processors within a workflow in a workflow
definition. How do you find this if you only know the workflow
definition file?
Currently: Annotations on separate URIs for each component,
described in workflow structure graph, which is body of annotation
targeting the workflow definition file
Importing/referring to annotations from other OA systems
(how to discover those?)
26. What is the benefit of OA for us?
Existing vocabulary â no need for our project to try to
specify and agree on our own way of tracking
annotations.
Potential interoperability with third-party annotation
tools
E.g. We want to annotate a figure in a paper and
relate it to a dataset in a research object â donât
want to write another tool for that!
Existing annotations (pre research object) in Taverna
and myExperiment map easily to OA model
27. History lesson (AO/OAC/OA)
When forming the Wf4Ever Research Object model, we found:
Open Annotation Collaboration (OAC)
Annotation Ontology (AO)
What was the difference?
Technically, for Wf4Everâs purposes: They are equivalent
Political choice: AO â supported by Utopia (Manchester)
We encouraged the formation of W3C Open Annotation
Community Group and a joint model
Next: Research Object model v0.2 and RO Bundle will use the
OA model â since we only used 2 properties, mapping is 1:1
http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations
28. Saving a research object:
RO bundle
Single, transferrable research object
Self-contained snapshot
Which files in ZIP, which are URIs? (Up to user/application)
Regular ZIP file, explored and unpacked with standard tools
JSON manifest is programmatically accessible without RDF
understanding
Works offline and in desktop applications â no REST API
access required
Basis for RO-enabled file formats, e.g. Taverna run bundle
Exchanged with myExperiment and RO tools
30. RO Bundle
What is aggregated? File In ZIP or external URI
Who made the RO? When?
Who?
External URIs placed in folders
Embedded annotation
External annotation, e.g. blogpost
JSON-LD context ï RDF
RO provenance
.ro/manifest.json
Format
Note: JSON "quotes" not shown above for brevity
http://json-ld.org/
http://orcid.org/
https://w3id.org/bundle
31. http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/
<h3 property="dc:title">Common Motifs in Scientific Workflows:
<br>An Empirical Analysis</h3>
<body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/"
typeOf="ore:Aggregation ro:ResearchObject">
Research Object as RDFa
http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/
<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"
typeOf="ro:Resource">Analytics for Taverna workflows</a></li>
<li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsxâ
typeOf="ro:Resource">Analytics for Wings workflows</a></li>
<span property="dc:creator prov:wasAttributedTo"
resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>