SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects

Mitglied der Helmholtz-Gemeinschaft

8th Workshop On Workflows in Support of Large-Scale Science

17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group,  Facultad

de Informática Universidad Politécnica de Madrid, Spain

$School of Computer Science University of Manchester, UK
#Reference Center

on Environmental Information Campinas SP, Brazil

~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
Scientific Workflows
•

Mitglied der Helmholtz-Gemeinschaft

•

Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

2
Ecological Niche Modeling
1

4

5

3

Mitglied der Helmholtz-Gemeinschaft

2

Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

3
Ecological Niche Modeling Workflow
Parameter

Occurrence 
Data

Environmental 
Layer

Geographic 
Mask

createModel

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

4
Designing workflow 
(from scratch)

in silico experiment

Reusing workflow

REFINE

Sharing & Analysis
Mitglied der Helmholtz-Gemeinschaft

Planning

Execution
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

5
Ecological Niche Modeling Workflow
Gamma

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

createModel

Environmental 
Layer

Geographic 
Mask

SVM
Maxent
GARP

Mitglied der Helmholtz-Gemeinschaft

testModel

calcAUC

AUC
Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

6
‐3.2

1
11

2.3

1.5
a

4.55

‐3

Ecological Niche Modeling Workflow
84

BLAST

10
6.788
Gamma

0.5

Cost

NumberOfPseu
doAbsences

Occurrence 
Data

Environmental 
Layer

Select Algorithms
0

createModel

Geographic 
Mask

12

SVM
Maxent
GARP

Select Parameters

100

testModel
Mitglied der Helmholtz-Gemeinschaft

‐2.9

‐bt

1.3

calcAUC

1
AUC

1

Sunday Nov. 17, 2013

/

gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7

7

13
Common strategies to handle this challenge

•
•
•

Default parameters & applications
Trial and error
Parameter sweeps

But:
Mitglied der Helmholtz-Gemeinschaft

•
•
•

Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

8
Designing workflow 
(from scratch)

in silico experiment

REFINE

Reusing workflow

Planning

Mitglied der Helmholtz-Gemeinschaft

Sharing & Analysis

Execution
Sunday Nov. 17, 2013

Optimization

8th Workshop On Workflows in Support of Large-Scale Science

9
Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•

Mitglied der Helmholtz-Gemeinschaft

•
•

Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

10
How does it work?
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna 

Sunday Nov. 17, 2013

Framework
Optimization     
Layer     

Plugins
A
P
I

Parameter Optimization
Component Optimization

8th Workshop On Workflows in Support of Large-Scale Science

11
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)

Best Fitness:
0.34

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

12
Status quo
•
•

Workflow optimization starts from scratch each time
Optimization meta-data are lost

Mitglied der Helmholtz-Gemeinschaft

Idea: Capture optimization meta-data next to traditional
provenance data

⇒
⇒

learn from/extend prior optimization runs
improve and accelerate optimization process

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

13
Research Objects
•
•
•
•

Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies

Mitglied der Helmholtz-Gemeinschaft

…
ore:aggregates

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

14
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48

1

2

.
.
.

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

15
Optimization Research Object Ontology
ro:Research
Object

opt:Optimization
Research
Object

ore:aggregates

Mitglied der Helmholtz-Gemeinschaft

opt:Algorithm

Describes the 
optimization 
algorithm and 
its parameters

opt:Fitness

opt:Generation

opt:Optimization
Run

opt:Search
Space

opt:Termination
Condition

opt:Workflow

Describes the 
fitness 
functions

Defines the 
population size 
and generation 
number for an 
Optimization 
Run

Represents one 
result set: sub‐
workflow, 
parameters and 
obtained fitness 
values

Describes the 
dependencies 
and parameter 
constraints

Describes the 
termination 
condition 
defined by the 
user

The workflow 
that was 
optimized

rdfs:subClassOf
Sunday Nov. 17, 2013

rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science

16
Algorithm

Mitglied der Helmholtz-Gemeinschaft

• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

17
Search Space

Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft

• Cost/2 < Gamma
(fictional)

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

18
Optimization Run

Mitglied der Helmholtz-Gemeinschaft

• Origin of result
• Parameter setting
• Fitness value

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

19
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05

1

Best Fitness:
0.42

2

Best Fitness:
0.48

Mitglied der Helmholtz-Gemeinschaft

.
.
.

Display the
optimization
result

x

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

20
Taverna

Optimization Framework & Plugin

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05

1

Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:

0.42
Fitness: 0.27
Generation 1 Iteration 4

2

Fitness: 0.19

Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31

.
Generation 1 Iteration 6
.
Fitness: 0.34

x

Mitglied der Helmholtz-Gemeinschaft

.

Display the
optimization
result

Best Fitness: 0.49
Genetic Algorithm Parameter 
Optimization Plugin 

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

21
Taverna

Optimization Framework & Plugin

Mitglied der Helmholtz-Gemeinschaft

(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)

Display the
optimization
result

Sunday Nov. 17, 2013

Generation 1 Iteration 1

Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
Generation 2 Iteration 1

1

Fitness: 0.22
Fitness: 0.05
Generation 3 Iteration 1
Generation 1 Iteration 3
Generation 2 Iteration 2
Best Fitness:

0.42
Fitness: 0.27
Fitness: 0.05
Fitness: 0.22
Generation 1 Iteration 4
2
Generation 3 Iteration 2
Generation 2 Iteration 3
Fitness: 0.19
Fitness: 0.22
Fitness: 0.34
Best Fitness:
Generation 1 Iteration 5
Generation 3 Iteration 3
Generation 2 Iteration 4
0.48
Fitness: 0.31
Fitness: 0.34
Fitness: 0.19
.
Generation 1 Iteration 6
x
Generation 3 Iteration 4 .
Generation 2 Iteration 5
Fitness: 0.34
.
Fitness: 0.19
Fitness: 0.31
Generation 3 Iteration 5
Generation 2 Iteration 6
Fitness: 0.31
Best Fitness: 0.49
Fitness: 0.33
Generation 3 Iteration  6
Fitness: 0.46
Genetic Algorithm Parameter 
Optimization Plugin 

8th Workshop On Workflows in Support of Large-Scale Science

22
Example
Result

Name

Value

Gamma

2.36

Cost

8

Mitglied der Helmholtz-Gemeinschaft

NumberOfPseudo 363
Absences
Fitness

Sunday Nov. 17, 2013

0.9207

8th Workshop On Workflows in Support of Large-Scale Science

23
Benefits of sharing and exploiting Optimization
Research Objects
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•
•

•

What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

24
Conclusion

•

Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures

•

Outlook: How to learn from similar workflows?

•
•
•
•

Mitglied der Helmholtz-Gemeinschaft

•

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

25
Links

Mitglied der Helmholtz-Gemeinschaft

http://purl.org/net/ro-optimization
http://purl.org/net/svm-opt-research-object

Sunday Nov. 17, 2013

8th Workshop On Workflows in Support of Large-Scale Science

26
Mitglied der Helmholtz-Gemeinschaft

Questions?
Thank you!

Más contenido relacionado

Similar a On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientistsinside-BigData.com
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Opquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetTemesis
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets TestbedMax Kaiser
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2OSri Ambati
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyArnab Bhadury
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Ali Shahed
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudEqunix Business Solutions
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 

Similar a On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects (20)

UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Opquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata DatasetOpquast desktop : quick analysis of an Opendata Dataset
Opquast desktop : quick analysis of an Opendata Dataset
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...An approach for knowledge-driven product, process and resource mappings for a...
An approach for knowledge-driven product, process and resource mappings for a...
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 

Más de dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 

Más de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects

  • 1. On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects Mitglied der Helmholtz-Gemeinschaft 8th Workshop On Workflows in Support of Large-Scale Science 17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*, Renato De Giovanni#, Matthias Obst~, Carole Goble$ *Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany +Ontology Engineering Group,  Facultad de Informática Universidad Politécnica de Madrid, Spain $School of Computer Science University of Manchester, UK #Reference Center on Environmental Information Campinas SP, Brazil ~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
  • 2. Scientific Workflows • Mitglied der Helmholtz-Gemeinschaft • Popular choice to design, manage, and execute in silico experiments Sharing and reuse via workflow repositories Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 2
  • 3. Ecological Niche Modeling 1 4 5 3 Mitglied der Helmholtz-Gemeinschaft 2 Perform species adaptation to environmental changes (BioVeL Project) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 3
  • 4. Ecological Niche Modeling Workflow Parameter Occurrence  Data Environmental  Layer Geographic  Mask createModel Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 4
  • 5. Designing workflow  (from scratch) in silico experiment Reusing workflow REFINE Sharing & Analysis Mitglied der Helmholtz-Gemeinschaft Planning Execution Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 5
  • 6. Ecological Niche Modeling Workflow Gamma Cost NumberOfPseu doAbsences Occurrence  Data createModel Environmental  Layer Geographic  Mask SVM Maxent GARP Mitglied der Helmholtz-Gemeinschaft testModel calcAUC AUC Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 6
  • 7. ‐3.2 1 11 2.3 1.5 a 4.55 ‐3 Ecological Niche Modeling Workflow 84 BLAST 10 6.788 Gamma 0.5 Cost NumberOfPseu doAbsences Occurrence  Data Environmental  Layer Select Algorithms 0 createModel Geographic  Mask 12 SVM Maxent GARP Select Parameters 100 testModel Mitglied der Helmholtz-Gemeinschaft ‐2.9 ‐bt 1.3 calcAUC 1 AUC 1 Sunday Nov. 17, 2013 / gaussian 8th Workshop On Workflows in Support of Large-Scale Science 1.9425 6.7 7 13
  • 8. Common strategies to handle this challenge • • • Default parameters & applications Trial and error Parameter sweeps But: Mitglied der Helmholtz-Gemeinschaft • • • Increasing complexity of scientific workflows Raising number parameters Work time & compute intensive Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 8
  • 9. Designing workflow  (from scratch) in silico experiment REFINE Reusing workflow Planning Mitglied der Helmholtz-Gemeinschaft Sharing & Analysis Execution Sunday Nov. 17, 2013 Optimization 8th Workshop On Workflows in Support of Large-Scale Science 9
  • 10. Intelligent automated optimization techniques Goal: • Automated way to find workflow settings that optimizes the output • Mitglied der Helmholtz-Gemeinschaft • • Define workflow output(s) as fitness value Use fitness value for evaluation (e.g. AUC or correlation coefficient) Use heuristic search algorithm to find best Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 10
  • 11. How does it work? • • • Mitglied der Helmholtz-Gemeinschaft • Development of optimization framework that extends Taverna workflow management system Abstracts optimization process (e.g. parallel execution, security) Developer API allows rapid adaption of new optimization methods Optimization plugins can be added independently WMS Taverna  Sunday Nov. 17, 2013 Framework Optimization      Layer      Plugins A P I Parameter Optimization Component Optimization 8th Workshop On Workflows in Support of Large-Scale Science 11
  • 12. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization method parameters (population size, termination criteria) Best Fitness: 0.34 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 12
  • 13. Status quo • • Workflow optimization starts from scratch each time Optimization meta-data are lost Mitglied der Helmholtz-Gemeinschaft Idea: Capture optimization meta-data next to traditional provenance data ⇒ ⇒ learn from/extend prior optimization runs improve and accelerate optimization process Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 13
  • 14. Research Objects • • • • Aligned with W3C standards Aggregates various resources Describes scientific processes in machine readable format Specified by several ontologies Mitglied der Helmholtz-Gemeinschaft … ore:aggregates Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 14
  • 15. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Best Fitness: 0.34 Best Fitness: 0.42 Best Fitness: 0.48 1 2 . . . x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 15
  • 16. Optimization Research Object Ontology ro:Research Object opt:Optimization Research Object ore:aggregates Mitglied der Helmholtz-Gemeinschaft opt:Algorithm Describes the  optimization  algorithm and  its parameters opt:Fitness opt:Generation opt:Optimization Run opt:Search Space opt:Termination Condition opt:Workflow Describes the  fitness  functions Defines the  population size  and generation  number for an  Optimization  Run Represents one  result set: sub‐ workflow,  parameters and  obtained fitness  values Describes the  dependencies  and parameter  constraints Describes the  termination  condition  defined by the  user The workflow  that was  optimized rdfs:subClassOf Sunday Nov. 17, 2013 rdf:Property 8th Workshop On Workflows in Support of Large-Scale Science 16
  • 17. Algorithm Mitglied der Helmholtz-Gemeinschaft • Genetic Algorihm • Mutation rate: 0.1 • Crossover rate 0.7 Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 17
  • 18. Search Space Gamma: • Double • 0 - 10 Mitglied der Helmholtz-Gemeinschaft • Cost/2 < Gamma (fictional) Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 18
  • 19. Optimization Run Mitglied der Helmholtz-Gemeinschaft • Origin of result • Parameter setting • Fitness value Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 19
  • 20. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Fitness: 0.05 1 Best Fitness: 0.42 2 Best Fitness: 0.48 Mitglied der Helmholtz-Gemeinschaft . . . Display the optimization result x Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 20
  • 21. Taverna Optimization Framework & Plugin (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 1 Fitness: 0.22 Generation 1 Iteration 3 Best Fitness: 0.42 Fitness: 0.27 Generation 1 Iteration 4 2 Fitness: 0.19 Best Fitness: Generation 1 Iteration 5 0.48 Fitness: 0.31 . Generation 1 Iteration 6 . Fitness: 0.34 x Mitglied der Helmholtz-Gemeinschaft . Display the optimization result Best Fitness: 0.49 Genetic Algorithm Parameter  Optimization Plugin  Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 21
  • 22. Taverna Optimization Framework & Plugin Mitglied der Helmholtz-Gemeinschaft (1) Define sub-workflow (2) Specify input parameters (constraints) (3) Select fitness output parameters (e.g. AUC) (4) Define optimization parameters (population size, termination criteria) Display the optimization result Sunday Nov. 17, 2013 Generation 1 Iteration 1 Best Fitness: Fitness: 0.05 0.34 Generation 1 Iteration 2 Fitness: 0.05 Generation 2 Iteration 1 1 Fitness: 0.22 Fitness: 0.05 Generation 3 Iteration 1 Generation 1 Iteration 3 Generation 2 Iteration 2 Best Fitness: 0.42 Fitness: 0.27 Fitness: 0.05 Fitness: 0.22 Generation 1 Iteration 4 2 Generation 3 Iteration 2 Generation 2 Iteration 3 Fitness: 0.19 Fitness: 0.22 Fitness: 0.34 Best Fitness: Generation 1 Iteration 5 Generation 3 Iteration 3 Generation 2 Iteration 4 0.48 Fitness: 0.31 Fitness: 0.34 Fitness: 0.19 . Generation 1 Iteration 6 x Generation 3 Iteration 4 . Generation 2 Iteration 5 Fitness: 0.34 . Fitness: 0.19 Fitness: 0.31 Generation 3 Iteration 5 Generation 2 Iteration 6 Fitness: 0.31 Best Fitness: 0.49 Fitness: 0.33 Generation 3 Iteration  6 Fitness: 0.46 Genetic Algorithm Parameter  Optimization Plugin  8th Workshop On Workflows in Support of Large-Scale Science 22
  • 23. Example Result Name Value Gamma 2.36 Cost 8 Mitglied der Helmholtz-Gemeinschaft NumberOfPseudo 363 Absences Fitness Sunday Nov. 17, 2013 0.9207 8th Workshop On Workflows in Support of Large-Scale Science 23
  • 24. Benefits of sharing and exploiting Optimization Research Objects • • • Mitglied der Helmholtz-Gemeinschaft • • • What is the optimal setting? - Reuse optimized settings What ranges have been explored? - Adopt used parameter ranges What algorithm settings were used? - Reuse algorithm settings Are there similar optimizations? - Reuse existing results Resume the optimization Embed optimization provenance into workflow infrastructures to be reused by other scientists Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 24
  • 25. Conclusion • Scientific workflows are hard to configure Optimization can help but meta-data get lost Extend Research Objects Build new Optimization Research Object Ontology Reuse of optimization meta-data to speed up optimization Shareable with the community in workflow infrastructures • Outlook: How to learn from similar workflows? • • • • Mitglied der Helmholtz-Gemeinschaft • Sunday Nov. 17, 2013 8th Workshop On Workflows in Support of Large-Scale Science 25