SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Scientific Workflow Systems for accessible,
reproducible research
Peter van Heusden and Alan Christoffels
Email: pvh@sanbi.ac.za
South African National Bioinformatics Institute
University of the Western Cape
Bellville, South Africa
eResearch Africa 2013 / Cape Town
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Outline
1 Introducing. . .
2 Bioinformatics workflows
3 How we could be doing things (and why we don’t)
4 Scientific workflows for reproduceable research
5 In conclusion
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Hi
Hi!
I’m Peter
and I’m a bioinformaticist.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Hi
Hi!
I’m Peter
and I’m a bioinformaticist.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Hi
Hi!
I’m Peter
and I’m a bioinformaticist.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
SANBI
from SANBI
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
What do we do?
[Abizar Lakdawalla, 2007]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
But wait. . .
[Wetterstrand, 2013]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Data is cheap and plentiful
[Karsch-Mizrachi et al., 2012]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Plunging cost of sequencing changes bioinformatics
“[Y]ou see people collecting information and then having to
put a lot more energy into the analysis of the information
than they have done in getting the information in the first
place. The software is typically very idiosyncratic since there
are very few generic tools that the bench scientist has for
collecting and analyzing and processing the data.
This is something that we computer scientists could help fix
by building generic tools for the scientists.” Jim Gray [Anthony J. G.
Hey et al., 2009]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Plunging cost of sequencing changes bioinformatics
“[Y]ou see people collecting information and then having to
put a lot more energy into the analysis of the information
than they have done in getting the information in the first
place. The software is typically very idiosyncratic since there
are very few generic tools that the bench scientist has for
collecting and analyzing and processing the data.
This is something that we computer scientists could help fix
by building generic tools for the scientists.” Jim Gray [Anthony J. G.
Hey et al., 2009]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Bioinformatics analysis
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Bioinformatics analysis
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Dr Cloete’s hunt for a TB targetting compound
PhD student at SANBI, Ruben Cloete, mined existing knowledge about
M. tuberculosis to find potential drug targets for curing the disease
Analysis proceeds through a series of queries, transforms and filters
Collection of tasks in analysis comprises a bioinformatics workflow
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
How workflows are implemented
print "The d i c t i o n a r y of sample and genomic f i l e s " , Variables . dict_of_sample_fastqfiles
for each_sample in Variables . dict_of_sample_fastqfiles . keys ( ) :
#count number of genomic f i l e s N f o r each sample . N − number of amplicons per sample
N=len ( Variables . dict_of_sample_fastqfiles [ each_sample ] )
i f N == 0: continue
#create bash f i l e f o r each amplicon / genomic f i l e to be submitted in array job
counter=0
list_of_toolname_fastm_files = [ ]
#get time stamp f o r appending in job name
timestamp = s t r ( datetime . now ( ) ) . replace ( " . " , " " ) . replace ( "−" , " " ) . replace ( " " , " " ) . replace ( " : " , " "
print " Sample name" , each_sample
for r e f _ a l i g n in Variables . dict_of_sample_fastqfiles [ each_sample ] :
print " sample gene f i l e " , r e f _ a l i g n
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
The problem with our workflows
Our workflows are specified at a low level, in terms of filenames
and commands to execute
Workflow scripts run analyses in the same way as when they are
executed by hand by a researcher
Last decade has seen change, but not enough:
Language Perl
Execution On server
Data storage Files
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
The problem with our workflows
Our workflows are specified at a low level, in terms of filenames
and commands to execute
Workflow scripts run analyses in the same way as when they are
executed by hand by a researcher
Last decade has seen change, but not enough:
Language Python
Execution On cluster / grid
Data storage Files / SQL db
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
A better toolset
[Bernstein, 1997]
New toolsets drive software engineering productivity
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Scientific workflow management systems
“Scientific workflows have emerged to tackle the problem of excessive
complexity in scientific experiments and applications. They provide a
high-level declarative way of specifying what a particular in silico
experiment modelled by a workflow is set to achieve, not how it will be
executed.” [Taverna project, 2009]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Scientific workflow management systems (2)
Scientific workflow management systems (SciWMSs) are
data-flow oriented
Expressing workflows in terms of data dependencies allows for
flexible mapping to computational resources
SciWMSs are frameworks, not APIs: they implement a workflow
pattern, user fills in the blanks of what needs doing
“A framework is a set of cooperating classes that make up a
reusable design for a specific class of software” [Gamma et al., 1993]
“If applications are hard to design, and toolkits are harder, then
frameworks are hardest of all. A framework designer gambles that
one architecture will work for all applications in the domain.” [Gamma
et al., 1993]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
A graphical workflow specification (Galaxy)
[Goecks et al., 2010]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
A textual workflow specification (bpipe)
REFERENCE=" reference . fa "
PICARD_HOME="/ usr / l o c a l / picard−tools / "
. . .
index = {
exec " samtools index $input "
}
c a l l _ v a r i a n t s = {
exec " samtools mpileup −uf $REFERENCE $input | bc fto ol s view −bvcg − > $output "
}
Bpipe . run {
align + sort + dedupe + index + c a l l _ v a r i a n t s
}
[Sadedin et al., 2012]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Why don’t we use scientific workflow management systems
already?
“Contemporary workflow platforms fall short of adequately supporting
rapid deployment into the user applications that consume them, and
legacy application codes need to be integrated and managed.” —
Goble and de Roure in [Anthony J. G. Hey et al., 2009]
SciWMSs are often not available on the computing platforms that
scientists use.
SciWMS support for the workflow patterns that scientists use is
sometimes poor
Bioinformaticists use a large number of tools, and a SciWMS for
bioinformatics must have rich tool support
The software adoption process in science is not a straightforward
one
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Barriers to SciWMS availability
Some are commercial products (e.g. Pipeline Pilot)
Can be hard to install (e.g. Kepler [Altintas et al., 2004] and Taverna [Hull et al.,
2006])
Frameworks require long-running daemons that require sysadmin
buy in to run
and sysadmins and scientists live in different worlds
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
SciWMS support for scientific workflow patterns
Current tool of choice for scientific workflow composition is a
combination of manual effort and general purpose scripting
languages
As software frameworks SciWMSs implement particular workflow
patterns
These might not match workflow requirements: e.g. Galaxy lacks
sub-workflow support or iteration across a collection
Lack of adequate support for workflow patterns results in “hacks”
that re-introduce needless complexity into the workflow
specification
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Tool support in SciWMSs
Scientific workflows make use of a large and evolving software
collection (e.g. see [Eagle Genomics, 2013])
SciWMSs must support:
1 Rich set of tools “out of the box”
2 Straightforward procedures to add new tools
Some SciWMSs (such as bpipe) closely resemble scripting
languages and replicate the “command line” model of tool
invocation
On the other extreme, Taverna is service oriented and expects to
tools to be made available as web services
If adding a tool is complex, scientists typically won’t do it
A partial solution is supporting a community collection of tool
definitions (e.g. Galaxy toolshed)
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Software adoption in scientific communities
“‘End-user developers’ commonly create scientific software, but they
are often unaware of or ignore traditional software engineering
standards, leaving trust in their coding expertise potentially misplaced.”
— [Joppa et al., 2013]
Scientific software adoption follows trends within the scientific
community, not necessarily software engineering best practice.
While training is necessary, it is not sufficient to ensure adoption
of new tools: building communication channels between different
communities is key.
Evolving scientific workflow practice is a social problem not just a
technical one.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Bioinformatics analysis (revisited)
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Workflow re-use and reproducible results
“all software is wrong, although some of it is useful, and all results are
approximate” — C. Titus Brown [C. Titus Brown, 2013]
Current scientific work process is focused on end products
(papers, theses) that cannot be easily reproduced.
In 2010 survey of 20 most highly cited journals found that only 3
required source code to be submitted along with papers.
Even then: “Workflows are often difficult to author, using
languages that are at an inappropriate level of abstraction and
expecting too much knowledge of the underlying infrastructure.
The reusability of a workflow is often confined to the project it was
conceived in – or even to its author – and it is inherently only as
strong as its components.” — Goble and de Roure [Anthony J. G. Hey et al.,
2009]
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
SciWMS for reproducable research
Results of scientific computing are not final “truth” but best-guess
approximations (with associated uncertainty)
SciWMSs that provide features that enhance the reproducibility of
research aid in the collaborative critique and elaboration of results
Critical features for reproducible research include:
Provenance recording: recording data on the provenance of
workflow products (but how and what should we record?)
Abstract workflow components: liberate workflows from the lab
and allow scientists to “run the components of the workflow
wherever the data is.” [C. Titus Brown, 2011]
Bring data closer to publications: publication should combine data,
text and workflow e.g. Galaxy Pages, Vistrails’ LaTeX integration
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Workflow re-use
If we can repeat a workflow, we can re-use it
Workflows will evolve: workflows seldom are “one size fits all”
If workflows evolve we need change management and versioning
Textual workflow specifications can be managed by traditional
source code control tools (e.g. git)
In SciWMSs that use graphical specifications Vistrails provides an
example of how a “version tree” can be represented and navigated
But:
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Workflow re-use
If we can repeat a workflow, we can re-use it
Workflows will evolve: workflows seldom are “one size fits all”
If workflows evolve we need change management and versioning
Textual workflow specifications can be managed by traditional
source code control tools (e.g. git)
In SciWMSs that use graphical specifications Vistrails provides an
example of how a “version tree” can be represented and navigated
But: most SciWMSs currently have poor change management
support
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
In conclusion
In bioinformatics data is becoming plentiful and cheap
this is changing the cost structure of research, since data analysis
is still expensive
Data analysis is large done using workflows involving human
effort and scripting languages
these workflows are fragile, not reproduceable and hard to
understand and re-use
SciWMSs provide frameworks for higher level specification of
scientific workflow
these frameworks are not widely used due to technical and
community limitations
despite these limitations SciWMSs offer an essential path towards
reproduceable and re-useable scientific workflows
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
In conclusion
In bioinformatics data is becoming plentiful and cheap
this is changing the cost structure of research, since data analysis
is still expensive
Data analysis is large done using workflows involving human
effort and scripting languages
these workflows are fragile, not reproduceable and hard to
understand and re-use
SciWMSs provide frameworks for higher level specification of
scientific workflow
these frameworks are not widely used due to technical and
community limitations
despite these limitations SciWMSs offer an essential path towards
reproduceable and re-useable scientific workflows
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
In conclusion
In bioinformatics data is becoming plentiful and cheap
this is changing the cost structure of research, since data analysis
is still expensive
Data analysis is large done using workflows involving human
effort and scripting languages
these workflows are fragile, not reproduceable and hard to
understand and re-use
SciWMSs provide frameworks for higher level specification of
scientific workflow
these frameworks are not widely used due to technical and
community limitations
despite these limitations SciWMSs offer an essential path towards
reproduceable and re-useable scientific workflows
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Thanks
Workflows for biological se-
quence analysis are discussed
by the “Pipelines collaboration”
Research on SciWMS supported
by the MRC and Prof Christoffels
Professor Alan Christoffels
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Bibliography I
Abizar Lakdawalla. File:Sequencing workflow.jpg - wikipedia, the free encyclopedia, Apr. 2007. URL
http://en.wikipedia.org/wiki/File:Sequencing_workflow.jpg.
I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. Kepler: an extensible system for design and execution of
scientific workflows. In 16th International Conference on Scientific and Statistical Database Management, 2004.
Proceedings, pages 423– 424. IEEE, June 2004. ISBN 0-7695-2146-0. doi: 10.1109/SSDM.2004.1311241.
Anthony J. G. Hey, Stewart Tansley, and Kristen Michelle Tolle. The fourth paradigm: data-intensive scientific discovery.
Microsoft Research, Oct. 2009. ISBN 9780982544204. URL
http://research.microsoft.com/en-us/collaboration/fourthparadigm/.
L. Bernstein. Software investment strategy. Bell Labs Technical Journal, 2(3):233, 1997. ISSN 10897089.
C. Titus Brown. Data intensive science, and workflows, Dec. 2011. URL
http://ivory.idyll.org/blog/data-intensive-science-and-workflows.html.
C. Titus Brown. Thoughts on the assemblathon 2 paper, 2013. URL
http://ivory.idyll.org/blog/thoughts-on-assemblathon-2.html.
Eagle Genomics. The elements of bioinformatics, 2013. URL http://elements.eaglegenomics.com/.
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: Abstraction and reuse of object-oriented design. In O. M.
Nierstrasz, editor, ECOOP’ 93 — Object-Oriented Programming, number 707 in Lecture Notes in Computer Science, pages
406–431. Springer Berlin Heidelberg, Jan. 1993. ISBN 978-3-540-57120-9, 978-3-540-47910-9.
J. Goecks, A. Nekrutenko, J. Taylor, and T. G. Team. Galaxy: a comprehensive approach for supporting accessible, reproducible,
and transparent computational research in the life sciences. Genome Biol, 11(8), 2010.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
Introducing. . .
Bioinformatics workflows
How we could be doing things (and why we don’t)
Scientific workflows for reproduceable research
In conclusion
References
Bibliography II
D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of
services. Nucleic acids research, 34(suppl 2):W729, 2006. ISSN 0305-1048. URL
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538887/.
L. N. Joppa, G. McInerny, R. Harper, L. Salido, K. Takeda, K. O’Hara, D. Gavaghan, and S. Emmott. Troubling trends in scientific
software use. Science, 340(6134):814–815, May 2013. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1231535. URL
http://www.sciencemag.org/content/340/6134/814. PMID: 23687031.
I. Karsch-Mizrachi, Y. Nakamura, and G. Cochrane. The international nucleotide sequence database collaboration. Nucleic Acids
Research, 40(D1):D33–D37, Jan. 2012. ISSN 0305-1048. doi: 10.1093/nar/gkr1006. URL
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244996/. PMID: 22080546 PMCID: PMC3244996.
S. P. Sadedin, B. Pope, and A. Oshlack. Bpipe: A tool for running and managing bioinformatics pipelines. Bioinformatics, Apr.
2012. ISSN 1367-4803, 1460-2059. doi: 10.1093/bioinformatics/bts167. URL http://bioinformatics.
oxfordjournals.org/content/early/2012/04/11/bioinformatics.bts167.
Taverna project. Why use workflows?, 2009. URL
http://www.taverna.org.uk/introduction/why-use-workflows/.
K. Wetterstrand. DNA sequencing costs: Data from the NHGRI genome sequencing program (GSP), 2013. URL
http://www.genome.gov/sequencingcosts/.
Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research

Más contenido relacionado

La actualidad más candente

Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 

La actualidad más candente (20)

NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
ROHub
ROHubROHub
ROHub
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 

Destacado

الانترنت
الانترنتالانترنت
الانترنتJano Jano
 
Jt presentation
Jt presentation Jt presentation
Jt presentation brionyrolfe
 
إنشاء جدول إكسيل في ورقة عمل أو حذفه
إنشاء جدول إكسيل في ورقة عمل أو حذفهإنشاء جدول إكسيل في ورقة عمل أو حذفه
إنشاء جدول إكسيل في ورقة عمل أو حذفهJano Jano
 
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of Mumbai
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of MumbaiSustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of Mumbai
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of MumbaiBabu George
 
Media Question 1
Media Question 1Media Question 1
Media Question 1brionyrolfe
 
Moral Bankruptcy: Children for Sale
Moral Bankruptcy: Children for Sale Moral Bankruptcy: Children for Sale
Moral Bankruptcy: Children for Sale Babu George
 
الانترنت
الانترنتالانترنت
الانترنتJano Jano
 
Hicheeliin tuluvluguu2013.
Hicheeliin tuluvluguu2013.Hicheeliin tuluvluguu2013.
Hicheeliin tuluvluguu2013.Boloroo9
 
Tourism and Biodiversity
Tourism   and BiodiversityTourism   and Biodiversity
Tourism and BiodiversityBabu George
 
presentation ADM2013 conference
presentation ADM2013 conferencepresentation ADM2013 conference
presentation ADM2013 conferenceJorrit Leijting
 
Analysis of an Opening Sequence
Analysis of an Opening SequenceAnalysis of an Opening Sequence
Analysis of an Opening Sequencebrionyrolfe
 
Ap comparative brazil privacy
Ap comparative brazil privacyAp comparative brazil privacy
Ap comparative brazil privacyMariaElenaGB
 
The Culture of Technology: Differences in the Use of E-Learning Technologies ...
The Culture of Technology: Differences in the Use of E-Learning Technologies ...The Culture of Technology: Differences in the Use of E-Learning Technologies ...
The Culture of Technology: Differences in the Use of E-Learning Technologies ...Babu George
 
SCRIPTURE_Edition002
SCRIPTURE_Edition002SCRIPTURE_Edition002
SCRIPTURE_Edition002Puneet Sharma
 
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...Обоснование актуальности исследования на тему: Олимпиада как средство контрол...
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...anvovchik
 

Destacado (20)

الانترنت
الانترنتالانترنت
الانترنت
 
Jt presentation
Jt presentation Jt presentation
Jt presentation
 
إنشاء جدول إكسيل في ورقة عمل أو حذفه
إنشاء جدول إكسيل في ورقة عمل أو حذفهإنشاء جدول إكسيل في ورقة عمل أو حذفه
إنشاء جدول إكسيل في ورقة عمل أو حذفه
 
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of Mumbai
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of MumbaiSustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of Mumbai
Sustainable Supply Chains: An ‘Updated’ Case Study of Dabbawallahs of Mumbai
 
Media Question 1
Media Question 1Media Question 1
Media Question 1
 
Moral Bankruptcy: Children for Sale
Moral Bankruptcy: Children for Sale Moral Bankruptcy: Children for Sale
Moral Bankruptcy: Children for Sale
 
الانترنت
الانترنتالانترنت
الانترنت
 
Hicheeliin tuluvluguu2013.
Hicheeliin tuluvluguu2013.Hicheeliin tuluvluguu2013.
Hicheeliin tuluvluguu2013.
 
Tourism and Biodiversity
Tourism   and BiodiversityTourism   and Biodiversity
Tourism and Biodiversity
 
presentation ADM2013 conference
presentation ADM2013 conferencepresentation ADM2013 conference
presentation ADM2013 conference
 
Obras viales
Obras vialesObras viales
Obras viales
 
Storeboard
StoreboardStoreboard
Storeboard
 
Jt powerpoint
Jt powerpointJt powerpoint
Jt powerpoint
 
pitch
pitchpitch
pitch
 
Analysis of an Opening Sequence
Analysis of an Opening SequenceAnalysis of an Opening Sequence
Analysis of an Opening Sequence
 
Ap comparative brazil privacy
Ap comparative brazil privacyAp comparative brazil privacy
Ap comparative brazil privacy
 
The Culture of Technology: Differences in the Use of E-Learning Technologies ...
The Culture of Technology: Differences in the Use of E-Learning Technologies ...The Culture of Technology: Differences in the Use of E-Learning Technologies ...
The Culture of Technology: Differences in the Use of E-Learning Technologies ...
 
S4 tarea4 mamome
S4 tarea4 mamomeS4 tarea4 mamome
S4 tarea4 mamome
 
SCRIPTURE_Edition002
SCRIPTURE_Edition002SCRIPTURE_Edition002
SCRIPTURE_Edition002
 
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...Обоснование актуальности исследования на тему: Олимпиада как средство контрол...
Обоснование актуальности исследования на тему: Олимпиада как средство контрол...
 

Similar a Scientific Workflow Systems for accessible, reproducible research

An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the SingularityMark Wilkinson
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
chapter-3.pptx
chapter-3.pptxchapter-3.pptx
chapter-3.pptxAsmaRauf5
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 

Similar a Scientific Workflow Systems for accessible, reproducible research (20)

An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
How To Create an Integrated Lab?
How To Create an Integrated Lab? How To Create an Integrated Lab?
How To Create an Integrated Lab?
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
chapter-3.pptx
chapter-3.pptxchapter-3.pptx
chapter-3.pptx
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 

Último

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 

Último (20)

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 

Scientific Workflow Systems for accessible, reproducible research

  • 1. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Scientific Workflow Systems for accessible, reproducible research Peter van Heusden and Alan Christoffels Email: pvh@sanbi.ac.za South African National Bioinformatics Institute University of the Western Cape Bellville, South Africa eResearch Africa 2013 / Cape Town Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 2. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Outline 1 Introducing. . . 2 Bioinformatics workflows 3 How we could be doing things (and why we don’t) 4 Scientific workflows for reproduceable research 5 In conclusion Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 3. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Hi Hi! I’m Peter and I’m a bioinformaticist. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 4. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Hi Hi! I’m Peter and I’m a bioinformaticist. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 5. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Hi Hi! I’m Peter and I’m a bioinformaticist. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 6. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References SANBI from SANBI Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 7. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References What do we do? [Abizar Lakdawalla, 2007] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 8. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References But wait. . . [Wetterstrand, 2013] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 9. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Data is cheap and plentiful [Karsch-Mizrachi et al., 2012] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 10. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Plunging cost of sequencing changes bioinformatics “[Y]ou see people collecting information and then having to put a lot more energy into the analysis of the information than they have done in getting the information in the first place. The software is typically very idiosyncratic since there are very few generic tools that the bench scientist has for collecting and analyzing and processing the data. This is something that we computer scientists could help fix by building generic tools for the scientists.” Jim Gray [Anthony J. G. Hey et al., 2009] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 11. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Plunging cost of sequencing changes bioinformatics “[Y]ou see people collecting information and then having to put a lot more energy into the analysis of the information than they have done in getting the information in the first place. The software is typically very idiosyncratic since there are very few generic tools that the bench scientist has for collecting and analyzing and processing the data. This is something that we computer scientists could help fix by building generic tools for the scientists.” Jim Gray [Anthony J. G. Hey et al., 2009] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 12. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Bioinformatics analysis Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 13. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Bioinformatics analysis Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 14. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Dr Cloete’s hunt for a TB targetting compound PhD student at SANBI, Ruben Cloete, mined existing knowledge about M. tuberculosis to find potential drug targets for curing the disease Analysis proceeds through a series of queries, transforms and filters Collection of tasks in analysis comprises a bioinformatics workflow Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 15. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References How workflows are implemented print "The d i c t i o n a r y of sample and genomic f i l e s " , Variables . dict_of_sample_fastqfiles for each_sample in Variables . dict_of_sample_fastqfiles . keys ( ) : #count number of genomic f i l e s N f o r each sample . N − number of amplicons per sample N=len ( Variables . dict_of_sample_fastqfiles [ each_sample ] ) i f N == 0: continue #create bash f i l e f o r each amplicon / genomic f i l e to be submitted in array job counter=0 list_of_toolname_fastm_files = [ ] #get time stamp f o r appending in job name timestamp = s t r ( datetime . now ( ) ) . replace ( " . " , " " ) . replace ( "−" , " " ) . replace ( " " , " " ) . replace ( " : " , " " print " Sample name" , each_sample for r e f _ a l i g n in Variables . dict_of_sample_fastqfiles [ each_sample ] : print " sample gene f i l e " , r e f _ a l i g n Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 16. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References The problem with our workflows Our workflows are specified at a low level, in terms of filenames and commands to execute Workflow scripts run analyses in the same way as when they are executed by hand by a researcher Last decade has seen change, but not enough: Language Perl Execution On server Data storage Files Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 17. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References The problem with our workflows Our workflows are specified at a low level, in terms of filenames and commands to execute Workflow scripts run analyses in the same way as when they are executed by hand by a researcher Last decade has seen change, but not enough: Language Python Execution On cluster / grid Data storage Files / SQL db Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 18. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References A better toolset [Bernstein, 1997] New toolsets drive software engineering productivity Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 19. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Scientific workflow management systems “Scientific workflows have emerged to tackle the problem of excessive complexity in scientific experiments and applications. They provide a high-level declarative way of specifying what a particular in silico experiment modelled by a workflow is set to achieve, not how it will be executed.” [Taverna project, 2009] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 20. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Scientific workflow management systems (2) Scientific workflow management systems (SciWMSs) are data-flow oriented Expressing workflows in terms of data dependencies allows for flexible mapping to computational resources SciWMSs are frameworks, not APIs: they implement a workflow pattern, user fills in the blanks of what needs doing “A framework is a set of cooperating classes that make up a reusable design for a specific class of software” [Gamma et al., 1993] “If applications are hard to design, and toolkits are harder, then frameworks are hardest of all. A framework designer gambles that one architecture will work for all applications in the domain.” [Gamma et al., 1993] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 21. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References A graphical workflow specification (Galaxy) [Goecks et al., 2010] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 22. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References A textual workflow specification (bpipe) REFERENCE=" reference . fa " PICARD_HOME="/ usr / l o c a l / picard−tools / " . . . index = { exec " samtools index $input " } c a l l _ v a r i a n t s = { exec " samtools mpileup −uf $REFERENCE $input | bc fto ol s view −bvcg − > $output " } Bpipe . run { align + sort + dedupe + index + c a l l _ v a r i a n t s } [Sadedin et al., 2012] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 23. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Why don’t we use scientific workflow management systems already? “Contemporary workflow platforms fall short of adequately supporting rapid deployment into the user applications that consume them, and legacy application codes need to be integrated and managed.” — Goble and de Roure in [Anthony J. G. Hey et al., 2009] SciWMSs are often not available on the computing platforms that scientists use. SciWMS support for the workflow patterns that scientists use is sometimes poor Bioinformaticists use a large number of tools, and a SciWMS for bioinformatics must have rich tool support The software adoption process in science is not a straightforward one Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 24. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Barriers to SciWMS availability Some are commercial products (e.g. Pipeline Pilot) Can be hard to install (e.g. Kepler [Altintas et al., 2004] and Taverna [Hull et al., 2006]) Frameworks require long-running daemons that require sysadmin buy in to run and sysadmins and scientists live in different worlds Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 25. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References SciWMS support for scientific workflow patterns Current tool of choice for scientific workflow composition is a combination of manual effort and general purpose scripting languages As software frameworks SciWMSs implement particular workflow patterns These might not match workflow requirements: e.g. Galaxy lacks sub-workflow support or iteration across a collection Lack of adequate support for workflow patterns results in “hacks” that re-introduce needless complexity into the workflow specification Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 26. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Tool support in SciWMSs Scientific workflows make use of a large and evolving software collection (e.g. see [Eagle Genomics, 2013]) SciWMSs must support: 1 Rich set of tools “out of the box” 2 Straightforward procedures to add new tools Some SciWMSs (such as bpipe) closely resemble scripting languages and replicate the “command line” model of tool invocation On the other extreme, Taverna is service oriented and expects to tools to be made available as web services If adding a tool is complex, scientists typically won’t do it A partial solution is supporting a community collection of tool definitions (e.g. Galaxy toolshed) Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 27. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Software adoption in scientific communities “‘End-user developers’ commonly create scientific software, but they are often unaware of or ignore traditional software engineering standards, leaving trust in their coding expertise potentially misplaced.” — [Joppa et al., 2013] Scientific software adoption follows trends within the scientific community, not necessarily software engineering best practice. While training is necessary, it is not sufficient to ensure adoption of new tools: building communication channels between different communities is key. Evolving scientific workflow practice is a social problem not just a technical one. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 28. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Bioinformatics analysis (revisited) Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 29. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Workflow re-use and reproducible results “all software is wrong, although some of it is useful, and all results are approximate” — C. Titus Brown [C. Titus Brown, 2013] Current scientific work process is focused on end products (papers, theses) that cannot be easily reproduced. In 2010 survey of 20 most highly cited journals found that only 3 required source code to be submitted along with papers. Even then: “Workflows are often difficult to author, using languages that are at an inappropriate level of abstraction and expecting too much knowledge of the underlying infrastructure. The reusability of a workflow is often confined to the project it was conceived in – or even to its author – and it is inherently only as strong as its components.” — Goble and de Roure [Anthony J. G. Hey et al., 2009] Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 30. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References SciWMS for reproducable research Results of scientific computing are not final “truth” but best-guess approximations (with associated uncertainty) SciWMSs that provide features that enhance the reproducibility of research aid in the collaborative critique and elaboration of results Critical features for reproducible research include: Provenance recording: recording data on the provenance of workflow products (but how and what should we record?) Abstract workflow components: liberate workflows from the lab and allow scientists to “run the components of the workflow wherever the data is.” [C. Titus Brown, 2011] Bring data closer to publications: publication should combine data, text and workflow e.g. Galaxy Pages, Vistrails’ LaTeX integration Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 31. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Workflow re-use If we can repeat a workflow, we can re-use it Workflows will evolve: workflows seldom are “one size fits all” If workflows evolve we need change management and versioning Textual workflow specifications can be managed by traditional source code control tools (e.g. git) In SciWMSs that use graphical specifications Vistrails provides an example of how a “version tree” can be represented and navigated But: Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 32. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Workflow re-use If we can repeat a workflow, we can re-use it Workflows will evolve: workflows seldom are “one size fits all” If workflows evolve we need change management and versioning Textual workflow specifications can be managed by traditional source code control tools (e.g. git) In SciWMSs that use graphical specifications Vistrails provides an example of how a “version tree” can be represented and navigated But: most SciWMSs currently have poor change management support Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 33. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References In conclusion In bioinformatics data is becoming plentiful and cheap this is changing the cost structure of research, since data analysis is still expensive Data analysis is large done using workflows involving human effort and scripting languages these workflows are fragile, not reproduceable and hard to understand and re-use SciWMSs provide frameworks for higher level specification of scientific workflow these frameworks are not widely used due to technical and community limitations despite these limitations SciWMSs offer an essential path towards reproduceable and re-useable scientific workflows Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 34. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References In conclusion In bioinformatics data is becoming plentiful and cheap this is changing the cost structure of research, since data analysis is still expensive Data analysis is large done using workflows involving human effort and scripting languages these workflows are fragile, not reproduceable and hard to understand and re-use SciWMSs provide frameworks for higher level specification of scientific workflow these frameworks are not widely used due to technical and community limitations despite these limitations SciWMSs offer an essential path towards reproduceable and re-useable scientific workflows Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 35. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References In conclusion In bioinformatics data is becoming plentiful and cheap this is changing the cost structure of research, since data analysis is still expensive Data analysis is large done using workflows involving human effort and scripting languages these workflows are fragile, not reproduceable and hard to understand and re-use SciWMSs provide frameworks for higher level specification of scientific workflow these frameworks are not widely used due to technical and community limitations despite these limitations SciWMSs offer an essential path towards reproduceable and re-useable scientific workflows Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 36. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Thanks Workflows for biological se- quence analysis are discussed by the “Pipelines collaboration” Research on SciWMS supported by the MRC and Prof Christoffels Professor Alan Christoffels Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 37. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Bibliography I Abizar Lakdawalla. File:Sequencing workflow.jpg - wikipedia, the free encyclopedia, Apr. 2007. URL http://en.wikipedia.org/wiki/File:Sequencing_workflow.jpg. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. Kepler: an extensible system for design and execution of scientific workflows. In 16th International Conference on Scientific and Statistical Database Management, 2004. Proceedings, pages 423– 424. IEEE, June 2004. ISBN 0-7695-2146-0. doi: 10.1109/SSDM.2004.1311241. Anthony J. G. Hey, Stewart Tansley, and Kristen Michelle Tolle. The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Oct. 2009. ISBN 9780982544204. URL http://research.microsoft.com/en-us/collaboration/fourthparadigm/. L. Bernstein. Software investment strategy. Bell Labs Technical Journal, 2(3):233, 1997. ISSN 10897089. C. Titus Brown. Data intensive science, and workflows, Dec. 2011. URL http://ivory.idyll.org/blog/data-intensive-science-and-workflows.html. C. Titus Brown. Thoughts on the assemblathon 2 paper, 2013. URL http://ivory.idyll.org/blog/thoughts-on-assemblathon-2.html. Eagle Genomics. The elements of bioinformatics, 2013. URL http://elements.eaglegenomics.com/. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: Abstraction and reuse of object-oriented design. In O. M. Nierstrasz, editor, ECOOP’ 93 — Object-Oriented Programming, number 707 in Lecture Notes in Computer Science, pages 406–431. Springer Berlin Heidelberg, Jan. 1993. ISBN 978-3-540-57120-9, 978-3-540-47910-9. J. Goecks, A. Nekrutenko, J. Taylor, and T. G. Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8), 2010. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research
  • 38. Introducing. . . Bioinformatics workflows How we could be doing things (and why we don’t) Scientific workflows for reproduceable research In conclusion References Bibliography II D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic acids research, 34(suppl 2):W729, 2006. ISSN 0305-1048. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538887/. L. N. Joppa, G. McInerny, R. Harper, L. Salido, K. Takeda, K. O’Hara, D. Gavaghan, and S. Emmott. Troubling trends in scientific software use. Science, 340(6134):814–815, May 2013. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1231535. URL http://www.sciencemag.org/content/340/6134/814. PMID: 23687031. I. Karsch-Mizrachi, Y. Nakamura, and G. Cochrane. The international nucleotide sequence database collaboration. Nucleic Acids Research, 40(D1):D33–D37, Jan. 2012. ISSN 0305-1048. doi: 10.1093/nar/gkr1006. URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244996/. PMID: 22080546 PMCID: PMC3244996. S. P. Sadedin, B. Pope, and A. Oshlack. Bpipe: A tool for running and managing bioinformatics pipelines. Bioinformatics, Apr. 2012. ISSN 1367-4803, 1460-2059. doi: 10.1093/bioinformatics/bts167. URL http://bioinformatics. oxfordjournals.org/content/early/2012/04/11/bioinformatics.bts167. Taverna project. Why use workflows?, 2009. URL http://www.taverna.org.uk/introduction/why-use-workflows/. K. Wetterstrand. DNA sequencing costs: Data from the NHGRI genome sequencing program (GSP), 2013. URL http://www.genome.gov/sequencingcosts/. Peter van Heusden and Alan Christoffels Scientific Workflow Systems for accessible, reproducible research