Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world.
This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility.
Find out more at www.azure4research.com
4. The Research Lifecycle
Data
Acquisition &
modelling
Collaboration
and
visualisation
Analysis &
data mining
Dissemination
& sharing
Archiving and
preserving
fourthparadigm.org
5. Believe it or not: how much can we rely on
published data on potential drug targets?
“at least 50% of published studies, even those in top-tier academic journals,
can’t be repeated with the same conclusions by an industrial lab”
Osherovich, L. Hedging against academic risk. SciBX 14 Apr 2011 (doi:10.1038/scibx.2011.416).
13. • Computational experiments should be
recomputable for all time
• Recomputation of recomputable experiments
should be very easy
• It should be easier to make experiments
recomputable than not to
• Tools and repositories can help recomputation
become standard
• The only way to ensure recomputability is to
provide virtual machines
• Runtime performance is a secondary issue
Ian Gent , Alexander Konovalov and Lars Kotthoff
Steven Crouch, Devasena Inupakutika
17. khmer-protocols:
• Effort to provide standard
“cheap” assembly
protocols for cloud
machines.
• Entirely copy/paste; ~2-6
days from raw reads to
assembly, annotations,
and differential
expression analysis. Est
~$150 per data set
• Open, versioned,
forkable, citable.
Open Science
C. Titus Brown, @ctitusbrown
http://ged.cse.msu.edu/
http://ivory.idyll.org/
18. Explicitly a “protocol” – explicit
steps, copy-paste, customizable,
versioned; not black box.
No requirement for computational
expertise or significant
computational hardware.
~1-5 days to teach a bench
biologist to use.
$100-150 of rental compute
(“cloud computing”)…
…for $1000 data set.
Now adding in quality control and
internal validation steps.
Some thoughts…
Reproducible
computing
environment
(Azure)
Publicly
available
data
(MMETSP)
Open and
versioned
protocol
Provenance
tracking and
registration
(Synapse?)
19. Distribution Modeller
<compute + data>
Middle ground
between:
Exploratory science
Procedural science
Black box that can be
cracked open and
modified
21. • Reproducing my
own results
• Replicating other
people’s results
• Reproducing other
people’s results
Repeatability, Replicability,
Reproducibility, Reuse
“reviewers have no time and no resources to reproduce
data and to dig deeply into the presented work. “
Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-
biotech-failures/#0_ undefined,0_
Photo:leechantmcarthur,CC-BY
22. Windows Azure for Research
• Azure Research Awards
• Windows Azure for Research
Training Courses
– Manchester, 3-4 April’14
• Webinars
• Technical resources &
curriculum
• Research community
engagements
www.azure4research.com