SlideShare a Scribd company logo
1 of 24
Download to read offline
Reproducible
Research and
the Cloud
Dr Kenji Takeda (Kenji.Takeda@Microsoft.com)
Microsoft Research
@azure4research
Microsoft Research
Scientific Discovery
Credit: ROYAL INSTITUTION OF GREAT BRITAIN / SCIENCE PHOTO LIBRARY
𝜌
𝐷𝑣
𝐷𝑡
= −𝛻𝑝 + 𝛻 ∙ 𝜯 + 𝒇
The Research Lifecycle
Data
Acquisition &
modelling
Collaboration
and
visualisation
Analysis &
data mining
Dissemination
& sharing
Archiving and
preserving
fourthparadigm.org
Believe it or not: how much can we rely on
published data on potential drug targets?
“at least 50% of published studies, even those in top-tier academic journals,
can’t be repeated with the same conclusions by an industrial lab”
Osherovich, L. Hedging against academic risk. SciBX 14 Apr 2011 (doi:10.1038/scibx.2011.416).
CLOUD COMPUTING
Global
presence
Datacenter
Edge point
The Microsoft Cloud
Cloud Computing
Choosefrommultipleruntimesandlanguagesforyour
applications:Python,Java,PHP,.NET,Node.js
RunLinuxonWindowsAzureVirtualMachines(VHD)
Supportmultipleframeworksandpopularopensource
applications withWindowsAzureWebSites
HDInsightHadoopforBigDataanalysis
Windows Azure
http://github.com/windowsazure
REPRODUCIBLE RESEARCH
http://www.phdcomics.com/comics.php?f=1689
• Computational experiments should be
recomputable for all time
• Recomputation of recomputable experiments
should be very easy
• It should be easier to make experiments
recomputable than not to
• Tools and repositories can help recomputation
become standard
• The only way to ensure recomputability is to
provide virtual machines
• Runtime performance is a secondary issue
Ian Gent , Alexander Konovalov and Lars Kotthoff
Steven Crouch, Devasena Inupakutika
Recomputation.org
Zanadu.IO
Patrick Henaff and Claude Martini
Zanadu.IO
khmer-protocols:
• Effort to provide standard
“cheap” assembly
protocols for cloud
machines.
• Entirely copy/paste; ~2-6
days from raw reads to
assembly, annotations,
and differential
expression analysis. Est
~$150 per data set
• Open, versioned,
forkable, citable.
Open Science
C. Titus Brown, @ctitusbrown
http://ged.cse.msu.edu/
http://ivory.idyll.org/
Explicitly a “protocol” – explicit
steps, copy-paste, customizable,
versioned; not black box.
No requirement for computational
expertise or significant
computational hardware.
~1-5 days to teach a bench
biologist to use.
$100-150 of rental compute
(“cloud computing”)…
…for $1000 data set.
Now adding in quality control and
internal validation steps.
Some thoughts…
Reproducible
computing
environment
(Azure)
Publicly
available
data
(MMETSP)
Open and
versioned
protocol
Provenance
tracking and
registration
(Synapse?)
Distribution Modeller
<compute + data>
Middle ground
between:
Exploratory science
Procedural science
Black box that can be
cracked open and
modified
Interactive with auto-provenance
• Reproducing my
own results
• Replicating other
people’s results
• Reproducing other
people’s results
Repeatability, Replicability,
Reproducibility, Reuse
“reviewers have no time and no resources to reproduce
data and to dig deeply into the presented work. “
Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-
biotech-failures/#0_ undefined,0_
Photo:leechantmcarthur,CC-BY
Windows Azure for Research
• Azure Research Awards
• Windows Azure for Research
Training Courses
– Manchester, 3-4 April’14
• Webinars
• Technical resources &
curriculum
• Research community
engagements
www.azure4research.com
THANK YOU
Kenji.Takeda@Microsoft.com
www.azure4research.com
Windows Azure for Research Group
@azure4research
Reproducible Research and the Cloud

More Related Content

What's hot

How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
Matthew Vaughn
 

What's hot (20)

Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
balloon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of servicesballoon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of services
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURECYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 
Honey on the Wire KohaCon18
Honey on the Wire  KohaCon18Honey on the Wire  KohaCon18
Honey on the Wire KohaCon18
 

Similar to Reproducible Research and the Cloud

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 

Similar to Reproducible Research and the Cloud (20)

Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 

More from Microsoft Azure for Research

Big data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and starsBig data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and stars
Microsoft Azure for Research
 

More from Microsoft Azure for Research (9)

Esciencetalk
EsciencetalkEsciencetalk
Esciencetalk
 
ieeecloud2016
ieeecloud2016ieeecloud2016
ieeecloud2016
 
Parallel asynchronous inference of word senses with Microsoft Azure
Parallel asynchronous inference of word senses with Microsoft AzureParallel asynchronous inference of word senses with Microsoft Azure
Parallel asynchronous inference of word senses with Microsoft Azure
 
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challengesCloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Environmental Science, Big Data and the Cloud
Environmental Science, Big Data and the CloudEnvironmental Science, Big Data and the Cloud
Environmental Science, Big Data and the Cloud
 
Big data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and starsBig data - from consumers and patients, to the sea and stars
Big data - from consumers and patients, to the sea and stars
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
 
Keynote Presentation at Moscow State University.
Keynote Presentation at Moscow State University.Keynote Presentation at Moscow State University.
Keynote Presentation at Moscow State University.
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Recently uploaded (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Reproducible Research and the Cloud