SlideShare a Scribd company logo
1 of 47
Beyond the Science
Gateway
Connecting Cyberinfrastructure
Back To The Laptop
Science Gateways, Ann Arbor MI, October 2017
About Me
• Today: Computational Scientist at Anaconda
• Platform for Python-centric data science
• Yesterday: Postdoc and Lecturer at Harvard
• Built a science gateway for protein structure analysis
• Leveraged OSG and XSEDE
• Last Week: PhD on CERN LHCb experiment
• 2007: “A REST Model for High Throughput Scheduling in Computational
Grids”
• Last Millennium: Electrical Engineer, University of Waterloo
http://about.me/ijstokes @ijstokes
Abstract
Science Gateways today are generally built to provide a web-accessible interface for a
particular scientific community to access a combination of software, hardware, and data
deployed in an expertly managed computing center. But what happens when the
scientist wants to repatriate their data? Or perform some analysis that is not supported
by the gateway? Both for the purposes of encouraging innovative workflows and
serving an audience with a wide range of computational experience it is important to
consider how a gateway can fit into the broader computational ecosystem of a
particular researcher or research group. One simple starting point for this is to ask the
question "how can the gateway connect back to the laptop?". This talk will consider
how this is being done today in science gateways and present some ideas for how this
could be expanded in the future.
Slides at http://bit.ly/gateways17-beyond
Tetralogy
• First Book: The Story of Science Gateways (a play in three acts)
• Second Book: Going Beyond Gateways Today
• Third Book: Opportunities For Future Success
• Epilogue: Anaconda For Reproducible Science
The Story of Science
Gateways
(a play in three acts)
The Cast
Beth, a biochemist
✓Experiment design
✓Microarray equipment
✓Wet lab skills
✗Database expert
✗Computer programming
✗Linux administration
Sakina, a software engineer
✓Python
✓Web development
✓Data wrangling
✗Biology
✗Liquid chromatography
Dipesh, a devops engineer
✓Clusters & containers
✓Security
✓Storage systems
✗Application development
✗Genetics & proteomics
Gateways: Act I
Internet
Files
Database
Cluster
Software
Workflow
Microarray Analysis Gateway
• Beth focused on her science
• Worked in the wet lab
• Collected data
• Submitted it to the
Microarray Analysis Gateway
• Got back results
• Iterated
• Published paper
Success!
Data: Act II
Microarray Analysis Gateway
• Beth is now making heavy
use of the Gateway
• She realizes there are
some opportunities for
cross-experiment data
analysis
• The Gateway doesn’t
support this
• She’s got funding for a
postdoc who wants to
investigate further
Paul, a postdoc
Paul’s laptop
Data Movement
• Paul needed access to Beth’s data
• Input and output data
• Some way to access raw output data, not just web-based
graphics
• This is an exceptional request for the Gateway team
• Paul and Dipesh coordinate to repatriate the data
Software: Act III
• Beth has an applied mathematician colleague Claire who would
like to try out a new GPU-based numerical analysis algorithm
she has developed
• Claire needs access to parts of the Gateway’s workflow
software
• Claire and Sakina coordinate to share the software details and
the workflow
Reproducible Software Stack
• Workflow is coupled to Gateway framework
• Claire works from a Mac, whereas the Gateway runs on a Linux
cluster
• Installation of software and dependencies is laborious
• Claire is an experienced in writing high performance numerical
algorithms, but not RESTful APIs, web servers, and workflow
managers
• Beth and Claire want to move the
collaborative research forward but
are feeling daunted
Going Beyond
Gateways Today
Opportunities For
Future Success
Cloud Data Services
Anaconda For
Reproducible Science
Giving superpowers to the people
who change the world
teams
Conda Sandboxing Technology
• Language independent
• Platform independent
• No special privileges required
• No VMs or containers
• Enables:
• Reproducibility
• Collaboration
• Scaling
“conda – package everything”
conda
base
Python
v2.7
NumPy
v1.10
Pandas
v0.16
r
R-base
v3.4
ggplot
caret
dev
Python
v3.6
Pandas
v0.20
Jupyter
NumPy
v1.12
Internal
Anaconda
Repository
Publish
Fetch
DevOps Engineer
(Linux)
Production
analytics cluster
Business
Analyst (Win)
Data Lab: shared
analytics cluster
Package
Control
Authentication
Anaconda
Enterprise
Server
Computation & Data Access
Web Interface
Active Directory/ LDAP
Optional
Data
Warehouse
Anaconda Enterprise
Data Scientist
(Mac)
© 2016 Continuum Analytics - Confidential & Proprietary
Anaconda
Project 1 Project 2 Project 3
Project 1 Project 2 Project 3
Data Science Development Data Science Reproducibility, Development
and Deployment
Anaconda Enterprise
Container 1
Container 2
Container 3 Container 4
Anaconda Enterprise and Containers
© 2016 Continuum Analytics - Confidential & Proprietary
© 2016 Continuum Analytics - Confidential & Proprietary
Thank You! Questions?
• Ian Stokes-Rees, ijstokes@anaconda.com
• @ijstokes
• http://anaconda.com
• Slides at http://bit.ly/gateways17-beyond

More Related Content

What's hot

Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Markus Harrer
 

What's hot (8)

A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
The Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOpsThe Rise of DataOps: Making Big Data Bite Size with DataOps
The Rise of DataOps: Making Big Data Bite Size with DataOps
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!Your Data Nerd Friends Need You!
Your Data Nerd Friends Need You!
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databri...
 

Similar to Beyond the Science Gateway

Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific Data
Marcus Hanwell
 

Similar to Beyond the Science Gateway (20)

Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
GenePattern Integration with Globus
GenePattern Integration with GlobusGenePattern Integration with Globus
GenePattern Integration with Globus
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 
grid computing
grid computinggrid computing
grid computing
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Open Source Visualization of Scientific Data
Open Source Visualization of Scientific DataOpen Source Visualization of Scientific Data
Open Source Visualization of Scientific Data
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
2019 03-11 bio it-world west genepattern notebook slides
2019 03-11 bio it-world west genepattern notebook slides2019 03-11 bio it-world west genepattern notebook slides
2019 03-11 bio it-world west genepattern notebook slides
 
Scientific
Scientific Scientific
Scientific
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 

More from Boston Consulting Group

Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...
Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
Boston Consulting Group
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
Boston Consulting Group
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interface
Boston Consulting Group
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees
Boston Consulting Group
 

More from Boston Consulting Group (13)

Python Blaze Overview
Python Blaze OverviewPython Blaze Overview
Python Blaze Overview
 
Making Data Analytics Awesome
Making Data Analytics AwesomeMaking Data Analytics Awesome
Making Data Analytics Awesome
 
Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...Adapting federated cyberinfrastructure for shared data collection facilities ...
Adapting federated cyberinfrastructure for shared data collection facilities ...
 
SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012SBGrid Science Portal - eScience 2012
SBGrid Science Portal - eScience 2012
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt22012 02 pre_hbs_grid_overview_ianstokesrees_pt2
2012 02 pre_hbs_grid_overview_ianstokesrees_pt2
 
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt12012 02 pre_hbs_grid_overview_ianstokesrees_pt1
2012 02 pre_hbs_grid_overview_ianstokesrees_pt1
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
 
2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees2011 10 pre_broad_grid_overview_ianstokesrees
2011 10 pre_broad_grid_overview_ianstokesrees
 
Grid Computing Overview
Grid Computing OverviewGrid Computing Overview
Grid Computing Overview
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interface
 
2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees2010 06 pre_show_computing_lifesciences_stokesrees
2010 06 pre_show_computing_lifesciences_stokesrees
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 

Recently uploaded

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 

Beyond the Science Gateway

  • 1. Beyond the Science Gateway Connecting Cyberinfrastructure Back To The Laptop Science Gateways, Ann Arbor MI, October 2017
  • 2. About Me • Today: Computational Scientist at Anaconda • Platform for Python-centric data science • Yesterday: Postdoc and Lecturer at Harvard • Built a science gateway for protein structure analysis • Leveraged OSG and XSEDE • Last Week: PhD on CERN LHCb experiment • 2007: “A REST Model for High Throughput Scheduling in Computational Grids” • Last Millennium: Electrical Engineer, University of Waterloo http://about.me/ijstokes @ijstokes
  • 3. Abstract Science Gateways today are generally built to provide a web-accessible interface for a particular scientific community to access a combination of software, hardware, and data deployed in an expertly managed computing center. But what happens when the scientist wants to repatriate their data? Or perform some analysis that is not supported by the gateway? Both for the purposes of encouraging innovative workflows and serving an audience with a wide range of computational experience it is important to consider how a gateway can fit into the broader computational ecosystem of a particular researcher or research group. One simple starting point for this is to ask the question "how can the gateway connect back to the laptop?". This talk will consider how this is being done today in science gateways and present some ideas for how this could be expanded in the future. Slides at http://bit.ly/gateways17-beyond
  • 4. Tetralogy • First Book: The Story of Science Gateways (a play in three acts) • Second Book: Going Beyond Gateways Today • Third Book: Opportunities For Future Success • Epilogue: Anaconda For Reproducible Science
  • 5. The Story of Science Gateways (a play in three acts)
  • 6. The Cast Beth, a biochemist ✓Experiment design ✓Microarray equipment ✓Wet lab skills ✗Database expert ✗Computer programming ✗Linux administration Sakina, a software engineer ✓Python ✓Web development ✓Data wrangling ✗Biology ✗Liquid chromatography Dipesh, a devops engineer ✓Clusters & containers ✓Security ✓Storage systems ✗Application development ✗Genetics & proteomics
  • 8. • Beth focused on her science • Worked in the wet lab • Collected data • Submitted it to the Microarray Analysis Gateway • Got back results • Iterated • Published paper Success!
  • 9. Data: Act II Microarray Analysis Gateway • Beth is now making heavy use of the Gateway • She realizes there are some opportunities for cross-experiment data analysis • The Gateway doesn’t support this • She’s got funding for a postdoc who wants to investigate further Paul, a postdoc Paul’s laptop
  • 10. Data Movement • Paul needed access to Beth’s data • Input and output data • Some way to access raw output data, not just web-based graphics • This is an exceptional request for the Gateway team • Paul and Dipesh coordinate to repatriate the data
  • 11. Software: Act III • Beth has an applied mathematician colleague Claire who would like to try out a new GPU-based numerical analysis algorithm she has developed • Claire needs access to parts of the Gateway’s workflow software • Claire and Sakina coordinate to share the software details and the workflow
  • 12. Reproducible Software Stack • Workflow is coupled to Gateway framework • Claire works from a Mac, whereas the Gateway runs on a Linux cluster • Installation of software and dependencies is laborious • Claire is an experienced in writing high performance numerical algorithms, but not RESTful APIs, web servers, and workflow managers • Beth and Claire want to move the collaborative research forward but are feeling daunted
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 26.
  • 28.
  • 29.
  • 30.
  • 31.
  • 33. Giving superpowers to the people who change the world teams
  • 34.
  • 35. Conda Sandboxing Technology • Language independent • Platform independent • No special privileges required • No VMs or containers • Enables: • Reproducibility • Collaboration • Scaling “conda – package everything” conda base Python v2.7 NumPy v1.10 Pandas v0.16 r R-base v3.4 ggplot caret dev Python v3.6 Pandas v0.20 Jupyter NumPy v1.12
  • 36.
  • 37. Internal Anaconda Repository Publish Fetch DevOps Engineer (Linux) Production analytics cluster Business Analyst (Win) Data Lab: shared analytics cluster Package Control Authentication Anaconda Enterprise Server Computation & Data Access Web Interface Active Directory/ LDAP Optional Data Warehouse Anaconda Enterprise Data Scientist (Mac)
  • 38. © 2016 Continuum Analytics - Confidential & Proprietary Anaconda Project 1 Project 2 Project 3 Project 1 Project 2 Project 3 Data Science Development Data Science Reproducibility, Development and Deployment Anaconda Enterprise Container 1 Container 2 Container 3 Container 4 Anaconda Enterprise and Containers
  • 39. © 2016 Continuum Analytics - Confidential & Proprietary
  • 40. © 2016 Continuum Analytics - Confidential & Proprietary
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Thank You! Questions? • Ian Stokes-Rees, ijstokes@anaconda.com • @ijstokes • http://anaconda.com • Slides at http://bit.ly/gateways17-beyond