SlideShare una empresa de Scribd logo
1 de 34
Open Forum: Open Science
Debbie
BardUsing containers and
supercomputers to solve the
mysteries of the Universe
Shifter:
containers for
HPC
What’s a
supercomputer?
Containers for
supercomputing
Agenda
Awesome
science
The nature of the
Universe
Developing new
technologies
Containerizing
open science
Reproducible
science
Shifter
Containerizing
Supercomputers
Supercomputing for Open Science
• Most widely used computing center in DoE Office of
Science
• 6000+ users, 750+ codes, 2000+ papers/year
• Biology, Energy, Environment
• Computing
• Materials, Chemistry, Geophysics
• Particle Physics, Cosmology
• Nuclear Physics
• Fusion Energy, Plasma Physics
NERSC Cori cabinetNERSC Mendel Cluster cabinet
It’s all about the connections
What’s a supercomputer?
• Edison Cray XC30
• 2.5PF
• 357TB RAM
• ~5000 nodes, ~130k cores
• Cori Cray XC40
• Data-intensive (32-core
Haswells, 128GB) partition
• Compute-intensive (68-core
KNLs, 90GB) partition
• ~10k nodes, ~700k cores
• Edison Cray XC30
• 2.5PF
• 357TB RAM
• ~5000 nodes, ~130k cores
• Cori Cray XC40
• Data-intensive (32-core
Haswells, 128GB) partition
• Compute-intensive (68-core
KNLs, 90GB) partition
• ~10k nodes, ~700k cores
>10PB project file system (GPFS)
>38PB scratch file system (Lustre)
>1.5PB Burst Buffer (flash)
Supercomputing file systems
• Scale out FS – 100s of OSSs.
• Access FS over high-speed
interconnect
• High aggregate BW, but works best for
large IO/transfer sizes
• Global, coherent namespace
• Easy for scientists to use
• Hard to scale up metadata
Not your grandmother’s FS
Compute Nodes IO Nodes Storage Servers
How do you distribute PBs of files and data to hundreds of thousands of compute
cores, with no latency?
• Cori: >1000 jobs running
simultaneously on (1600*32) cores
• Everything from 1000+ node jobs
to single-core jobs
• Time-insensitive simulations
• Real-time experimental data
analysis
• Complex scheduling problem!
Who uses a supercomputer?
Job size on Cori (# cores)
The traditional idea of supercomputer usage is a gigantic, whole-machine simulation
that runs for days/weeks and produces a huge dataset, or a single number– for
example, a 20,000-year climate simulation or a calculation of the structure of an atom.
The reality is much more diverse/unruly.
Supercomputing issues
Screamingly fast interconnect, no local disk,
and custom compute environment designed
to accelerate parallel apps – but not
everything can adapt easily to this
environment.
• Portability
• Custom Cray SUSE Linux-based
environment – hard to use standard
Linux-based code/libs
• Scientists often run at multiple sites –
wherever they can get the cycles
LHC Grid Computing
Our users want to run complex software stacks on multiple platforms
Supercomputing issues
Screamingly fast interconnect, no local disk,
and custom compute environment designed
to accelerate parallel apps – but not
everything can adapt easily to this
environment.
• Portability
• Scalability
• Slow start-up time for shared libs (i.e.
python code)
• Distributed FS doesn’t deal well with
lots of small files
Our users want to run complex software stacks on multiple platforms
Supercomputing issues
Screamingly fast interconnect and custom
compute environment designed to accelerate
parallel apps – but not everything can adapt
easily to this environment.
• Portability
• Scalability
• Slow start-up time for shared libs (i.e.
python code)
• Distributed FS doesn’t deal well with
lots of small files
Our users want to run complex software stacks on multiple platforms
Containers for
HPC!
Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter enables the collaborative nature of Docker for
science and large-scale systems
Enable Docker functionality and direct compatibility, but
customizing for the needs of HPC systems
Shifter directly imports Docker images
Containers on supercomputers
Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter uses loop mount of image file – moves metadata
operations (like file lookup) to the compute node, rather than
relying on central metadata servers of parallel file system.
Gives much faster shared library performance…
High performance at huge scale
Containers on supercomputers
Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter uses loop mount of image file – moves metadata
operations (like file lookup) to the compute node, rather than
relying on central metadata servers of parallel file system.
Gives much faster shared library performance…
High performance at huge scale
Awesome
Science
Containerizing the Unvierse
Dark Energy Survey
What is the Universe
made of?
How and why is it
expanding?
Astronomy Data Analysis
Dark Energy Survey
What is the Universe
made of?
How and why is it
expanding?
Astronomy Data ProcessingLight from some of these galaxies was emitted 13 billion years ago
Dark Energy Survey
Astronomy Data Analysis
Measuring the expansion history of the
universe to understand the nature of Dark
Energy.
Data analysis code: identify objects
(stars, galaxies, quasars, asteroids etc) in
images, calibrate, measure their
properties.
• Why Containers?
• Complicated software stack – runs
on laptops to supercomputers
• Python-based code; lots of imports
LHC ATLAS computing stack
What is the Universe made of?
Why does anything have mass?
A billion proton-proton collisions per second
and multi-GB of data per second.
CVMFS: >3.5TB, >50M inodes
Spectacularly complex software
stack required to analyse data
from particle collisions
• Why Containers?
• Un-tar stack on compute
node is not efficient,
doesn’t scale (~30min/job)
• Dedupe files, squashfs
image: 315GB
• Scales up to thousands of
nodes
LHC ATLAS computing stack
# Cores Average start-up
time
24 32s
240 11s
2400 15s
24000 24s
How does photosynthesis
happen?
How do drugs dock with proteins
in our cells?
Why do jet engines fail?
LCLS
Linac Coherent Light Source
Suepr-intense femtosecond x-ray pulses
The Superfacility Concept
Scientists using the LCLS at SLAC need
real-time feedback on their running
experiments – take advantage of NERSC
supercomputers
• Why Containers?
• Complex python-based analysis
environment LCLS-driven
• Workflow : Data and analysis code
coming in from outside NERSC –
security concern
LCLS
Containerizing
Open Science
Post-experiment data analysis
Everyone agrees this is essential (federally mandated!), but
noone knows how to do it properly/coherently
• Algorithms: need to run scripts that produced the
results
• Environment: need to replicate the OS, software
libraries, compiler version
• Data: large volumes, databases, calibration data,
metadata…
Scientific Reproducibility
https://www.whitehouse.gov/sites/default/files/microsites/ostp
/ostp_public_access_memo_2013.pdf
Containers forever
Ince, Hatton & Graham-Cumming, Nature 482, 485 (2012)
Scientific communication relies on evidence that cannot be entirely included in
publications, but the rise of computational science has added a new layer of
inaccessibility. Although it is now accepted that data should be made available
on request, the current regulations regarding the availability of software are
inconsistent. We argue that, with some exceptions, anything less than the
release of source programs is intolerable for results that depend on
computation. The vagaries of hardware, software and natural language will
always ensure that exact reproducibility remains uncertain, but withholding
code increases the chances that efforts to reproduce results will fail.
Containers offer the possibility of
encapsulating analysis code and
compute environment to ensure
reproducibility of algorithms and
environment.
• Enable reproduction of results on
any compute system
Containers forever?
In case you can’t think of anything to talk about
• Make this publishable: DOI for DockerHub images, as for github repos.
• Link github/Docker repos?
• How to link data to containers?
• How to maintain containers over the long term?
• Long-term data access efforts in many areas of science – thinking 20
years ahead. Are containers viable in this timeframe?
Discussion Points
Backup Slides
Shifter!=Docker
• User runs as the user in the container – not root
• Image modified at container construction time:
• Modifies /etc, /var, /opt
• replaces /etc/passwd, /etc/group other files for site/security
needs
• adds /var/hostsfile to identify other nodes in the calculation
(like $PBS_NODEFILE)
• Injects some support software in /opt/udiImage
• Adds mount points for parallel filesystems
• Your homedir can stay the same inside and outside of the
container
• Site configurable
• Image readonly on the Computational Platform
• to modify your image, push an update using Docker
• Shifter only uses mount namespaces, not network or process namespaces
• Allows your application to leverage the HSN and more easily integrate
with the system
• Shifter does not use cgroups directly
• Allows the site workload manager (e.g., SLURM, Torque) to manage
resources
• Shifter uses individual compressed filesystem files to
store images, not the Docker graph
• Uses more diskspace, but delivers high
performance at scale
• Shifter integrates with your Workload Manager
• Can instantiate container on thousands of
nodes
• Run parallel MPI jobs
• Specialized sshd run within container for exclusive-
node for non-native-MPI parallel jobs
• PBS_NODESFILE equivalent provided within
container (/var/hostsfile)
• Similar to Cray CCM functionality
• Acts in place of CCM if shifter “image” is
pointed to /dsl VFS tree
Shifter~=Docker
• Sets up user-defined image under user control
• Allows volume remapping
• mount /a/b/c on /b/a/c in container
• Containers can be “run”
• Environment variables, working directory, entrypoint scripts can be defined and run
• Can instantiate multiple containers on same node
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard

Más contenido relacionado

La actualidad más candente

Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FB
Docker, Inc.
 

La actualidad más candente (20)

A New Centralized Volume Storage Solution for Docker and Container Cloud by W...
A New Centralized Volume Storage Solution for Docker and Container Cloud by W...A New Centralized Volume Storage Solution for Docker and Container Cloud by W...
A New Centralized Volume Storage Solution for Docker and Container Cloud by W...
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta Mobycraft - Docker in 8-bit by Aditya Gupta
Mobycraft - Docker in 8-bit by Aditya Gupta
 
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt BaldwinApplication Deployment and Management at Scale with 1&1 by Matt Baldwin
Application Deployment and Management at Scale with 1&1 by Matt Baldwin
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
DockerCon EU 2015: Monitoring Docker
DockerCon EU 2015: Monitoring DockerDockerCon EU 2015: Monitoring Docker
DockerCon EU 2015: Monitoring Docker
 
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
Microservices + Events + Docker = A Perfect Trio by Docker Captain Chris Rich...
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
 
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera
 
Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...
Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...
Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...
 
Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FB
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
 
Experiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processingExperiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processing
 
Persistent Data Storage for Docker Containers by Andre Moruga
Persistent Data Storage for Docker Containers by Andre MorugaPersistent Data Storage for Docker Containers by Andre Moruga
Persistent Data Storage for Docker Containers by Andre Moruga
 
Dell Trials and Triumphs using Docker on Client Systems by Sean McGinnis and ...
Dell Trials and Triumphs using Docker on Client Systems by Sean McGinnis and ...Dell Trials and Triumphs using Docker on Client Systems by Sean McGinnis and ...
Dell Trials and Triumphs using Docker on Client Systems by Sean McGinnis and ...
 
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
 
Containers: Life Beyond Microservices? by Sushil Kumar, Robin Systems
Containers: Life Beyond Microservices? by Sushil Kumar, Robin SystemsContainers: Life Beyond Microservices? by Sushil Kumar, Robin Systems
Containers: Life Beyond Microservices? by Sushil Kumar, Robin Systems
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 

Destacado

LXC to Docker Via Continuous Delivery
LXC to Docker Via Continuous DeliveryLXC to Docker Via Continuous Delivery
LXC to Docker Via Continuous Delivery
Docker, Inc.
 
Deploying Containers and Managing Them
Deploying Containers and Managing ThemDeploying Containers and Managing Them
Deploying Containers and Managing Them
Docker, Inc.
 
Test What You Write, Ship What You Test
Test What You Write, Ship What You TestTest What You Write, Ship What You Test
Test What You Write, Ship What You Test
Docker, Inc.
 
DockerCon14 Keynote
DockerCon14 KeynoteDockerCon14 Keynote
DockerCon14 Keynote
Docker, Inc.
 
Docker at RelateIQ
Docker at RelateIQDocker at RelateIQ
Docker at RelateIQ
Docker, Inc.
 
Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1
Docker, Inc.
 
DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2
Docker, Inc.
 
Building a Platform with Django, Docker and Salt
Building a Platform with Django, Docker and SaltBuilding a Platform with Django, Docker and Salt
Building a Platform with Django, Docker and Salt
Docker, Inc.
 

Destacado (20)

Open Access im Wissenschaftsverlag: Bücher
Open Access im Wissenschaftsverlag: BücherOpen Access im Wissenschaftsverlag: Bücher
Open Access im Wissenschaftsverlag: Bücher
 
Shipping Container Architecture
Shipping Container Architecture Shipping Container Architecture
Shipping Container Architecture
 
DockerCon SF 2015: MomOps in DevOps w/ Mukta Aphale
DockerCon SF 2015: MomOps in DevOps w/ Mukta AphaleDockerCon SF 2015: MomOps in DevOps w/ Mukta Aphale
DockerCon SF 2015: MomOps in DevOps w/ Mukta Aphale
 
LXC to Docker Via Continuous Delivery
LXC to Docker Via Continuous DeliveryLXC to Docker Via Continuous Delivery
LXC to Docker Via Continuous Delivery
 
Deploying Containers and Managing Them
Deploying Containers and Managing ThemDeploying Containers and Managing Them
Deploying Containers and Managing Them
 
Test What You Write, Ship What You Test
Test What You Write, Ship What You TestTest What You Write, Ship What You Test
Test What You Write, Ship What You Test
 
Docker Online Meetup #30: Docker Trusted Registry 1.4.1
Docker Online Meetup #30: Docker Trusted Registry 1.4.1Docker Online Meetup #30: Docker Trusted Registry 1.4.1
Docker Online Meetup #30: Docker Trusted Registry 1.4.1
 
DockerCon SF 2015: Education for a digital world
DockerCon SF 2015: Education for a digital worldDockerCon SF 2015: Education for a digital world
DockerCon SF 2015: Education for a digital world
 
Docker, Innovation Accelerator
Docker, Innovation AcceleratorDocker, Innovation Accelerator
Docker, Innovation Accelerator
 
Docker at Spotify
Docker at SpotifyDocker at Spotify
Docker at Spotify
 
DockerCon14 Keynote
DockerCon14 KeynoteDockerCon14 Keynote
DockerCon14 Keynote
 
Docker at RelateIQ
Docker at RelateIQDocker at RelateIQ
Docker at RelateIQ
 
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaSDockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaS
 
Why should I care about stateful containers?
Why should I care about stateful containers?Why should I care about stateful containers?
Why should I care about stateful containers?
 
Understanding Containers through Gaming by Brendan Fosberry
Understanding Containers through Gaming by Brendan Fosberry Understanding Containers through Gaming by Brendan Fosberry
Understanding Containers through Gaming by Brendan Fosberry
 
Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1Dockerfile Basics Workshop #1
Dockerfile Basics Workshop #1
 
DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2DockerCon 14 Keynote Day 2
DockerCon 14 Keynote Day 2
 
Developer Week
Developer WeekDeveloper Week
Developer Week
 
OpenStack Boston
OpenStack BostonOpenStack Boston
OpenStack Boston
 
Building a Platform with Django, Docker and Salt
Building a Platform with Django, Docker and SaltBuilding a Platform with Django, Docker and Salt
Building a Platform with Django, Docker and Salt
 

Similar a Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard

Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
David Wallom
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

Similar a Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard (20)

Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Big data talk barcelona - jsr - jc
Big data talk   barcelona - jsr - jcBig data talk   barcelona - jsr - jc
Big data talk barcelona - jsr - jc
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing data
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
CERNBox: Site Report
CERNBox: Site ReportCERNBox: Site Report
CERNBox: Site Report
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 

Más de Docker, Inc.

Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
Docker, Inc.
 

Más de Docker, Inc. (20)

Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience Containerize Your Game Server for the Best Multiplayer Experience
Containerize Your Game Server for the Best Multiplayer Experience
 
How to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker BuildHow to Improve Your Image Builds Using Advance Docker Build
How to Improve Your Image Builds Using Advance Docker Build
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
Securing Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINXSecuring Your Containerized Applications with NGINX
Securing Your Containerized Applications with NGINX
 
How To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and ComposeHow To Build and Run Node Apps with Docker and Compose
How To Build and Run Node Apps with Docker and Compose
 
Hands-on Helm
Hands-on Helm Hands-on Helm
Hands-on Helm
 
Distributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at SalesforceDistributed Deep Learning with Docker at Salesforce
Distributed Deep Learning with Docker at Salesforce
 
The First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker HubThe First 10M Pulls: Building The Official Curl Image for Docker Hub
The First 10M Pulls: Building The Official Curl Image for Docker Hub
 
Monitoring in a Microservices World
Monitoring in a Microservices WorldMonitoring in a Microservices World
Monitoring in a Microservices World
 
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
Become a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio CodeBecome a Docker Power User With Microsoft Visual Studio Code
Become a Docker Power User With Microsoft Visual Studio Code
 
How to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container RegistryHow to Use Mirroring and Caching to Optimize your Container Registry
How to Use Mirroring and Caching to Optimize your Container Registry
 
Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!Monolithic to Microservices + Docker = SDLC on Steroids!
Monolithic to Microservices + Docker = SDLC on Steroids!
 
Kubernetes at Datadog Scale
Kubernetes at Datadog ScaleKubernetes at Datadog Scale
Kubernetes at Datadog Scale
 
Labels, Labels, Labels
Labels, Labels, Labels Labels, Labels, Labels
Labels, Labels, Labels
 
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelUsing Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment Model
 
Build & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWSBuild & Deploy Multi-Container Applications to AWS
Build & Deploy Multi-Container Applications to AWS
 
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...
 
Developing with Docker for the Arm Architecture
Developing with Docker for the Arm ArchitectureDeveloping with Docker for the Arm Architecture
Developing with Docker for the Arm Architecture
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard

  • 1. Open Forum: Open Science Debbie BardUsing containers and supercomputers to solve the mysteries of the Universe
  • 2. Shifter: containers for HPC What’s a supercomputer? Containers for supercomputing Agenda Awesome science The nature of the Universe Developing new technologies Containerizing open science Reproducible science
  • 4. Supercomputing for Open Science • Most widely used computing center in DoE Office of Science • 6000+ users, 750+ codes, 2000+ papers/year • Biology, Energy, Environment • Computing • Materials, Chemistry, Geophysics • Particle Physics, Cosmology • Nuclear Physics • Fusion Energy, Plasma Physics
  • 5. NERSC Cori cabinetNERSC Mendel Cluster cabinet It’s all about the connections What’s a supercomputer?
  • 6. • Edison Cray XC30 • 2.5PF • 357TB RAM • ~5000 nodes, ~130k cores • Cori Cray XC40 • Data-intensive (32-core Haswells, 128GB) partition • Compute-intensive (68-core KNLs, 90GB) partition • ~10k nodes, ~700k cores
  • 7. • Edison Cray XC30 • 2.5PF • 357TB RAM • ~5000 nodes, ~130k cores • Cori Cray XC40 • Data-intensive (32-core Haswells, 128GB) partition • Compute-intensive (68-core KNLs, 90GB) partition • ~10k nodes, ~700k cores >10PB project file system (GPFS) >38PB scratch file system (Lustre) >1.5PB Burst Buffer (flash)
  • 8. Supercomputing file systems • Scale out FS – 100s of OSSs. • Access FS over high-speed interconnect • High aggregate BW, but works best for large IO/transfer sizes • Global, coherent namespace • Easy for scientists to use • Hard to scale up metadata Not your grandmother’s FS Compute Nodes IO Nodes Storage Servers How do you distribute PBs of files and data to hundreds of thousands of compute cores, with no latency?
  • 9. • Cori: >1000 jobs running simultaneously on (1600*32) cores • Everything from 1000+ node jobs to single-core jobs • Time-insensitive simulations • Real-time experimental data analysis • Complex scheduling problem! Who uses a supercomputer? Job size on Cori (# cores) The traditional idea of supercomputer usage is a gigantic, whole-machine simulation that runs for days/weeks and produces a huge dataset, or a single number– for example, a 20,000-year climate simulation or a calculation of the structure of an atom. The reality is much more diverse/unruly.
  • 10. Supercomputing issues Screamingly fast interconnect, no local disk, and custom compute environment designed to accelerate parallel apps – but not everything can adapt easily to this environment. • Portability • Custom Cray SUSE Linux-based environment – hard to use standard Linux-based code/libs • Scientists often run at multiple sites – wherever they can get the cycles LHC Grid Computing Our users want to run complex software stacks on multiple platforms
  • 11. Supercomputing issues Screamingly fast interconnect, no local disk, and custom compute environment designed to accelerate parallel apps – but not everything can adapt easily to this environment. • Portability • Scalability • Slow start-up time for shared libs (i.e. python code) • Distributed FS doesn’t deal well with lots of small files Our users want to run complex software stacks on multiple platforms
  • 12. Supercomputing issues Screamingly fast interconnect and custom compute environment designed to accelerate parallel apps – but not everything can adapt easily to this environment. • Portability • Scalability • Slow start-up time for shared libs (i.e. python code) • Distributed FS doesn’t deal well with lots of small files Our users want to run complex software stacks on multiple platforms Containers for HPC!
  • 13. Why not simply use Docker? • Underlying custom OS • Highly-optimized interconnect • Security issues: if you can start a Docker container, you can start it as root – map in other volumes with root access! Shifter enables the collaborative nature of Docker for science and large-scale systems Enable Docker functionality and direct compatibility, but customizing for the needs of HPC systems Shifter directly imports Docker images Containers on supercomputers
  • 14. Why not simply use Docker? • Underlying custom OS • Highly-optimized interconnect • Security issues: if you can start a Docker container, you can start it as root – map in other volumes with root access! Shifter uses loop mount of image file – moves metadata operations (like file lookup) to the compute node, rather than relying on central metadata servers of parallel file system. Gives much faster shared library performance… High performance at huge scale Containers on supercomputers
  • 15. Why not simply use Docker? • Underlying custom OS • Highly-optimized interconnect • Security issues: if you can start a Docker container, you can start it as root – map in other volumes with root access! Shifter uses loop mount of image file – moves metadata operations (like file lookup) to the compute node, rather than relying on central metadata servers of parallel file system. Gives much faster shared library performance… High performance at huge scale
  • 17. Dark Energy Survey What is the Universe made of? How and why is it expanding? Astronomy Data Analysis
  • 18. Dark Energy Survey What is the Universe made of? How and why is it expanding? Astronomy Data ProcessingLight from some of these galaxies was emitted 13 billion years ago
  • 19. Dark Energy Survey Astronomy Data Analysis Measuring the expansion history of the universe to understand the nature of Dark Energy. Data analysis code: identify objects (stars, galaxies, quasars, asteroids etc) in images, calibrate, measure their properties. • Why Containers? • Complicated software stack – runs on laptops to supercomputers • Python-based code; lots of imports
  • 20. LHC ATLAS computing stack What is the Universe made of? Why does anything have mass?
  • 21. A billion proton-proton collisions per second and multi-GB of data per second.
  • 22. CVMFS: >3.5TB, >50M inodes Spectacularly complex software stack required to analyse data from particle collisions • Why Containers? • Un-tar stack on compute node is not efficient, doesn’t scale (~30min/job) • Dedupe files, squashfs image: 315GB • Scales up to thousands of nodes LHC ATLAS computing stack # Cores Average start-up time 24 32s 240 11s 2400 15s 24000 24s
  • 23. How does photosynthesis happen? How do drugs dock with proteins in our cells? Why do jet engines fail? LCLS Linac Coherent Light Source
  • 25. The Superfacility Concept Scientists using the LCLS at SLAC need real-time feedback on their running experiments – take advantage of NERSC supercomputers • Why Containers? • Complex python-based analysis environment LCLS-driven • Workflow : Data and analysis code coming in from outside NERSC – security concern LCLS
  • 27. Post-experiment data analysis Everyone agrees this is essential (federally mandated!), but noone knows how to do it properly/coherently • Algorithms: need to run scripts that produced the results • Environment: need to replicate the OS, software libraries, compiler version • Data: large volumes, databases, calibration data, metadata… Scientific Reproducibility https://www.whitehouse.gov/sites/default/files/microsites/ostp /ostp_public_access_memo_2013.pdf
  • 28. Containers forever Ince, Hatton & Graham-Cumming, Nature 482, 485 (2012) Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.
  • 29. Containers offer the possibility of encapsulating analysis code and compute environment to ensure reproducibility of algorithms and environment. • Enable reproduction of results on any compute system Containers forever?
  • 30. In case you can’t think of anything to talk about • Make this publishable: DOI for DockerHub images, as for github repos. • Link github/Docker repos? • How to link data to containers? • How to maintain containers over the long term? • Long-term data access efforts in many areas of science – thinking 20 years ahead. Are containers viable in this timeframe? Discussion Points
  • 32. Shifter!=Docker • User runs as the user in the container – not root • Image modified at container construction time: • Modifies /etc, /var, /opt • replaces /etc/passwd, /etc/group other files for site/security needs • adds /var/hostsfile to identify other nodes in the calculation (like $PBS_NODEFILE) • Injects some support software in /opt/udiImage • Adds mount points for parallel filesystems • Your homedir can stay the same inside and outside of the container • Site configurable • Image readonly on the Computational Platform • to modify your image, push an update using Docker • Shifter only uses mount namespaces, not network or process namespaces • Allows your application to leverage the HSN and more easily integrate with the system • Shifter does not use cgroups directly • Allows the site workload manager (e.g., SLURM, Torque) to manage resources • Shifter uses individual compressed filesystem files to store images, not the Docker graph • Uses more diskspace, but delivers high performance at scale • Shifter integrates with your Workload Manager • Can instantiate container on thousands of nodes • Run parallel MPI jobs • Specialized sshd run within container for exclusive- node for non-native-MPI parallel jobs • PBS_NODESFILE equivalent provided within container (/var/hostsfile) • Similar to Cray CCM functionality • Acts in place of CCM if shifter “image” is pointed to /dsl VFS tree
  • 33. Shifter~=Docker • Sets up user-defined image under user control • Allows volume remapping • mount /a/b/c on /b/a/c in container • Containers can be “run” • Environment variables, working directory, entrypoint scripts can be defined and run • Can instantiate multiple containers on same node

Notas del editor

  1. 30min startup time foes to 20sec