This document discusses using containers and supercomputers to enable open science. It describes how supercomputers are used for diverse scientific research in many fields. Containers can help address issues with portability and scalability on supercomputers by replicating software environments. Shifter enables the use of Docker containers on supercomputers while addressing security and performance concerns. Examples are given of scientific projects using containers, such as astronomy, particle physics, and biology projects. Ensuring reproducibility of results through containerization is also discussed.
8. Supercomputing file systems
• Scale out FS – 100s of OSSs.
• Access FS over high-speed
interconnect
• High aggregate BW, but works best for
large IO/transfer sizes
• Global, coherent namespace
• Easy for scientists to use
• Hard to scale up metadata
Not your grandmother’s FS
Compute Nodes IO Nodes Storage Servers
How do you distribute PBs of files and data to hundreds of thousands of compute
cores, with no latency?
9. • Cori: >1000 jobs running
simultaneously on (1600*32) cores
• Everything from 1000+ node jobs
to single-core jobs
• Time-insensitive simulations
• Real-time experimental data
analysis
• Complex scheduling problem!
Who uses a supercomputer?
Job size on Cori (# cores)
The traditional idea of supercomputer usage is a gigantic, whole-machine simulation
that runs for days/weeks and produces a huge dataset, or a single number– for
example, a 20,000-year climate simulation or a calculation of the structure of an atom.
The reality is much more diverse/unruly.
10. Supercomputing issues
Screamingly fast interconnect, no local disk,
and custom compute environment designed
to accelerate parallel apps – but not
everything can adapt easily to this
environment.
• Portability
• Custom Cray SUSE Linux-based
environment – hard to use standard
Linux-based code/libs
• Scientists often run at multiple sites –
wherever they can get the cycles
LHC Grid Computing
Our users want to run complex software stacks on multiple platforms
11. Supercomputing issues
Screamingly fast interconnect, no local disk,
and custom compute environment designed
to accelerate parallel apps – but not
everything can adapt easily to this
environment.
• Portability
• Scalability
• Slow start-up time for shared libs (i.e.
python code)
• Distributed FS doesn’t deal well with
lots of small files
Our users want to run complex software stacks on multiple platforms
12. Supercomputing issues
Screamingly fast interconnect and custom
compute environment designed to accelerate
parallel apps – but not everything can adapt
easily to this environment.
• Portability
• Scalability
• Slow start-up time for shared libs (i.e.
python code)
• Distributed FS doesn’t deal well with
lots of small files
Our users want to run complex software stacks on multiple platforms
Containers for
HPC!
13. Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter enables the collaborative nature of Docker for
science and large-scale systems
Enable Docker functionality and direct compatibility, but
customizing for the needs of HPC systems
Shifter directly imports Docker images
Containers on supercomputers
14. Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter uses loop mount of image file – moves metadata
operations (like file lookup) to the compute node, rather than
relying on central metadata servers of parallel file system.
Gives much faster shared library performance…
High performance at huge scale
Containers on supercomputers
15. Why not simply use Docker?
• Underlying custom OS
• Highly-optimized interconnect
• Security issues: if you can start a Docker container,
you can start it as root – map in other volumes with
root access!
Shifter uses loop mount of image file – moves metadata
operations (like file lookup) to the compute node, rather than
relying on central metadata servers of parallel file system.
Gives much faster shared library performance…
High performance at huge scale
17. Dark Energy Survey
What is the Universe
made of?
How and why is it
expanding?
Astronomy Data Analysis
18. Dark Energy Survey
What is the Universe
made of?
How and why is it
expanding?
Astronomy Data ProcessingLight from some of these galaxies was emitted 13 billion years ago
19. Dark Energy Survey
Astronomy Data Analysis
Measuring the expansion history of the
universe to understand the nature of Dark
Energy.
Data analysis code: identify objects
(stars, galaxies, quasars, asteroids etc) in
images, calibrate, measure their
properties.
• Why Containers?
• Complicated software stack – runs
on laptops to supercomputers
• Python-based code; lots of imports
20. LHC ATLAS computing stack
What is the Universe made of?
Why does anything have mass?
25. The Superfacility Concept
Scientists using the LCLS at SLAC need
real-time feedback on their running
experiments – take advantage of NERSC
supercomputers
• Why Containers?
• Complex python-based analysis
environment LCLS-driven
• Workflow : Data and analysis code
coming in from outside NERSC –
security concern
LCLS
27. Post-experiment data analysis
Everyone agrees this is essential (federally mandated!), but
noone knows how to do it properly/coherently
• Algorithms: need to run scripts that produced the
results
• Environment: need to replicate the OS, software
libraries, compiler version
• Data: large volumes, databases, calibration data,
metadata…
Scientific Reproducibility
https://www.whitehouse.gov/sites/default/files/microsites/ostp
/ostp_public_access_memo_2013.pdf
28. Containers forever
Ince, Hatton & Graham-Cumming, Nature 482, 485 (2012)
Scientific communication relies on evidence that cannot be entirely included in
publications, but the rise of computational science has added a new layer of
inaccessibility. Although it is now accepted that data should be made available
on request, the current regulations regarding the availability of software are
inconsistent. We argue that, with some exceptions, anything less than the
release of source programs is intolerable for results that depend on
computation. The vagaries of hardware, software and natural language will
always ensure that exact reproducibility remains uncertain, but withholding
code increases the chances that efforts to reproduce results will fail.
29. Containers offer the possibility of
encapsulating analysis code and
compute environment to ensure
reproducibility of algorithms and
environment.
• Enable reproduction of results on
any compute system
Containers forever?
30. In case you can’t think of anything to talk about
• Make this publishable: DOI for DockerHub images, as for github repos.
• Link github/Docker repos?
• How to link data to containers?
• How to maintain containers over the long term?
• Long-term data access efforts in many areas of science – thinking 20
years ahead. Are containers viable in this timeframe?
Discussion Points
32. Shifter!=Docker
• User runs as the user in the container – not root
• Image modified at container construction time:
• Modifies /etc, /var, /opt
• replaces /etc/passwd, /etc/group other files for site/security
needs
• adds /var/hostsfile to identify other nodes in the calculation
(like $PBS_NODEFILE)
• Injects some support software in /opt/udiImage
• Adds mount points for parallel filesystems
• Your homedir can stay the same inside and outside of the
container
• Site configurable
• Image readonly on the Computational Platform
• to modify your image, push an update using Docker
• Shifter only uses mount namespaces, not network or process namespaces
• Allows your application to leverage the HSN and more easily integrate
with the system
• Shifter does not use cgroups directly
• Allows the site workload manager (e.g., SLURM, Torque) to manage
resources
• Shifter uses individual compressed filesystem files to
store images, not the Docker graph
• Uses more diskspace, but delivers high
performance at scale
• Shifter integrates with your Workload Manager
• Can instantiate container on thousands of
nodes
• Run parallel MPI jobs
• Specialized sshd run within container for exclusive-
node for non-native-MPI parallel jobs
• PBS_NODESFILE equivalent provided within
container (/var/hostsfile)
• Similar to Cray CCM functionality
• Acts in place of CCM if shifter “image” is
pointed to /dsl VFS tree
33. Shifter~=Docker
• Sets up user-defined image under user control
• Allows volume remapping
• mount /a/b/c on /b/a/c in container
• Containers can be “run”
• Environment variables, working directory, entrypoint scripts can be defined and run
• Can instantiate multiple containers on same node