In this deck from the 2019 Stanford HPC Conference, Todd Gamblin from Lawrence Livermore National Laboratory presents: Spack - A Package Manager for HPC.
"Spack is a package manager for cluster users, developers and administrators. Rapidly gaining popularity in the HPC community, like other HPC package managers, Spack was designed to build packages from source. Spack supports relocatable binaries for specific OS releases, target architectures, MPI implementations, and other very fine-grained build options.
This talk will introduce some of the open infrastructure for distributing packages, challenges to providing binaries for a large package ecosystem and what we're doing to address problems. We'll also talk about challenges for implementing relocatable binaries with a multi-compiler system like Spack. Finally, we'll talk about how Spack integrates with the US Exascale project's open source software release plan and how this will help glue together the HPC OSS ecosystem.
Todd is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. His research focuses on scalable tools for measuring, analyzing, and visualizing performance data from massively parallel applications. Todd is also involved with many production projects at LLNL. He works with Livermore Computing’s Development Environment Group to build tools that allow users to deploy, run, debug, and optimize their software for machines with million-way concurrency.
Todd received his Ph.D. in computer science from the University of North Carolina at Chapel Hill in 2009. His dissertation investigated parallel methods for compressing and sampling performance measurements from hundreds of thousands of concurrent processors. He received his B.A. in Computer Science and Japanese from Williams College in 2002. He has also worked as a software developer in Tokyo and held research internships at the University of Tokyo and IBM Research.
Watch the video: https://youtu.be/DhUVbroMLJY
Learn more: https://computation.llnl.gov/projects/spack-hpc-package-manager
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Presentation on how to chat with PDF using ChatGPT code interpreter
Spack - A Package Manager for HPC
1. LLNL-PRES-747560
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-
AC52-07NA27344. Lawrence Livermore National Security, LLC
Spack: A Package Manager for HPC
2019 HPC-AI Advisory Council Stanford Conference
Todd GamblinFebruary 14, 2019
Stanford University
Computer Scientist
@spackpmgithub.com/spack
3. 3
LLNL-PRES-747560
@spackpmgithub.com/spack
Half of this DAG is external (blue); more than half of it is open source
Nearly all of it needs to be built specially for HPC to get the best performance
Even proprietary codes are based on many open source libraries
4. 4
LLNL-PRES-747560
@spackpmgithub.com/spack
The Exascale Computing Project is building an entire ecosystem
Every application has its own stack of dependencies.
Developers, users, and facilities dedicate (many) FTEs to building & porting.
Often trade reuse and usability for performance.
80+ software packagesx
5+ target architectures/platforms
Xeon Power KNL
NVIDIA ARM Laptops?
x
Up to 7 compilers
Intel GCC Clang XL
PGI Cray NAG
x
= up to 1,260,000 combinations!
15+ applications
x
10+ Programming Models
OpenMPI MPICH MVAPICH OpenMP CUDA
OpenACC Dharma Legion RAJA Kokkos
2-3 versions of each package +
external dependencies
x
We must make it easier to rely on others’ software!
6. 6
LLNL-PRES-747560
@spackpmgithub.com/spack
How to install software on a supercomputer
configuremake
Fightwithcompiler...
make
Tweakconfigureargs...
makeinstall
makeconfigure
configuremake
makeinstall
cmakemakemakeinstall
1. Download all 16
tarballs you need
2. Start building!
3. Run code
4. Segfault!?
5. Start over…
7. 7
LLNL-PRES-747560
@spackpmgithub.com/spack
Most supercomputers deploy some form of environment modules
— TCL modules (dates back to 1995) and Lmod (from TACC) are the most popular
Modules don’t handle installation!
— They only modify your environment (things like PATH, LD_LIBRARY_PATH, etc.)
Someone (likely a team of people) has already installed gcc for you!
— Also, you can only `module load` the things they’ve installed
What about modules?
$ gcc
- bash: gcc: command not found
$ module load gcc/7.0.1
$ gcc –dumpversion
7.0.1
8. 8
LLNL-PRES-747560
@spackpmgithub.com/spack
Containers provide a great way to reproduce and distribute an
already-built software stack
Someone needs to build the container!
— This isn’t trivial
— Containerized applications still have hundreds of dependencies
Using the OS package manager inside a container is insufficient
— Most binaries are built unoptimized
— Generic binaries, not optimized for specific architectures
Developing with an OS software stack can be painful
— Little freedom to choose versions
— Little freedom to choose compiler options, build options, etc. for packages
What about containers?
We need something more flexible to build the containers
9. 9
LLNL-PRES-747560
@spackpmgithub.com/spack
How to install Spack (works out of the box):
How to install a package:
HDF5 and its dependencies are installed
within the Spack directory.
Unlike typical package managers, Spack can also
install many variants of the same build.
— Different compilers
— Different MPI implementations
— Different build options
Spack is a flexible package manager for HPC
$ git clone https://github.com/spack/spack
$ . spack/share/spack/setup-env.sh
$ spack install hdf5
@spackpm
github.com/spack/spack
Visit spack.io
10. 10
LLNL-PRES-747560
@spackpmgithub.com/spack
Each expression is a spec for a particular configuration
— Each clause adds a constraint to the spec
— Constraints are optional – specify only what you need.
— Customize install on the command line!
Spec syntax is recursive
— Full control over the combinatorial build space
Spack provides the spec syntax to describe custom configurations
$spack installmpileaks unconstrained
$spack installmpileaks@3.3 @ custom version
$spack installmpileaks@3.3%gcc@4.7.3 %custom compiler
$spack installmpileaks@3.3%gcc@4.7.3 +threads +/- build option
$spack installmpileaks@3.3cxxflags="-O3–g3” settingcompiler flags
$spack installmpileaks@3.3os=cnl10target=haswell settingtargetforX-compile
$spack installmpileaks@3.3^mpich@3.2 %gcc@4.9.3 ^ dependencyinformation
14. 14
LLNL-PRES-747560
@spackpmgithub.com/spack
Spack packages are templates
They use a simple Python DSL to define how to build a spec
from spack import *
class Dyninst(Package):
"""API for dynamic binary instrumentation.""”
homepage = "https://paradyn.org"
url = "http://www.paradyn.org/release8.1.2/DyninstAPI-8.1.2.tgz"
version('8.2.1', 'abf60b7faabe7a2e’)
version('8.1.2', 'bf03b33375afa66f’)
version('8.1.1', 'd1a04e995b7aa709’)
depends_on("cmake", type="build")
depends_on("libelf", type="link")
depends_on("libdwarf", type="link")
depends_on("boost @1.42: +multithreaded")
def install(self, spec, prefix):
with working_dir('spack-build', create=True):
cmake('-DBoost_INCLUDE_DIR=‘ + spec['boost'].prefix.include,
'-DBoost_LIBRARY_DIR=‘ + spec['boost'].prefix.lib,
'-DBoost_NO_SYSTEM_PATHS=TRUE’
'..')
make()
make("install")
Metadata at the class level
Versions
Install logic in instance methods
Dependencies (note: they use the same spec syntax)
Patches, variants, resources, conflicts, etc.
(not shown)
15. 15
LLNL-PRES-747560
@spackpmgithub.com/spack
Each unique dependency graph is a unique
configuration.
Each configuration installed in a unique directory.
— Configurations of the same package can coexist.
Hash of entire directed acyclic graph (DAG) is
appended to each prefix.
Installed packages automatically find dependencies
— Spack embeds RPATHs in binaries.
— No need to use modules or set LD_LIBRARY_PATH
— Things work the way you built them
Spack handles combinatorial software complexity.
spack/opt/
linux-x86_64/
gcc-4.7.2/
mpileaks-1.1-0f54bf34cadk/
intel-14.1/
hdf5-1.8.15-lkf14aq3nqiz/
bgq/
xl-12.1/
hdf5-1-8.16-fqb3a15abrwx/
...
Installation Layout
Dependency DAG
Hash
No limit on the number of versions you can have installed.
16. 16
LLNL-PRES-747560
@spackpmgithub.com/spack
mpi is a virtual dependency
Install the same package built with two
different MPI implementations:
Virtual deps are replaced with a valid
implementation at resolution time.
— If the user didn’t pick something and there are
multiple options, Spack picks.
Depend on interfaces (not implementations)
with virtual dependencies
$ spack install mpileaks ^mvapich
$ spack install mpileaks ^openmpi@1.4:
class Mpileaks(Package):
depends_on("mpi@2:")
class Mvapich(Package):
provides("mpi@1” when="@:1.8")
provides("mpi@2” when="@1.9:")
class Openmpi(Package):
provides("mpi@:2.2" when="@1.6.5:")
Virtual dependencies can be versioned:
dependent
provider
provider
17. 17
LLNL-PRES-747560
@spackpmgithub.com/spack
Concretization fills in missing parts of requested specs.
mpileaks ^callpath@1.0+debug ^libelf@0.8.11
Concrete spec is fully constrained
and can be passed to install.
Concretize
Workflow:
1. Users input only an abstract spec with some constraints
2. Spack makes choices according to policies (site/user/etc.)
3. Spack installs concrete configurations of package + dependencies
Dependency resolution is an NP-complete problem!
— Different versions/configurations of packages require different
versions/configurations of dependencies
— Concretizer searches for a configuration that satisfies all the
requirements
— This is basically a SAT/SMT solve
18. 18
LLNL-PRES-747560
@spackpmgithub.com/spack
Dependency Resolution is an NP-hard problem!
Different versions of packages require different
versions of dependencies
— Concretizer searches for a configuration that satisfies all
the requirements
— Can show that SAT/SMT solve is equivalent problem
Resolution is NP-complete for *just* package and
version metadata
— Concretization also includes compilers, variants,
architecture, optional dependencies, virtual
dependencies
— We have some leeway because multiple stacks can
coexist within Spack (unlike system PMs)
— Even within one DAG there can be issues!
https://research.swtch.com/version-sat
Unsatisfiable!
19. 19
LLNL-PRES-747560
@spackpmgithub.com/spack
Spack is used worldwide!
Over 350 contributors
from labs, academia, industry
Over 3,000 software packages
Over 150,000 downloads in the past year
Over 1,100 monthly active users (on docs site)
Plot shows sessions on
spack.readthedocs.io for one month
20. 20
LLNL-PRES-747560
@spackpmgithub.com/spack
Started Spack development in 2013
— Paper at SC15
— Tutorials at SC16, SC17, SC18
— GitHub community has grown steadily!
232 pull requests merged in lead-up to
SC18!
— By 74 contributors
— We’ve been gradually increasing core
contributors
Spack has a very active open source community
21. 21
LLNL-PRES-747560
@spackpmgithub.com/spack
We try to make it easy to modify a package
— spack edit <package>
— Pull request
Contributors are HPC software developers
as well as user support teams and admins
We get contributions in the core as well as
in packages
LLNL still ha a majority of the core
contributions, with significant help from
others.
Spack has benefitted tremendously from external contributions
22. 22
LLNL-PRES-747560
@spackpmgithub.com/spack
Spack is being used on many of the top HPC systems
At HPC sites for software stack+ modules
— Reduced Summit deploy time from 2 weeks to 12 hrs.
— EPFL deploys its software stack with Jenkins + Spack
— NERSC, LLNL, ANL, other US DOE sites
— SJTU in China
Within ECP as part of their software release process
— ECP-wide software distribution
— SDK workflows
Within High Energy Physics (HEP) community
— HEP (Fermi, CERN) have contributed many features to
support their workflow
Many others
Summit (ORNL)
Sierra (LLNL)
Cori (NERSC)
SuperMUC-NG (LRZ)
23. 23
LLNL-PRES-747560
@spackpmgithub.com/spack
New stuff:
1. Spack environments (covered today)
2. spack.yaml and spack.lock files for tracking dependencies (covered today)
3. Custom configurations via command line (covered today)
4. Better support for linking Python packages into view directories (pip in views)
5. Support for uploading build logs to CDash
6. Packages have more control over compiler flags via flag handlers
7. Better support for module file generation
8. Better support for Intel compilers, Intel MPI, etc.
9. Many performance improvements, improved startup time
Spack is now permissively licensed under Apache-2.0 or MIT
— previously LGPL
Over 2,900 packages (800 added since last year)
— This is from November; over 3,000 in latest develop branch
Spack v0.12.1 was just released
24. 24
LLNL-PRES-747560
@spackpmgithub.com/spack
Allows developers to bundle Spack configuration with their repository
Can also be used to maintain configuration together with Spack packages.
— E.g., versioning your own local software stack with consistent compilers/MPI
implementations
Manifest / Lockfile model pioneered by Bundler is becoming standard
— spack.yaml describes project requirements
— spack.lock describes exactly what versions/configurations were installed, allows
them to be reproduced.
Spack has added environments and spack.yaml / spack.lock
Simple spack.yaml file
install build
project
spack.yaml file with
names of required
dependencies
Lockfile describes
exact versions installed
Dependency
packages
25. 25
LLNL-PRES-747560
@spackpmgithub.com/spack
We recently started providing base images on DockerHub with Spack preinstalled.
Very easy to build a container with some Spack packages in it:
Spack environments also help with building containers
spack-docker-demo/
Dockerfile
spack.yaml
Base image with Spack in PATH
Copy in spack.yaml
Then run spack install
List of packages to install,
with constraints
Build with docker build .
Run with Singularity
(or another tool)
26. 26
LLNL-PRES-747560
@spackpmgithub.com/spack
Supporting the U.S. Exascale project with binary builds
— Spack will be used to manage ECP software releases
— In conjunction with ECP CI, start to generate prebuilt binaries for HPC facilities
— Use the same relocatable binary packages for container deployment
Spack stacks: Build on environments to enable more automated deployment at HPC centers.
— Single YAML-file configuration for entire site stack
— Install massive combinatorial package installations, modules, etc. with one command.
Spack chains:
— Allow user Spack instances to leverage facility and team installations
— Hierarchical development flow
Architecture-specific binaries
— Better provenance for builds
— Better support for matching optimized binary packages to machines
Better dependency resolution
— Handle newer C++ libraries better
— More aggressive concretizer support
— Support for depending on language levels/compiler features (e.g., C++14, lambdas, OpenMP@version)
What’s on the road map?
27. 27
LLNL-PRES-747560
@spackpmgithub.com/spack
U.S. Exascale Computing Project (ECP)
will release software through Spack
Software in ECP stack needs to run on ECP platforms,
testbeds, clusters, laptops
— Each new environment requires effort.
ECP asks us to build a robust, reliable,
and easy-to-use software stack
We will provide the infrastructure necessary to make this tractable:
1. A dependency model that can handle HPC software
2. A hub for coordinated software releases (like xSDK)
3. Build and test automation for large packages across facility
4. Hosted binary and source software distributions for all ECP HPC platforms
Spack is the delivery platform for the ECP software stack
28. 28
LLNL-PRES-747560
@spackpmgithub.com/spack
CI at HPC centers is notoriously difficult
— Security concerns prevent most CI tools from being run by staff or by users
— HPC centers really need to deploy trusted CI services for this to work
We are developing a secure CI system for HPC centers:
— Setuid runners (run CI jobs as users); Batch integration (similar, but parallel jobs); multi-center runner support
Onyx Point will upstream this support into GitLab CI
— Initial rollout in FY19 at ECP labs: ANL, ORNL, NERSC, LLNL, LANL, SNL
— Upstream GitLab features can be used by anyone!
Through ECP, we are working with Onyx Point to deliver
continuous integration for HPC centers
User checks out / commits
code
Two-factor authentication
Fast mirroring
Setuid runner Batch runner
Trusted runners at HPC facility
30. 30
LLNL-PRES-747560
@spackpmgithub.com/spack
Spack stacks: entire facility deployments in a single YAML file
Allow users to easily express a huge cross-
product of specs
— All the packages needed for a facility
— Generate modules tailored to the site
— Generate a directory layout to browse the packages
Build on the environments workflow
— Manifest + lockfile
— Lockfile enables reproducibility
Relocatable binaries allow the same binary to be
used in a stack, regular install, or container build.
— Difference is how the user interacts with the stack
— Single-PATH stack vs. modules.
31. 31
LLNL-PRES-747560
@spackpmgithub.com/spack
As an HPC package manager, we want to provide optimized builds
— Code level choices (O2, O3)
— Architecture specific choices (-mcpu=cortex-a7, -march=haswell)
Architectures vary as to how much they expose features to users
— x86 exposes feature sets in /proc/cpuinfo
— Arm hides many features behind revision number
Methods for accessing architecture optimizations
— Vary by both compiler and architecture
• Gcc –mcpu vs. –march, for example
• Relies on architectures providing a programmatic way to get information
We want to expose the names users understand
— Thunderx2, cortex-a7 for arm
— Power8, power9 for IBM
— Haswell, skylake for Intel
Specific target information in specs – In progress
32. 32
LLNL-PRES-747560
@spackpmgithub.com/spack
Spack simplifies HPC software for:
— Users
— Developers
— Cluster installations
— The largest HPC facilities
Spack is central to ECP’s software strategy
— Enable software reuse for developers and users
— Allow the facilities to consume the entire ECP stack
The roadmap is packed with new features:
— Building the ECP software distribution
— Better workflows for building containers
— Stacks for facilities
— Chains for rapid dev workflow
— Optimized binaries
— Better dependency resolution
The Spack community is growing rapidly
@spackpm
github.com/spack/spack
Visit spack.io
33. Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United
States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or
implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or
imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security,
LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government
or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.