SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
LLNL-PRES-746388
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
The Sierra Supercomputer:
Science and Technology on a
Mission
Adam Bertsch
Sierra Integration Project Lead
Lawrence Livermore National Laboratory
February 20th, 2018
LLNL-PRES-746388
2
§ The Sierra Supercomputer
§ Collaboration
— CORAL and OpenPower
§ Science and Technology on a Mission
Overview
LLNL-PRES-746388
3
§ On a spring bicycle ride in Livermore five years ago.
Where did Sierra start for me?
LLNL-PRES-746388
4
AdvancedTechnology
Systems(ATS)
Fiscal Year
‘13 ‘14 ‘15 ‘16 ‘17 ‘18
Use
Retire
‘20‘19 ‘21
Commodity
Technology
Systems(CTS)
Procure&
Deploy
Sequoia (LLNL)
ATS 1 – Trinity (LANL/SNL)
Tri-lab Linux Capacity Cluster II (TLCC II)
CTS 1
CTS 2
‘22
System
Delivery
ATS 3 – Crossroads (LANL/SNL)
ATS 4 – El Capitan (LLNL)
‘23
ATS 5 – (LANL/SNL)
ATS 2 – Sierra (LLNL)
Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL
Sierra will be the next ASC ATS platform
LLNL-PRES-746388
5
The Sierra system that will replace Sequoia
features a GPU-accelerated architecture
Mellanox Interconnect
Single Plane EDR InfiniBand
2 to 1 Tapered Fat Tree
IBM POWER9
• Gen2 NVLink
NVIDIA Volta
• 7 TFlop/s
• HBM2
• Gen2 NVLink
Components
Compute Node
2 IBM POWER9 CPUs
4 NVIDIA Volta GPUs
NVMe-compatible PCIe 1.6 TB SSD
256 GiB DDR4
16 GiB Globally addressable HBM2
associated with each GPU
Coherent Shared Memory
Compute Rack
Standard 19”
Warm water cooling
Compute System
4320 nodes
1.29 PB Memory
240 Compute Racks
125 PFLOPS
~12 MW
Spectrum Scale
File System
154 PB usable storage
1.54 TB/s R/W bandwidth
LLNL-PRES-746388
6
Sierra Represents a Major Shift For Our Mission-
Critical Applications
2004 -
present
• IBM BlueGene, along with copious amounts of
linux-based cluster technology dominate the HPC
landscape at LLNL.
• Apps scale to unprecedented 1.6M MPI ranks, but
are still largely MPI-everywhere on ”lean” cores
2014
BG/L
BG/P
BG/Q
• Heterogeneous GPGPU-based computing begins
to look attractive with NVLINK and Unified
Memory (P9/Volta)
• Apps must undertake a 180 turn from BlueGene
toward heterogeneity and large-memory nodes
2015 –
20??
• LLNL believes heterogeneous computing is here
to stay, and Sierra gives us a solid leg-up in
preparing for exascale and beyond
LLNL-PRES-746388
7
Cost-neutral architectural tradeoffs
This decision, counter to prevailing wisdom for system design, benefits Sierra’s
planned UQ workload: 5% more UQ simulations at performance loss of < 1%
§ Full bandwidth from dual ported Mellanox
EDR HCAs to TOR switches
§ Half bandwidth from TOR switches to
director switches
§ An economic trade-off that provides
approximately 5% more nodes
§ DRAM cost double expectation
§ Evaluated two options to close
the cost gap
§ Buy half as much memory
§ Taper the network
§ Selected both options!
LLNL-PRES-746388
8
Sierra system architecture details
Sierra uSierra rzSierra
Nodes 4,320 684 54
POWER9 processors per node 2 2 2
GV100 (Volta) GPUs per node 4 4 4
Node Peak (TFLOP/s) 29.1 29.1 29.1
System Peak (PFLOP/s) 125 19.9 1.57
Node Memory (GiB) 320(256+64) 320(256+64) 320(256+64)
System Memory (PiB) 1.29 0.209 0.017
Interconnect 2x IB EDR 2x IB EDR 2x IB EDR
Off-Node Aggregate b/w (GB/s) 45.5 45.5 45.5
Compute racks 240 38 3
Network and Infrastructure racks 13 4 0.5
Storage Racks 24 4 0.5
Total racks 277 46 4
Peak Power (MW) ~12 ~1.8 ~0.14
LLNL-PRES-746388
9
CORAL Targets compared to Sierra
§ Target speedup over current systems of 4x on Scalable benchmarks …
§ … and 6x on Throughput benchmarks
§ Aggregate memory of 4 PB and ≥ 1 GB per MPI task (2 GB preferred)
§ Max power of system and peripherals ≤ 20MW
§ Mean Time Between Application Failure that requires human
intervention ≥ 6 days
§ Architectural Diversity
§ Delivery in 2017 with acceptance in 2018
§ Peak Performance ≥ 100 PF
LLNL-PRES-746388
10
Collaboration
LLNL-PRES-746388
11
Sierra is part of CORAL, the Collaboration
of Oak Ridge, Argonne and Livermore
CORAL is the next major phase in the U.S. Department of Energy’s
scientific computing roadmap and path to exascale computing
Modeled on successful LLNL/ANL/IBM
Blue Gene partnership (Sequoia/Mira)
Long-term contractual partnership
with 2 vendors
2 awardees for 3 platform
acquisition contracts
2 nonrecurring eng. contracts
NRE contract
ANL Aurora contract
NRE contract
ORNL Summit contract
LLNL Sierra contract
RFP
BG/Q Sequoia
BG/L BG/P Dawn
LLNL’s IBM Blue Gene Systems
LLNL-PRES-746388
12
OpenPOWER collaboration enabled technology
behind Sierra
LLNL-PRES-XXXXXX
13
LLNL-PRES-683853
21&
!  Motherboard'design'
!  Water'cooled'compute'nodes''
!  GPU'resilience''
!  Switch'based'collectives,'
FlashDirect'
!  Open'source'compiler'
infrastructure'
CORAL&NRE&will&provide&significant&benefit&
to&the&final&systems&
!  System'diagnostics'
!  System'scheduling''
!  Burst'buffer''
!  GPFS'performance'and'
scalability'
!  Cluster'management'
!  Open'source'tools'
!  Center'of'Excellence'
Eight''joint'lab/contractor'working'groups'are'actively'
working'on'all'the'above'and'more'''
LLNL-PRES-746388
14
CORAL NRE drove Spectrum Scale advances
that will vastly improve Sierra file system
Original Plan of Record
Modified (Cost Neutral)
§ 120PB delivered capacity
§ Conservative drive capacity estimates
§ 1TB/s write, 1.2TB/s read
§ Concern about metadata targets
NRE + Hardware
Advancements
Close partnerships lead to ecosystem improvements
§ 154PB delivered capacity
§ Substantially increased drive capacities
§ 1.54 TB/s write/read
§ Enhanced Spectrum Scale metadata perf.
(many optimizations already released)
LLNL-PRES-746388
15
Outstanding benchmark analysis by IBM and
NVIDIA demonstrates the system’s usability
Projections included code changes that showed tractable
annotation-based approach (i.e., OpenMP) will be competitive
Figure 5: CORAL benchmark projections show GPU-accelerated system is expected to deliver substan
performance at the system level compared to CPU-only configuration.
The demonstration of compelling, scalable performance at the system level across a
13
CORAL APPLICATION PERFORMANCE PROJECTIONS
0x
2x
4x
6x
8x
10x
12x
14x
QBOX LSMS HACC NEKbone
RelativePerformance
Scalable Science Benchmarks
CPU-only CPU + GPU
0x
2x
4x
6x
8x
10x
12x
14x
CAM-SE UMT2013 AMG2013 MCB
RelativePerformance
Throughput Benchmarks
CPU-only CPU + GPU
LLNL-PRES-746388
16
3-4 Years to Prepare Applications for Sierra
2014 – CORAL contracts at LLNL and
ORNL go to IBM/NVIDIA. 3-4 years of
“runway”
Today – System Delivery Underway.
Initial open science use about to begin.
Transition to mission use in ~10 months.
Goal: Take off flying
when machine “hits
the floor”
LLNL-PRES-746388
17
Application enablement through the Center of
Excellence – the original (and enduring) concept
Vendor
Application /
Platform
Expertise
Staff (code
developers)
Applications
and Key
Libraries
Hands-on collaborations
On-site vendor expertise
Customized training
Early delivery platform access
Application bottleneck analysis
Both portable and vendor-specific solutions
Technical areas: algorithms, compilers,
programming models, resilience, tools, …
Codes tuned for
next-gen
platforms
Rapid feedback
to vendors on
early issues
Publications,
early science
Steering
committee
The LLNL/Sierra
CoE is jointly
funded by the ASC
Program and the
LLNL Institution
Improved
Software Tools
and
Environment
Performance
Portability
LLNL-PRES-746388
18
§ Decouple loop traversal and iterate (body)
§ An iterate is a “task” (aggregate, (re)order, …)
§ IndexSet and execution policy abstractions
simplify exploration of implementation/tuning
options without disrupting source code
RAJA is our performance-portability solution that uses standard C++
idioms to target multiple back-end programming models
double* x ; double* y ;
double a, tsum = 0, tmin = MYMAX;
for ( int i = begin; i < end; ++i ) {
y[i] += a * x[i] ;
tsum += y[i] ;
if ( y[i] < tmin ) tmin = y[i];
}
C-style for-loop
Execution Policy
(how loop runs: PM backend, etc.)
Index
(index sets, segments to
partition, order, …. iterations)
Pattern
(forall, reduction, scan, etc.)
RAJA Concepts
RAJA-style loop
double* x ; double* y ;
double a ;
RAJA::SumReduction<reduce_policy, double> tsum(0);
RAJA::MinReduction<reduce_policy, double> tmin(MYMAX);
RAJA::forall< exec_policy > ( IndexSet , [=] (int i) {
y[i] += a * x[i] ;
tsum += y[i];
tmin.min( y[i] );
} );
double* x ; double* y ;
double a ;
RAJA::SumReduction<reduce_policy, double> tsum(0);
RAJA::MinReduction<reduce_policy, double> tmin(MYMAX);
RAJA::forall< exec_policy > ( index_set , [=] (int i) {
y[i] += a * x[i] ;
tsum += y[i];
tmin.min( y[i] );
} );
RAJA Transformation
RAJA allows us to write-once, target multiple back-ends
LLNL-PRES-746388
19
Early Access to CORAL Software Technology
Based on IBM’s current technical programming offerings with
enhancements for new function and scalability. Includes:
• Initial messaging stack implementation
• Initial Compute node kernel
• Compilers:
o Open Source LLVM
o PGI compiler
o XL compiler
Early Access systems provide critical
pre-Sierra generation resources
Early Access Compute System
• 18 Compute Nodes
• EDR IB switch fabric
• Ethernet mgmt
• Air-cooled
Early Access Compute Node
• “Minsky” HPC Server
• 2xPOWER8+ processors
• 4xNVIDIA GP100; NVLINK
• 256GB SMP (32x8GB DDR4)
Initial Early Access GPFS
• GPFS Management servers
• 2 POWER8 Storage controllers
• 1 GL-2; 2x58 6TB SAS drives or
• 1 GL-4; 4x58 6TB SAS drives
• EDR InfiniBand switch fabric
• Enhanced to GL-6, 6x58
6TB SAS drives
LLNL-PRES-746388
20
Three LLNL Early Access systems support
ASC and institutional preparations for Sierra
Shark
597.6TF
Compute Compute
Storage
Infrastructure
ESS
Ray:
896.4 TF
Compute Compute StorageCompute Infrastructure
Compute nodes on
Ray have 1.6TB
NVMe drives
Compute Compute
Storage
Infrastructure
ESS
Manta
597.6 TF
LLNL-PRES-746388
21
Sierra COE focus over time and how it necessarily has
shifted as technology and code readiness advances
Time (~ 5 years)
Relativelevelsofeffort
•
Training
•
Topical “deep dives”
•
Develop of portability
abstractions (RAJA)
•
Early explorations on
key kernels
•
Exploration with proxy apps
•
Identify algorithmic and data structures changes
•
Hack-a-thons
•
Use of available non-IBM GPU resources
DeliveryofSierra
•
Transition to
Sierra
•
Performance
optimization
•
Deliver “early
science”
results
•
Transition to early
delivery hardware
•
Focus on production
apps
•
Performance prediction
and modeling
•
Test portability
LLNL-PRES-746388
22
The COE has broadened to encompass most Sierra
preparation activities
§ The Sierra COE began with vendor proposals as part of the Sierra procurement
for proposed NRE (non-recurring engineering) activities
— Inspired by ORNL Titan activities
— IBM Research and NVIDIA Dev Tech’s are staffed with a cadre of top-notch application experts
§ Has become an organizing principle beyond vendor contributions
— Ties to Livermore Computing (LC) CORAL working groups (compilers and tools)
— ASQ division staff continue to provide key CS support to ASC code teams
— CASC leading several projects (RAJA, ICOE Tools and Libraries)
§ The LLNL Advance Architecture and Portability Specialists (AAPS) team has
become an integral and enduring part of our COE strategy
— Providing targeted expertise to both ASC and ICOE teams
Sierra
COE
Sierra NRE
subcontract
w/ IBM
Institutional
COE
LLNL Staff
(Code Teams
+ AAPS)
LLNL-PRES-746388
23
Preliminary ASC application results on EA
system (single node results)
Application/
package/lib
Status Notes Observed GPU
Speedups*
Hydro Most of code ported to RAJA 7-10x (package dependent)
Equation of
State
Ported to RAJA. Still porting setup/initialization and some less-
used interpolation routines
11x
CG solver Internal (e.g. non-hypre) solver 10x
Hypre Setup not ported to GPU, still a potential bottleneck 9x
Unstructured
Deterministic
Transport
Basing work on UMT benchmark results. Performance limited
to NVLINK bandwidth (streaming algorithm)
With UMT:
2-2.5x streaming
4-8x in memory
Structured
Deterministic
Transport
Excellent results with Kripke mini-app via RAJA. Porting main
application and expect similar results
Kripke (mini-app):
14x
Monte Carlo
Transport
Using “big kernel” approach, and exploring via Quicksilver
mini-app.
~1-2x
* 4 GPUs vs 2 P8s
Little or no tuning has been done for the Power CPU yet, and compilers are improving.
Most codes currently see a 1.1 – 1.4x speedup over comparable Xeon CPUs (e.g. Haswell)
Optimistic Optimistic, with
caveats
Unknown, but
promising
May not be
GPU-friendly
GPUs bad
LLNL-PRES-746388
24
Science and Technology on a Mission
Intelligence Biosecurity
Nonproliferation Defense
Science Energy Weapons
Counterterrorism
LLNL-PRES-746388
25
§ Annual certification of the U.S. nuclear stockpile without nuclear
testing
Stockpile Stewardship and the ASC Program
LLNL-PRES-XXXXXX
26
LLNL-PRES-683853
5&
!  ASC&delivers&the&leading&edge,&highGend&simula>on&capabili>es&
needed&to&meet&U.S.&nuclear&weapons&assessment&and&cer>fica>on&
requirements,&including:&weapon&codes,&weapon&science,&compu>ng&
plaOorms,&and&suppor>ng&infrastructure.&
!  ASC&modeling&and&simula>on&provides:&
—  a&computa>onal&surrogate&for&nuclear&tes>ng,&which&is&central&to&U.S.&
na>onal&security.&
—  the&ability&to&model&the&extraordinary&complexity&of&nuclear&weapons&
systems,&which&is&essen>al&to&establish&confidence&in&the&performance&of&
our&aging&stockpile.&&
—  the&necessary&tools&for&nuclear&weapons&scien>sts&and&engineers&to&gain&
a&comprehensive&understanding&of&the&en>re&weapons&lifecycle&from&
design&to&safe&processes&for&dismantlement.&&
!  Through&close&coordina>on&with&other&government&agencies,&ASC&
tools&also&play&an&important&role&in&suppor>ng&nonprolifera>on&
efforts,&emergency&response,&and&nuclear&forensics.&
Advanced'Simula:on'and'Compu:ng'(ASC)'
Program'is'LLNL’s'main'HPC'funding'source'
LLNL-PRES-746388
27
LLNL-PRES-683853
26&
Extreme'scale'compu:ng'is'needed'across'the'spectrum'
of'NNSA’s'na:onal'security'missions!
Modelcomplexity Transition
Metals
Noble
Metals
Actinide
Metals
(10µ)3µ3(100nm)3
ab initio
Systems
2D
Weapons’ Science
3D
Simulation scale
Nuclear Stockpile
•  Safety
•  Surety
•  Reliability
•  Robustness
Non-
Proliferation
and Nuclear
Counter
Terrorism
LLNL-PRES-746388
28
Advanced technology insertion will establish new
directions for high-performance computing
CORAL technology
Exascale technology
Memory-
intensive
systems
Machine learning and
neuromorphic technologies
Quantum technologies
Data-analytics
systems e.g.
Catalyst
Cognitive simulation systems
38 Photonics Spectra June 2014 www.photonics.com
Quantum Computers
NV−
center platforms have already proved
effective in many core parts of a quan-
tum computer, from single-qubit gates to
entanglement of two remote NV−
centers.
The system can operate at room tem-
perature, which is an advantage over ion
traps and superconductor approaches. The
NV−
center offers an electron spin and
a nuclear spin, the latter of which works
as a memory to store quantum informa-
tion. The electron spin can be coupled to
a photon to connect one NV−
module to
another, potentially enabling large-scale
quantum computers.
optic interconnects, which can simultane-
ously be used as off-chip memory/storage
buffers. The role of the buffer is to delay
single photons for short periods to facili-
tate circuit synchronization. The NTT
group has also proposed (doi: 10.1038/
ncomms3725) a photonic quantum pro-
cessor based on an on-chip single-photon
buffer using a coupled resonator optical
waveguide (CROW).
-
fer consists of 400 high-Q (quantum
nanocavities. The model showed buffer-
ing of a pulsed single photon for up to 150
ps with 50-ps tenability while preserving
entanglement. At the circuit level, the
group found maximum tolerable error
per elementary quantum gate to be ap-
proximately 0.73 percent. At the physical
architecture level, they estimate that a
large-scale computer with a similar error
threshold is achievable in the near future,
-
puting. Even better, the team says that its
electron-spin entanglement distribution
scheme can also be applied to quantum-
dot-based systems.
Superconductor platforms
Why are qubits so faulty? The quantum
state of a qubit is a very fragile thing,
easily disturbed by loss due to materials-
related defects, noise from nearby control
lines, or other qubits.
“Intuitively, one can imagine the
quantum state as a spinning top: If you
don’t touch it, it’ll keep spinning. If you
tap on it, the top starts to precess; and if
you perturb it too much, it tumbles,” said
Rami Barends, postdoctoral fellow of
physics at the University of California,
Santa Barbara.
Fault-tolerant large-scale quantum
computers require a platform that can
scale up, a compatible architecture that
corrects errors to below 1 percent, high
=
the threshold, said Barends, a member of
professor John Martinis’ group at UCSB.
He and doctoral candidate Julian Kelly
Nature paper de-
scribing a superconductor-based quantum
architecture that paves the path toward
those goals. Superconductor platforms
enable construction of large quantum
circuits compatible with inexpensive
microfabrication.
The approach used by Martinis’ group
uses a Josephson junction design, which
uses aluminum as the superconductor
and 2-nm thin barriers of aluminum
oxide to store quantum bits on a sapphire
substrate. The key to making the surface
code work is a 2-D array of Xmon qubits
in which quantum logic and measure-
ment correct the errors (doi:10.1038/
nature13171). The processor chip has
nearest-neighbor coupling. The gates
modify the state of one qubit conditional
on the state of the other, thus entangling
them. The photons that are sent to the qu-
bits have a 6-GHz microwave frequency,
which can be generated electrically.
To eliminate thermal noise at room
temperature, superconductor-based
EricLucero/UCSB
614QuantumComputing.indd 38 5/27/14 12:45 PM
Workflow steering
Active learning – design
optimization and UQ
Hybrid accelerated
simulation
2015
2021
2018
2025
Analytics and simulation
convergence
LLNL-PRES-746388
29
Integrated machine learning and simulation could
enable dynamic validation of complex models
High-fidelity
simulation
Ensembles
of
simulation
sets
CORAL computing
architectures power the
dynamic validation loop
High dimensional
model parameters
Hypothesis generation
– use the ML model to
predict parameters for
experimental data
Requests
for new
experiments
and data
Distribution of
predicted
properties
LLNL-PRES-746388
30
§ The advantages that led us to select Sierra generalize
—Power efficiency
—Network advantages of “fat nodes”
—Clear path for performance portability
—Balancing capabilities/costs implies complex memory hierarchies
§ Broad collaboration is essential to successful deployment
of increasingly complex HPC systems
—Computing centers collaborate on requirements
—Vendors collaborate on solutions
Sierra and its EA systems are beginning an
accelerator-based computing era at LLNL
Heterogeneous computing on Sierra represents a viable path to
exascale
LLNL-PRES-746388
31
Sierra Supercomputer Powers Scientific Discovery

Más contenido relacionado

Similar a Sierra Supercomputer Powers Scientific Discovery

TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopEleni Trouva
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Fwdays
 
Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Helix Nebula The Science Cloud
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreinside-BigData.com
 
Telefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafosTelefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafosNeo4j
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemDeepak Shankar
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdfPramodhN3
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 

Similar a Sierra Supercomputer Powers Scientific Discovery (20)

TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA Workshop
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
 
Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
Telefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafosTelefonica: Automatización de la gestión de redes mediante grafos
Telefonica: Automatización de la gestión de redes mediante grafos
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 

Más de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Más de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Sierra Supercomputer Powers Scientific Discovery

  • 1. LLNL-PRES-746388 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC The Sierra Supercomputer: Science and Technology on a Mission Adam Bertsch Sierra Integration Project Lead Lawrence Livermore National Laboratory February 20th, 2018
  • 2. LLNL-PRES-746388 2 § The Sierra Supercomputer § Collaboration — CORAL and OpenPower § Science and Technology on a Mission Overview
  • 3. LLNL-PRES-746388 3 § On a spring bicycle ride in Livermore five years ago. Where did Sierra start for me?
  • 4. LLNL-PRES-746388 4 AdvancedTechnology Systems(ATS) Fiscal Year ‘13 ‘14 ‘15 ‘16 ‘17 ‘18 Use Retire ‘20‘19 ‘21 Commodity Technology Systems(CTS) Procure& Deploy Sequoia (LLNL) ATS 1 – Trinity (LANL/SNL) Tri-lab Linux Capacity Cluster II (TLCC II) CTS 1 CTS 2 ‘22 System Delivery ATS 3 – Crossroads (LANL/SNL) ATS 4 – El Capitan (LLNL) ‘23 ATS 5 – (LANL/SNL) ATS 2 – Sierra (LLNL) Sequoia and Sierra are the current and next-generation Advanced Technology Systems at LLNL Sierra will be the next ASC ATS platform
  • 5. LLNL-PRES-746388 5 The Sierra system that will replace Sequoia features a GPU-accelerated architecture Mellanox Interconnect Single Plane EDR InfiniBand 2 to 1 Tapered Fat Tree IBM POWER9 • Gen2 NVLink NVIDIA Volta • 7 TFlop/s • HBM2 • Gen2 NVLink Components Compute Node 2 IBM POWER9 CPUs 4 NVIDIA Volta GPUs NVMe-compatible PCIe 1.6 TB SSD 256 GiB DDR4 16 GiB Globally addressable HBM2 associated with each GPU Coherent Shared Memory Compute Rack Standard 19” Warm water cooling Compute System 4320 nodes 1.29 PB Memory 240 Compute Racks 125 PFLOPS ~12 MW Spectrum Scale File System 154 PB usable storage 1.54 TB/s R/W bandwidth
  • 6. LLNL-PRES-746388 6 Sierra Represents a Major Shift For Our Mission- Critical Applications 2004 - present • IBM BlueGene, along with copious amounts of linux-based cluster technology dominate the HPC landscape at LLNL. • Apps scale to unprecedented 1.6M MPI ranks, but are still largely MPI-everywhere on ”lean” cores 2014 BG/L BG/P BG/Q • Heterogeneous GPGPU-based computing begins to look attractive with NVLINK and Unified Memory (P9/Volta) • Apps must undertake a 180 turn from BlueGene toward heterogeneity and large-memory nodes 2015 – 20?? • LLNL believes heterogeneous computing is here to stay, and Sierra gives us a solid leg-up in preparing for exascale and beyond
  • 7. LLNL-PRES-746388 7 Cost-neutral architectural tradeoffs This decision, counter to prevailing wisdom for system design, benefits Sierra’s planned UQ workload: 5% more UQ simulations at performance loss of < 1% § Full bandwidth from dual ported Mellanox EDR HCAs to TOR switches § Half bandwidth from TOR switches to director switches § An economic trade-off that provides approximately 5% more nodes § DRAM cost double expectation § Evaluated two options to close the cost gap § Buy half as much memory § Taper the network § Selected both options!
  • 8. LLNL-PRES-746388 8 Sierra system architecture details Sierra uSierra rzSierra Nodes 4,320 684 54 POWER9 processors per node 2 2 2 GV100 (Volta) GPUs per node 4 4 4 Node Peak (TFLOP/s) 29.1 29.1 29.1 System Peak (PFLOP/s) 125 19.9 1.57 Node Memory (GiB) 320(256+64) 320(256+64) 320(256+64) System Memory (PiB) 1.29 0.209 0.017 Interconnect 2x IB EDR 2x IB EDR 2x IB EDR Off-Node Aggregate b/w (GB/s) 45.5 45.5 45.5 Compute racks 240 38 3 Network and Infrastructure racks 13 4 0.5 Storage Racks 24 4 0.5 Total racks 277 46 4 Peak Power (MW) ~12 ~1.8 ~0.14
  • 9. LLNL-PRES-746388 9 CORAL Targets compared to Sierra § Target speedup over current systems of 4x on Scalable benchmarks … § … and 6x on Throughput benchmarks § Aggregate memory of 4 PB and ≥ 1 GB per MPI task (2 GB preferred) § Max power of system and peripherals ≤ 20MW § Mean Time Between Application Failure that requires human intervention ≥ 6 days § Architectural Diversity § Delivery in 2017 with acceptance in 2018 § Peak Performance ≥ 100 PF
  • 11. LLNL-PRES-746388 11 Sierra is part of CORAL, the Collaboration of Oak Ridge, Argonne and Livermore CORAL is the next major phase in the U.S. Department of Energy’s scientific computing roadmap and path to exascale computing Modeled on successful LLNL/ANL/IBM Blue Gene partnership (Sequoia/Mira) Long-term contractual partnership with 2 vendors 2 awardees for 3 platform acquisition contracts 2 nonrecurring eng. contracts NRE contract ANL Aurora contract NRE contract ORNL Summit contract LLNL Sierra contract RFP BG/Q Sequoia BG/L BG/P Dawn LLNL’s IBM Blue Gene Systems
  • 13. LLNL-PRES-XXXXXX 13 LLNL-PRES-683853 21& !  Motherboard'design' !  Water'cooled'compute'nodes'' !  GPU'resilience'' !  Switch'based'collectives,' FlashDirect' !  Open'source'compiler' infrastructure' CORAL&NRE&will&provide&significant&benefit& to&the&final&systems& !  System'diagnostics' !  System'scheduling'' !  Burst'buffer'' !  GPFS'performance'and' scalability' !  Cluster'management' !  Open'source'tools' !  Center'of'Excellence' Eight''joint'lab/contractor'working'groups'are'actively' working'on'all'the'above'and'more'''
  • 14. LLNL-PRES-746388 14 CORAL NRE drove Spectrum Scale advances that will vastly improve Sierra file system Original Plan of Record Modified (Cost Neutral) § 120PB delivered capacity § Conservative drive capacity estimates § 1TB/s write, 1.2TB/s read § Concern about metadata targets NRE + Hardware Advancements Close partnerships lead to ecosystem improvements § 154PB delivered capacity § Substantially increased drive capacities § 1.54 TB/s write/read § Enhanced Spectrum Scale metadata perf. (many optimizations already released)
  • 15. LLNL-PRES-746388 15 Outstanding benchmark analysis by IBM and NVIDIA demonstrates the system’s usability Projections included code changes that showed tractable annotation-based approach (i.e., OpenMP) will be competitive Figure 5: CORAL benchmark projections show GPU-accelerated system is expected to deliver substan performance at the system level compared to CPU-only configuration. The demonstration of compelling, scalable performance at the system level across a 13 CORAL APPLICATION PERFORMANCE PROJECTIONS 0x 2x 4x 6x 8x 10x 12x 14x QBOX LSMS HACC NEKbone RelativePerformance Scalable Science Benchmarks CPU-only CPU + GPU 0x 2x 4x 6x 8x 10x 12x 14x CAM-SE UMT2013 AMG2013 MCB RelativePerformance Throughput Benchmarks CPU-only CPU + GPU
  • 16. LLNL-PRES-746388 16 3-4 Years to Prepare Applications for Sierra 2014 – CORAL contracts at LLNL and ORNL go to IBM/NVIDIA. 3-4 years of “runway” Today – System Delivery Underway. Initial open science use about to begin. Transition to mission use in ~10 months. Goal: Take off flying when machine “hits the floor”
  • 17. LLNL-PRES-746388 17 Application enablement through the Center of Excellence – the original (and enduring) concept Vendor Application / Platform Expertise Staff (code developers) Applications and Key Libraries Hands-on collaborations On-site vendor expertise Customized training Early delivery platform access Application bottleneck analysis Both portable and vendor-specific solutions Technical areas: algorithms, compilers, programming models, resilience, tools, … Codes tuned for next-gen platforms Rapid feedback to vendors on early issues Publications, early science Steering committee The LLNL/Sierra CoE is jointly funded by the ASC Program and the LLNL Institution Improved Software Tools and Environment Performance Portability
  • 18. LLNL-PRES-746388 18 § Decouple loop traversal and iterate (body) § An iterate is a “task” (aggregate, (re)order, …) § IndexSet and execution policy abstractions simplify exploration of implementation/tuning options without disrupting source code RAJA is our performance-portability solution that uses standard C++ idioms to target multiple back-end programming models double* x ; double* y ; double a, tsum = 0, tmin = MYMAX; for ( int i = begin; i < end; ++i ) { y[i] += a * x[i] ; tsum += y[i] ; if ( y[i] < tmin ) tmin = y[i]; } C-style for-loop Execution Policy (how loop runs: PM backend, etc.) Index (index sets, segments to partition, order, …. iterations) Pattern (forall, reduction, scan, etc.) RAJA Concepts RAJA-style loop double* x ; double* y ; double a ; RAJA::SumReduction<reduce_policy, double> tsum(0); RAJA::MinReduction<reduce_policy, double> tmin(MYMAX); RAJA::forall< exec_policy > ( IndexSet , [=] (int i) { y[i] += a * x[i] ; tsum += y[i]; tmin.min( y[i] ); } ); double* x ; double* y ; double a ; RAJA::SumReduction<reduce_policy, double> tsum(0); RAJA::MinReduction<reduce_policy, double> tmin(MYMAX); RAJA::forall< exec_policy > ( index_set , [=] (int i) { y[i] += a * x[i] ; tsum += y[i]; tmin.min( y[i] ); } ); RAJA Transformation RAJA allows us to write-once, target multiple back-ends
  • 19. LLNL-PRES-746388 19 Early Access to CORAL Software Technology Based on IBM’s current technical programming offerings with enhancements for new function and scalability. Includes: • Initial messaging stack implementation • Initial Compute node kernel • Compilers: o Open Source LLVM o PGI compiler o XL compiler Early Access systems provide critical pre-Sierra generation resources Early Access Compute System • 18 Compute Nodes • EDR IB switch fabric • Ethernet mgmt • Air-cooled Early Access Compute Node • “Minsky” HPC Server • 2xPOWER8+ processors • 4xNVIDIA GP100; NVLINK • 256GB SMP (32x8GB DDR4) Initial Early Access GPFS • GPFS Management servers • 2 POWER8 Storage controllers • 1 GL-2; 2x58 6TB SAS drives or • 1 GL-4; 4x58 6TB SAS drives • EDR InfiniBand switch fabric • Enhanced to GL-6, 6x58 6TB SAS drives
  • 20. LLNL-PRES-746388 20 Three LLNL Early Access systems support ASC and institutional preparations for Sierra Shark 597.6TF Compute Compute Storage Infrastructure ESS Ray: 896.4 TF Compute Compute StorageCompute Infrastructure Compute nodes on Ray have 1.6TB NVMe drives Compute Compute Storage Infrastructure ESS Manta 597.6 TF
  • 21. LLNL-PRES-746388 21 Sierra COE focus over time and how it necessarily has shifted as technology and code readiness advances Time (~ 5 years) Relativelevelsofeffort • Training • Topical “deep dives” • Develop of portability abstractions (RAJA) • Early explorations on key kernels • Exploration with proxy apps • Identify algorithmic and data structures changes • Hack-a-thons • Use of available non-IBM GPU resources DeliveryofSierra • Transition to Sierra • Performance optimization • Deliver “early science” results • Transition to early delivery hardware • Focus on production apps • Performance prediction and modeling • Test portability
  • 22. LLNL-PRES-746388 22 The COE has broadened to encompass most Sierra preparation activities § The Sierra COE began with vendor proposals as part of the Sierra procurement for proposed NRE (non-recurring engineering) activities — Inspired by ORNL Titan activities — IBM Research and NVIDIA Dev Tech’s are staffed with a cadre of top-notch application experts § Has become an organizing principle beyond vendor contributions — Ties to Livermore Computing (LC) CORAL working groups (compilers and tools) — ASQ division staff continue to provide key CS support to ASC code teams — CASC leading several projects (RAJA, ICOE Tools and Libraries) § The LLNL Advance Architecture and Portability Specialists (AAPS) team has become an integral and enduring part of our COE strategy — Providing targeted expertise to both ASC and ICOE teams Sierra COE Sierra NRE subcontract w/ IBM Institutional COE LLNL Staff (Code Teams + AAPS)
  • 23. LLNL-PRES-746388 23 Preliminary ASC application results on EA system (single node results) Application/ package/lib Status Notes Observed GPU Speedups* Hydro Most of code ported to RAJA 7-10x (package dependent) Equation of State Ported to RAJA. Still porting setup/initialization and some less- used interpolation routines 11x CG solver Internal (e.g. non-hypre) solver 10x Hypre Setup not ported to GPU, still a potential bottleneck 9x Unstructured Deterministic Transport Basing work on UMT benchmark results. Performance limited to NVLINK bandwidth (streaming algorithm) With UMT: 2-2.5x streaming 4-8x in memory Structured Deterministic Transport Excellent results with Kripke mini-app via RAJA. Porting main application and expect similar results Kripke (mini-app): 14x Monte Carlo Transport Using “big kernel” approach, and exploring via Quicksilver mini-app. ~1-2x * 4 GPUs vs 2 P8s Little or no tuning has been done for the Power CPU yet, and compilers are improving. Most codes currently see a 1.1 – 1.4x speedup over comparable Xeon CPUs (e.g. Haswell) Optimistic Optimistic, with caveats Unknown, but promising May not be GPU-friendly GPUs bad
  • 24. LLNL-PRES-746388 24 Science and Technology on a Mission Intelligence Biosecurity Nonproliferation Defense Science Energy Weapons Counterterrorism
  • 25. LLNL-PRES-746388 25 § Annual certification of the U.S. nuclear stockpile without nuclear testing Stockpile Stewardship and the ASC Program
  • 26. LLNL-PRES-XXXXXX 26 LLNL-PRES-683853 5& !  ASC&delivers&the&leading&edge,&highGend&simula>on&capabili>es& needed&to&meet&U.S.&nuclear&weapons&assessment&and&cer>fica>on& requirements,&including:&weapon&codes,&weapon&science,&compu>ng& plaOorms,&and&suppor>ng&infrastructure.& !  ASC&modeling&and&simula>on&provides:& —  a&computa>onal&surrogate&for&nuclear&tes>ng,&which&is&central&to&U.S.& na>onal&security.& —  the&ability&to&model&the&extraordinary&complexity&of&nuclear&weapons& systems,&which&is&essen>al&to&establish&confidence&in&the&performance&of& our&aging&stockpile.&& —  the&necessary&tools&for&nuclear&weapons&scien>sts&and&engineers&to&gain& a&comprehensive&understanding&of&the&en>re&weapons&lifecycle&from& design&to&safe&processes&for&dismantlement.&& !  Through&close&coordina>on&with&other&government&agencies,&ASC& tools&also&play&an&important&role&in&suppor>ng&nonprolifera>on& efforts,&emergency&response,&and&nuclear&forensics.& Advanced'Simula:on'and'Compu:ng'(ASC)' Program'is'LLNL’s'main'HPC'funding'source'
  • 28. LLNL-PRES-746388 28 Advanced technology insertion will establish new directions for high-performance computing CORAL technology Exascale technology Memory- intensive systems Machine learning and neuromorphic technologies Quantum technologies Data-analytics systems e.g. Catalyst Cognitive simulation systems 38 Photonics Spectra June 2014 www.photonics.com Quantum Computers NV− center platforms have already proved effective in many core parts of a quan- tum computer, from single-qubit gates to entanglement of two remote NV− centers. The system can operate at room tem- perature, which is an advantage over ion traps and superconductor approaches. The NV− center offers an electron spin and a nuclear spin, the latter of which works as a memory to store quantum informa- tion. The electron spin can be coupled to a photon to connect one NV− module to another, potentially enabling large-scale quantum computers. optic interconnects, which can simultane- ously be used as off-chip memory/storage buffers. The role of the buffer is to delay single photons for short periods to facili- tate circuit synchronization. The NTT group has also proposed (doi: 10.1038/ ncomms3725) a photonic quantum pro- cessor based on an on-chip single-photon buffer using a coupled resonator optical waveguide (CROW). - fer consists of 400 high-Q (quantum nanocavities. The model showed buffer- ing of a pulsed single photon for up to 150 ps with 50-ps tenability while preserving entanglement. At the circuit level, the group found maximum tolerable error per elementary quantum gate to be ap- proximately 0.73 percent. At the physical architecture level, they estimate that a large-scale computer with a similar error threshold is achievable in the near future, - puting. Even better, the team says that its electron-spin entanglement distribution scheme can also be applied to quantum- dot-based systems. Superconductor platforms Why are qubits so faulty? The quantum state of a qubit is a very fragile thing, easily disturbed by loss due to materials- related defects, noise from nearby control lines, or other qubits. “Intuitively, one can imagine the quantum state as a spinning top: If you don’t touch it, it’ll keep spinning. If you tap on it, the top starts to precess; and if you perturb it too much, it tumbles,” said Rami Barends, postdoctoral fellow of physics at the University of California, Santa Barbara. Fault-tolerant large-scale quantum computers require a platform that can scale up, a compatible architecture that corrects errors to below 1 percent, high = the threshold, said Barends, a member of professor John Martinis’ group at UCSB. He and doctoral candidate Julian Kelly Nature paper de- scribing a superconductor-based quantum architecture that paves the path toward those goals. Superconductor platforms enable construction of large quantum circuits compatible with inexpensive microfabrication. The approach used by Martinis’ group uses a Josephson junction design, which uses aluminum as the superconductor and 2-nm thin barriers of aluminum oxide to store quantum bits on a sapphire substrate. The key to making the surface code work is a 2-D array of Xmon qubits in which quantum logic and measure- ment correct the errors (doi:10.1038/ nature13171). The processor chip has nearest-neighbor coupling. The gates modify the state of one qubit conditional on the state of the other, thus entangling them. The photons that are sent to the qu- bits have a 6-GHz microwave frequency, which can be generated electrically. To eliminate thermal noise at room temperature, superconductor-based EricLucero/UCSB 614QuantumComputing.indd 38 5/27/14 12:45 PM Workflow steering Active learning – design optimization and UQ Hybrid accelerated simulation 2015 2021 2018 2025 Analytics and simulation convergence
  • 29. LLNL-PRES-746388 29 Integrated machine learning and simulation could enable dynamic validation of complex models High-fidelity simulation Ensembles of simulation sets CORAL computing architectures power the dynamic validation loop High dimensional model parameters Hypothesis generation – use the ML model to predict parameters for experimental data Requests for new experiments and data Distribution of predicted properties
  • 30. LLNL-PRES-746388 30 § The advantages that led us to select Sierra generalize —Power efficiency —Network advantages of “fat nodes” —Clear path for performance portability —Balancing capabilities/costs implies complex memory hierarchies § Broad collaboration is essential to successful deployment of increasingly complex HPC systems —Computing centers collaborate on requirements —Vendors collaborate on solutions Sierra and its EA systems are beginning an accelerator-based computing era at LLNL Heterogeneous computing on Sierra represents a viable path to exascale