Chasing the Rainbow – National Computational Infrastructure’s Pursuit of High-Performance OpenStack Cloud: Andrew Howard, NCI

nci.org.au
nci.org.au
@NCInews
National Computational Infrastructure’s
Pursuit of High-Performance in OpenStack Clouds
Andrew Howard & Matthew Sanderson
HPC and Cloud Systems
National Computational Infrastructure,
The Australian National University

nci.org.au2
o NCI Contributors
o Dr. Muhammad Atif
o Mr. Simon Fowler
o Mr. Jakub Chrzeszczyk
o Dr. Ching-Ye (Leif) Lin
o Dr. Benjamin Menadue
Thanks to my colleagues

nci.org.au3
o NCI Overview
o Why we are interested in HPC Clouds ?
o NCI Cloud past and present
o What have we done to implement a HPC Cloud
o Containers
o MPI Performance under Docker
o Conclusion
o Questions
Agenda

nci.org.au4
NCI: an overview
Mission: World-class, high-end computing services for Australian research and innovation
What is NCI:
• Australia’s most highly integrated e-infrastructure environment
• Petascale supercomputer + highest performance research cloud + highest performance
storage in the southern hemisphere
• Comprehensive and integrated expert service
• National/internationally renowned support team
NCI is national and strategic:
• Driven by national research priorities and excellence
• Engaged with research institutions/collaborations and industry
• A capability beyond the capacity of any single institution
• Sustained by a collaboration of agencies/universities
NCI is important to Australia because it:
• Enables research that otherwise would be impossible
• Enables delivery of world-class science
• Enables interrogation of big data, otherwise impossible
• Enables high-impact research that matters; informs public policy
• Attracts and retains world-class researchers for Australia
• Catalyses development of young researchers’ skills
Research Outcomes
Communities and
Institutions/
Access and Services
Expertise Support
and
Development
HPC Services 
Virtual Laboratories/ 
Data-intensive
Services
Integration
Compute (HPC/Cloud)  
Storage/Network
Infrastructure

nci.org.au5
NCI today: comprehensive, integrated, 
quality service, innovative and valued
Facts and Figures
• Supercomputer (Raijin): 1.2 petaflops (1,200,000,000,000,000 operations/sec)
– 57,492 cores, 160 Tbytes memory, 10 petabytes storage, 9 Tbit/sec backplane
– Australia’s highest sustained performance research supercomputer
• HPC Cloud: 3,200 cores, supercomputer spec. for orchestrating data services
• Global integrated storage (highest performance filesystems in Australia)
– 20 PB disk (up to120 Gbytes/sec b/w); 40 petabytes of tape for archive purposes
• Power consumption: 1.6-2.0 megawatts
• Service researchers at 30 universities, 5 national science agencies and 2 MRIs
• ~2,500 research users; 1,400 journal articles supported by NCI services
• Support for more than $50M of national competitive research grants annually
• One-third of Fellows elected to Australian Academy of Science (2014-15) are NCI users
Scale
• HPC and data infrastructure: $47M replacement value (NCRIS, Aust. Gov’t)
• Purpose built data centre: $24M replacement value (2012)
• Recurrent operations: $17-18M p.a. (partners: $11+M; NCRIS: $5+M)
– Co-investment: Science agencies ($6M p.a.), Universities and ARC ($5+M p.a.)
Expert, agile and secure
• 60 expert staff: operations, user support, high-performance computing and data,  
collections management/curation, visualisation, virtual lab development, etc.
• Driven by the goals of researchers and research institutions
• Annual IT security audits

nci.org.au6
Inside the 900 sq. m. machine room

nci.org.au7
Supports the full gamut of research
pure strategic applied industry
• Fundamental
sciences
• Mathematics,
physics,
chemistry,
astronomy,
• ARC Centres of
Excellence
(ARCCSS,
CAASTRO,
CUDOS)
• Research with an
intended strategic
outcome
• Environmental,
medical,
geoscientific
• e.g., energy
(UNSW), food
security (ANU),
geosciences
(Sydney)
• Supporting
industry and
innovation
• e.g., ANU/UNSW
startup,
Lithicon, sold
for $76M to US
company FEI in
2014;
multinational
miner
• Informing
public policy;
real economic
impact
• Climate
variation,
next-gen
weather
forecasting,
disaster
management
(CoE, BoM,
CSIRO, GA)

nci.org.au8
Services
• Services and Technologies (~30 staff)
– Operations— robust/expert/secure (20 staff incl. 4 vendor contracted)
– HPC
• Expert user support (9)
• Largest research software library in Australia (300+ applications in all fields)
– Cloud
• High-performance: VMs, Clusters
• Secure, high-performance filesystem, integrated into NCI workflow environment
– Storage
• Active (high-performance Lustre parallel) and archival (dual copy HSM tape);
• Partner shares; Collections; Partner dedicated
• Research Engagement and Innovation (~20 staff)
– HPC and Data-Intensive Innovation
• Upscaling priority applications (e.g., Fujitsu-NCI collaboration on ACCESS),
• Bioinformatics pipelines (APN, melanoma, human genome)
– Virtual Environments
• Climate/Weather, All-sky Astrophysics, Geophysics, etc. (NeCTAR)
– Data Collections
• Management, publication, citation— strong environmental focus + other
– Visualisation
• Drishti, Voluminous, Interactive presentations

nci.org.au9
Virtual Environments and Laboratories

nci.org.au
Moving to friction-free environments, e.g virtual desktops

nci.org.au
Courtesy: Geoscience Australia
Shared Science Platforms for Shared Science Services

nci.org.au12
NCI provides user with Data as a Service
User generates/
transfers data
NCI provides fast
data storage
Data Management
Portal
HPC
Data Curation,
Publish, Citation
Web based real-time analytics
software,
Virtual Desktop Interface,
Virtual Laboratory, and other
services
Data Manager
completes DMP
and creates a
catalogue
Super
computer
users
Paper and Data
published
Data visualisation
NCI Vislab
Data sharing and
re-use
End-to-end Data Life Cycle

nci.org.au13
• The Climate & Weather Science Laboratory
(CWSLab) is an innovation in climate data
analysis enabled by NCI via NeCTAR funding
• Ideal for performing interactive analysis, code
development, visualising data and publication
writing
• Analogous to local computer but with access to
many petabytes of climate & weather data
• Virtual Desktop
Infrastructure established
with access to climate
data
• Users log in to a desktop
interface
Earth systems & environmental science data in cloud
computing

nci.org.au
Cloud Infrastructure
o NCI has been Cloud Computing since 2009
o RedHat OpenStack Cloud. (2013)
o 384 core private cloud.
o Enterprise grade.
o Typically for Virtual Laboratories.
o Uptime of 100% for past two years
o Icehouse (2014)
o Migrate nova-network to Neutron
o 56G Ethernet
o Ceph volume services added
o Scale up from 32 nodes to 100
o Kilo (2015)
o Power efficiency improvements reduce idle
load from 120W to 65W
o Increased overcommit ratio
14

nci.org.au
o NeCTAR Research Cloud (2013 – Public Cloud).
o Iaas and PaaS
o Foundation node of NeCTAR (Australia’s National E-research cloud)
o Intel Sandy Bridge (3200 cores with Hyper Threading).
o Full Fat Tree 56G Ethernet (Mellanox)
o Higher initial cost but provides consistent network performance and
flexibility
o 800Gb of SSDs per compute node
o 2x400Gb SSDs in RAID-0
o Access to 0.5Pb of Ceph storage on the same fabric.
o Delivering on-demand research computing
15

nci.org.au16
o Tenjin Partner Cloud (2013)
o Flagship Cloud for data intensive compute.
o Same hardware platform as NeCTAR Cloud
o Two zones:
o Density (Overcommit of CPUs)
o Performance (No CPU or memory overcommit)
o RDO with Neutron and Centos 7.X.
o Architected to support both the high Computational and I/O
performance required for “big data” research.
o 2x400Gb SSDs per compute node in RAID-0 (800Gb per node)
o Access to ~1 Pb of Ceph storage
o Access to 30 Pb of Lustre storage
o SR-IOV, FFT and 56G Ethernet.
o On-demand access to GPU nodes.
o Federated with NCI HPC environment.

nci.org.au17
o InfiniCloud (Experimental)
o FDR (56Gb) Infiniband Cloud
o IceHouse then Kilo – Heavily Modified at NCI. Based on Mellanox
recipe.
o Virtual Functions
o Mellanox InfiniBand HCA is presented into Virtual Machines via SR-
IOV
o InfiniBand PKey to VLAN mapping
o Near line-rate IB performance
o Once stable, Tenjin may move to native IB.
o Containers
o Docker
o Rocket?

nci.org.au18
Job statistics on Raijin- Users are really into parallel jobs
NCI’s Awesome dashboard
Why a High Performance Cloud?

nci.org.au19
o Complement NCI supercomputer offerings.
o Accelerate processing of single Node jobs
o Virtual Laboratories.
o Remote Job Submission.
o Visualisation.
o Serving Research data to the Web
o Requiring access to Global file-system at NCI.
o On-Demand GPU access.
o Workloads not best suited for Lustre.
o Local scratch is SSD on NCI Cloud compared to SATA HDD on Raijin.
o Pipelines and workloads that are not suited for supercomputer
o Packages that cannot/will not be supported.
o Proof of concepts before making a big run.
o Cloud burst
o Offloading single node jobs to the Cloud when the supercomputer system
heavily used.
o Student Courses.
o RDMA (using NeCTAR)
Why a High Performance Cloud?

nci.org.au20
o Many research workloads utilise very large data sets
o Secure access to data in place
o Seamlessly combine resources across NCI HPC and Cloud
without copying data into and out of the Cloud
o Migrate workloads transparently between domains (HPC, Cloud)
o On-demand provisioning
o Legacy and/or emerging elastic workflows
o Provide a wider range of services to NCI users
o GPU clusters
o Utilise the most appropriate and energy efficient hardware to
achieve research outcomes
Combining computation and data

nci.org.au21
10 GigE
/g/data 56Gb FDR IB Fabric
/g/data1
~7.4PB
/g/data2
~6.5PB
/short
7.6PB
/home, /system,
/images, /apps
Cache 1.0PB,
Tape 12.3PB
Massdata /g/data Raijin FS
VMware
OpenStack
Tenjin
NCI data
movers
ToHuxleyDC
Raijin 56Gb FDR IB Fabric
Internet
Raijin Compute
Raijin Login +
Data movers
/g/data3
~7.3PB
OpenStack
NeCTAR
Ceph
NeCTAR
0,.5 PB
Tenjin
0.5PB
NCI Systems Connectivity

nci.org.au22
o Elements which differentiate NCI HPC and Cloud systems
o Workflows
o Communications architecture
o InfiniBand and Ethernet
o InfiniBand
o FDR 56Gbs and EDR 100Gbs
o Lossless - full fat tree
o Deterministic network latency and throughput
o Hardware offload for communication through RDMA
o Kernel and TCP/IP stack bypass
o Ethernet
o 10Gbps, 40Gbps, 56Gbps and 100Gbps
o 10G is typical for Cloud presentation
o Can be lossless or a traditional switched network
o RDMA
o Remote Direct Memory Access
o Offloads communication from operating system network stack
o Heavily used in HPC applications through various MPI libraries
Comparing Cloud System performance

nci.org.au23
Why are packet loss and latency important
Image: ESNet

nci.org.au24
o What are we measuring ?
o Can traditional HPC level MPI applications run effectively
within a container environment ?
o How do latency and throughput compare to our baseline
HPC performance ?
o Comparison of MPI RDMA performance in various
environments
o Native InfiniBand (Full Fat Tree)
o Ethernet and RoCE (Full Fat Tree and Switched)
o RDMA in a container
o How does it compare to Bare Metal performance
Examining container performance

nci.org.au25
Cluster Architecture Interconnect Loc
Raijin Xeon(R) CPU E5-2670 @ 2.60GHz (Sandy
Bridge)
Mellanox FDR
Infiniband - FFT
NCI
Tenjin Intel Xeon E312xx @ 2.60 GHz (Sandy
Bridge)
Mellanox FDR
Infiniband, flashed to
56G Ethernet- FFT
NCI
Tenjin 
(Container)
Intel Xeon E312xx @ 2.60 GHz (Sandy
Bridge)
Mellanox FDR
Infiniband, flashed to
56G Ethernet- FFT
NCI
InfiniCloud Intel(R) Xeon(R) CPU E5-2650 0 @
2.00GHz
Mellanox FDR
Infiniband
NCI
10G-Cloud AMD Opteron 63xx 10G Ethernet
o OpenMPI 1.10
o All applications compiled with GCC used with -O3. The Intel Compilers were not used, to
achieve a fair comparison.
o All clouds were based on OpenStack. (Icehouse, Juno, Kilo)
o Preliminary results- 10 runs, discarded max and min results and took average
o Comprehensive results will be presented in a white paper.
Preliminary Results (Platform)

nci.org.au26
Point to Point Latency

nci.org.au27
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M
Bandwidth!(MB/Sec)
Message!Size!in!bytes
OSU!Point!to!Point!Bandwidth!(MB/ Sec)!- Higher!is!Better
#!BW-AWS-WEB #!10GbE-Cloud #!Tenjin-TCP
#!Tenjin-Yalla #!Tenjin-RoCE #!Tenjin-Container
#!InfiniCloud-VM #!InfiniCloud-HY Raijin
Point to Point Bandwidth

nci.org.au
Courtesy: Dr. Ching-Yeh (Leaf) Lin at NCI
Trinity is a bioinformatics de novo sequence-assembly package consists of three programs: Inchworm (openmp,
gcc), Chrysalis (openmp, gcc) and Butterfly (java). The calculation was carried out using the procedure published
by BJ Haas et al, Nature Protocols 8, 1494–1512 (2013)
28
Bioinformatics Workload
Speedups compared to 10G-XXX-Cloud
16 CPU-One Compute Node (higher is better)
0
0.5
1
1.5
2
Inchworm Chrysalis Butterfly
Raijin Tenjin 10G-XXX-Cloud
Bioinformatics workload – Single compute node

nci.org.au29
Speed-up of NPB Class 'C' with 32 and 64 Processes
Normalized w.r.t. 32 Processes on 10G Ethernet Cloud (Higher is better)
0
2.5
5
7.5
10
CG EP FT IS LU MG
10GbE-Cloud-32P Tenjin-32P Tenjin Container-32P Raijin-32P 10GbE-Cloud-64P Tenjin-64P
Tenjin Container-64P Raijin-64P
NAS Parallel Benchmarks

nci.org.au30
- ApoA1, measured
s time-step
- 16 CPUS per Node
- Lack of NUMA
- TCP btl on cloud
worked better
than MXM
NAMD Speed-up
Speedup
0
12.5
25
37.5
50
Number of CPUs
1 2 4 8 16 32 64 128
Tenjin Tenjin-Containers Raijin
Molecular Dynamics Code - NAMD

nci.org.au
ComputeTime(s)(loweris
better)
1.00
10.00
100.00
1000.00
10000.00
Number of CPUs
1 2 4 8 16 32 64 128
RDO TCP RDO TCP MXM RDO OIB RDO OIB MXM RJ TCP
RJ TCP MXM RJ OIB RJ OIB MXM
Courtesy: Dr. Benjamin Menadue
Computational Physics: Custom-written, hybrid Monte Carlo code for generate gauge fields for Lattice QCD. For each iteration,
calculating the Hamiltonian involves inverting a large, complex matrix using CGNE. Written in Fortran, using pure MPI (no
threading).
31
Scaling still an issue – NUMA

nci.org.au32
NCI’s commitment to HPC in the Cloud
o NCI is engaged with many partners providing Cloud based HPC and HTC
solutions to researchers. These are usually released as Open Source.
o Slurm-Cluster
o Enables a researcher to quickly and easily build a cluster in the cloud
backed by the Slurm scheduler. It is targeted to Tenjin and NeCTAR
clouds, but should work on any OpenStack deployment.  
https://github.com/NCI-Cloud/slurm-cluster
o Intel Grant for Cluster in the Cloud
o Worked with Amazon via LinkDigital
o Raijin in a Box in preproduction and to be made available to the AWS
market place.
o How to build a supercomputer on AWS with spot instances. 
https://www.youtube.com/watch?v=KG3SKaf7yEw

nci.org.au33
NCI’s commitment to HPC in the Cloud
o Applying NCI’s depth of expertise in HPC application tuning to deliver high
performance, secure computing environments in the Cloud for Australian
Researchers.
o Bringing “Cloud to HPC”
o Containers
o Docker
o “Bring your own workflow” model

nci.org.au
o We can support seamless high performance research workloads with large data access
requirements across multiple platforms
o Parallel jobs can run on the Cloud, but is it HPC?
o Not at the moment.
o Cloud is suited to high throughput computing (HTC), ease of provisioning and
specific workloads
o Traditional HPC provides the best performance for larger parallel applications with
MPI requirements.
o A common underlying hardware architecture shared between our HPC and Cloud platforms
provides application portability and flexibility in provisioning a system in either role.
o QPI and NUMA can have a large impact on performance
o Single Node performance is on par with bare metal (if the application is not memory
bound)
o Locality Aware Scheduling (NUMA and Network awareness)
o Our benchmarks were limited by the QPI performance of SandyBridge.
o NCI plans to deploy bare-metal provisioning using Ironic
34
Conclusion

nci.org.au
nci.org.au
@NCInews
Thank You
andrew.howard@anu.edu.au
matthew.sanderson@anu.edu.au

Chasing the Rainbow – National Computational Infrastructure’s Pursuit of High-Performance OpenStack Cloud: Andrew Howard, NCI

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (13)

Más de OpenStack

Más de OpenStack (20)

Último

Último (20)

Chasing the Rainbow – National Computational Infrastructure’s Pursuit of High-Performance OpenStack Cloud: Andrew Howard, NCI