Audience: Intermediate
About: With a mission to foster ambitious and aspirational research objectives, the National Computational Infrastructure (NCI) in Australia operates world-class computing services for a collaboration of Australian national research organisations and research-intensive universities.
As an increasing number of applications such as Bio-informatics leverage Big Data, and more of our virtual laboratories call for elastic, self-service provisioning and research data sharing. NCI has created platforms which combine the flexibility of OpenStack Cloud provisioning with both high speed Ethernet and high performance InfiniBand fabrics to deliver a union of compute and I/O which challenges traditional HPC performance.
In this presentation. we will discuss the various approaches we explored leading to our current implementation, share some of our performance results and examine the role that high-speed networks and fabrics play in enhancing NCI cloud performance and efficiency.
Speaker Bio: Andrew Howard – HPC and Cloud, National Computational Infrastructure
Andrew has many decades of hands-on technical, diplomatic and logistics experience covering a wide range of standard and bespoke technologies, languages and applications within Industry, Government and Academia nationally and internationally.
A fascination with computers and networking as a student lead to Andrew pioneering implementations of networks starting with installing the first Ethernet in Australia at Digital Equipment Corp in the early 80’s, the first fibre optic extended Ethernet in the mid 80’s, national converged DECnet, TCP/IP, SNA and X.25 networks in the late 80’s, ISDN, Frame relay and secure networks in the early 90’s, development and operation of one of the first Internet Service Providers in Australia in the 90’s, managing the development and delivery of the next generation Australian National Research and Education Network GrangeNet and AARNet3 networks, managing the development and operation of the Australian Government ICON fibre network and setting world speed records in International networking in the early part of this century.
Since joining the Australian National University in 2006 he has managed the evaluation, development and implementation of high speed communications systems, fibre networks and collaboration facilities. He represents the University at International Research Network groups including APAN, Internet2 and TNC and has held the positions as Co-Chair of a number of APAN Working Groups. As Co-Chair of the APAN E-Culture working group for many years he lead the production of the Dancing Q and Dancing Across Oceans Performance events.
OpenStack Australia Day - Sydney 2016
https://events.aptira.com/openstack-australia-day-sydney-2016/
2. nci.org.au2
o NCI Contributors
o Dr. Muhammad Atif
o Mr. Simon Fowler
o Mr. Jakub Chrzeszczyk
o Dr. Ching-Ye (Leif) Lin
o Dr. Benjamin Menadue
Thanks to my colleagues
3. nci.org.au3
o NCI Overview
o Why we are interested in HPC Clouds ?
o NCI Cloud past and present
o What have we done to implement a HPC Cloud
o Containers
o MPI Performance under Docker
o Conclusion
o Questions
Agenda
4. nci.org.au4
NCI: an overview
Mission: World-class, high-end computing services for Australian research and innovation
What is NCI:
• Australia’s most highly integrated e-infrastructure environment
• Petascale supercomputer + highest performance research cloud + highest performance
storage in the southern hemisphere
• Comprehensive and integrated expert service
• National/internationally renowned support team
NCI is national and strategic:
• Driven by national research priorities and excellence
• Engaged with research institutions/collaborations and industry
• A capability beyond the capacity of any single institution
• Sustained by a collaboration of agencies/universities
NCI is important to Australia because it:
• Enables research that otherwise would be impossible
• Enables delivery of world-class science
• Enables interrogation of big data, otherwise impossible
• Enables high-impact research that matters; informs public policy
• Attracts and retains world-class researchers for Australia
• Catalyses development of young researchers’ skills
Research Outcomes
Communities and
Institutions/
Access and Services
Expertise Support
and
Development
HPC Services
Virtual Laboratories/
Data-intensive
Services
Integration
Compute (HPC/Cloud)
Storage/Network
Infrastructure
5. nci.org.au5
NCI today: comprehensive, integrated,
quality service, innovative and valued
Facts and Figures
• Supercomputer (Raijin): 1.2 petaflops (1,200,000,000,000,000 operations/sec)
– 57,492 cores, 160 Tbytes memory, 10 petabytes storage, 9 Tbit/sec backplane
– Australia’s highest sustained performance research supercomputer
• HPC Cloud: 3,200 cores, supercomputer spec. for orchestrating data services
• Global integrated storage (highest performance filesystems in Australia)
– 20 PB disk (up to120 Gbytes/sec b/w); 40 petabytes of tape for archive purposes
• Power consumption: 1.6-2.0 megawatts
• Service researchers at 30 universities, 5 national science agencies and 2 MRIs
• ~2,500 research users; 1,400 journal articles supported by NCI services
• Support for more than $50M of national competitive research grants annually
• One-third of Fellows elected to Australian Academy of Science (2014-15) are NCI users
Scale
• HPC and data infrastructure: $47M replacement value (NCRIS, Aust. Gov’t)
• Purpose built data centre: $24M replacement value (2012)
• Recurrent operations: $17-18M p.a. (partners: $11+M; NCRIS: $5+M)
– Co-investment: Science agencies ($6M p.a.), Universities and ARC ($5+M p.a.)
Expert, agile and secure
• 60 expert staff: operations, user support, high-performance computing and data,
collections management/curation, visualisation, virtual lab development, etc.
• Driven by the goals of researchers and research institutions
• Annual IT security audits
7. nci.org.au7
Supports the full gamut of research
pure strategic applied industry
• Fundamental
sciences
• Mathematics,
physics,
chemistry,
astronomy,
• ARC Centres of
Excellence
(ARCCSS,
CAASTRO,
CUDOS)
• Research with an
intended strategic
outcome
• Environmental,
medical,
geoscientific
• e.g., energy
(UNSW), food
security (ANU),
geosciences
(Sydney)
• Supporting
industry and
innovation
• e.g., ANU/UNSW
startup,
Lithicon, sold
for $76M to US
company FEI in
2014;
multinational
miner
• Informing
public policy;
real economic
impact
• Climate
variation,
next-gen
weather
forecasting,
disaster
management
(CoE, BoM,
CSIRO, GA)
8. nci.org.au8
Services
• Services and Technologies (~30 staff)
– Operations— robust/expert/secure (20 staff incl. 4 vendor contracted)
– HPC
• Expert user support (9)
• Largest research software library in Australia (300+ applications in all fields)
– Cloud
• High-performance: VMs, Clusters
• Secure, high-performance filesystem, integrated into NCI workflow environment
– Storage
• Active (high-performance Lustre parallel) and archival (dual copy HSM tape);
• Partner shares; Collections; Partner dedicated
• Research Engagement and Innovation (~20 staff)
– HPC and Data-Intensive Innovation
• Upscaling priority applications (e.g., Fujitsu-NCI collaboration on ACCESS),
• Bioinformatics pipelines (APN, melanoma, human genome)
– Virtual Environments
• Climate/Weather, All-sky Astrophysics, Geophysics, etc. (NeCTAR)
– Data Collections
• Management, publication, citation— strong environmental focus + other
– Visualisation
• Drishti, Voluminous, Interactive presentations
12. nci.org.au12
NCI provides user with Data as a Service
User generates/
transfers data
NCI provides fast
data storage
Data Management
Portal
HPC
Data Curation,
Publish, Citation
Web based real-time analytics
software,
Virtual Desktop Interface,
Virtual Laboratory, and other
services
Data Manager
completes DMP
and creates a
catalogue
Super
computer
users
Paper and Data
published
Data visualisation
NCI Vislab
Data sharing and
re-use
End-to-end Data Life Cycle
13. nci.org.au13
• The Climate & Weather Science Laboratory
(CWSLab) is an innovation in climate data
analysis enabled by NCI via NeCTAR funding
• Ideal for performing interactive analysis, code
development, visualising data and publication
writing
• Analogous to local computer but with access to
many petabytes of climate & weather data
• Virtual Desktop
Infrastructure established
with access to climate
data
• Users log in to a desktop
interface
Earth systems & environmental science data in cloud
computing
14. nci.org.au
Cloud Infrastructure
o NCI has been Cloud Computing since 2009
o RedHat OpenStack Cloud. (2013)
o 384 core private cloud.
o Enterprise grade.
o Typically for Virtual Laboratories.
o Uptime of 100% for past two years
o Icehouse (2014)
o Migrate nova-network to Neutron
o 56G Ethernet
o Ceph volume services added
o Scale up from 32 nodes to 100
o Kilo (2015)
o Power efficiency improvements reduce idle
load from 120W to 65W
o Increased overcommit ratio
14
15. nci.org.au
o NeCTAR Research Cloud (2013 – Public Cloud).
o Iaas and PaaS
o Foundation node of NeCTAR (Australia’s National E-research cloud)
o Intel Sandy Bridge (3200 cores with Hyper Threading).
o Full Fat Tree 56G Ethernet (Mellanox)
o Higher initial cost but provides consistent network performance and
flexibility
o 800Gb of SSDs per compute node
o 2x400Gb SSDs in RAID-0
o Access to 0.5Pb of Ceph storage on the same fabric.
o Delivering on-demand research computing
15
Cloud Infrastructure
16. nci.org.au16
o Tenjin Partner Cloud (2013)
o Flagship Cloud for data intensive compute.
o Same hardware platform as NeCTAR Cloud
o Two zones:
o Density (Overcommit of CPUs)
o Performance (No CPU or memory overcommit)
o RDO with Neutron and Centos 7.X.
o Architected to support both the high Computational and I/O
performance required for “big data” research.
o 2x400Gb SSDs per compute node in RAID-0 (800Gb per node)
o Access to ~1 Pb of Ceph storage
o Access to 30 Pb of Lustre storage
o SR-IOV, FFT and 56G Ethernet.
o On-demand access to GPU nodes.
o Federated with NCI HPC environment.
Cloud Infrastructure
17. nci.org.au17
o InfiniCloud (Experimental)
o FDR (56Gb) Infiniband Cloud
o IceHouse then Kilo – Heavily Modified at NCI. Based on Mellanox
recipe.
o Virtual Functions
o Mellanox InfiniBand HCA is presented into Virtual Machines via SR-
IOV
o InfiniBand PKey to VLAN mapping
o Near line-rate IB performance
o Once stable, Tenjin may move to native IB.
o Containers
o Docker
o Rocket?
Cloud Infrastructure
18. nci.org.au18
Job statistics on Raijin- Users are really into parallel jobs
NCI’s Awesome dashboard
Why a High Performance Cloud?
19. nci.org.au19
o Complement NCI supercomputer offerings.
o Accelerate processing of single Node jobs
o Virtual Laboratories.
o Remote Job Submission.
o Visualisation.
o Serving Research data to the Web
o Requiring access to Global file-system at NCI.
o On-Demand GPU access.
o Workloads not best suited for Lustre.
o Local scratch is SSD on NCI Cloud compared to SATA HDD on Raijin.
o Pipelines and workloads that are not suited for supercomputer
o Packages that cannot/will not be supported.
o Proof of concepts before making a big run.
o Cloud burst
o Offloading single node jobs to the Cloud when the supercomputer system
heavily used.
o Student Courses.
o RDMA (using NeCTAR)
Why a High Performance Cloud?
20. nci.org.au20
o Many research workloads utilise very large data sets
o Secure access to data in place
o Seamlessly combine resources across NCI HPC and Cloud
without copying data into and out of the Cloud
o Migrate workloads transparently between domains (HPC, Cloud)
o On-demand provisioning
o Legacy and/or emerging elastic workflows
o Provide a wider range of services to NCI users
o GPU clusters
o Utilise the most appropriate and energy efficient hardware to
achieve research outcomes
Combining computation and data
22. nci.org.au22
o Elements which differentiate NCI HPC and Cloud systems
o Workflows
o Communications architecture
o InfiniBand and Ethernet
o InfiniBand
o FDR 56Gbs and EDR 100Gbs
o Lossless - full fat tree
o Deterministic network latency and throughput
o Hardware offload for communication through RDMA
o Kernel and TCP/IP stack bypass
o Ethernet
o 10Gbps, 40Gbps, 56Gbps and 100Gbps
o 10G is typical for Cloud presentation
o Can be lossless or a traditional switched network
o RDMA
o Remote Direct Memory Access
o Offloads communication from operating system network stack
o Heavily used in HPC applications through various MPI libraries
Comparing Cloud System performance
24. nci.org.au24
o What are we measuring ?
o Can traditional HPC level MPI applications run effectively
within a container environment ?
o How do latency and throughput compare to our baseline
HPC performance ?
o Comparison of MPI RDMA performance in various
environments
o Native InfiniBand (Full Fat Tree)
o Ethernet and RoCE (Full Fat Tree and Switched)
o RDMA in a container
o How does it compare to Bare Metal performance
Examining container performance
25. nci.org.au25
Cluster Architecture Interconnect Loc
Raijin Xeon(R) CPU E5-2670 @ 2.60GHz (Sandy
Bridge)
Mellanox FDR
Infiniband - FFT
NCI
Tenjin Intel Xeon E312xx @ 2.60 GHz (Sandy
Bridge)
Mellanox FDR
Infiniband, flashed to
56G Ethernet- FFT
NCI
Tenjin
(Container)
Intel Xeon E312xx @ 2.60 GHz (Sandy
Bridge)
Mellanox FDR
Infiniband, flashed to
56G Ethernet- FFT
NCI
InfiniCloud Intel(R) Xeon(R) CPU E5-2650 0 @
2.00GHz
Mellanox FDR
Infiniband
NCI
10G-Cloud AMD Opteron 63xx 10G Ethernet
o OpenMPI 1.10
o All applications compiled with GCC used with -O3. The Intel Compilers were not used, to
achieve a fair comparison.
o All clouds were based on OpenStack. (Icehouse, Juno, Kilo)
o Preliminary results- 10 runs, discarded max and min results and took average
o Comprehensive results will be presented in a white paper.
Preliminary Results (Platform)
28. nci.org.au
Courtesy: Dr. Ching-Yeh (Leaf) Lin at NCI
Trinity is a bioinformatics de novo sequence-assembly package consists of three programs: Inchworm (openmp,
gcc), Chrysalis (openmp, gcc) and Butterfly (java). The calculation was carried out using the procedure published
by BJ Haas et al, Nature Protocols 8, 1494–1512 (2013)
28
Bioinformatics Workload
Speedups compared to 10G-XXX-Cloud
16 CPU-One Compute Node (higher is better)
0
0.5
1
1.5
2
Inchworm Chrysalis Butterfly
Raijin Tenjin 10G-XXX-Cloud
Bioinformatics workload – Single compute node
29. nci.org.au29
Speed-up of NPB Class 'C' with 32 and 64 Processes
Normalized w.r.t. 32 Processes on 10G Ethernet Cloud (Higher is better)
0
2.5
5
7.5
10
CG EP FT IS LU MG
10GbE-Cloud-32P Tenjin-32P Tenjin Container-32P Raijin-32P 10GbE-Cloud-64P Tenjin-64P
Tenjin Container-64P Raijin-64P
NAS Parallel Benchmarks
30. nci.org.au30
- ApoA1, measured
s time-step
- 16 CPUS per Node
- Lack of NUMA
- TCP btl on cloud
worked better
than MXM
NAMD Speed-up
Speedup
0
12.5
25
37.5
50
Number of CPUs
1 2 4 8 16 32 64 128
Tenjin Tenjin-Containers Raijin
Molecular Dynamics Code - NAMD
31. nci.org.au
ComputeTime(s)(loweris
better)
1.00
10.00
100.00
1000.00
10000.00
Number of CPUs
1 2 4 8 16 32 64 128
RDO TCP RDO TCP MXM RDO OIB RDO OIB MXM RJ TCP
RJ TCP MXM RJ OIB RJ OIB MXM
Courtesy: Dr. Benjamin Menadue
Computational Physics: Custom-written, hybrid Monte Carlo code for generate gauge fields for Lattice QCD. For each iteration,
calculating the Hamiltonian involves inverting a large, complex matrix using CGNE. Written in Fortran, using pure MPI (no
threading).
31
Scaling still an issue – NUMA
32. nci.org.au32
NCI’s commitment to HPC in the Cloud
o NCI is engaged with many partners providing Cloud based HPC and HTC
solutions to researchers. These are usually released as Open Source.
o Slurm-Cluster
o Enables a researcher to quickly and easily build a cluster in the cloud
backed by the Slurm scheduler. It is targeted to Tenjin and NeCTAR
clouds, but should work on any OpenStack deployment.
https://github.com/NCI-Cloud/slurm-cluster
o Intel Grant for Cluster in the Cloud
o Worked with Amazon via LinkDigital
o Raijin in a Box in preproduction and to be made available to the AWS
market place.
o How to build a supercomputer on AWS with spot instances.
https://www.youtube.com/watch?v=KG3SKaf7yEw
33. nci.org.au33
NCI’s commitment to HPC in the Cloud
o Applying NCI’s depth of expertise in HPC application tuning to deliver high
performance, secure computing environments in the Cloud for Australian
Researchers.
o Bringing “Cloud to HPC”
o Containers
o Docker
o “Bring your own workflow” model
34. nci.org.au
o We can support seamless high performance research workloads with large data access
requirements across multiple platforms
o Parallel jobs can run on the Cloud, but is it HPC?
o Not at the moment.
o Cloud is suited to high throughput computing (HTC), ease of provisioning and
specific workloads
o Traditional HPC provides the best performance for larger parallel applications with
MPI requirements.
o A common underlying hardware architecture shared between our HPC and Cloud platforms
provides application portability and flexibility in provisioning a system in either role.
o QPI and NUMA can have a large impact on performance
o Single Node performance is on par with bare metal (if the application is not memory
bound)
o Locality Aware Scheduling (NUMA and Network awareness)
o Our benchmarks were limited by the QPI performance of SandyBridge.
o NCI plans to deploy bare-metal provisioning using Ironic
34
Conclusion