Huawei outlines requirements for developing a competitive ARM-based HPC solution. They plan a two-phase strategy using existing Hi1616 platforms followed by more powerful Hi1620 platforms. Requirements include high-performance CPUs, optimized software stack, support for applications and ISVs, and cloud deployment. Huawei aims to demonstrate ARM's value in HPC by 2018-2020 through partnerships and turnkey solutions.
1. Huawei’s requirements for
the ARM based HPC solution readiness
Joshua.Mora@Huawei.com
Chief Architect microprocessor and applications for HPC and BigData
R&D IT Product Line.
Futurewei, Santa Clara, USA
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
2. 2
• A high level review of a wide range of requirements to architect an
ARM based competitive HPC solution is provided.
• The review combines both Industry and Huawei’s unique views with
the intend to :
• communicate openly the alignment and support in ongoing
efforts carried over by other ARM key players
• brief on the areas of differentiation that Huawei is investing
towards the research, development and deployment of
homegrown ARM based HPC solution(s).
Objectives of the presentation
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
3. 3
Market opportunities and Timelines
• ARM, partners and vendors both on HW and SW are creating a set
of competitive products that customers are evaluating and investing
with visibility in 2018-2020.
• ARM based HPC initiatives and business cases currently lead by
customers in research institutions are a clear server market
reaction to the stagnation of x86 based solutions faced in the past
~4 years. The result is a competitive performance of the ARM core
and SOCs and the growth/maturity of core SW with the help of key
entities such as Linaro and ARM vendors.
• We believe at Huawei that 2018-2020 is a crucial window of
opportunities to demonstrate the value of ARM based solutions,
among others in the HPC space.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
4. 4
Execution model
• Our execution model will allow Huawei to participate in that window of
opportunities aforementioned.
• The strategy for the execution of the development of ARM based HPC
solutions has 2 phases
• Phase 1 (development ready): A variety of Hi1616 based
platforms (reliable and performant, ~ Broadwell) have been
available to enable partners to build both HW and SW
ecosystems (core components of the HPC solution). Including
applications.
• Phase 2 (business ready): A similar number of Hi1620 based
platforms (with competitive performance against currently
available x86 CPUs) is becoming soon available to perform an
“smooth/quick” update/transition from phase 1.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
5. 5
Very High Level Requirements
• Ultimate objective: turn key HPC solutions.
• Define/architect from HW perspective:
• compute tier, ARM based, with and without accelerators
• storage tier, ARM based, with and without accelerators
• Networking, support for IB and RoCE, with “smart” capabilities
• Define/architect from SW perspective:
• BIOS/FW platform specific
• OS tuning (incl. drivers and system libraries) and certification,
platform agnostic
• HPC SW stack optimized and certified for specific platforms
• Applications optimized and certified for specific platforms
• Deployment models: on premise and cloud
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
6. 6
Speeding up ARM based HPC solution adoption
• Focus on development all the way to the turn key solutions (like
cluster management, containers and applications), not just core
components (like drivers, OS, MPI, compiler, math library).
• Investment on deployment of ARM based HPC solution as a service
in the cloud: Customers should not need to be aware what
architecture is delivering the HPC service (ie. HPC application
execution must meet performance targets in an affordable way)
• This effort requires alignment with ARM, HW vendors, SW vendors,
cloud providers and communicate it clearly to customers through a
variety of events such as this one.
• It cannot be easily and solely driven by single ARM based vendor.
• Huawei acknowledges and supports therefore these activities that
will pave the road for ARM based business.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
7. 7
HW Requirements for ARM based HPC solution
• CPU
• race for memory bandwidth, window of opportunity with 8 memory
channels/CPU. Memory frequencies upto 3200MHz. Leading into
2P system memory bandwidth >300GB/s (measured)
• Large core count with competitive performance ~64cores/CPU
(without SMT/HT) at high core frequency upto 3GHz
• >128bit vector instructions
• Low local and remote random memory access, < 90nsec,
<200nsec respectively
• Efficient hardware prefetchers to get high single core bandwidth
>20GB/s for few cores in numanode to saturate memory
controller bandwidth (good for core licensed applications).
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
8. 8
SW Requirements for ARM based HPC solution
• Ongoing development efforts in 2 complementary solutions
• Fully opensource, “community based support/you are mostly on your
own”
• Fully commercial, “we support you everywhere”
-
-
-
Cost/Revenue/Margin
Value added
-optimizations
-support
-basic performance
Commercial
solution
Open source
solution
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
9. 9
SW Requirements for ARM based HPC solution
• For either one, the SW stack looks as follows:
• BIOS, OS, drivers
• Cluster management for monitoring, provisioning, scheduling
• Containers for application deployment
• Development tools (compiler, profiler, debugger)
• Libraries (Math, MPI)
• Applications (different verticals)
• Parallel File System
• The open source effort is around openHPC (activity reviewed with
Linaro) and application focus is driven by current business
opportunities such as in CFD, weather, bioinformatics, astrophysics.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
10. 10
SW Requirements for ARM based HPC solution
• Partnerships with ISVs:
• ISVs are fundamental for the healthy growth of the ARM
business if we are pursuing the turn key solutions.
• While our final objective is to deliver good performance on our
platform, we are encouraging the ISVs to reach out the other
ARM vendors in order to grow the portfolio of ARM based
solutions available to customers in 2018-2020.
• We follow the 2 phase execution model with the ISVs.
• Reseller agreements to facilitate the adoption of high quality
software stacks optimized for ARM.
• We would pursue to deploy the turn key solutions with those
ISVs both on premise and in the cloud to speed up the adoption.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
11. 11
• Math libraries supporting Hybrid MPI + openMP for multi chip module
SOCs with low communication/synchronization overheads within node.
• Optimized multithreaded libraries based on task scheduling of DAG
(Directed Acyclic Graph) to leverage the high core count CPU.
• Opportunities to reduce bandwidth requirements and make it more
scalable for large core count architectures.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
12. 12
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
13. 13
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
14. 14
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
15. 15
HPCG: Leveraging DAG for fuse of Gauss-Seidel with Residual (bw reduction)
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
100 100 100 100 100 100
237.6 228.2
241.6
220.7
175.7
185.8
64
128
256
1 2 4 8 16 32
Relativeperformanceincrease
#cores
problem size 96x96x96 in Hi1616
FW GS + R FW FS GS FW FS GS Opt
1.0
2.0
4.2 7.7
11.2
23.6
1
2
4
8
16
32
1.0
2.0
4.0
8.0
16.0
32.0
1 2 4 8 16 32
Speedup
#cores
problem size 96x96x96 in Hi1616
FW GS + R FW FS GS FW FS GS Opt Ideal
Superlinear
cache effects
wrt 1 core
Memory bandwidth
Saturation 12/16
cores in numanode
FW: Forward Pass, similar benefits for Backward pass
1.8Xbetter
16. 16
• MPI validation, optimization and certification across a set of
configurations:
• Inter node communication with NIC type: IB, RoCE
• Intra node communication
• Operating systems: opensource and commercial
• Compiler: opensource and commercial
• MPI primitives: P2P, collectives
• Platform optimization and certification
• ISV + MPI optimization and certification
• Integration of ISV + MPI with cluster management
• Participating in OpenUCX to drive features and optimization on ARM
• Provide early access to clusters through HPC-AI advisory council.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
17. 17
Services Requirements for ARM based HPC solution
• Dedicated seasoned team with HPC skills (yes, we are hiring!)
spread out in China, EU and US to optimize ARM based HPC
solutions delivering :
• optimizations on open source applications
• Support to ISVs in their porting, optimization and certification
efforts.
• Training on ARM CPU, platform, software stacks.
• Benchmarking team for business support
• That very same team has high interaction with Hisilicon team to
squeeze performance on applications and to drive new features for
next generation CPUs for HPC.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
18. 18
Services Requirements for ARM based HPC solution
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
• Dedicated HPC team with multidisciplinary and overlapped skills
Group skills 1: CPU centric
CPU Architecture, compiler technology, algorithms, performance modeling,
profiling
Group skills 2: System centric
CPU architecture, system architecture, networking, parallel file systems, Operating
systems and driver tuning.
Group skills 3: Math centric
Linear algebra, statistics, algorithms, data structures, MPI, OpenMP, partial
differential equations, sometimes also one of the verticals, numerical methods
Group skills4: Vertical centric
Individuals with vertical market experience, also strong on linear algebra, partial
differential equations, numerical methods
19. 19
If you want to know more
• Both vendors and customers are encouraged to sign an NDA for
disclosure of details of Huawei’s ARM based HPC solutions and
availability timelines
• We are planning to unveil progressively more details within 2H 18
at multiple events like SC18 including both open source and
commercial application demos.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018