Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

/
/ / /
. . ,
. . . . ,
, .
Vancouver
2018
. ,

Agenda
• . 2 7 9 C
• &H H 99 9 D 7 D D /0&
• A 0D D 0 F H 9
• EDE .
E7 D AH 9 DI E7 9 DC C F 7 LP S 01/ 1 & 1 TORN

OpenStack for Scientific Research
• HPC/AI workloads make cloud acceleration a requirement rather than an interesting option.
• Hardware acceleration isn’t new. GPU, ASIC, NVMe, FPGA, etc.
• Current issues to use accelerators in OpenStack
5G Wireless and
Optical Networking
Cloud Processing
and Analytics
50B
Connected Devices
by 20201
Hyper-connected
World
Deluge of Data

Benefits increase over timeBetter Utilization
Greater Flexibility
More Vendor choice
Lower cost of ownership
Implement
Standards
Enable
Solutions
Resource
Pooling
Compute and
Rack API
Storage
Management API
Network Device
Management API
Partner with
OEMs & ISVs
NVMe over
PCIe
NVMe over
Fabrics
FPGA
Accelerators
Interface with
Orchestrators
Build Validated
Solution Stacks
Intel® RSD Vision

Any forward looking information provided here is subject to change without notice
2016 2017 2018
* Other names and brands may be claimed as property of others.
** Dates above refer to delivery of Intel RSD reference code. OEM delivery of solutions are typically launched 1-2 quarters later
Intel RSD v1.2
MODERN
MANAGABILITY
• Open and Modern
HW Management
APIs (Redfish*)
• Pod-Level Architecture
APIs
• Networked-Storage
Services APIs
Intel RSD v2.1
NVMe POOLING OVER
PCIe
• Physical Storage
Disaggregation and
Composability APIs
• Storage Pooling over PCIe
(Direct-Attach)
• Pooled Node
Controller Discovery
Intel RSD v2.3
NVMe POOLING
OVER ETHERNET
• NVMe over Fabrics*
(Ethernet, RDMA)
• Standards-Based
Storage Mgmt.
(SNIA Swordfish*)
• Telemetry for NVMe
over Fabrics*
Available
Now
Available
Now
Features under
investigation:
• Intel persistent
memory support
• FPGA Accelerator
Pooling over PCIe
• Network card
pooling
• Standards-Based
Network Mgmt.
(Yang-to-Redfish*)
In Development
Target: Q2 2018
Intel RSD v2.2
ADVANCED
MANAGABILITY
• Intel Xeon Scalable
Processor Support
• Out-of-Band
Telemetry APIs
• TPM Support
• FPGA Discovery
over PCIe
Available
Now
Intel® RSD Roadmap

Storage Drawer
Compute
Compute
Compute
Compute
Network
Accelerator Drawer
Disaggregation
Spend less up front and
save $$ over time
Storage Pooling over PCIe NVMe over Fabrics (Ethernet)
Two implementation options for hardware disaggregation
Lower
Latency
Greater
Scalability
Intel® RSD - Disaggregation

Storage Pooling – PCIe Direct Attach
7
Pod Manager
§
§
§
Storage TrayCompute Servers
Management Management
Re-timer1Gb Eth.
PSME
Re-timer1Gb Eth.
PSME
Re-timer1Gb Eth.
PSME
Network
Data
Data
PCIe
Switch
1Gb Eth.
PSME
PCIe
PCIe
PCIe
Storage
Pool
16 drives typically
Compute
Radix
Generally 8-16 x4
Latency Add ~1 μSec per I/O
Bandwidth <50% oversubscribed
Use Case Hottest Tier Storage
Cost Low
Shipping NOW with v2.1
Direct Attach PCIe
Storage Pooling
requires PCIe Re-timer cards
in Compute Servers
7 0 1 7 9 2 7 0 9 2 91 19 10 AE F . ID C

Storage Pooling – NVMe over Fabric
8
Storage
Pool
Unlimited
Compute
Radix
Unlimited
Bandwidth May be oversubscribed
Use Case Warm Tier Storage
Cost Medium
Coming in v2.3
Software-based Storage
Pooling over Ethernet
using “NVMe over
Fabrics” protocol
Pod Manager
§
§
§
Storage ServersCompute Servers
§
§
§
RDMA1Gb Eth.
PSME
Data
RDMA1Gb Eth.
PSME
RDMA1Gb Eth.
PSME
Network
RDMA
PCIe
Switch
1Gb Eth.
x86
CPU
PSME
RDMA
PCIe
Switch
1Gb Eth.
x86
CPU
PSME
RDMA
PCIe
Switch
1Gb Eth.
x86
CPU
PSME
0 8 1 72 89 A 8 1 70 2 2 21 FCL I . . . E D

Storage Pooling – NVMe over Fabric
9
Storage
Pool
Unlimited
Compute
Radix
Unlimited
Bandwidth May be oversubscribed
Use Case Warm Tier Storage
Cost Low
Hardware-based Storage
Pooling over Ethernet
using “NVMe over Fabrics”
protocol
Pod Manager
§
§
§
Storage AppliancesCompute Servers
§
§
§
RDMA1Gb Eth.
PSME
Data
RDMA1Gb Eth.
PSME
RDMA1Gb Eth.
PSME
Network
RDMA
PCIe
Switch
1Gb Eth.
PSME
HW
Bridge
RDMA
PCIe
Switch
1Gb Eth.
PSME
HW
Bridge
RDMA
PCIe
Switch
1Gb Eth.
PSME
HW
Bridge
7 0 1 7 9 2 7 0 9 2 91 19 10 AE F . ID C

OpenStack Acceleration Service – Cyborg
Cyborg is an OpenStack project that aims to provide a general purpose management framework for acceleration
resources (i.e. various types of accelerators such as Crypto cards, GPU, FPGA, NVMe/NOF SSDs, ODP, DPDK/SPDK
and so on). So Cyborg will be a good choice to manage NVMe high-speed storage devices in Intel RSD rack, by
considering it as one kind of accelerator.
10
Cyborg-api
Cyborg-dbCyborg-conductor
Cyborg-agent
local cache
vendor-X driver cyborg-generic-driver vendor-Y driver
vendor-X FPGA vendor-Z ipsec vendor-Y NVMe SSD
Control Node
Compute Node

The Workflow of Cyborg Services
Driver Agent Database/
Conductor
Api
Detect
Accelerators
Hardware /
Software
updates
Install
drivers
Monitor
Utilization
Accelerators
list
Usage
statistics
Attach /
Detach for
Nova
Manage per
host software
Provides
schedule
hints to Nova
User
requests
instance
Operator
manages
hardware

C2 0 A
9 C 7 D 9 99 7 A EIFON L . 1 MH

Thank You
www.99cloud.net
1 2
@ 0
0 . . -
6701 49 5 5

Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

Similar a Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware (20)

Más de Shuquan Huang

Más de Shuquan Huang (9)

Último

Último (20)

Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware