In this deck, Johann Lombardi from Intel presents: DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence.
"Intel has been building an entirely open source software ecosystem for data-centric computing, fully optimized for Intel® architecture and non-volatile memory (NVM) technologies, including Intel Optane DC persistent memory and Intel Optane DC SSDs. Distributed Asynchronous Object Storage (DAOS) is the foundation of the Intel exascale storage stack. DAOS is an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations per second (IOPS) storage containers to HPC applications. It enables next-generation data-centric workflows that combine simulation, data analytics, and AI."
Unlike traditional storage stacks that were primarily designed for rotating media, DAOS is architected from the ground up to make use of new NVM technologies, and it is extremely lightweight because it operates end-to-end in user space with full operating system bypass. DAOS offers a shift away from an I/O model designed for block-based, high-latency storage to one that inherently supports fine- grained data access and unlocks the performance of next- generation storage technologies.
Watch the video: https://youtu.be/wnGBW31yhLM
Learn more: https://www.intel.com/content/www/us/en/high-performance-computing/daos-high-performance-storage-brief.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
3. E & G E C O E N G I N E E R I N G
DAOSArchitecture
I n t e l P r o p r i e t a r y
High-latency communications
P2P operations
No HW acceleration
Conventional Storage Systems
Linux Kernel I/OBlock Interface
Data & Metadata
Intel® 3D-NAND StorageIntel® 3D-XPoint Storage
HDD
Low-latency high-message-rate communications
Collective operations & in-storage computing
DAOS Storage Engine
Intel® 3D-XPoint Storage
SPDKNVMe Interface
Metadata, low-latency I/Os &
indexing/query
Bulk data
3D-NAND/XPoint Storage
HDD
PMDKMemory Interface
4. E & G E C O E N G I N E E R I N G
DAOSDeployments
I n t e l P r o p r i e t a r y
DAOS Nodes (DNs)
Intel® Xeon servers with DCPMM & NVMe SSDs
Gateway Nodes (GNs)
Intel® Xeon servers with no local storage
Capacity Tier
PFS, Cloud Object Store, …
UPI
Xeon
CPUPCIe
x16
PCIe
x4
Fabric
PCIe
x16
PCIe
x4
Xeon
CPU
NIC
NIC
Fabric
Fabric
Pooled Hyperconverged
5. E & G E C O E N G I N E E R I N G
DAOSTierAnatomy
I n t e l P r o p r i e t a r y
DAOS Tier
• Globally accessible from any compute nodes
• Large capacity (100’s PB)
DAOS Nodes
• COTS Intel® Xeon servers running the DAOS service
• RNIC attached for communications
• Support multiple RNICs per server to sustain
backend storage IOPS/bandwidth
• Mix of storage technologies attached
• Intel® Optane™ DC Persistent Memory (DCPMM)
• NVMe SSD (*NAND, Intel® Optane™ SSDs)
AI/Analytics/Simulation Workflow
DAOS library
File HDF5 TensorFlow…
DAOS
Service
Intel®QLC3DNandSSD
DAOS Nodes
Compute Nodes
6. E & G E C O E N G I N E E R I N G
NetworkSupport
I n t e l P r o p r i e t a r y
Performance-critical I/O path over libfabric
• Low-latency messaging
• End-to-end in userspace
• Native support for RDMA
• True zero-copy I/O
• Non-blocking
• Scalable collective communications
Out-of-band channel for administration
• Manage hardware, service & pools
• Telemetry & troubleshooting
• Secured with TLS & certificate
DAOS Storage Engine
Data Plane Control Plane
gRPC
TCP/IP
OFI / Libfabric
OPA RoCE SlingshotInfinibandSockets
AWS
EFA
7. E & G E C O E N G I N E E R I N G
Storagevirtualization&Multi-tenancy
I n t e l P r o p r i e t a r y
Distributed storage reservation
• Intel® Optane ™ DC Persistent Memory (DCPMM)
• NVMe SSD
Predicatable capacity
• Can be resized
• Can be extended to span more servers
Multi-tenancy
• NFSv4-type ACLs
Typically 1 pool = 1 project
• Can have a single pool or 100’s
Pool 1 (DCPMM + NVMe)
Pool 2 (DCPMM) Pool 3 (DCPMM+NVMe)
Storage Node
Target1 Target2 Target3 Target4
Storage Node Storage Node Storage Node
Target1 Target2 Target3 Target4 Target1 Target2 Target3 Target4
Target1 Target2 Target3 Target4
8. E & G E C O E N G I N E E R I N G
POSIXI/OSupport
I n t e l P r o p r i e t a r y
Application / Framework
DAOS library (libdaos)
DAOS File System (libdfs)
Interception Library
dfuse
Single process address space
DAOS Storage Engine
RPC RDMA
End-to-end userspace
No system calls
9. E & G E C O E N G I N E E R I N G
POSIXi/OLimitations
I n t e l P r o p r i e t a r y
val2
key1
val1
key2
key3
val3
@
@
@
AI/Analytics Application Dataset Serialized into a POSIX File
header size key1 size val1
val1 cont’d size key2 size
val2
val2 cont’d
val2 cont’d size key3 size val3
val3
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Dataset Serialized into a POSIX File
Block 1
Block 2
Block 3
Block 4
POSIX
Serialization
HPC Application I/O Middleware
HDF5
SCR/VeloC
MPI-IO
Data Metadata
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
POSIX
Serialization
…
10. E & G E C O E N G I N E E R I N G
DAOSAPI
I n t e l P r o p r i e t a r y
key1
val1
key3
val3
@
@
Application
NVMe SSD
DAOS
key1
val1
root
@
key3
val3
@
val2
key2
@@
@
@
val2 con’d
val2
key2
@
Application
Native support for structured, semi-structured &
unstructured data models
• Built on top of DCPMM
• Unconstrained by POSIX serialization
• Data access time orders of magnitude faster (µs)
• Scalable concurrent updates & high IOPS
• Enable in-storage computing
DAOS Storage Engine
Open Source Apache 2.0 License
Data Model Library
Array KV Store Multi-level KV Store
11. E & G E C O E N G I N E E R I N G
ApplicationInterface
I n t e l P r o p r i e t a r y
Dataset
Mover
Capacity Tier
Lustre, S3, HSM, …
DAOS Storage Engine
Open Source Apache 2.0 License
POSIX I/O
HPC APPs
HDF5 MPI-IO VeloC
Apache
Spark
Apache
Arrow
(No)SQL
Analytics/AI APPs
TensorFlow
12. E & G E C O E N G I N E E R I N G
DAOS&Bigdata/AI
I n t e l P r o p r i e t a r y
DAOS
FileSystem
DAOS POSIX API
Java Wrapper
HDFS
Hadoop FileSystem Abstract Class
DAOS
Storage
DAOS Library, PMDK/SPDK used, Kernel bypass, zero copy
DAOS/Arrow
Wrapper
Apache Arrow Data Source
DAOS API
Wrapper
DAOS Native Data Source
Parquet, ORC, etc.
Scan/Parse
Project, Join, ML, etc.
Native Scan/Parse
(CSV, Parquet, ORC, etc.)
13. E & G E C O E N G I N E E R I N G
DAOS:Primarystorageforaurora
I n t e l P r o p r i e t a r y
"The Argonne Leadership Computing Facility will be the first major production deployment of the DAOS storage system
as part of Aurora, the first US exascale system coming in 2021. The DAOS storage system is designed to provide the levels
of metadata operation rates and bandwidth required for I/O extensive workloads on an exascale-level machine.”
Susan Coghlan, ALCF-X Project Director/Exascale Computing Systems Deputy Director
Aurora DAOS configuration
• Capacity: 230PB
• Bandwidth >25TB/s
14. E & G E C O E N G I N E E R I N G
DAOSCOMMUNITYROADMAP
I n t e l P r o p r i e t a r y
All information provided in this roadmap is subject to change without notice.
1Q19 2Q19 3Q19 4Q19 1Q20 2Q20 3Q20 4Q20 1Q21 2Q21 3Q21 4Q21 1Q22 2Q22 3Q22
Pre-1.0 releases & RCs 1.0 1.2 1.4 2.0 2.2 2.4
DAOS:
- NVMe & DCPMM support
- python/golang API bindings
- Per-pool ACL
- Lustre integration
I/O Middleware:
- MPI-IO Driver
- HDF5 DAOS Connector
- POSIX I/O support
- Spark
DAOS:
- End-to-end data integrity
- Per-container ACL
- Improved control plane
DAOS:
- Online server addition
- Advanced control plane
I/O Middleware:
- POSIX data mover
- Async HDF5 operations over DAOS
DAOS:
- Erasure code
- Telemetry & per-job statistics
- Multi OFI provider support
- Distributed transactions
I/O Middleware:
- Advanced POSIX I/O support
- Advanced data mover
Partner engagement & PoCs
DAOS:
- Progressive layout / GIGA+
- Placement optimizations
- Checksum scrubbing
I/O Middleware:
- Apache Arrow (not POR)
Exascale-readyPetascale
DAOS:
- Catastrophic recovery
tools