2. RDMA for High Performance Data
Movement
Network I/O operations are costly:
− CPU load
− Context switching
− Memory latency
Zero-copy networking
− NIC copies data directly to/from application
memory
IB transport (HPC applications)
iWARP (TCP stack / TOE)
3. RDMA model
One sided operations
Get/Put semantics
Send/receive
Direct data placement
RDMA Write
RDMA Read
Asyschronous
− Work Queue (send queue – receive queue)
− Completion Queue
5. RDMA/iWARP
Implicit RDMA support
Explicit RDMA support
iWARP
− encapsulate RDMA traffic at a high level
− Use TCP stack
− Without TOE is it beneficial?
6. Alternative Approaches
RDMA over Converged Ethernet (RoCE)
− Lightweight RDMA transport over Ethernet
Widely deployed technology
Support kernel bypass
OFED 1.5.1 supports RoCE
SoftRDMAs...
− SoftRoCE (OFED 1.5.1 supports softRoCE)
− SoftiWARP (new TPC kernel stack)
8. Challanges in Bulk Transfer
Application Level Adjustments
Request Aggregation
− Small data files
− Does FTP like transfer mechanism is appropriate
for RDMA?
File System Overhead
− Asynchronous Operations
Connection Caching / Multiple Connection?
9. Local Area / Wide Area
IB RDMA designed for local area
− How does RDMA perform in Wide Area?
iWARP
− No promising results - Over TCP (with TOE?)
− SoftiWARP ???
RoCE
− Isolated traffic ? / much less CPU usage
− softRoCE?
10. GridFTP over RDMA
XIO driver for GridFTP
− Experimented using Chelsio cards (cxgb3)
− 10GE
− WAN testing in progress!
− Local area: 910MBbps – 1175MBps
− Much better than GridFTP over TCP
Much less CPU load (1/2)
11. FTP100 – FTP over RDMA
Experimented with Mellonox Cards
− Local area – 10GE
− iWARP
Did not perform well compared to TCP
− No significant gain
− RoCE tests
In progress (have some initial results)
Limited by the disk performance
Mem2mem:
− Can already saturate the 10GE link
12. What is Next?
Experiments RDMA model over WAN
SoftiWARP from IBM Zurich
− TCP kernel stack implementing/defining RDMA
iverbs
SoftRoCE – OFED 1.5.2-rxe distribution
− Multiple connections?
13. Transfer Applications over RDMA
Simple Client/Server:
− Developing a prototype for transferring climate
dataset using RDMA protocols
− Asysnchronous memory management module
Application level tuning?
− Memory regions (max/min?)
− Multiple QPs
14. Climate Analysis
Climate Applications are Data-Intensive
Shared data repository:
− Data files needs to be downloaded for further
processing and analysis
− Data retrieval is the main bottleneck
− Multiple clients (working as VM instances)
Can not depent on HW support
SoftRoCE ? softiWARP
15. What can we do for WAN testing?
Q&A?
→ https://sdm.lbl.gov/climate100/