Nvmw 2014 extending main memory with flash-the optimized swap approach

Jihyung Park, Hyuck Han and Sangyeun Cho
Memory Solutions Lab
Memory Business
Extending Main Memory with Flash –
the Optimized SWAP Approach

1. Introduction
2. Optimized SWAP
3. Evaluation
4. Future Work
5. Conclusion

Why extend main memory with flash?
• To overcome DRAM scaling limitations and offer large working memory
• To reduce total cost of ownership (acquisition and operation)
• Flash has no seek time
• Flash has faster latency than HDD
Two approaches toward memory extension
• Non-transparent approach: Application has to change
• Transparent approach: Application is NOT aware of the underlying flash
Introduction

Current swap algorithm is optimized for HDD
Paging for the Fast device
• Fast and Simple vs. Heavy and Accurate
Motivation

Swap entry search
• A new search algorithm
I/O path optimization
• Swap read-ahead
• I/O scheduler
• Swappiness
Swap device as backing store: Inclusive vs. Exclusive
• We adjust the swap entry free policy to enforce that the swap device
“includes” all swapped out pages
Optimized SWAP

Tree search
• “Bit tree”, no pointer, a node size is just one byte
• Fan-out degree is 8 (one bit is pointing a child node)
• 8-level tree covers multi-terabytes of swap space.
• Search cost: 2O(log N)
• Reduce swap structure size
– Roughly current swap mechanism vs. O-Swap = 10MB vs. 2MB (to support 32GB
swap space)
Optimized SWAP
0 2 4 61 3 5 7 8 9

Read-ahead
• No read-ahead (due to randomness)
• Note also that SSD has no seek time
I/O scheduler
• NOOP (due to randomness and fast response requirements)
• Bypass
Swappiness
• swappiness : 0
Swap entry reclaim policy
• Do not free swap entries as much as possible
Optimized SWAP

Evaluation - Memcached
System
CPU Xeon E5-2665 (HT disabled)
# Core 16
Network 10Gb Ethernet
SSD Samsung XS1715 (NVME)
Workload
YCSB
DB Size 30GB
Value Length 2048B
# memcached threads 64
# Clients 320
Get : Update 95% : 5%
Memory
SWAP OSWAP Full DRAM
DRAM 8GB
SSD Swap 32GB
DRAM 8GB
SSD Swap 32GB
DRAM 32GB

0
2
4
6
8
10
12
14
Operationspersecond(x10,000)
Memcached (NVME, 10Gb Network)

0
1
2
3
4
5
6
7
8
256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms
SWAP Performance by Latency Segment
< 1ms QoS

0
5
10
15
20
25
256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms 256ms 512ms
OSWAP Performance by Latency Segment
< 1ms QoS

0
2
4
6
8
10
12
256us 512us 1024us 2ms 4ms 8ms 16ms 32ms 64ms 128ms
Full DRAM Performance by Latency Segment
< 1ms QoS

Evaluation - Linkbench
System
CPU Xeon E5-2665 (HT disabled)
# Core 16
Network 10Gb Ethernet
SSD Samsung XS1715 (NVME)
Workload
Linkbench
DB Size 30GB
# Clients 400
Memory
DRAM 8GB
SSD Swap 32GB
DRAM 8GB
SSD Swap 32GB
DRAM 32GB

Evaluation - Linkbench
0
2
4
6
8
10
12
14
Requestspersecond(x1,000)
Linkbench

Rack scale architecture
High performance memory + High capacity memory
Future Work
CPUs
DRAM
DRAM
DRAM
Compute
PCIe <-> Ctrl Ctrl
Memory
Memory
Memorycable
Memory Device

Cost-effective memory capacity
Exploit flash memory transparently
Conclusion

Nvmw 2014 extending main memory with flash-the optimized swap approach

Nvmw 2014 extending main memory with flash-the optimized swap approach

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (17)

Similar a Nvmw 2014 extending main memory with flash-the optimized swap approach

Similar a Nvmw 2014 extending main memory with flash-the optimized swap approach (20)

Último

Último (20)

Nvmw 2014 extending main memory with flash-the optimized swap approach