This document discusses the history and current state of hybrid storage pools that combine disk and flash storage using ZFS. It notes that while flash was initially seen as a replacement for disk, disk still plays an important role. ZFS introduced the hybrid storage pool concept in 2007 to take advantage of flash's speed between DRAM and disk. The L2ARC feature caches data from disk on flash to improve performance but has issues with persistence and warmup times that need further work. Overall, hybrid storage pools that integrate disk and flash can provide benefits but developing the L2ARC and optimizing for real workloads remains an area for improvement.
1. Hybrid Storage Pools
Using Disk and Flash with ZFS
(Now with the benefit of hindsight!)
Adam Leventhal @ahl
2. Flash
Emerges
• Storage
medium
invented
in
1980
– Very
fast
reads
(~50us)
– Fast
writes
(~300us)
– High
IOPS
/
low
latency
– Limited
number
of
write
cycles
• 2004:
flash
cost
as
much
as
DRAM
• 2007:
flash
cost
was
right
between
DRAM
and
disk
3. Disk
is
dead…
just
like
tape
• Many
predicted
the
death
of
disk
or
relegaSon
of
disk
to
backup
• Didn’t
happen
• All-‐flash
soluSons
sSll
trying
to
gain
mass
adopSon
4. ZFS
circa
2007
• Sun
was
developing
a
ZFS-‐based
storage
appliance
(Fishworks)
• ZFS:
enterprise
class
storage
on
commodity
hardware
• Problem:
enterprise
storage
was
a
lot
faster
• Looked
at
tradiSonal
soluSons
– NV-‐DRAM
to
accelerate
writes
– Massive
DRAM
to
cache
reads
• But
it
was
just
the
right
Sme
for
flash…
5. Hybrid
Storage
Pool
(HSP)
• Use
flash
as
a
storage
Ser
• Between
DRAM
and
disk
in
cost,
capacity,
latency,
throughput
• Use
commodity
disks
– 7200
RPM
– Good
throughput
– Great
$/GB
and
wa_s/GB
• Combine
disk,
flash,
DRAM
into
a
hybrid
pool
• In
ZFS:
– ZFS
intent
log
(ZIL)
for
write
acceleraSon
– L2ARC
to
extend
the
reach
of
the
ZFS
cache
8. ZFS
Caching
• AdapSve
Replacement
Cache
(ARC)
as
the
primary
DRAM
cache
• L2ARC
developed
by
Brendan
Gregg
to
use
external
(flash)
devices
• Takes
into
account
opSmal
IO
pa_erns
for
flash
– Random,
small
writes
=
hastened
failure
– SequenSal,
large
writes
=
happy
SSDs
– Thro_les
writes
to
preserve
longevity
• Uses
predicSve
evicSon
to
idenSfy
blocks
to
cache
9. L2ARC
Problems
• Non-‐persistent
– Aeer
a
reboot
or
fatal
system
failure,
the
cache
is
empty
• Slow
to
warm
up
– Will
only
write
to
one
device
at
a
Sme
-‐>
best
case
1TB
/
hour
– Real
world
example
2TB
in
24
hours
• Conceptually
most
of
the
way
there
• No
real
way
to
tune
it
to
a
workload
• Not
much
real-‐world
tesSng
and
tuning
done
10. Changing
Landscape
• DRAM
prices
have
dropped
dramaScally
• Large
memory
systems
available
(3TB+)
• NAND
flash
is
geing
trickier
to
build
around
• Endurance
and
performance
decrease
as
lithography
and
price
decrease
– MLC
and
“TLC”
(volume
flash)
have
parScularly
short
lives
• Running
into
size
limitaSons
– 32nm
in
2008
– 19nm
today
– Supposed
floor
around
11nm
• SSDs
are
becoming
increasingly
complex
11. What
to
do
today?
• The
L2ARC
can
help
– The
SSD
space
is
large
and
highly
varied
– Generally
cheap,
laptop
SSDs
suffice
for
the
L2ARC
– Give
it
enough
Sme
to
warm
up
(hours
or
days)
– Measure
the
impact
on
your
actual
workload
• The
ARC
is
great
and
relaSvely
simple
– Load
up
on
DRAM
12. Next
for
ZFS
• For
the
L2ARC
to
be
viable,
it
needs
to
be
persistent
• Lots
of
performance
work
needed
– Run
it
through
a
bunch
of
real-‐world
use
cases
– Make
it
easy
to
collect
coherent,
relevant
data
– Create
the
right
knobs
for
users
to
turn
• There
are
a
few
companies
using
the
L2ARC
• Hopefully
they
will
take
up
the
mantle