RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door
1. The Lion of Storage Systems
Open the New Door
Yosuke Hara
Oct 26, 2013 (rev 2.2)
1
2. LeoFS is "Unstructured Big Data Storage for the Web"
and a highly available, distributed, eventually consistent
storage system.
Organizations can use LeoFS to store lots of data
efficently, safely and inexpensively.
Started OSS-project on July 4, 2012
www.leofs.org
2
4. Motivation
As of 2010
Get Away From Using
"Expensive H/W Based Storages"
?
Expensive Storage Problems:
1. High Costs (Initial Costs, Running Costs)
2. Possibility of "SPOF"
3. NOT Easily Scale
Storage Expansion is difficult during
periods of increasing data
4
5. The Lion of Storage Systems
HIGH Availability
LeoFS Non Stop
3 Vs in 3 HIGHs
Velocity: Low Latency
Minimum Resources
HIGH Cost
Performance Ratio
Volume: Petabyte / Exabyte
Variety: Photo, Movie, Unstructured-data
HIGH
Scalability
REST-API / AWS S3-API
5
11. LeoFS Overview - Storage
Automatically Replicate
an Object and a Metadata to Remote Node(s)
Gateway
Choosing Replica Target Node(s)
KEY = “bucket/leofs.key”
Hash = md5(Filename)
RING
2 ^ 128 (MD5)
# of replicas = 3
Primary Node
Secondary-1
Storage (Storage Cluster)
Secondary-2
Use "Consistent Hashing"
for Replication
in the Storage Cluster
"P2P"
11
12. LeoFS Overview - Storage
Storage Engine consits of "Object Storage" and "Metadata Storage"
Built in "Replicator", "Recoverer" w/Queue for the Eventual Consistency
Gateway
Request From Gateway
Storage Engine, Metadata + Object Container
LeoFS Storage
Storage Engine Workers
...
Replicator
Repairer w/Queue
...
Metadata : Keeps an in-memory index of all data
Object Container : Manages "Log Structured File"
12
13. LeoFS Storage Engine - Retrieve an object from the storage
Storage Engine Worker
< META DATA
< META DATA > >
ID
Id
Filename
Filename
Offset, Size
Offset
Checksum (MD5)
Size
Version#
Checksum
Header
File
Footer
Object Container
Metadata Storage
Storage Engine Worker
13
14. LeoFS Storage Engine - Retrieve an object from the storage
Storage Engine Worker
Insert a metadata
Append an object
into the object container
< META DATA
< META DATA > >
ID
Id
Filename
Filename
Offset, Size
Offset
Checksum (MD5)
Size
Version#
Checksum
Header
File
Footer
Object Container
Metadata Storage
Storage Engine Worker
14
15. LeoFS Storage Engine - Remove unnecessary objects from the storage
Storage Engine Worker
Compact
Old Object Container/Metadata
New Object Container/Metadata
Storage Engine Worker
15
16. LeoFS Overview - Storage - Data Structure/Relationship an object
for Retrieve an File (Object)
for Sync
<Metadata>
{VNodeId, Key}
KeySize
Custom
Meta Size
File Size
Offset
Version
Timestamp
Checksum
<Needle>
Checksum
KeySize
User-Meta
Size
DataSize
Offset
Version
Header (Metadata - Fixed length)
Timestamp
{VNodeId,
Key}
User-Meta
Actual
File
Body (Variable Length)
Footer
Footer (8B)
Needle-5
Needle-4
Needle-3
Needle-2
Needle-1
Super-block
<Object Container>
16
17. LeoFS Overview - Storage - Large Object Support
To Equalize Disk Usage of Every Storage Node
To Realize High I/O efficiency and High Availability
[ WRITE Operation ]
chunk-0
Every chunked object and
metadata are replicated
in the cluster
chunk-1
chunk-2
chunk-3
Chunked Objects
Original Object Name
Original Object Size
# of Chunks
An Original Object’s Metadata
Client(s)
Gateway
Storage Cluster
17
22. New Features - LeoInsight (v1.0)
Give Insight into the State of LeoFS
1. To control requests from Clients to LeoFS
2. To check and see "Traffic info" and "State of Every Node"
for Keeping Availability
22
23. New Features - LeoInsight (v1.0)
The Lion of Storage Systems
Storage Cluster
Operate LeoFS
Manager
Gateway
Pr
Tr
a
ffi
c
-In
fo
fro
ov
e
s
No
tif
of
a
y
No
de
fro
m
G
at
ew
Notifier
ay
/S
to
r
ag
m
G
at
ew
ay
e/
M
an
Re
ag
er
tri
e
ve
A
TES
R
I(
P
J
N)
SO
Consume MSG
Persistent calculated
statistics-data
Distributed Queue (ElkDB)
TimeSeriesDB (Savannah)
23
25. New Features - Multi Data Center Data Replication (v1.0)
HIGH-Scalability
HIGH-Availability
+
Easy Operation for Admins
Europe
Tokyo
US
Singapore
NO SPOF
NO Performance Degration
25
26. v1.0 - Multi Data Center Data Replication
[ 3 Regions & 5 Replicas ]
DC-1 Configuration:
- Method of Replication:
- Consistency Level:
Client
Application(s)
Request to
the Target Region
- local-quorum:[N=3, W=2, R=1, D=2]
- # of target DC(s):2
Method of MDC-Replication:
Async: Bulked Transfer
Sync+Tran: Consensus Algorithm
- # of replicas a DC:1
>> Total of Replicas: 5
Storage cluster
[replicas:3]
[replicas:1]
[replicas:1]
DC-1
DC-2
DC-3
Manager cluster
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
26
27. v1.0 - Multi Data Center Data Replication
[ 3 Regions & 5 Replicas ]
DC-1 Configuration:
- Method of Replication:
- Consistency Level:
Client
Application(s)
Request to
the Target Region
- local-quorum:[N=3, W=2, R=1, D=2]
- # of target DC(s):2
Method of MDC-Replication:
Async: Bulked Transfer
Sync+Tran: Consensus Algorithm
- # of replicas a DC:1
>> Total of Replicas: 5
1) 3 replicas are written in "Local Region"
Storage cluster
[replicas:3]
[replicas:1]
[replicas:1]
DC-1
DC-2
DC-3
Manager cluster
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
27
28. v1.0 - Multi Data Center Data Replication
[ 3 Regions & 5 Replicas ]
DC1.node_0 DC1.node_1
DC1.node_2
DC2.node_3
DC3.node_4
Client
Application(s)
Request to
the Target Region
Leader
Primary
Local-follower
Remote-follower
Follower
2) Sync (or Async) Rplicaion to Other Region(s)
Storage cluster
[replicas:3]
[replicas:1]
[replicas:1]
DC-1
DC-2
DC-3
Manager cluster
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
28
29. v1.0 - Multi Data Center Data Replication
[ 3 Regions & 5 Replicas ]
Remote-1
Remote-2
Tokyo
Singapore
US
Singapore
Client
Local Region
Tokyo
Europe
Europe
US
Singapore
US
Europe
Tokyo
Application(s)
Request to
the Target Region
3) Replication for Geographical Optimization
Storage cluster
[replicas:3]
[replicas:1]
[replicas:1]
DC-1
DC-2
DC-3
DC-4
Tokyo
Singapore
US
Europe
Manager cluster
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
29
33. The Lion of Storage Systems
HIGH Availability
LeoFS Non Stop
3 Vs in 3 HIGHs
Velocity: Low Latency
Minimum Resources
HIGH Cost
Performance Ratio
Volume: Petabyte / Exabyte
Variety: Photo, Movie, Unstructured-data
HIGH
Scalability
REST-API / AWS S3-API
33
34. Set Sail for “Cloud Storage”
Website: www.leofs.org
Twitter: @LeoFastStorage
Facebook: www.facebook.com/org.leofs
34