SlideShare una empresa de Scribd logo
1 de 80
Descargar para leer sin conexión
Experiences building a distributed
shared-log on RADOS
Noah Watkins
UC Santa Cruz
@noahdesu
2
● Graduate student
● UC Santa Cruz
● Data management,
file systems, HPC,
QoS
● Ceph as a
prototyping
platform
About me
● Graduate student
● UC Santa Cruz
3
● Graduate student
● UC Santa Cruz
● Data management,
file systems, HPC,
QoS
● Ceph as a
prototyping
platform
About me
● Graduate student
● UC Santa Cruz
● Data management
● Storage systems
● High-performance computing
● Quality of service
4
● Graduate student
● UC Santa Cruz
● Data management,
file systems, HPC,
QoS
● Ceph as a
prototyping
platform
About me
● Graduate student
● UC Santa Cruz
● Data management
● Storage systems
● High-performance computing
● Quality of service
● Ceph shop for prototyping
Ceph storage interfaces, a familiar sight...
5
RADOS
LIBRADOS RADOSGW RBD CEPHFS
Our research: programmability in storage
6
RADOS
LIBRADOS RADOSGW RBD CEPHFSMatrix Log
Our research: programmability in storage
7
RADOS
LIBRADOS RADOSGW RBD CEPHFSMatrix Log
The agenda
● Why should I care about logs?
● How to build a really, really fast log
● Mapping techniques for fast logs onto Ceph
● A few usage examples
8
Why you should care about logs
9
A log is an ordered list of data elements
● General abstract form of a log
● Ordered list of elements
10
0 1 2 3 4 5 6 7 8 9 10 11 12
A log is an ordered list of data elements
● General abstract form of a log
● Ordered list of elements
● New entries are appended to the log
11
0 1 2 3 4 5 6 7 8 9 10 11 12
clients
append
A log is an ordered list of data elements
● General abstract form of a log
● Ordered list of elements
● New entries are appended to the log
● Log entries can be read directly
12
0 1 2 3 4 5 6 7 8 9 10 11 12
clients
append
Everyone is going bananas for logs
13
● Very high throughput logs
● No global ordering
● Strongly consistent, global ordering
● Write batching
● Single writer
Apache
BookKeeper
Strongly consistent shared-log
14
0 1 2 3 4 5 6 7 8 9 10 11 12
Database management systems Replicated state machines Cloud-based metadata services
Strongly consistent, high-performance shared-log
15
0 1 2 3 4 5 6 7 8 9 10 11 12
Database management systems Replicated state machines Cloud-based metadata services
Shared-logs are challenging to scale
Balakrishnan et al., “Tango: Distributed Data Structures over a Shared Log”, SOSP ‘13
16
Shared-logs are challenging to scale
Balakrishnan et al., “Tango: Distributed Data Structures over a Shared Log”, SOSP ‘13
17
Shared-logs are challenging to scale
clients
0 1
2
Paxos
append
18
0 1 2 3 4 5 6 7 8 9 10 11 12
Shared-logs are challenging to scale
clients
0 1
2
Paxos
append
Writes are
funneled through
a total ordering
engine
19
0 1 2 3 4 5 6 7 8 9 10 11 12
Shared-logs are challenging to scale
clients
0 1
2
Paxos
append
Writes are
funneled through
a total ordering
engine
20
0 1 2 3 4 5 6 7 8 9 10 11 12
Parallel reads
supported when
log is distributed
Random read
workloads perform
poorly on HDDs
CORFU: A Shared Log Design for Flash Clusters
21
Balakrishnan, et. al, NSDI 2011
CORFU decouples I/O from ordering
22
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
reads
CORFU decouples I/O from ordering
23
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
reads
CORFU decouples I/O from ordering
24
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
parallel appendreads
CORFU stripes log across a cluster of flash devices
25
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
parallel appendreads
You might be thinking...
clients
1, 2, 3, 4, 5, ….
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
read append Decouple append I/O from
assignment of ordering● It can’t possibly be this simple....
26
100K-1M pos/sec
You might be thinking...
clients
1, 2, 3, 4, 5, ….
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
read append Decouple append I/O from
assignment of ordering● It can’t possibly be this simple.... you’d be correct
27
100K-1M pos/sec
You might be thinking...
clients
1, 2, 3, 4, 5, ….
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
read append Decouple append I/O from
assignment of ordering● It can’t possibly be this simple.... you’d be correct
● Isn’t this supposed to be a Ceph talk?
28
100K-1M pos/sec
The CORFU I/O path
LogInterface(e.g.Append)
29
Cluster of flash devices
Append( ) ??
CORFU uses a cluster map and striping function
LogInterface(e.g.Append)
30
Cluster of flash devices
Append( )
Striping and
Addressing
CORFU replicates log entries across the cluster
LogInterface(e.g.Append)
31
Cluster of flash devices
Append( )
Striping and
Addressing
Fault tolerance
Replication
The CORFU protocol relies on custom I/O interface
LogInterface(e.g.Append)
32
Cluster of flash devices
Append( )
Striping and
Addressing
Fault tolerance
Custom
hardware
interface
Replication
A dedicated CORFU cluster is expensive...
● Duplicated services
○ Fault-tolerance
○ Metadata management
● Dedicated / over-provisioned hardware
○ You need a full cluster of flash devices!
● Custom storage interfaces
○ Hardware or software
● Already have a Ceph cluster?
○ Too bad
● Can we use software-defined storage?
33
Mapping the components of CORFU onto Ceph
LogInterface(e.g.Append)
34
Cluster of flash devices
Append( )
Striping and
Addressing
Fault tolerance
Custom
hardware
interface
Replication
A cluster of OSDs rather than raw storage devices
LogInterface(e.g.Append)
35
Cluster of OSDs
Append( )
Striping and
Addressing
Fault tolerance
Custom
hardware
interface
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
Ceph handles fault-tolerance transparently
LogInterface(e.g.Append)
36
Cluster of OSDs
Append( )
Striping and
Addressing
Fault tolerance
Custom
hardware
interface
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
Replication
Log distribution becomes object naming issue
LogInterface(e.g.Append)
37
Append( )
Striping and
Addressing
Object naming
Custom
hardware
interface
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
Replication
Cluster of OSDs
Storage interface built using RADOS object classes
LogInterface(e.g.Append)
38
Append( )
Striping and
Addressing
Object naming
RADOS
object
class
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
OSD OSD OSD
Replication
Cluster of OSDs
ZLog is an implementation of CORFU on Ceph
osd osd osd
39
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
parallel appendreads
BlueStore RocksDBLMDB
ZLog is an implementation of CORFU on Ceph
osd osd osd
40
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
parallel appendreads
BlueStore RocksDBLMDB
Transparent
changes to
hardware,
software, tuning
ZLog is an implementation of CORFU on Ceph
osd osd osd
41
clients
1, 2, 3, 4...
0 1 2 3 4 5 6 7 8 9 10 11 12
RPC: Tail, Next
std::atomic<u64> seq;
tail = seq;
next = seq++;
Sequencer
100K-1M pos/sec
parallel appendreads
BlueStore RocksDBLMDB
Transparent
changes to
hardware,
software, tuning
Integration with
features like
tiering, erasure
coding, and I/O
hinting
The ZLog project is open-source on Github
● https://github.com/cruzdb/zlog
● Development backend (LMDB)
● Tools to build Ceph plugin
● High and low-level APIs
42
// build a backend instance
auto backend = std::unique_ptr<zlog::Backend>(
new zlog::storage::ceph::CephBackend(ioctx));
// open the log
zlog::Log log;
int ret = zlog::Log::CreateWithBackend(std::move(backend),
"mylog", &log);
// append to the log
uint64_t pos;
log.Append(Slice(“foo”), &pos);
The ZLog project is open-source on Github
● https://github.com/cruzdb/zlog
● Development backend (LMDB)
● Tools to build Ceph plugin
● High and low-level APIs
43
// build a backend instance
auto backend = std::unique_ptr<zlog::Backend>(
new zlog::storage::ceph::CephBackend(ioctx));
// open the log
zlog::Log log;
int ret = zlog::Log::CreateWithBackend(std::move(backend),
"mylog", &log);
// append to the log
uint64_t pos;
log.Append(Slice(“foo”), &pos);
The ZLog project is open-source on Github
● https://github.com/cruzdb/zlog
● Development backend (LMDB)
● Tools to build Ceph plugin
● High and low-level APIs
44
// build a backend instance
auto backend = std::unique_ptr<zlog::Backend>(
new zlog::storage::ceph::CephBackend(ioctx));
// open the log
zlog::Log log;
int ret = zlog::Log::CreateWithBackend(std::move(backend),
"mylog", &log);
// append to the log
uint64_t pos;
log.Append(Slice(“foo”), &pos);
The ZLog project is open-source on Github
● https://github.com/cruzdb/zlog
● Development backend (LMDB)
● Tools to build Ceph plugin
● High and low-level APIs
45
// build a backend instance
auto backend = std::unique_ptr<zlog::Backend>(
new zlog::storage::ceph::CephBackend(ioctx));
// open the log
zlog::Log log;
int ret = zlog::Log::CreateWithBackend(std::move(backend),
"mylog", &log);
// append to the log
uint64_t pos;
log.Append(Slice(“foo”), &pos);
Building applications on RADOS using
object classes
46
Design decisions for the ZLog I/O path
47
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
Design decisions for the ZLog I/O path
48
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
Design decisions for the ZLog I/O path
49
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
Design decisions for the ZLog I/O path
50
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
● … but not too many objects
Design decisions for the ZLog I/O path
51
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
● … but not too many objects
● Changes to striping policy
Design decisions for the ZLog I/O path
52
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
● … but not too many objects
● Changes to striping policy
● Coordinated metadata update
Design decisions for the ZLog I/O path
53
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
● … but not too many objects
● Changes to striping policy
● Coordinated metadata update
● Treat an object like a
consensus API
Design decisions for the ZLog I/O path
54
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
● Log entry → Object
● Stripe log across objects
● … but not too many objects
● Changes to striping policy
● Coordinated metadata update
● Treat an object like a
consensus API
The CORFU storage interface
● write(obj, position, data)
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
55
The CORFU storage interface
● write(obj, position, data)
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
56
The CORFU storage interface
● write(obj, position, data)
○ Write-once: position must be open
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
57
The CORFU storage interface
● write(obj, position, data)
○ Write-once: position must be open
○ Request must be tagged with up-to-date epoch
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
58
The CORFU storage interface
● write(obj, position, data)
○ Write-once: position must be open
○ Request must be tagged with up-to-date epoch
○ Position must not fall within a GC’d range
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
59
The CORFU storage interface
● write(obj, position, data)
○ Write-once: position must be open
○ Request must be tagged with up-to-date epoch
○ Position must not fall within a GC’d range
○ Optionally update maximum position observed
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
60
The CORFU storage interface
● write(obj, position, data)
○ Write-once: position must be open
○ Request must be tagged with up-to-date epoch
○ Position must not fall within a GC’d range
○ Optionally update maximum position observed
○ Atomic update
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
61
librados won’t [currently] cut it for this job
● write(obj, position, data)
○ Write-once: position must be open
○ Request must be tagged with up-to-date epoch
○ Position must not fall within a GC’d range
○ Optionally update maximum position observed
○ Atomic update
● read(obj, position)
● invalidate(obj, position)
● trim(obj, position)
○ Mark for GC
● seal(obj, epoch)
62
CORFU write interface as an object class
63
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
int write(context_t hctx, bufferlist *in, bufferlist *out) {
// ensure up-to-date client
if (epoch_guard(header, op.epoch(), false)) {
return -ESPIPE;
}
// read entry based on request position
ceph::bufferlist bl;
int ret = cls_cxx_map_get_val(hctx, key, &bl);
if (ret < 0)
return ret;
if (entry && !decode(bl, entry))
return -EIO;
return 0;
// write entry if position is not taken
if (ret == -ENOENT) {
entry_write_entry(hctx, key, entry);
if (!header.has_max_pos() || (op.pos() > header.max_pos()))
header.set_max_pos(op.pos());
ret = entry_write_header(hctx, header);
} else {
// handle errors associated with write-once semantics
}
}
CORFU write interface as an object class
64
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
int write(context_t hctx, bufferlist *in, bufferlist *out) {
// ensure up-to-date client
if (epoch_guard(header, op.epoch(), false)) {
return -ESPIPE;
}
// read entry based on request position
ceph::bufferlist bl;
int ret = cls_cxx_map_get_val(hctx, key, &bl);
if (ret < 0)
return ret;
if (entry && !decode(bl, entry))
return -EIO;
return 0;
// write entry if position is not taken
if (ret == -ENOENT) {
entry_write_entry(hctx, key, entry);
if (!header.has_max_pos() || (op.pos() > header.max_pos()))
header.set_max_pos(op.pos());
ret = entry_write_header(hctx, header);
} else {
// handle errors associated with write-once semantics
}
}
connection to client
CORFU write interface as an object class
65
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
int write(context_t hctx, bufferlist *in, bufferlist *out) {
// ensure up-to-date client
if (epoch_guard(header, op.epoch(), false)) {
return -ESPIPE;
}
// read entry based on request position
ceph::bufferlist bl;
int ret = cls_cxx_map_get_val(hctx, key, &bl);
if (ret < 0)
return ret;
if (entry && !decode(bl, entry))
return -EIO;
return 0;
// write entry if position is not taken
if (ret == -ENOENT) {
entry_write_entry(hctx, key, entry);
if (!header.has_max_pos() || (op.pos() > header.max_pos()))
header.set_max_pos(op.pos());
ret = entry_write_header(hctx, header);
} else {
// handle errors associated with write-once semantics
}
}
connection to client
CORFU write interface as an object class
66
RADOS object classes
CORFU storage interface
OSD
librados
ZLog client library
Striping policy
int write(context_t hctx, bufferlist *in, bufferlist *out) {
// ensure up-to-date client
if (epoch_guard(header, op.epoch(), false)) {
return -ESPIPE;
}
// read entry based on request position
ceph::bufferlist bl;
int ret = cls_cxx_map_get_val(hctx, key, &bl);
if (ret < 0)
return ret;
if (entry && !decode(bl, entry))
return -EIO;
return 0;
// write entry if position is not taken
if (ret == -ENOENT) {
entry_write_entry(hctx, key, entry);
if (!header.has_max_pos() || (op.pos() > header.max_pos()))
header.set_max_pos(op.pos());
ret = entry_write_header(hctx, header);
} else {
// handle errors associated with write-once semantics
}
}
connection to client
Getting started with object classes
1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc)
a. Super well documented, easy examples
67
Getting started with object classes
1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc)
a. Super well documented, easy examples
2. Build and load the plugin into Ceph
68
Getting started with object classes
1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc)
a. Super well documented, easy examples
2. Build and load the plugin into Ceph
3. Old way to manage plugins
a. Manage your own Ceph fork
69
Getting started with object classes
1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc)
a. Super well documented, easy examples
2. Build and load the plugin into Ceph
3. Old way to manage plugins
a. Manage your own Ceph fork
4. New way to manage plugins with SDK (credit: Neha Ojha @ b7215b0)
a. dnf install rados-objclass-devel
b. #include <rados/objclass.h>
c. compile cls_hello.cc as object library
70
Getting started with object classes
1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc)
a. Super well documented, easy examples
2. Build and load the plugin into Ceph
3. Old way to manage plugins
a. Manage your own Ceph fork
4. New way to manage plugins with SDK (credit: Neha Ojha @ b7215b0)
a. dnf install rados-objclass-devel
b. #include <rados/objclass.h>
c. compile cls_hello.cc as object library
5. Distribute your plugin to OSDs at <libdir>/rados-classes/cls_hello.so
6. Starting making requests ioctx::exec(“hello”, …)
71
The CORFU seal(obj, epoch) interface is tougher
● Semantics are do the following atomically
○ Verify and update the stored epoch
○ Return the largest position written
● RADOS object classes don’t support this mix of ops
○ Currently
● Solution
○ Split the operation
○ Verify
● Luckily this isn’t a fast path
72
How we have been using ZLog
73
PostgreSQL logical replication with ZLog
74
0 1 2 3 4 5 6 7 8 9 10 11 12
osd osd osd
INSERT UPDATE DELETEReplay
Replay
Replay
Replay
Replay
C C C C
Queries
ZLog
https://github.com/cruzdb/pg_zlog
CruzDB is an immutable key-value store on ZLog
75
0 1 2 3 4 5 6 7 8 9 10 11 12
osd osd osd
GET PUT DELETEReplay
Replay
Replay
Replay
Replay
C C C C
Key/Value Transactions
ZLog
https://github.com/cruzdb/cruzdb
CruzDB is an immutable key-value store on ZLog
76
0 1 2 3 4 5 6 7 8 9 10 11 12
osd osd osd
GET PUT DELETEReplay
Replay
Replay
Replay
Replay
C C C C
Key/Value Transactions
ZLog
Database is a
copy-on-write
red-black tree
https://github.com/cruzdb/cruzdb
CruzDB is an immutable key-value store on ZLog
77
0 1 2 3 4 5 6 7 8 9 10 11 12
osd osd osd
GET PUT DELETEReplay
Replay
Replay
Replay
Replay
C C C C
Key/Value Transactions
ZLog
Database is a
copy-on-write
red-black tree
Efficient read-only
snapshots
https://github.com/cruzdb/cruzdb
CruzDB is an immutable key-value store on ZLog
78
0 1 2 3 4 5 6 7 8 9 10 11 12
osd osd osd
GET PUT DELETEReplay
Replay
Replay
Replay
Replay
C C C C
Key/Value Transactions
ZLog
Database is a
copy-on-write
red-black tree
Efficient read-only
snapshotsTime travel
capabilities
https://github.com/cruzdb/cruzdb
Wrapping up
● We use Ceph as a prototyping system
○ Application-specific interfaces
● There is a broad range of interests in log-oriented interfaces
● We’ve built a Ceph-based implementation of a high-performance log
○ ZLog @ https://github.com/cruzdb/zlog
● Using it for several real-world use cases
○ PostgreSQL logical replication (https://github.com/cruzdb/pg_zlog)
○ CruzDB immutable database (https://github.com/cruzdb/cruzdb)
79
Thank you and questions
● Noah Watkins (@noahdesu)
● Learning resources
○ Object class SDK documentation
■ http://docs.ceph.com/docs/master/rados/api/objclass-sdk/
○ ZLog is the only user of the SDK I’m aware of
■ https://github.com/cruzdb/zlog/blob/master/src/storage/ceph/cls_zlog.cc
○ The Ceph tree contains a ton of object classes for reference
■ https://github.com/ceph/ceph/tree/master/src/cls
○ Writing object classes using Lua
■ https://ceph.com/geen-categorie/dynamic-object-interfaces-with-lua/
■ https://nwat.xyz/blog/2018/03/13/video-of-my-talk-at-lua-workshop-2017/
80

Más contenido relacionado

La actualidad más candente

Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminEverything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminCeph Community
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanCeph Community
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiRADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiCeph Community
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangCeph Community
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleJames Saint-Rossy
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiCeph Community
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Ceph Community
 
Unified readonly cache for ceph
Unified readonly cache for cephUnified readonly cache for ceph
Unified readonly cache for cephzhouyuan
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuCeph Community
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanCeph Community
 

La actualidad más candente (20)

Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt BenjaminEverything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
Everything you wanted to know about RadosGW - Orit Wasserman, Matt Benjamin
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John HaanBasic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiRADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
Unified readonly cache for ceph
Unified readonly cache for cephUnified readonly cache for ceph
Unified readonly cache for ceph
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
librados
libradoslibrados
librados
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 

Similar a Experiences building a distributed shared log on RADOS - Noah Watkins

Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xrkr10
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Jérôme Petazzoni
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...OpenShift Origin
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesDoKC
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205Linaro
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
 

Similar a Experiences building a distributed shared log on RADOS - Noah Watkins (20)

Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Experiences building a distributed shared log on RADOS - Noah Watkins

  • 1. Experiences building a distributed shared-log on RADOS Noah Watkins UC Santa Cruz @noahdesu
  • 2. 2 ● Graduate student ● UC Santa Cruz ● Data management, file systems, HPC, QoS ● Ceph as a prototyping platform About me ● Graduate student ● UC Santa Cruz
  • 3. 3 ● Graduate student ● UC Santa Cruz ● Data management, file systems, HPC, QoS ● Ceph as a prototyping platform About me ● Graduate student ● UC Santa Cruz ● Data management ● Storage systems ● High-performance computing ● Quality of service
  • 4. 4 ● Graduate student ● UC Santa Cruz ● Data management, file systems, HPC, QoS ● Ceph as a prototyping platform About me ● Graduate student ● UC Santa Cruz ● Data management ● Storage systems ● High-performance computing ● Quality of service ● Ceph shop for prototyping
  • 5. Ceph storage interfaces, a familiar sight... 5 RADOS LIBRADOS RADOSGW RBD CEPHFS
  • 6. Our research: programmability in storage 6 RADOS LIBRADOS RADOSGW RBD CEPHFSMatrix Log
  • 7. Our research: programmability in storage 7 RADOS LIBRADOS RADOSGW RBD CEPHFSMatrix Log
  • 8. The agenda ● Why should I care about logs? ● How to build a really, really fast log ● Mapping techniques for fast logs onto Ceph ● A few usage examples 8
  • 9. Why you should care about logs 9
  • 10. A log is an ordered list of data elements ● General abstract form of a log ● Ordered list of elements 10 0 1 2 3 4 5 6 7 8 9 10 11 12
  • 11. A log is an ordered list of data elements ● General abstract form of a log ● Ordered list of elements ● New entries are appended to the log 11 0 1 2 3 4 5 6 7 8 9 10 11 12 clients append
  • 12. A log is an ordered list of data elements ● General abstract form of a log ● Ordered list of elements ● New entries are appended to the log ● Log entries can be read directly 12 0 1 2 3 4 5 6 7 8 9 10 11 12 clients append
  • 13. Everyone is going bananas for logs 13 ● Very high throughput logs ● No global ordering ● Strongly consistent, global ordering ● Write batching ● Single writer Apache BookKeeper
  • 14. Strongly consistent shared-log 14 0 1 2 3 4 5 6 7 8 9 10 11 12 Database management systems Replicated state machines Cloud-based metadata services
  • 15. Strongly consistent, high-performance shared-log 15 0 1 2 3 4 5 6 7 8 9 10 11 12 Database management systems Replicated state machines Cloud-based metadata services
  • 16. Shared-logs are challenging to scale Balakrishnan et al., “Tango: Distributed Data Structures over a Shared Log”, SOSP ‘13 16
  • 17. Shared-logs are challenging to scale Balakrishnan et al., “Tango: Distributed Data Structures over a Shared Log”, SOSP ‘13 17
  • 18. Shared-logs are challenging to scale clients 0 1 2 Paxos append 18 0 1 2 3 4 5 6 7 8 9 10 11 12
  • 19. Shared-logs are challenging to scale clients 0 1 2 Paxos append Writes are funneled through a total ordering engine 19 0 1 2 3 4 5 6 7 8 9 10 11 12
  • 20. Shared-logs are challenging to scale clients 0 1 2 Paxos append Writes are funneled through a total ordering engine 20 0 1 2 3 4 5 6 7 8 9 10 11 12 Parallel reads supported when log is distributed Random read workloads perform poorly on HDDs
  • 21. CORFU: A Shared Log Design for Flash Clusters 21 Balakrishnan, et. al, NSDI 2011
  • 22. CORFU decouples I/O from ordering 22 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer reads
  • 23. CORFU decouples I/O from ordering 23 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec reads
  • 24. CORFU decouples I/O from ordering 24 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec parallel appendreads
  • 25. CORFU stripes log across a cluster of flash devices 25 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec parallel appendreads
  • 26. You might be thinking... clients 1, 2, 3, 4, 5, …. RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer read append Decouple append I/O from assignment of ordering● It can’t possibly be this simple.... 26 100K-1M pos/sec
  • 27. You might be thinking... clients 1, 2, 3, 4, 5, …. RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer read append Decouple append I/O from assignment of ordering● It can’t possibly be this simple.... you’d be correct 27 100K-1M pos/sec
  • 28. You might be thinking... clients 1, 2, 3, 4, 5, …. RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer read append Decouple append I/O from assignment of ordering● It can’t possibly be this simple.... you’d be correct ● Isn’t this supposed to be a Ceph talk? 28 100K-1M pos/sec
  • 29. The CORFU I/O path LogInterface(e.g.Append) 29 Cluster of flash devices Append( ) ??
  • 30. CORFU uses a cluster map and striping function LogInterface(e.g.Append) 30 Cluster of flash devices Append( ) Striping and Addressing
  • 31. CORFU replicates log entries across the cluster LogInterface(e.g.Append) 31 Cluster of flash devices Append( ) Striping and Addressing Fault tolerance Replication
  • 32. The CORFU protocol relies on custom I/O interface LogInterface(e.g.Append) 32 Cluster of flash devices Append( ) Striping and Addressing Fault tolerance Custom hardware interface Replication
  • 33. A dedicated CORFU cluster is expensive... ● Duplicated services ○ Fault-tolerance ○ Metadata management ● Dedicated / over-provisioned hardware ○ You need a full cluster of flash devices! ● Custom storage interfaces ○ Hardware or software ● Already have a Ceph cluster? ○ Too bad ● Can we use software-defined storage? 33
  • 34. Mapping the components of CORFU onto Ceph LogInterface(e.g.Append) 34 Cluster of flash devices Append( ) Striping and Addressing Fault tolerance Custom hardware interface Replication
  • 35. A cluster of OSDs rather than raw storage devices LogInterface(e.g.Append) 35 Cluster of OSDs Append( ) Striping and Addressing Fault tolerance Custom hardware interface OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD
  • 36. Ceph handles fault-tolerance transparently LogInterface(e.g.Append) 36 Cluster of OSDs Append( ) Striping and Addressing Fault tolerance Custom hardware interface OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD Replication
  • 37. Log distribution becomes object naming issue LogInterface(e.g.Append) 37 Append( ) Striping and Addressing Object naming Custom hardware interface OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD Replication Cluster of OSDs
  • 38. Storage interface built using RADOS object classes LogInterface(e.g.Append) 38 Append( ) Striping and Addressing Object naming RADOS object class OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD Replication Cluster of OSDs
  • 39. ZLog is an implementation of CORFU on Ceph osd osd osd 39 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec parallel appendreads BlueStore RocksDBLMDB
  • 40. ZLog is an implementation of CORFU on Ceph osd osd osd 40 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec parallel appendreads BlueStore RocksDBLMDB Transparent changes to hardware, software, tuning
  • 41. ZLog is an implementation of CORFU on Ceph osd osd osd 41 clients 1, 2, 3, 4... 0 1 2 3 4 5 6 7 8 9 10 11 12 RPC: Tail, Next std::atomic<u64> seq; tail = seq; next = seq++; Sequencer 100K-1M pos/sec parallel appendreads BlueStore RocksDBLMDB Transparent changes to hardware, software, tuning Integration with features like tiering, erasure coding, and I/O hinting
  • 42. The ZLog project is open-source on Github ● https://github.com/cruzdb/zlog ● Development backend (LMDB) ● Tools to build Ceph plugin ● High and low-level APIs 42 // build a backend instance auto backend = std::unique_ptr<zlog::Backend>( new zlog::storage::ceph::CephBackend(ioctx)); // open the log zlog::Log log; int ret = zlog::Log::CreateWithBackend(std::move(backend), "mylog", &log); // append to the log uint64_t pos; log.Append(Slice(“foo”), &pos);
  • 43. The ZLog project is open-source on Github ● https://github.com/cruzdb/zlog ● Development backend (LMDB) ● Tools to build Ceph plugin ● High and low-level APIs 43 // build a backend instance auto backend = std::unique_ptr<zlog::Backend>( new zlog::storage::ceph::CephBackend(ioctx)); // open the log zlog::Log log; int ret = zlog::Log::CreateWithBackend(std::move(backend), "mylog", &log); // append to the log uint64_t pos; log.Append(Slice(“foo”), &pos);
  • 44. The ZLog project is open-source on Github ● https://github.com/cruzdb/zlog ● Development backend (LMDB) ● Tools to build Ceph plugin ● High and low-level APIs 44 // build a backend instance auto backend = std::unique_ptr<zlog::Backend>( new zlog::storage::ceph::CephBackend(ioctx)); // open the log zlog::Log log; int ret = zlog::Log::CreateWithBackend(std::move(backend), "mylog", &log); // append to the log uint64_t pos; log.Append(Slice(“foo”), &pos);
  • 45. The ZLog project is open-source on Github ● https://github.com/cruzdb/zlog ● Development backend (LMDB) ● Tools to build Ceph plugin ● High and low-level APIs 45 // build a backend instance auto backend = std::unique_ptr<zlog::Backend>( new zlog::storage::ceph::CephBackend(ioctx)); // open the log zlog::Log log; int ret = zlog::Log::CreateWithBackend(std::move(backend), "mylog", &log); // append to the log uint64_t pos; log.Append(Slice(“foo”), &pos);
  • 46. Building applications on RADOS using object classes 46
  • 47. Design decisions for the ZLog I/O path 47 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy
  • 48. Design decisions for the ZLog I/O path 48 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object
  • 49. Design decisions for the ZLog I/O path 49 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects
  • 50. Design decisions for the ZLog I/O path 50 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects ● … but not too many objects
  • 51. Design decisions for the ZLog I/O path 51 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects ● … but not too many objects ● Changes to striping policy
  • 52. Design decisions for the ZLog I/O path 52 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects ● … but not too many objects ● Changes to striping policy ● Coordinated metadata update
  • 53. Design decisions for the ZLog I/O path 53 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects ● … but not too many objects ● Changes to striping policy ● Coordinated metadata update ● Treat an object like a consensus API
  • 54. Design decisions for the ZLog I/O path 54 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy ● Log entry → Object ● Stripe log across objects ● … but not too many objects ● Changes to striping policy ● Coordinated metadata update ● Treat an object like a consensus API
  • 55. The CORFU storage interface ● write(obj, position, data) ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 55
  • 56. The CORFU storage interface ● write(obj, position, data) ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 56
  • 57. The CORFU storage interface ● write(obj, position, data) ○ Write-once: position must be open ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 57
  • 58. The CORFU storage interface ● write(obj, position, data) ○ Write-once: position must be open ○ Request must be tagged with up-to-date epoch ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 58
  • 59. The CORFU storage interface ● write(obj, position, data) ○ Write-once: position must be open ○ Request must be tagged with up-to-date epoch ○ Position must not fall within a GC’d range ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 59
  • 60. The CORFU storage interface ● write(obj, position, data) ○ Write-once: position must be open ○ Request must be tagged with up-to-date epoch ○ Position must not fall within a GC’d range ○ Optionally update maximum position observed ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 60
  • 61. The CORFU storage interface ● write(obj, position, data) ○ Write-once: position must be open ○ Request must be tagged with up-to-date epoch ○ Position must not fall within a GC’d range ○ Optionally update maximum position observed ○ Atomic update ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 61
  • 62. librados won’t [currently] cut it for this job ● write(obj, position, data) ○ Write-once: position must be open ○ Request must be tagged with up-to-date epoch ○ Position must not fall within a GC’d range ○ Optionally update maximum position observed ○ Atomic update ● read(obj, position) ● invalidate(obj, position) ● trim(obj, position) ○ Mark for GC ● seal(obj, epoch) 62
  • 63. CORFU write interface as an object class 63 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy int write(context_t hctx, bufferlist *in, bufferlist *out) { // ensure up-to-date client if (epoch_guard(header, op.epoch(), false)) { return -ESPIPE; } // read entry based on request position ceph::bufferlist bl; int ret = cls_cxx_map_get_val(hctx, key, &bl); if (ret < 0) return ret; if (entry && !decode(bl, entry)) return -EIO; return 0; // write entry if position is not taken if (ret == -ENOENT) { entry_write_entry(hctx, key, entry); if (!header.has_max_pos() || (op.pos() > header.max_pos())) header.set_max_pos(op.pos()); ret = entry_write_header(hctx, header); } else { // handle errors associated with write-once semantics } }
  • 64. CORFU write interface as an object class 64 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy int write(context_t hctx, bufferlist *in, bufferlist *out) { // ensure up-to-date client if (epoch_guard(header, op.epoch(), false)) { return -ESPIPE; } // read entry based on request position ceph::bufferlist bl; int ret = cls_cxx_map_get_val(hctx, key, &bl); if (ret < 0) return ret; if (entry && !decode(bl, entry)) return -EIO; return 0; // write entry if position is not taken if (ret == -ENOENT) { entry_write_entry(hctx, key, entry); if (!header.has_max_pos() || (op.pos() > header.max_pos())) header.set_max_pos(op.pos()); ret = entry_write_header(hctx, header); } else { // handle errors associated with write-once semantics } } connection to client
  • 65. CORFU write interface as an object class 65 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy int write(context_t hctx, bufferlist *in, bufferlist *out) { // ensure up-to-date client if (epoch_guard(header, op.epoch(), false)) { return -ESPIPE; } // read entry based on request position ceph::bufferlist bl; int ret = cls_cxx_map_get_val(hctx, key, &bl); if (ret < 0) return ret; if (entry && !decode(bl, entry)) return -EIO; return 0; // write entry if position is not taken if (ret == -ENOENT) { entry_write_entry(hctx, key, entry); if (!header.has_max_pos() || (op.pos() > header.max_pos())) header.set_max_pos(op.pos()); ret = entry_write_header(hctx, header); } else { // handle errors associated with write-once semantics } } connection to client
  • 66. CORFU write interface as an object class 66 RADOS object classes CORFU storage interface OSD librados ZLog client library Striping policy int write(context_t hctx, bufferlist *in, bufferlist *out) { // ensure up-to-date client if (epoch_guard(header, op.epoch(), false)) { return -ESPIPE; } // read entry based on request position ceph::bufferlist bl; int ret = cls_cxx_map_get_val(hctx, key, &bl); if (ret < 0) return ret; if (entry && !decode(bl, entry)) return -EIO; return 0; // write entry if position is not taken if (ret == -ENOENT) { entry_write_entry(hctx, key, entry); if (!header.has_max_pos() || (op.pos() > header.max_pos())) header.set_max_pos(op.pos()); ret = entry_write_header(hctx, header); } else { // handle errors associated with write-once semantics } } connection to client
  • 67. Getting started with object classes 1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc) a. Super well documented, easy examples 67
  • 68. Getting started with object classes 1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc) a. Super well documented, easy examples 2. Build and load the plugin into Ceph 68
  • 69. Getting started with object classes 1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc) a. Super well documented, easy examples 2. Build and load the plugin into Ceph 3. Old way to manage plugins a. Manage your own Ceph fork 69
  • 70. Getting started with object classes 1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc) a. Super well documented, easy examples 2. Build and load the plugin into Ceph 3. Old way to manage plugins a. Manage your own Ceph fork 4. New way to manage plugins with SDK (credit: Neha Ojha @ b7215b0) a. dnf install rados-objclass-devel b. #include <rados/objclass.h> c. compile cls_hello.cc as object library 70
  • 71. Getting started with object classes 1. The hello world object class (in Ceph: src/cls/hello/cls_hello.cc) a. Super well documented, easy examples 2. Build and load the plugin into Ceph 3. Old way to manage plugins a. Manage your own Ceph fork 4. New way to manage plugins with SDK (credit: Neha Ojha @ b7215b0) a. dnf install rados-objclass-devel b. #include <rados/objclass.h> c. compile cls_hello.cc as object library 5. Distribute your plugin to OSDs at <libdir>/rados-classes/cls_hello.so 6. Starting making requests ioctx::exec(“hello”, …) 71
  • 72. The CORFU seal(obj, epoch) interface is tougher ● Semantics are do the following atomically ○ Verify and update the stored epoch ○ Return the largest position written ● RADOS object classes don’t support this mix of ops ○ Currently ● Solution ○ Split the operation ○ Verify ● Luckily this isn’t a fast path 72
  • 73. How we have been using ZLog 73
  • 74. PostgreSQL logical replication with ZLog 74 0 1 2 3 4 5 6 7 8 9 10 11 12 osd osd osd INSERT UPDATE DELETEReplay Replay Replay Replay Replay C C C C Queries ZLog https://github.com/cruzdb/pg_zlog
  • 75. CruzDB is an immutable key-value store on ZLog 75 0 1 2 3 4 5 6 7 8 9 10 11 12 osd osd osd GET PUT DELETEReplay Replay Replay Replay Replay C C C C Key/Value Transactions ZLog https://github.com/cruzdb/cruzdb
  • 76. CruzDB is an immutable key-value store on ZLog 76 0 1 2 3 4 5 6 7 8 9 10 11 12 osd osd osd GET PUT DELETEReplay Replay Replay Replay Replay C C C C Key/Value Transactions ZLog Database is a copy-on-write red-black tree https://github.com/cruzdb/cruzdb
  • 77. CruzDB is an immutable key-value store on ZLog 77 0 1 2 3 4 5 6 7 8 9 10 11 12 osd osd osd GET PUT DELETEReplay Replay Replay Replay Replay C C C C Key/Value Transactions ZLog Database is a copy-on-write red-black tree Efficient read-only snapshots https://github.com/cruzdb/cruzdb
  • 78. CruzDB is an immutable key-value store on ZLog 78 0 1 2 3 4 5 6 7 8 9 10 11 12 osd osd osd GET PUT DELETEReplay Replay Replay Replay Replay C C C C Key/Value Transactions ZLog Database is a copy-on-write red-black tree Efficient read-only snapshotsTime travel capabilities https://github.com/cruzdb/cruzdb
  • 79. Wrapping up ● We use Ceph as a prototyping system ○ Application-specific interfaces ● There is a broad range of interests in log-oriented interfaces ● We’ve built a Ceph-based implementation of a high-performance log ○ ZLog @ https://github.com/cruzdb/zlog ● Using it for several real-world use cases ○ PostgreSQL logical replication (https://github.com/cruzdb/pg_zlog) ○ CruzDB immutable database (https://github.com/cruzdb/cruzdb) 79
  • 80. Thank you and questions ● Noah Watkins (@noahdesu) ● Learning resources ○ Object class SDK documentation ■ http://docs.ceph.com/docs/master/rados/api/objclass-sdk/ ○ ZLog is the only user of the SDK I’m aware of ■ https://github.com/cruzdb/zlog/blob/master/src/storage/ceph/cls_zlog.cc ○ The Ceph tree contains a ton of object classes for reference ■ https://github.com/ceph/ceph/tree/master/src/cls ○ Writing object classes using Lua ■ https://ceph.com/geen-categorie/dynamic-object-interfaces-with-lua/ ■ https://nwat.xyz/blog/2018/03/13/video-of-my-talk-at-lua-workshop-2017/ 80