21. 21
CEPH
OBJECT GATEWAY
A powerful S3- and Swift-
compatible gateway that
brings the power of the
Ceph Object Store to
modern applications
CEPH
BLOCKDEVICE
A distributed virtual block
device that delivers high-
performance, cost-
effective storage for
virtual machines and
legacy applications
CEPH
FILESYSTEM
A distributed, scale-out
filesystem with POSIX
semantics that provides
storage for a legacy and
modern applications
OBJECTS VIRTUAL DISKS FILES & DIRECTORIES
CEPH STORAGECLUSTER
A reliable, easy to manage, next-generation distributed object
store that provides storage of unstructured data for applications
22. 22
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
23. 23
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
26. 26
Monitors:
• Maintain cluster membership
and state
• Provide consensus for
distributed decision-making
• Small, odd number
• These do not serve stored
objects to clients
M
OSDs:
• 10s to 10000s in a cluster
• One per disk
• (or one per SSD, RAID group…)
• Serve stored objects to
clients
• Intelligently peer to perform
replication and recovery tasks
27. 27
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
29. L
LIBRADOS
• Provides direct access to
RADOS for applications
• C, C++, Python, PHP, Java, Erl
ang
• Direct access to storage nodes
• No HTTP overhead
30. 30
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-
managing, intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
32. 32
RADOS Gateway:
• REST-based object storage
proxy
• Uses RADOS to store objects
• API supports
buckets, accounts
• Usage accounting for billing
• Compatible with S3 and
Swift applications
33. 33
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
37. 37
RADOS Block Device:
• Storage of disk images in
RADOS
• Decouples VMs from host
• Images are striped across the
cluster (pool)
• Snapshots
• Copy-on-write clones
• Support in:
• Mainline Linux Kernel (2.6.39+)
• Qemu/KVM, native Xen coming
soon
• OpenStack, CloudStack, Nebula,
Proxmox
38. 38
RADOS
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS,
with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD
A reliable and fully-
distributed block
device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS
A POSIX-compliant
distributed file
system, with a Linux
kernel client and
support for FUSE
RADOSGW
A bucket-based REST
gateway, compatible
with S3 and Swift
APP APP HOST/VM CLIENT
40. 40
Metadata Server
• Manages metadata for a
POSIX-compliant shared
filesystem
• Directory hierarchy
• File metadata
(owner, timestamps, mode, et
c.)
• Stores metadata in RADOS
• Does not serve file data to
clients
• Only required for shared
filesystem
74. Getting Started With Ceph
Read about the latest version of Ceph.
• The latest stuff is always at http://ceph.com/get
Deploy a test cluster using ceph-deploy.
• Read the quick-start guide at http://ceph.com/qsg
Deploy a test cluster on the AWS free-tier using Juju.
• Read the guide at http://ceph.com/juju
Read the rest of the docs!
• Find docs for the latest release at http://ceph.com/docs
74
Have a working cluster up quickly.
75. Getting Involved With Ceph
Most project discussion happens on the mailing list.
• Join or view archives at http://ceph.com/list
IRC is a great place to get help (or help others!)
• Find details and historical logs at http://ceph.com/irc
The tracker manages our bugs and feature requests.
• Register and start looking around at http://ceph.com/tracker
Doc updates and suggestions are always welcome.
• Learn how to contribute docs at http://ceph.com/docwriting
75
Help build the best storage system around!
76. Ceph Cuttlefish (v0.61.x)
1. New ceph-deploy provisioning tool
2. New Chef cookbooks
3. Fully-tested packages for RHEL (in EPEL)
4. RGW authentication management API
5. RADOS pool quotas
6. New ceph df
7. RBD incremental snapshots
76
Best Ceph ever.
Hi, welcome to my talk. I’m really happy that you chose to join me for this, given your many other choices. Believe me, I’m going to tell you things that will literally tear your head off. Ok, not literally. That would be really messy.
Working through a computer means that we can store more information, and we can store it more quickly. But it also means that we’re separated from the information we’ve created.
Working through a computer means that we can store more information, and we can store it more quickly. But it also means that we’re separated from the information we’ve created.
Ceph was designed to be self-managing. Lots of distributed storage systems require operator intervention when something goes wrong.
RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, the Ceph team has built three applications that allow you to store data and do fantastic things. But before we get into all of that, let’s start at the beginning of the story.
But that’s a lot to digest all at once. Let’s start with RADOS.
Remember all that meta-data we talked about in the beginning? Feels so long ago. It has to be stored somewhere! Something has to keep track of who created files, when they were created, and who has the right to access them. And something has to remember where they live within a tree. Enter MDS, the Ceph Metadata Server. Clients accessing Ceph FS data first make a request to an MDS, which provides what they need to get files from the right OSDs.
If you aren’t running Ceph FS, you don’t need to deploy metadata servers.
So now that you know what Ceph is, I’m going to tell you what makes it different.
All of that metadata for Ceph FS has to be stored somewhere. It’s a giant diary, keeping track of where everything is and who owns it.
MDSs store all of their data within RADOS itself, but there’s still a problem…
There are multiple MDSs!
So how do you have one tree and multiple servers?
If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.
When the second one comes along, it will intelligently partition the work by taking a subtree.
When the third MDS arrives, it will attempt to split the tree again.
Same with the fourth.
A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.