3. 3
•Ceph in <30s
•Ceph, a little bit more
•Ceph in the wild
•Orchestration
•Community status
•What’s Next?
•Questions
The plan, Stan
Welcome!
4. 4
On commodity hardware
Ceph can run on any
infrastructure, metal
or virtualized to
provide a cheap and
powerful storage
cluster.
Object, block, and file
Low overhead doesn’t
mean just hardware,
it means people too!
Awesomesauce
Infrastructure-aware
placement algorithm
allows you to do really
cool stuff.
Huge and beyond
Designed for exabyte,
current
implementations in
the multi-petabyte.
HPC, Big Data, Cloud,
raw storage.
…besides wicked-awesome?
What is Ceph?
Software All-in-1 CRUSH Scale
5. 5
Find out more!
Ceph.com
…but you can find out more
Use it today
Dreamhost.com/cloud/DreamObjects
Get Support
Inktank.com
That WAS fast
6. 6
OBJECTS VIRTUAL DISKS FILES & DIRECTORIES
CEPH
FILE SYSTEM
A distributed, scale-out
filesystem with POSIX
semantics that provides
storage for a legacy and
modern applications
CEPH
GATEWAY
A powerful S3- and Swift-
compatible gateway that
brings the power of the
Ceph Object Store to
modern applications
CEPH
BLOCK DEVICE
A distributed virtual block
device that delivers high-
performance, cost-effective
storage for virtual machines
and legacy applications
CEPH OBJECT STORE
A reliable, easy to manage, next-generation distributed object
store that provides storage of unstructured data for applications
19. 1
9
Object && Block
Via RBD and RGW (Swift API)
Our BFF
Identity
Via Keystone
More coming!
Work continues with updates in
Havana and Icehouse.
OpenStack
20. 2
0
Block
Alternate primary, and secondary
Community maintained
Community
Wido from 42on.com
More coming in 4.2!
Snapshot & backup support
Cloning (layering) support
No NFS for system VMs
Secondary/Backup storage (s3)
CloudStack
21. 2
1
A blatent ripoff!
Primary Storage Flow
•The mgmt server never talks
to the Ceph cluster
•One mgmt server can
manage 1000s of hypervisors
•Mgmt server can be clustered
•Multiple Ceph clusters/pools
can be added to CloudStack
cluster
22. 2
2
A pretty package
A commercially
packaged OpenStack
solution back by
Ceph.
RADOS for Archipelago
Virtual server
management
software tool on top
of Xen or KVM.
RBD backed
Complete
virtualization
management with
KVM and containers.
BBC territory
Talk next week in
Berlin
So many delicious flavors
Other Cloud
SUSE Cloud Ganeti Proxmox OpenNebula
23. 2
3
Since 2.6.35
Kernel clients for RBD
and CephFS. Active
development as a
Linux file system.
iSCSI ahoy!
One of the Linux iSCSI
target frameworks.
Emulates: SBC (disk),
SMC (jukebox), MMC
(CD/DVD), SSC (tape),
OSD.
Getting creative
Creative community
member used Ceph to
back their VMWare
infrastructure via
fibre channel.
You can always use more friends
Project Intersection
Kernel STGT VMWare
Love me!
Slightly out-of-date.
Some work has been
done, but could use
some love.
Wireshark
24. 2
4
CephFS
CephFS can serve as a
drop-in replacement
for HDFS.
Upstream
Ceph vfs module
upstream samba.
CephFS or RBD
Reexporting CephFS
or RBD for NFS/CIFS.
MOAR projects
Project Intersection
Hadoop Samba Ganesha
Recently Open Source
Commercially
supported product
from Citrix. Recently
Open Sourced. Still a
bit of a tech preview.
XenServer
25. 2
5
Support for libvirt
XenServer can manipulate Ceph!
Don’t let the naming fool you, it’s easy
Blktap{2,3,asplode}
Qemu; new boss, same as the old boss
(but not really)
What’s in a name?
Ceph :: XenServer :: Libvirt
Block device :: VDI :: storage vol
Pool :: Storage Repo :: storage pool
Doing it with Xen*
26. 2
6
Thanks David Scott!
XenServer host arch
Xapi, XenAPI
xenopsd S M adapters
libvirt
libxl ceph ocfs2
libxenguest libxc qemu
xen
Client
(CloudStack, OpenStack, XenDesktop)
27. 2
7
Come for the block
Stay for the object and file
No matter what you use!
Reduced Overhead
Easier to manage one cluster
“Other Stuff”
CephFS prototypes
fast development profile
ceph-devel
lots of partner action
Gateway Drug
28. 2
8
Squash Hotspots
Multiple hosts = parallel workload
But what does that mean?
Instant Clones
No time to boot for many images
Live migration
Shared storage allows you to
move instances between compute
nodes transparently.
Blocks are delicious
29. 2
9
Flexible APIs
Native support for swift and s3
And less filling!
Secondary Storage
Coming with 4.2
Horizontal Scaling
Easy with HAProxy or others
Objects can juggle
30. 3
0
Neat prototypes
Image distribution to hypervisors
You can dress them up, but you can’t take them anywhere
Still early
You can fix that!
Outside uses
Great way to combine resources.
Files are tricksy
32. 3
2
Procedural, Ruby
Written in Ruby, this
is more of the dev-
side of DevOps. Once
you get past the
learning curve it’s
powerful though.
Model-driven
Aimed more at the
sysadmin, this
procedural tool has a
very wide penetration
(even on Windows!).
Agentless, whole stack
Using the built-in
OpenSSH in your OS,
this super easy tool
goes further up the
stack than most.
Fast, 0MQ
Using ZeroMQ this tool
is designed for massive
scale and fast, fast, fast.
Unfortunately 0MQ has
no built in encryption.
The new hotness
Orchestration
Chef Puppet Ansible Salt
33. 3
3
Canonical Unleashed
Being language
agnostic, this tool can
completely encapsulate
a service. Can also
handle provisioning all
the way down to
hardware.
Dell has skin in the game
Complete operations
platform that can dive
all the way down to
BIOS/RAID level.
Others are joining in
Custom provisioning
and
orchestration, just
one example of how
busy this corner of
the market is.
Doing it w/o a tool
If you prefer not to
use a tool, Ceph gives
you an easy way to
deploy your cluster by
hand.
MOAR HOTNESS
Orchestration Cont’d
Juju Crowbar ComodIT Ceph-deploy
40. 4
0
An ongoing process
While the first pass
for disaster recovery
is done, we want to
get to built-in, world-
wide replication.
Reception efficiency
Currently underway
in the community!
Headed to dynamic
Can already do this in
a static pool-based
setup. Looking to get
to a use-based
migration.
Making it open-er
Been talking about it
forever. The time is
coming!
Hop on board!
The Ceph Train
Geo-Replication Erasure Coding Tiering Governance
41. 4
1
Quarterly Online Summit
Online summit puts
the core devs together
with the Ceph
community.
Not just for NYC
More planned,
including Santa Clara
and London. Keep an
eye out:
http://inktank.com/cephdays/
Geek-on-duty
During the week
there are times when
Ceph experts are
available to help. Stop
by oftc.net/ceph
Email makes the world go
Our mailing lists are
very active, check out
ceph.com for details
on how to join in!
Open Source is Open!
Get Involved!
CDS Ceph Day IRC Lists
42. 4
2
http://wiki.ceph.com/04De
velopment/Project_Ideas
Lists, blueprints,
sideboard, paper cuts,
etc.
http://tracker.ceph.com/
All the things!
New #ceph-devel
Splitting off developer
chatter to make it
easier to filter
discussions.
http://ceph.com/resources
/mailing-list-irc/
Our mailing lists are
very active, check out
ceph.com for details
on how to join in!
Patches welcome
Projects
Wiki Redmine IRC Lists
43. 4
3
Comments? Anything for the good of the cause?
Questions?
E-MAIL
patrick@inktank.com
WEBSITE
Ceph.com
SOCIAL
@scuttlemonkey
@ceph
Facebook.com/cephstorage
Notas del editor
The way CRUSH is configured is somewhat unique. Instead of defining pools for different data types, workgroups, subnets, or applications, CRUSH is configured with the physical topology of your storage network. You tell it how many buildings, rooms, shelves, racks, and nodes you have, and you tell it how you want data placed. For example, you could tell CRUSH that it’s okay to have two replicas in the same building, but not on the same power circuit. You also tell it how many copies to keep.
With CRUSH, the first thing that happens is the data gets split into a certain number of sections. These are called “placement groups”. The number of placement groups is configurable. Then, the CRUSH algorithm is invoked, passing along the latest cluster map and a set of placement rules, and it determines where the placement group belongs in the cluster. This is a pseudo-random calculation, but it’s also repeatable; given the same cluster state and rule set, it will always return the same results.
Each placement group is run through CRUSH and stored in the cluster. Notice how no node has received more than one copy of a placement group, and no two nodes contain the same information? That’s important.
When it comes time to store an object in the cluster (or retrieve one), the client calculates where it belongs.
What happens, though, when a node goes down? The OSDs are always talking to each other (and the monitors), and they know when something is amiss. The third and fifth node on the top row have noticed that the second node on the bottom row is gone, and they are also aware that they have replicas of the missing data.
What happens, though, when a node goes down? The OSDs are always talking to each other (and the monitors), and they know when something is amiss. The third and fifth node on the top row have noticed that the second node on the bottom row is gone, and they are also aware that they have replicas of the missing data.
The OSDs collectively use the CRUSH algorithm to determine how the cluster should look based on its new state, and move the data to where clients running CRUSH expect it to be.
Because of the way placement is calculated instead of centrally controlled, node failures are transparent to clients.
4.2 ready (working on RBD java bindings)QEMU and libvirt are creating images in format 1, hacky stuff to make format 2RBD for Primary and RGW S3 for Secondary (templates, backups, isos)
You can have a management server which is communicating to all of your agents (hypervisors)Management servers can by clustered for HA/failover or performance
Client -> XenAPI ->Domain manager -> xen control library -> standard xen libraries && “upstream” qemuStorage plugins -> libvirt support (experimental build) -> ceph && ocfs2