SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Ceph: a decade in the making and
still going strong
Sage Weil
Today the part of Sage Weil will be played by...
RESEARCH
INCUBATIONRESEARCH
INKTANKINCUBATIONRESEARCH
Research beginnings
8
RESEARCH
UCSC research grant
● “Petascale object storage”
●
DOE: LANL, LLNL, Sandia
● Scalability, reliability, performance
● HPC file system workloads
● Scalable metadata management
● First line of Ceph code
● Summer internship at LLNL
●
High security national lab environment
● Could write anything, as long as it was OSS
The rest of Ceph
● RADOS – distributed object storage cluster (2005)
● EBOFS – local object storage (2004/2006)
● CRUSH – hashing for the real world (2005)
● Paxos monitors – cluster consensus (2006)
→ emphasis on consistent, reliable storage
→ scale by pushing intelligence to the edges
→ a different but compelling architecture
Industry black hole
● Many large storage vendors
●
Proprietary solutions that don't scale well
● Few open source alternatives (2006)
●
Very limited scale, or
● Limited community and architecture (Lustre)
●
No enterprise feature sets (snapshots, quotas)
● PhD grads all built interesting systems...
●
...and then went to work for Netapp, DDN, EMC, Veritas.
● They want you, not your project
A different path?
● Change the storage world with open source
●
Do what Linux did to Solaris, Irix, Ultrix, etc.
● License
●
LGPL: share changes, okay to link to proprietary code
● Avoid unfriendly practices
●
Dual licensing
● Copyright assignment
● Platform
● Remember sourceforge.net?
Incubation
15
INCUBATIONRESEARCH
DreamHost!
● Move back to LA, continue hacking
● Hired a few developers
● Pure development
● No deliverables
Ambitious feature set
● Native Linux kernel client (2007-)
● Per-directory snapshots (2008)
● Recursive accounting (2008)
● Object classes (2009)
● librados (2009)
● radosgw (2009)
● strong authentication (2009)
● RBD: rados block device (2010)
The kernel client
● ceph-fuse was limited, not very fast
● Build native Linux kernel implementation
● Began attending Linux file system developer events (LSF)
● Early words of encouragement from ex-Lustre dev
● Engage Linux fs developer community as peer
●
Initial attempts merge rejected by Linus
● Not sufficient evidence of user demand
● A few fans and would-be users chimed in...
● Eventually merged for v2.6.34 (early 2010)
Part of a larger ecosystem
● Ceph need not solve all problems as monolithic stack
● Replaced ebofs object file system with btrfs
● Same design goals; avoid reinventing the wheel
●
Robust, supported, well-optimized
● Kernel-level cache management
●
Copy-on-write, checksumming, other goodness
● Contributed some early functionality
●
Cloning files
● Async snapshots
Budding community
● #ceph on irc.oftc.net, ceph-devel@vger.kernel.org
● Many interested users
● A few developers
● Many fans
● Too unstable for any real deployments
● Still mostly focused on right architecture and technical
solutions
Road to product
● DreamHost decides to build an S3-compatible object
storage service with Ceph
● Stability
● Focus on core RADOS, RBD, radosgw
● Paying back some technical debt
● Build testing automation
●
Code review!
● Expand engineering team
The reality
● Growing incoming commercial interest
●
Early attempts from organizations large and small
● Difficult to engage with a web hosting company
● No means to support commercial deployments
● Project needed a company to back it
●
Fund the engineering effort
● Build and test a product
●
Support users
● Orchestrated a spin out of DreamHost in 2012
Inktank
24
INKTANKINCUBATIONRESEARCH
Do it right
● How do we build a strong open source company?
● How do we build a strong open source community?
● Models?
● Red Hat, SUSE, Cloudera, MySQL, Canonical, …
● Initial funding from DreamHost, Mark Shuttleworth
Goals
● A stable Ceph release for production deployment
●
DreamObjects
● Lay foundation for widespread adoption
●
Platform support (Ubuntu, Red Hat, SUSE)
● Documentation
●
Build and test infrastructure
● Build a sales and support organization
● Expand engineering organization
Branding
● Early decision to engage professional agency
● Terms like
● “Brand core”
●
“Design system”
● Company vs Project
●
Inktank != Ceph
● Establish a healthy relationship with the community
● Aspirational messaging: The Future of Storage
Slick graphics
●
broken powerpoint template
29
Traction
● Too many production deployments to count
●
We don't know about most of them!
● Too many customers (for me) to count
● Growing partner list
● Lots of buzz
● OpenStack
Quality
● Increased adoption means increased demands on robust
testing
● Across multiple platforms
●
Include platforms we don't use
● Upgrades
●
Rolling upgrades
● Inter-version compatibility
Developer community
● Significant external contributors
● First-class feature contributions from contributors
● Non-Inktank participants in daily stand-ups
● External access to build/test lab infrastructure
● Common toolset
● Github
●
Email (kernel.org)
● IRC (oftc.net)
● Linux distros
CDS: Ceph Developer Summit
● Community process for building project roadmap
● 100% online
● Google hangouts
●
Wikis
● Etherpad
● First was in Spring 2013, 6th
was in October, next in Mar
● Great feedback, growing participation
● Indoctrinating our own developers to an open
development model
And then...
s/Red Hat of Storage/Storage of Red Hat/
Calamari
● Inktank strategy was to package Ceph for the Enterprise
● Inktank Ceph Enterprise (ICE)
● Ceph: a hardened, tested, validated version
●
Calamari: management layer and GUI (proprietary!)
● Enterprise integrations: SNMP, HyperV, VMWare
●
Support SLAs
● Red Hat model is pure open source
● Open sourced Calamari
The Present
36
Tiering
● Client side caches are great, but only buy so much.
● Can we separate hot and cold data onto different storage
devices?
●
Cache pools: promote hot objects from an existing pool into a fast
(e.g., FusionIO) pool
● Cold pools: demote cold data to a slow, archival pool (e.g.,
erasure coding, NYI)
● Very Cold Pools (efficient erasure coding, compression, osd spin
down to save power) OR tape/public cloud
● How do you identify what is hot and cold?
● Common in enterprise solutions; not found in open source
scale-out systems
→ cache pools new in Firefly, better in Giant, continued in
Hammer
Erasure coding
● Replication for redundancy is flexible and fast
● For larger clusters, it can be expensive
● We can trade recovery performance for storage
● Erasure coded data is hard to modify, but ideal for cold or
read-only objects
● Cold storage tiering
●
Will be used directly by radosgw
Storage
overhead
Repair
traffic
MTTDL
(days)
3x replication 3x 1x 2.3 E10
RS (10, 4) 1.4x 10x 3.3 E13
LRC (10, 6, 5) 1.6x 5x 1.2 E15
Erasure coding (cont'd)
● In firefly
● LRC in Giant
● Intel ISA-L (optimized library) in Giant, maybe backported
to Firefly
● Talk of ARM optimized (NEON) jerasure
Async Replication in RADOS
● Clinic project with Harvey Mudd
● Group of students working on real world project
● Reason the bounds on clock drift so we can achieve point-
in-time consistency across a distributed set of nodes
CephFS
● Dogfooding for internal QA infrastructure
● Learning lots
● Many rough edges, but working quite well!
● We want to hear from you!
The Future
43
CephFS
→ This is where it all started – let's get there
● Today
● QA coverage and bug squashing continues
● NFS and CIFS now large complete and robust
●
Multi-MDS stability continues to improve
● Need
● QA investment
● Snapshot work
● Amazing community effort
The larger ecosystem
Storage backends
● Backends are pluggable
● Recent work to use rocksdb everywhere leveldb can be
used (mon/osd); can easily plug in other key/value store
libraries
● Other possibilities include LMDB or NVNKV (from fusionIO)
● Prototype kinetic backend
● Alternative OSD backends
●
KeyValueStore – put all data in a k/v db (Haomai @ UnitedStack)
●
KeyFileStore initial plans (2nd
gen?)
●
Some partners looking at backends tuned to their hardware
Governance
How do we strengthen the project community?
● Acknowledge Sage's role as maintainer / BDL
● Recognize project leads
● RBD, RGW, RADOS, CephFS, Calamari, etc.
● Formalize processes around CDS, community roadmap
● Formal foundation?
● Community build and test lab infrastructure (getting IPs this
week!)
● Build and test for broad range of OSs, distros, hardware
Technical roadmap
● How do we reach new use-cases and users?
● How do we better satisfy existing users?
● How do we ensure Ceph can succeed in enough markets
for business investment to thrive?
● Enough breadth to expand and grow the community
● Enough focus to do well
Performance
● Lots of work with partners to improve performance
● High-end flash back ends. Optimize hot paths to limit CPU
usage, drive up IOPS
●
Improve threading, fine-grained locks
● Low-power processors. Run well on small ARM devices
(including those new-fangled ethernet drives)
Ethernet Drives
● Multiple vendors are building 'ethernet drives'
● Normal hard drives w/ small ARM host on board
● Could run OSD natively on the drive, completely remove
the “host” from the deployment
● Many different implementations, some vendors need help w/ open
architecture and ecosystem concepts
● Current devices are hard disks; no reason they couldn't
also be flash-based, or hybrid
● This is exactly what we were thinking when Ceph was
originally designed!
Big data
Why is “big data” built on such a weak storage model?
● Move computation to the data
● Evangelize RADOS classes
● librados case studies and proof points
● Build a general purpose compute and storage platform
The enterprise
How do we pay for all our toys?
● Support legacy and transitional interfaces
● iSCSI, NFS, pNFS, CIFS
● Vmware, Hyper-v
●
Identify the beachhead use-cases
● Only takes one use-case to get in the door
● Single platform – shared storage resource
● Bottom-up: earn respect of engineers and admins
● Top-down: strong brand and compelling product
Why we can beat the old guard
● It is hard to compete with free and open source software
●
Unbeatable value proposition
● Ultimately a more efficient development model
● It is hard to manufacture community
● Strong foundational architecture
●
Native protocols, Linux kernel support
● Unencumbered by legacy protocols like NFS
● Move beyond traditional client/server model
● Ongoing paradigm shift
● Software defined infrastructure, data center
Thanks!

Más contenido relacionado

La actualidad más candente

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
buildacloud
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
inside-BigData.com
 

La actualidad más candente (20)

[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Bluestore
BluestoreBluestore
Bluestore
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Accelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oFAccelerating Ceph with RDMA and NVMe-oF
Accelerating Ceph with RDMA and NVMe-oF
 

Destacado

Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Ceph Community
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
eNovance
 
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
eNovance
 
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
eNovance
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
Ian Colle
 

Destacado (20)

Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Object Store
Ceph Object StoreCeph Object Store
Ceph Object Store
 
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
Ceph Object Storage at Spreadshirt (July 2015, Ceph Berlin Meetup)
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
Best Practices with Ceph as Distributed, Intelligent, Unified Cloud Storage -...
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
 
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
OpenStack in Action! 5 - OpenStack Fundation - Behind the scenes: How we prod...
 
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
OpenStack in Action! 5 - Red Hat - Accelerate Your Business in the Open Hybri...
 
Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack Ceph de facto storage backend for OpenStack
Ceph de facto storage backend for OpenStack
 
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarryCeph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
Ceph, Xen, and CloudStack: Semper Melior-XPUS13 McGarry
 
Private Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStackPrivate Cloud mit Ceph und OpenStack
Private Cloud mit Ceph und OpenStack
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
 

Similar a Introduction into Ceph storage for OpenStack

Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Community
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
CloudStack - Open Source Cloud Computing Project
 

Similar a Introduction into Ceph storage for OpenStack (20)

Ceph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade inCeph Day New York: Ceph: one decade in
Ceph Day New York: Ceph: one decade in
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
 
Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote Ceph Day SF 2015 - Keynote
Ceph Day SF 2015 - Keynote
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's CephCeph Day NYC: Building Tomorrow's Ceph
Ceph Day NYC: Building Tomorrow's Ceph
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
Introduction to OpenStack Storage
Introduction to OpenStack StorageIntroduction to OpenStack Storage
Introduction to OpenStack Storage
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Ceph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage WeilCeph, the future of Storage - Sage Weil
Ceph, the future of Storage - Sage Weil
 
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 

Más de OpenStack_Online

IBM Cloud OpenStack Services
IBM Cloud OpenStack ServicesIBM Cloud OpenStack Services
IBM Cloud OpenStack Services
OpenStack_Online
 
Turning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platformTurning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platform
OpenStack_Online
 

Más de OpenStack_Online (10)

Building a cloud ready linux image locally using KVM
Building a cloud ready linux image locally using KVM Building a cloud ready linux image locally using KVM
Building a cloud ready linux image locally using KVM
 
How Dell and Intel are Optimizing OpenStack Clouds
How Dell and Intel are Optimizing OpenStack CloudsHow Dell and Intel are Optimizing OpenStack Clouds
How Dell and Intel are Optimizing OpenStack Clouds
 
Platform9 deployment models for OpenStack
Platform9 deployment models for OpenStackPlatform9 deployment models for OpenStack
Platform9 deployment models for OpenStack
 
MidoNet gives OpenStack Neutron a Boost
MidoNet gives OpenStack Neutron a BoostMidoNet gives OpenStack Neutron a Boost
MidoNet gives OpenStack Neutron a Boost
 
Automating OpenStack clouds and beyond w/ StackStorm
Automating OpenStack clouds and beyond w/ StackStormAutomating OpenStack clouds and beyond w/ StackStorm
Automating OpenStack clouds and beyond w/ StackStorm
 
OpenStack Trove Update - Juno, Kilo and Beyond
OpenStack Trove Update - Juno, Kilo and BeyondOpenStack Trove Update - Juno, Kilo and Beyond
OpenStack Trove Update - Juno, Kilo and Beyond
 
IBM Cloud OpenStack Services
IBM Cloud OpenStack ServicesIBM Cloud OpenStack Services
IBM Cloud OpenStack Services
 
What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?
 
Turning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platformTurning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platform
 
z/VM and OpenStack
z/VM and OpenStackz/VM and OpenStack
z/VM and OpenStack
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Introduction into Ceph storage for OpenStack

  • 1. Ceph: a decade in the making and still going strong Sage Weil
  • 2. Today the part of Sage Weil will be played by...
  • 3.
  • 4.
  • 10. UCSC research grant ● “Petascale object storage” ● DOE: LANL, LLNL, Sandia ● Scalability, reliability, performance ● HPC file system workloads ● Scalable metadata management ● First line of Ceph code ● Summer internship at LLNL ● High security national lab environment ● Could write anything, as long as it was OSS
  • 11. The rest of Ceph ● RADOS – distributed object storage cluster (2005) ● EBOFS – local object storage (2004/2006) ● CRUSH – hashing for the real world (2005) ● Paxos monitors – cluster consensus (2006) → emphasis on consistent, reliable storage → scale by pushing intelligence to the edges → a different but compelling architecture
  • 12.
  • 13. Industry black hole ● Many large storage vendors ● Proprietary solutions that don't scale well ● Few open source alternatives (2006) ● Very limited scale, or ● Limited community and architecture (Lustre) ● No enterprise feature sets (snapshots, quotas) ● PhD grads all built interesting systems... ● ...and then went to work for Netapp, DDN, EMC, Veritas. ● They want you, not your project
  • 14. A different path? ● Change the storage world with open source ● Do what Linux did to Solaris, Irix, Ultrix, etc. ● License ● LGPL: share changes, okay to link to proprietary code ● Avoid unfriendly practices ● Dual licensing ● Copyright assignment ● Platform ● Remember sourceforge.net?
  • 17. DreamHost! ● Move back to LA, continue hacking ● Hired a few developers ● Pure development ● No deliverables
  • 18. Ambitious feature set ● Native Linux kernel client (2007-) ● Per-directory snapshots (2008) ● Recursive accounting (2008) ● Object classes (2009) ● librados (2009) ● radosgw (2009) ● strong authentication (2009) ● RBD: rados block device (2010)
  • 19. The kernel client ● ceph-fuse was limited, not very fast ● Build native Linux kernel implementation ● Began attending Linux file system developer events (LSF) ● Early words of encouragement from ex-Lustre dev ● Engage Linux fs developer community as peer ● Initial attempts merge rejected by Linus ● Not sufficient evidence of user demand ● A few fans and would-be users chimed in... ● Eventually merged for v2.6.34 (early 2010)
  • 20. Part of a larger ecosystem ● Ceph need not solve all problems as monolithic stack ● Replaced ebofs object file system with btrfs ● Same design goals; avoid reinventing the wheel ● Robust, supported, well-optimized ● Kernel-level cache management ● Copy-on-write, checksumming, other goodness ● Contributed some early functionality ● Cloning files ● Async snapshots
  • 21. Budding community ● #ceph on irc.oftc.net, ceph-devel@vger.kernel.org ● Many interested users ● A few developers ● Many fans ● Too unstable for any real deployments ● Still mostly focused on right architecture and technical solutions
  • 22. Road to product ● DreamHost decides to build an S3-compatible object storage service with Ceph ● Stability ● Focus on core RADOS, RBD, radosgw ● Paying back some technical debt ● Build testing automation ● Code review! ● Expand engineering team
  • 23. The reality ● Growing incoming commercial interest ● Early attempts from organizations large and small ● Difficult to engage with a web hosting company ● No means to support commercial deployments ● Project needed a company to back it ● Fund the engineering effort ● Build and test a product ● Support users ● Orchestrated a spin out of DreamHost in 2012
  • 26. Do it right ● How do we build a strong open source company? ● How do we build a strong open source community? ● Models? ● Red Hat, SUSE, Cloudera, MySQL, Canonical, … ● Initial funding from DreamHost, Mark Shuttleworth
  • 27. Goals ● A stable Ceph release for production deployment ● DreamObjects ● Lay foundation for widespread adoption ● Platform support (Ubuntu, Red Hat, SUSE) ● Documentation ● Build and test infrastructure ● Build a sales and support organization ● Expand engineering organization
  • 28. Branding ● Early decision to engage professional agency ● Terms like ● “Brand core” ● “Design system” ● Company vs Project ● Inktank != Ceph ● Establish a healthy relationship with the community ● Aspirational messaging: The Future of Storage
  • 30. Traction ● Too many production deployments to count ● We don't know about most of them! ● Too many customers (for me) to count ● Growing partner list ● Lots of buzz ● OpenStack
  • 31. Quality ● Increased adoption means increased demands on robust testing ● Across multiple platforms ● Include platforms we don't use ● Upgrades ● Rolling upgrades ● Inter-version compatibility
  • 32. Developer community ● Significant external contributors ● First-class feature contributions from contributors ● Non-Inktank participants in daily stand-ups ● External access to build/test lab infrastructure ● Common toolset ● Github ● Email (kernel.org) ● IRC (oftc.net) ● Linux distros
  • 33. CDS: Ceph Developer Summit ● Community process for building project roadmap ● 100% online ● Google hangouts ● Wikis ● Etherpad ● First was in Spring 2013, 6th was in October, next in Mar ● Great feedback, growing participation ● Indoctrinating our own developers to an open development model
  • 34. And then... s/Red Hat of Storage/Storage of Red Hat/
  • 35. Calamari ● Inktank strategy was to package Ceph for the Enterprise ● Inktank Ceph Enterprise (ICE) ● Ceph: a hardened, tested, validated version ● Calamari: management layer and GUI (proprietary!) ● Enterprise integrations: SNMP, HyperV, VMWare ● Support SLAs ● Red Hat model is pure open source ● Open sourced Calamari
  • 37. Tiering ● Client side caches are great, but only buy so much. ● Can we separate hot and cold data onto different storage devices? ● Cache pools: promote hot objects from an existing pool into a fast (e.g., FusionIO) pool ● Cold pools: demote cold data to a slow, archival pool (e.g., erasure coding, NYI) ● Very Cold Pools (efficient erasure coding, compression, osd spin down to save power) OR tape/public cloud ● How do you identify what is hot and cold? ● Common in enterprise solutions; not found in open source scale-out systems → cache pools new in Firefly, better in Giant, continued in Hammer
  • 38. Erasure coding ● Replication for redundancy is flexible and fast ● For larger clusters, it can be expensive ● We can trade recovery performance for storage ● Erasure coded data is hard to modify, but ideal for cold or read-only objects ● Cold storage tiering ● Will be used directly by radosgw Storage overhead Repair traffic MTTDL (days) 3x replication 3x 1x 2.3 E10 RS (10, 4) 1.4x 10x 3.3 E13 LRC (10, 6, 5) 1.6x 5x 1.2 E15
  • 39. Erasure coding (cont'd) ● In firefly ● LRC in Giant ● Intel ISA-L (optimized library) in Giant, maybe backported to Firefly ● Talk of ARM optimized (NEON) jerasure
  • 40. Async Replication in RADOS ● Clinic project with Harvey Mudd ● Group of students working on real world project ● Reason the bounds on clock drift so we can achieve point- in-time consistency across a distributed set of nodes
  • 41. CephFS ● Dogfooding for internal QA infrastructure ● Learning lots ● Many rough edges, but working quite well! ● We want to hear from you!
  • 42.
  • 44. CephFS → This is where it all started – let's get there ● Today ● QA coverage and bug squashing continues ● NFS and CIFS now large complete and robust ● Multi-MDS stability continues to improve ● Need ● QA investment ● Snapshot work ● Amazing community effort
  • 46. Storage backends ● Backends are pluggable ● Recent work to use rocksdb everywhere leveldb can be used (mon/osd); can easily plug in other key/value store libraries ● Other possibilities include LMDB or NVNKV (from fusionIO) ● Prototype kinetic backend ● Alternative OSD backends ● KeyValueStore – put all data in a k/v db (Haomai @ UnitedStack) ● KeyFileStore initial plans (2nd gen?) ● Some partners looking at backends tuned to their hardware
  • 47. Governance How do we strengthen the project community? ● Acknowledge Sage's role as maintainer / BDL ● Recognize project leads ● RBD, RGW, RADOS, CephFS, Calamari, etc. ● Formalize processes around CDS, community roadmap ● Formal foundation? ● Community build and test lab infrastructure (getting IPs this week!) ● Build and test for broad range of OSs, distros, hardware
  • 48. Technical roadmap ● How do we reach new use-cases and users? ● How do we better satisfy existing users? ● How do we ensure Ceph can succeed in enough markets for business investment to thrive? ● Enough breadth to expand and grow the community ● Enough focus to do well
  • 49. Performance ● Lots of work with partners to improve performance ● High-end flash back ends. Optimize hot paths to limit CPU usage, drive up IOPS ● Improve threading, fine-grained locks ● Low-power processors. Run well on small ARM devices (including those new-fangled ethernet drives)
  • 50. Ethernet Drives ● Multiple vendors are building 'ethernet drives' ● Normal hard drives w/ small ARM host on board ● Could run OSD natively on the drive, completely remove the “host” from the deployment ● Many different implementations, some vendors need help w/ open architecture and ecosystem concepts ● Current devices are hard disks; no reason they couldn't also be flash-based, or hybrid ● This is exactly what we were thinking when Ceph was originally designed!
  • 51. Big data Why is “big data” built on such a weak storage model? ● Move computation to the data ● Evangelize RADOS classes ● librados case studies and proof points ● Build a general purpose compute and storage platform
  • 52. The enterprise How do we pay for all our toys? ● Support legacy and transitional interfaces ● iSCSI, NFS, pNFS, CIFS ● Vmware, Hyper-v ● Identify the beachhead use-cases ● Only takes one use-case to get in the door ● Single platform – shared storage resource ● Bottom-up: earn respect of engineers and admins ● Top-down: strong brand and compelling product
  • 53. Why we can beat the old guard ● It is hard to compete with free and open source software ● Unbeatable value proposition ● Ultimately a more efficient development model ● It is hard to manufacture community ● Strong foundational architecture ● Native protocols, Linux kernel support ● Unencumbered by legacy protocols like NFS ● Move beyond traditional client/server model ● Ongoing paradigm shift ● Software defined infrastructure, data center