SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Disaster Recovery and Ceph Block Storage
Introducing Multi-Site Mirroring
Jason Dillaman
RBD Project Technical Lead
Vault 2017
WHAT IS CEPH ALL ABOUT
2
▪ Software-defined distributed storage
▪ All components scale horizontally
▪ No single point of failure
▪ Self-manage whenever possible
▪ Hardware agnostic, commodity hardware
▪ Object, block, and file in a single cluster
▪ Open source (LGPL)
RGW
A web services gateway for
object storage, compatible with
S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-distributed
block device with cloud
platform integration
CEPHFS
A distributed file system with
POSIX semantics and scale-out
metadata management
OBJECT BLOCK FILE
CEPH COMPONENTS
3
▪ Block device abstraction
▪ Striped over fixed-size objects
▪ Highlights
▪ Broad integration
▪ Thinly provisioned
▪ Snapshots
▪ Copy-on-write clones
RADOS BLOCK DEVICE
4
LIBRADOS
LIBRBD
M M
M
RADOS CLUSTER
RADOS BLOCK DEVICE
5
KRBD
User-space Client Linux Host
LIBCEPH
IMAGE UPDATES
MIRRORING MOTIVATION
6
▪ Massively scalable and fault tolerant design
▪ What about data center failures?
▪ Data is the “special sauce”
▪ Failure to plan is planning to fail
▪ Snapshot-based incremental backups
▪ Desire online, continuous backups
▪ a la RBD Mirroring
▪ Replication needs to be asynchronous
▪ IO shouldn’t be blocked due to slow WAN
▪ Support transient connectivity issues
▪ Replication needs to be crash consistent
▪ Respect write barriers
▪ Easy management
▪ Expect failure can happen anytime
MIRRORING DESIGN PRINCIPLES
7
▪ Journal-based approach to log all modifications
▪ Support access by multiple clients
▪ Client-side operation
▪ Event logs appended into journal objects
▪ Delay all image modifications until event safe
▪ Commit journal events once image modification safe
▪ Provides an ordered view of all updates to an image
MIRRORING DESIGN FOUNDATION
8
▪ Typical IO path with journaling:
a. Create an event to describe the update
b. Asynchronously append event to journal object
c. Update in-memory image cache
d. Blocks cache writeback for affected extent
e. Completes IO to client
f. Unblock writeback once event is safe
JOURNAL DESIGN
9
▪ IO path with journaling w/o cache:
a. Create an event to describe the update
b. Asynchronously append event to journal object
c. Asynchronously update image once event is safe
d. Complete IO to client once update is safe
JOURNAL DESIGN
10
JOURNAL DESIGN
11
LIBRADOS
LIBRBD
M M
M
RADOS CLUSTER
User-space Client
WRITES READS
JOURNAL EVENTS
IMAGE UPDATES
▪ Mirroring peers are configured per-pool
▪ Enabling mirroring is a per-image property
▪ Requires that the journal image feature is enabled
▪ Image is primary (R/W) or non-primary (R/O)
▪ Replication is handled by rbd-mirror daemon
MIRRORING OVERVIEW
12
RBD-MIRROR DAEMON
▪ Requires access to remote and local clusters
▪ Responsible for image (re)sync
▪ Replays journal to achieve consistency
▪ Pulls journal event from remote cluster
▪ Applies event to local cluster
▪ Commits journal event
▪ Trim journal
▪ Transparently handles failover/failback
▪ Two-way replication between two sites
▪ One-way replication between N sites
13
RBD-MIRROR DAEMON
14
SITE B
Client
LIBRBD
M
CLUSTER A
MM M
CLUSTER B
MM
SITE A
RBD-MIRROR
LIBRBD
JOURNAL
EVENTS
IMAGE
UPDATES
IMAGE
UPDATES
MIRRORING SETUP
15
▪ Deploy rbd-mirror daemon on each cluster
▪ Jewel/Kraken only support single active daemon per-cluster
▪ Provide uniquely named “ceph.conf” for each remote cluster
▪ Create pools with same name on each cluster
▪ Enable pool mirroring (rbd mirror pool enable)
▪ Specify peer cluster via rbd CLI (rbd mirror pool peer add)
▪ Enable image journaling feature (rbd feature enable journaling)
▪ Enable image mirroring (rbd mirror image enable)
▪ Per-image failover / failback
▪ Coordinated demotion / promotion
▪ (rbd mirror image demote)
▪ (rbd mirror image promote)
▪ Uncoordinated promotion + resync
▪ (rbd mirror image promote --force)
▪ Resync from force-promotion / split-brain
▪ (rbd mirror image resync)
SITE FAILOVER
16
▪ Write IOs have worst-case 2x performance hit
▪ Journal event append
▪ Image object write
▪ In-memory cache can mask hit if working set fits
▪ Only supported by librbd-based clients
CAVEATS
17
▪ Use a small SSD/NVMe-backed pool for journals
▪ ‘rbd journal pool = <fast pool name>’
▪ Batch multiple events into a single journal append
▪ ‘rbd journal object flush age = <seconds>’
▪ Increase journal data width to match queue depth
▪ ‘rbd journal splay width = <number of objects>’
▪ Future work: potentially parallelize journal append + image
write between write barriers
MITIGATION
18
▪ Active/Active rbd-mirror daemons
▪ Deferred replication and deletion
▪ “Deep Scrub” of replicated images
▪ Smarter image resynchronization
▪ Improved health status reporting
▪ Improved pool promotion process
FUTURE FEATURES
19
▪ Design for failure
▪ Ceph now provides additional tools for full-scale disaster
recovery
▪ RBD workloads can seamlessly relocate between geographic
sites
▪ Feedback is welcome
BE PREPARED
20
Questions?
21
THANK YOU!
Jason Dillaman
RBD Project Tech Lead
dillaman@redhat.com
22

Más contenido relacionado

La actualidad más candente

Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Demistifying open stack storage
Demistifying open stack storageDemistifying open stack storage
Demistifying open stack storageopenstackindia
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Community
 
Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceeNovance
 
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebulaOpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebulaOpenNebula Project
 
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...NETWAYS
 
What's New In Grizzly: Nova
What's New In Grizzly: NovaWhat's New In Grizzly: Nova
What's New In Grizzly: NovaMirantis
 
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015Deepak Shetty
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanNETWAYS
 
Deploying openstack using ansible
Deploying openstack using ansibleDeploying openstack using ansible
Deploying openstack using ansibleopenstackindia
 
Building a Microsoft cloud with open technologies
Building a Microsoft cloud with open technologiesBuilding a Microsoft cloud with open technologies
Building a Microsoft cloud with open technologiesAlessandro Pilotti
 
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...NETWAYS
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
Into the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHenginesInto the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHenginesSimon Leinen
 
3 ubuntu open_stack_ceph
3 ubuntu open_stack_ceph3 ubuntu open_stack_ceph
3 ubuntu open_stack_cephopenstackindia
 

La actualidad más candente (19)

Ceph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOceanCeph Tech Talk: Ceph at DigitalOcean
Ceph Tech Talk: Ceph at DigitalOcean
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Demistifying open stack storage
Demistifying open stack storageDemistifying open stack storage
Demistifying open stack storage
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt
 
Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovance
 
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebulaOpenNebula TechDay Waterloo 2015 - Hyperconvergence  and  OpenNebula
OpenNebula TechDay Waterloo 2015 - Hyperconvergence and OpenNebula
 
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
 
What's New In Grizzly: Nova
What's New In Grizzly: NovaWhat's New In Grizzly: Nova
What's New In Grizzly: Nova
 
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
 
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan KoomanOpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman
 
Deploying openstack using ansible
Deploying openstack using ansibleDeploying openstack using ansible
Deploying openstack using ansible
 
Ceph meetup montreal
Ceph meetup montrealCeph meetup montreal
Ceph meetup montreal
 
Building a Microsoft cloud with open technologies
Building a Microsoft cloud with open technologiesBuilding a Microsoft cloud with open technologies
Building a Microsoft cloud with open technologies
 
OpenStack Kolla project update rocky release
OpenStack Kolla project update rocky releaseOpenStack Kolla project update rocky release
OpenStack Kolla project update rocky release
 
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
OpenNebula Conf 2014 | OpenNebula and MooseFS for disaster recovery: real clo...
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Into the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHenginesInto the cold - Object Storage in SWITCHengines
Into the cold - Object Storage in SWITCHengines
 
Hello, Docker!
Hello, Docker!Hello, Docker!
Hello, Docker!
 
3 ubuntu open_stack_ceph
3 ubuntu open_stack_ceph3 ubuntu open_stack_ceph
3 ubuntu open_stack_ceph
 

Destacado

Building a Game With JavaScript (Thinkful LA)
Building a Game With JavaScript (Thinkful LA)Building a Game With JavaScript (Thinkful LA)
Building a Game With JavaScript (Thinkful LA)Thinkful
 
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.YERSON DAVID MARTINEZ CEPEDA
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
Ejemplos de-preguntas-saber-3-lenguaje-2015
Ejemplos de-preguntas-saber-3-lenguaje-2015Ejemplos de-preguntas-saber-3-lenguaje-2015
Ejemplos de-preguntas-saber-3-lenguaje-2015Colegio Tierra Del Fuego
 
Augmented Reality in Education: Present Accomplishments, Future Visions
Augmented Reality in Education: Present Accomplishments, Future VisionsAugmented Reality in Education: Present Accomplishments, Future Visions
Augmented Reality in Education: Present Accomplishments, Future VisionsJulie Evans
 
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...Julie Evans
 
3Com 7030-10082
3Com 7030-100823Com 7030-10082
3Com 7030-10082savomir
 
2017 SCA (Qld) Into the Future
2017 SCA (Qld)  Into the Future 2017 SCA (Qld)  Into the Future
2017 SCA (Qld) Into the Future StrataMax
 
Анализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернелАнализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернелZero Science Lab
 
Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Dev_Events
 

Destacado (20)

Ceph Object Store
Ceph Object StoreCeph Object Store
Ceph Object Store
 
Building a Game With JavaScript (Thinkful LA)
Building a Game With JavaScript (Thinkful LA)Building a Game With JavaScript (Thinkful LA)
Building a Game With JavaScript (Thinkful LA)
 
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.
ACCIDENTES DE TRABAJO, CAUSAS, EFECTOS Y PREVENCIÓN.
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Didáctica Crítica
Didáctica Crítica Didáctica Crítica
Didáctica Crítica
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Review of the survey
Review of the surveyReview of the survey
Review of the survey
 
Myriam ortiz
Myriam ortizMyriam ortiz
Myriam ortiz
 
Chicago SD Drawings
Chicago SD DrawingsChicago SD Drawings
Chicago SD Drawings
 
Didáctica Crítica
Didáctica Crítica Didáctica Crítica
Didáctica Crítica
 
Ejemplos de-preguntas-saber-3-lenguaje-2015
Ejemplos de-preguntas-saber-3-lenguaje-2015Ejemplos de-preguntas-saber-3-lenguaje-2015
Ejemplos de-preguntas-saber-3-lenguaje-2015
 
Augmented Reality in Education: Present Accomplishments, Future Visions
Augmented Reality in Education: Present Accomplishments, Future VisionsAugmented Reality in Education: Present Accomplishments, Future Visions
Augmented Reality in Education: Present Accomplishments, Future Visions
 
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...
Connecting the Dots for Digital Learning: Listening to the Ideas and Views of...
 
Presentación3
Presentación3Presentación3
Presentación3
 
Kudryashova Irina potfolio
Kudryashova Irina potfolioKudryashova Irina potfolio
Kudryashova Irina potfolio
 
3Com 7030-10082
3Com 7030-100823Com 7030-10082
3Com 7030-10082
 
proyecto de quimica
proyecto de quimicaproyecto de quimica
proyecto de quimica
 
2017 SCA (Qld) Into the Future
2017 SCA (Qld)  Into the Future 2017 SCA (Qld)  Into the Future
2017 SCA (Qld) Into the Future
 
Анализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернелАнализа на оддалечена експлоатациjа во Linux кернел
Анализа на оддалечена експлоатациjа во Linux кернел
 
Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...Secrets of building a debuggable runtime: Learn how language implementors sol...
Secrets of building a debuggable runtime: Learn how language implementors sol...
 

Similar a Disaster Recovery with Ceph Block Storage Multi-Site Mirroring

RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanCeph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETNikos Kormpakis
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure CodingShinobu KINJO
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure CodingShinobu Kinjo
 
20121102 ceph-in-the-cloud
20121102 ceph-in-the-cloud20121102 ceph-in-the-cloud
20121102 ceph-in-the-cloudCeph Community
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemNETWAYS
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guidesparkfabrik
 
Experts Live Europe 2017 - Why you should care about Docker - an introduction
Experts Live Europe 2017 - Why you should care about Docker - an introductionExperts Live Europe 2017 - Why you should care about Docker - an introduction
Experts Live Europe 2017 - Why you should care about Docker - an introductionMarc Müller
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storageKarol Chrapek
 
Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive Ceph Community
 
Introduction to React Native
Introduction to React NativeIntroduction to React Native
Introduction to React NativeWaqqas Jabbar
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH Ceph Community
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and BeyondSage Weil
 
DevOps Fusion 2019: Docker - Why the future takes place in containers
DevOps Fusion 2019: Docker - Why the future takes place in containersDevOps Fusion 2019: Docker - Why the future takes place in containers
DevOps Fusion 2019: Docker - Why the future takes place in containersMarc Müller
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSJohn McCormack
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tipsSamuel Chow
 

Similar a Disaster Recovery with Ceph Block Storage Multi-Site Mirroring (20)

RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason Dillaman
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Open Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNETOpen Source Storage at Scale: Ceph @ GRNET
Open Source Storage at Scale: Ceph @ GRNET
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure Coding
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure Coding
 
20121102 ceph-in-the-cloud
20121102 ceph-in-the-cloud20121102 ceph-in-the-cloud
20121102 ceph-in-the-cloud
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
 
Experts Live Europe 2017 - Why you should care about Docker - an introduction
Experts Live Europe 2017 - Why you should care about Docker - an introductionExperts Live Europe 2017 - Why you should care about Docker - an introduction
Experts Live Europe 2017 - Why you should care about Docker - an introduction
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storage
 
Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive Ceph Day LA - RBD: A deep dive
Ceph Day LA - RBD: A deep dive
 
Introduction to React Native
Introduction to React NativeIntroduction to React Native
Introduction to React Native
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
 
Node.js an Exectutive View
Node.js an Exectutive ViewNode.js an Exectutive View
Node.js an Exectutive View
 
DevOps Fusion 2019: Docker - Why the future takes place in containers
DevOps Fusion 2019: Docker - Why the future takes place in containersDevOps Fusion 2019: Docker - Why the future takes place in containers
DevOps Fusion 2019: Docker - Why the future takes place in containers
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tips
 

Último

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 

Último (20)

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 

Disaster Recovery with Ceph Block Storage Multi-Site Mirroring

  • 1. Disaster Recovery and Ceph Block Storage Introducing Multi-Site Mirroring Jason Dillaman RBD Project Technical Lead Vault 2017
  • 2. WHAT IS CEPH ALL ABOUT 2 ▪ Software-defined distributed storage ▪ All components scale horizontally ▪ No single point of failure ▪ Self-manage whenever possible ▪ Hardware agnostic, commodity hardware ▪ Object, block, and file in a single cluster ▪ Open source (LGPL)
  • 3. RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully-distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale-out metadata management OBJECT BLOCK FILE CEPH COMPONENTS 3
  • 4. ▪ Block device abstraction ▪ Striped over fixed-size objects ▪ Highlights ▪ Broad integration ▪ Thinly provisioned ▪ Snapshots ▪ Copy-on-write clones RADOS BLOCK DEVICE 4
  • 5. LIBRADOS LIBRBD M M M RADOS CLUSTER RADOS BLOCK DEVICE 5 KRBD User-space Client Linux Host LIBCEPH IMAGE UPDATES
  • 6. MIRRORING MOTIVATION 6 ▪ Massively scalable and fault tolerant design ▪ What about data center failures? ▪ Data is the “special sauce” ▪ Failure to plan is planning to fail ▪ Snapshot-based incremental backups ▪ Desire online, continuous backups ▪ a la RBD Mirroring
  • 7. ▪ Replication needs to be asynchronous ▪ IO shouldn’t be blocked due to slow WAN ▪ Support transient connectivity issues ▪ Replication needs to be crash consistent ▪ Respect write barriers ▪ Easy management ▪ Expect failure can happen anytime MIRRORING DESIGN PRINCIPLES 7
  • 8. ▪ Journal-based approach to log all modifications ▪ Support access by multiple clients ▪ Client-side operation ▪ Event logs appended into journal objects ▪ Delay all image modifications until event safe ▪ Commit journal events once image modification safe ▪ Provides an ordered view of all updates to an image MIRRORING DESIGN FOUNDATION 8
  • 9. ▪ Typical IO path with journaling: a. Create an event to describe the update b. Asynchronously append event to journal object c. Update in-memory image cache d. Blocks cache writeback for affected extent e. Completes IO to client f. Unblock writeback once event is safe JOURNAL DESIGN 9
  • 10. ▪ IO path with journaling w/o cache: a. Create an event to describe the update b. Asynchronously append event to journal object c. Asynchronously update image once event is safe d. Complete IO to client once update is safe JOURNAL DESIGN 10
  • 11. JOURNAL DESIGN 11 LIBRADOS LIBRBD M M M RADOS CLUSTER User-space Client WRITES READS JOURNAL EVENTS IMAGE UPDATES
  • 12. ▪ Mirroring peers are configured per-pool ▪ Enabling mirroring is a per-image property ▪ Requires that the journal image feature is enabled ▪ Image is primary (R/W) or non-primary (R/O) ▪ Replication is handled by rbd-mirror daemon MIRRORING OVERVIEW 12
  • 13. RBD-MIRROR DAEMON ▪ Requires access to remote and local clusters ▪ Responsible for image (re)sync ▪ Replays journal to achieve consistency ▪ Pulls journal event from remote cluster ▪ Applies event to local cluster ▪ Commits journal event ▪ Trim journal ▪ Transparently handles failover/failback ▪ Two-way replication between two sites ▪ One-way replication between N sites 13
  • 14. RBD-MIRROR DAEMON 14 SITE B Client LIBRBD M CLUSTER A MM M CLUSTER B MM SITE A RBD-MIRROR LIBRBD JOURNAL EVENTS IMAGE UPDATES IMAGE UPDATES
  • 15. MIRRORING SETUP 15 ▪ Deploy rbd-mirror daemon on each cluster ▪ Jewel/Kraken only support single active daemon per-cluster ▪ Provide uniquely named “ceph.conf” for each remote cluster ▪ Create pools with same name on each cluster ▪ Enable pool mirroring (rbd mirror pool enable) ▪ Specify peer cluster via rbd CLI (rbd mirror pool peer add) ▪ Enable image journaling feature (rbd feature enable journaling) ▪ Enable image mirroring (rbd mirror image enable)
  • 16. ▪ Per-image failover / failback ▪ Coordinated demotion / promotion ▪ (rbd mirror image demote) ▪ (rbd mirror image promote) ▪ Uncoordinated promotion + resync ▪ (rbd mirror image promote --force) ▪ Resync from force-promotion / split-brain ▪ (rbd mirror image resync) SITE FAILOVER 16
  • 17. ▪ Write IOs have worst-case 2x performance hit ▪ Journal event append ▪ Image object write ▪ In-memory cache can mask hit if working set fits ▪ Only supported by librbd-based clients CAVEATS 17
  • 18. ▪ Use a small SSD/NVMe-backed pool for journals ▪ ‘rbd journal pool = <fast pool name>’ ▪ Batch multiple events into a single journal append ▪ ‘rbd journal object flush age = <seconds>’ ▪ Increase journal data width to match queue depth ▪ ‘rbd journal splay width = <number of objects>’ ▪ Future work: potentially parallelize journal append + image write between write barriers MITIGATION 18
  • 19. ▪ Active/Active rbd-mirror daemons ▪ Deferred replication and deletion ▪ “Deep Scrub” of replicated images ▪ Smarter image resynchronization ▪ Improved health status reporting ▪ Improved pool promotion process FUTURE FEATURES 19
  • 20. ▪ Design for failure ▪ Ceph now provides additional tools for full-scale disaster recovery ▪ RBD workloads can seamlessly relocate between geographic sites ▪ Feedback is welcome BE PREPARED 20
  • 22. THANK YOU! Jason Dillaman RBD Project Tech Lead dillaman@redhat.com 22