SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Everything You Thought
You Already Knew About
Orchestration
Laura Frank
Director of Engineering,
Codeship
Managing Distributed State with Raft
Quorum 101
Leader Election
Log Replication
Service Scheduling
Failure Recovery
Agenda
bonus debugging tips!
They’re trying to get a collection of nodes to behave
like a single node.
• How does the system maintain state?
• How does work get scheduled?
The Big Problem(s)
What are tools like Swarm and Kubernetes trying to do?
Manager
leader
WorkerWorker
Manager
follower
Manager
follower
WorkerWorker Worker
raft consensus group
So You Think You
Have Quorum
Quorum
The minimum number of votes that a consensus
group needs in order to be allowed to perform
an operation.
Without quorum, your system can’t do work.
Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
Having two managers instead of
one actually doubles your
chances of losing quorum.
Pay attention to datacenter topology when placing
managers.
Quorum With Multiple Regions
Manager Nodes Distribution across 3 Regions
3 1-1-1
5 1-2-2
7 3-2-2
9 3-3-3
magically works with
Docker for AWS
Let’s talk about Raft!
I think I’ll just write my
own distributed
consensus algorithm.
-no sensible person
Log replication
Leader election
Safety (won’t talk about this much today)
Raft is responsible for…
Being easier to understand
Orchestration systems typically use a key/value
store backed by a consensus algorithm
In a lot of cases, that algorithm is Raft!
Raft is used everywhere…
…that etcd is used
SwarmKit implements the
Raft algorithm directly.
In most cases, you don’t want to run
work on your manager nodes
docker node update --availability drain <NODE>
Participating in a Raft consensus group is work, too.
Make your manager nodes unavailable for tasks:
*I will run work on managers for educational purposes
Leader Election &
Log Replication
Manager
leader
Manager
candidate
Manager
follower
Manager
offline
demo.consensus.group
The log is the source of truth for
your application.
In the context of distributed computing (and this
talk), a log is an append-only, time-based record
of data.
2 10 30 25 5 12first entry append entry here!
This log is for computers, not humans.
2 10 30 25 5 12
Server
12
Client
12
In simple systems, the log is pretty straightforward.
In a manager group, that log entry can only “become
truth” once it is confirmed from the majority of
followers (quorum!)
Client
12
Manager
follower
Manager
follower
Manager
leader
demo.consensus.group
In distributed computing, it’s essential that you
understand log replication.
bit.ly/logging-post
Debugging Tip
Watch the Raft logs.
Monitor via inotifywait OR
just read them directly!
Scheduling
HA application
problems
scheduling
problems
orchestrator problems
Scheduling constraints
Restrict services to specific nodes, such as specific
architectures, security levels, or types
docker service create 
--constraint 'node.labels.type==web' my-app
New in 17.04.0-ce
Topology-aware scheduling!!1!
Implements a spread strategy over nodes that belong to a
certain category.
Unlike --constraint, this is a “soft” preference
—placement-pref ‘spread=node.labels.dc’
Swarm will not rebalance
healthy tasks when a new
node comes online
Debugging Tip
Add a manager to your
Swarm running with
--availability drain
and in Engine debug mode
Failure Recovery
Losing quorum
• Bring the downed nodes back online (derp)
Regain quorum
• On a healthy manager, run 

docker swarm init --force-new-cluster
This will create a new cluster with one healthy manager
• You need to promote new managers
The datacenter is
on fire
• Bring up a new manager and stop Docker
• sudo rm -rf /var/lib/docker/swarm
• Copy backup to /var/lib/docker/swarm
• Start Docker
• docker swarm init (--force-new-cluster)
Restore from a backup in 5 easy steps!
• In general, users shouldn’t be allowed to modify IP
addresses of nodes
• Restoring from a backup == old IP address for node1
• Workaround is to use elastic IPs with ability to reassign
But wait, there’s a bug… or a feature
Thank You!
@docker
#dockercon

Más contenido relacionado

La actualidad más candente

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

La actualidad más candente (20)

Shifter: Containers in HPC Environments
Shifter: Containers in HPC EnvironmentsShifter: Containers in HPC Environments
Shifter: Containers in HPC Environments
 
DevOps Foundations
DevOps FoundationsDevOps Foundations
DevOps Foundations
 
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in production
 
Docker Swarm for Beginner
Docker Swarm for BeginnerDocker Swarm for Beginner
Docker Swarm for Beginner
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
LinuxKit Deep Dive
LinuxKit Deep DiveLinuxKit Deep Dive
LinuxKit Deep Dive
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Docker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing MeetupDocker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing Meetup
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 

Similar a Everything You Thought You Already Knew About Orchestration

BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
Craig Schumann
 
Drools Presentation for Tallink.ee
Drools Presentation for Tallink.eeDrools Presentation for Tallink.ee
Drools Presentation for Tallink.ee
Anton Arhipov
 

Similar a Everything You Thought You Already Knew About Orchestration (20)

Dzone performancemonitoring2016-mastercode.vn
Dzone performancemonitoring2016-mastercode.vnDzone performancemonitoring2016-mastercode.vn
Dzone performancemonitoring2016-mastercode.vn
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
Architecting for Failure in a Containerized World
Architecting for Failure in a Containerized WorldArchitecting for Failure in a Containerized World
Architecting for Failure in a Containerized World
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 
程序员实践之路
程序员实践之路程序员实践之路
程序员实践之路
 
arch_mtg_sqlsig_hcotter_replication.ppt
arch_mtg_sqlsig_hcotter_replication.pptarch_mtg_sqlsig_hcotter_replication.ppt
arch_mtg_sqlsig_hcotter_replication.ppt
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
SharePoint Administration: Tips from the Field
SharePoint Administration: Tips from the FieldSharePoint Administration: Tips from the Field
SharePoint Administration: Tips from the Field
 
Drools Presentation for Tallink.ee
Drools Presentation for Tallink.eeDrools Presentation for Tallink.ee
Drools Presentation for Tallink.ee
 
Windows 7 client performance talk - Jeff Stokes
Windows 7 client performance talk - Jeff StokesWindows 7 client performance talk - Jeff Stokes
Windows 7 client performance talk - Jeff Stokes
 
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
Acc 340 Preview Full Course
Acc 340 Preview Full Course Acc 340 Preview Full Course
Acc 340 Preview Full Course
 
Acc 340 Preview Full Course
Acc 340 Preview Full CourseAcc 340 Preview Full Course
Acc 340 Preview Full Course
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 

Más de Laura Frank Tacho

Más de Laura Frank Tacho (8)

The Container Shame Spiral
The Container Shame SpiralThe Container Shame Spiral
The Container Shame Spiral
 
Using Docker For Development
Using Docker For DevelopmentUsing Docker For Development
Using Docker For Development
 
Deploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKSDeploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKS
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Efficient Parallel Testing with Docker
Efficient Parallel Testing with DockerEfficient Parallel Testing with Docker
Efficient Parallel Testing with Docker
 
Stop Being Lazy and Test Your Software
Stop Being Lazy and Test Your SoftwareStop Being Lazy and Test Your Software
Stop Being Lazy and Test Your Software
 
Happier Teams Through Tools
Happier Teams Through ToolsHappier Teams Through Tools
Happier Teams Through Tools
 
Rails Applications with Docker
Rails Applications with DockerRails Applications with Docker
Rails Applications with Docker
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Everything You Thought You Already Knew About Orchestration