Pull, don’t push: Architectures for monitoring and configuration in a microservices era

•Descargar como PPTX, PDF•

1 recomendación•439 vistas

Applications today are increasingly being designed using a share-nothing, microservices architecture that is resilient to the failure of individual components, even when built atop cloud infrastructure that can suffer infrequent-but-massive outages. Yet we still see many supporting tools for application monitoring, observability, configuration management and release management using a centralized “orchestration” approach that depends on pushing changes to unreliable distributed systems. In this Sensu Summit 2018 talk, Chef's Julian Dunn & Fletcher Nichol give you a primer about promise theory and the autonomous actor model that underlies the design of products like Sensu and Habitat, why it leads to not only higher overall system reliability but human comprehension for easier operations. They argue that you should consider designing all of your applications and supporting systems in this way. They may even show a demo or two to illustrate how inverting the design radically changes the notion of “application release orchestration”, so that you can retain orchestration-type semantics even with an eventually-consistent system design.

Tecnología

Pull, don’t push!
Architectures for monitoring and configuration in a
microservices era
Julian Dunn, Director of Product Marketing, Chef
@julian_dunn
Fletcher Nichol, Senior Software Development Engineer, Chef
@fnichol

• Modular, self-contained, pre-fabricated components
• Neighbors share components
• Complex shares services as a whole

An ordered set of operations
Across a set of independent machines
Connected to an orchestrator only via a
network.

Humans acting on Microsoft Visio acting on
machines
Humans acting on code acting on machines

An ordered set of operations
Defined in code
Across a set of independent machines
Connected to an orchestrator only via a
network.

mylaptop:~$ ./disable-load-balancer.sh
mylaptop:~$ ssh db01 do-database-migration.sh
mylaptop:~$ for i in app01 app02; do
> ssh $i do-deployment.sh
> done
mylaptop:~$ ./enable-load-balancer.sh

Problems with Orchestration
Resilience Scalability
Deployment Technical
Operational Cognitive

Deployment Resilience
for i in app01 app02 app03; do
do-deploy.sh –server $i
done

Deployment Resilience
for i in app01 app02 app03; do
do-deploy.sh –server $i
if $? != 0; then
failed=$i
break
end
done
# what goes down here?
# roll back $failed?
# roll back all others?
# ignore it?

Operational Resilience
Orchestration Backplane – must be up at all times!
Application Plane – delegated resilience to the backplane

Operational Resilience
Orchestration Backplane
Application Plane
Orchestration Backplane

Mainframes
Time Sharing
Client/Server
Web 1.0
Web 2.0
Cloud
Internet of
Things
Edge
Time
Distributed
Centralized
The Future Is Distributed

Distributed Devices Need Distributed Management
• Adaptive
Learning
• Configuration
Updates
• Software
Updates

Distributed, Autonomous Systems
Make progress towards promised
desired state
Expose interfaces to allow others to
verify promises
Can promise to take certain behaviors
in the face of failure of others

The Design of Sensu
and
The Design of Habitat

The Design of Sensu vs. Traditional “Monitoring”
Nagios master
Agent
1
Agent
2
1. Poll
(orchestrate)
2. Run
checks
1. Run
checks
Agent
1
Agent
2
Sensu Backend
2. Post data

Habitat supervisor in a nutshell
•Network-connected supervision system
•Like systemd+consul/etcd (process supervision with
lifecycle hooks + shared state for reactive realtime change
management)
•Eventually-consistent global state using SWIM masterless
(peer-to-peer) membership protocol

sensu-
backend
hab-sup
sensu-
backend
hab-sup
sensu-
backend
hab-sup
backend.default
sensu-
agent
hab-sup
agent.default
--bind sensu:backend.default
Resolve symbol “sensu” in configs to
properties of service group
backend.default

Let’s See it in Action!
Demo: Sensu running under Habitat

• Modern architectures demand a
choreographed rather than an
orchestrated approach
• At scale, fleet management and
cognitive complexity is the biggest
problem
• Habitat and Sensu are both examples
of edge-centric, autonomous actor
systems, and they work well together
😺

Pull, don’t push: Architectures for monitoring and configuration in a microservices era

Más contenido relacionado

La actualidad más candente

SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...Nicolas Brousse

Triangle Devops Meetup 10/2015aspyker

Netflix oss season 1 episode 3 Ruslan Meshenberg

Netflix Open Source Meetup Season 4 Episode 2aspyker

Netflix and Containers: Not A Stranger Thingaspyker

Running a Massively Parallel Self-serve Distributed Data System At ScaleZhenzhong Xu

Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg

以 Kubernetes 部屬 Spark 大數據計算環境inwin stack

Nova Updates - Kilo EditionOpenStack Foundation

OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...NETWAYS

SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuiteNicolas Brousse

An approach for migrating enterprise apps into open stackArthur Berezin

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...Outlyer

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019Icinga

Nagios Conference 2014 - Luis Contreras - Monitoring SAP System with Nagios CoreNagios

Netflix Open Source Meetup Season 3 Episode 2aspyker

OpenContrail ImplementationsJakub Pavlik

Apache Cassandra Lunch #72: Databricks and CassandraAnant Corporation

CS80A Foothill College Open Source Talkaspyker

Modern Monitoring - SysAdminDay 2017Opsta

La actualidad más candente (20)

SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...

Triangle Devops Meetup 10/2015

Netflix oss season 1 episode 3

Netflix Open Source Meetup Season 4 Episode 2

Netflix and Containers: Not A Stranger Thing

Running a Massively Parallel Self-serve Distributed Data System At Scale

Netflix oss season 2 episode 1 - meetup Lightning talks

以 Kubernetes 部屬 Spark 大數據計算環境

Nova Updates - Kilo Edition

OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...

SuiteWorld16: Mega Volume - How TubeMogul Leverages NetSuite

An approach for migrating enterprise apps into open stack

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019

Nagios Conference 2014 - Luis Contreras - Monitoring SAP System with Nagios Core

Netflix Open Source Meetup Season 3 Episode 2

OpenContrail Implementations

Apache Cassandra Lunch #72: Databricks and Cassandra

CS80A Foothill College Open Source Talk

Modern Monitoring - SysAdminDay 2017

Similar a Pull, don’t push: Architectures for monitoring and configuration in a microservices era

Simplifying SDN Networking Across Private and Public Clouds5nine

Neeraj_Virmani_ResumeNeeraj Virmani

Build Time HackingMohammed Tanveer

TechWiseTV Workshop: Open NX-OS and Devops with Puppet LabsRobb Boyd

Meteor South Bay Meetup - Kubernetes & Google Container EngineKit Merker

Remote sensing and control of an irrigation system using a distributed wirele...nithinreddykaithi

Twelve Factor AppChrist Ngantung

System center 2012 configurations managerBelarmino Tomicha

Application Streaming is dead. A smart way to choose an alternativeDenis Gundarev

Containerization Principles Overview for app development and deploymentDr Ganesh Iyer

Operational Visibiliy and Analytics - BU SeminarCanturk Isci

Meet Puppet's new product lineup 12/7/2017Puppet

Sdn primer pdfPooja Patel

DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...Cisco DevNet

Netflix Cloud Architecture and Open Sourceaspyker

Open shift and docker - october,2014Hojoong Kim

Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceVMware Tanzu

Virtualization 101Gaurav Marwaha

TotalView Debugger On Blue GeneTotalviewtech

Technology insights: Decision Science PlatformDecision Science Community

Similar a Pull, don’t push: Architectures for monitoring and configuration in a microservices era (20)

Simplifying SDN Networking Across Private and Public Clouds

Neeraj_Virmani_Resume

Build Time Hacking

TechWiseTV Workshop: Open NX-OS and Devops with Puppet Labs

Meteor South Bay Meetup - Kubernetes & Google Container Engine

Remote sensing and control of an irrigation system using a distributed wirele...

Twelve Factor App

System center 2012 configurations manager

Application Streaming is dead. A smart way to choose an alternative

Containerization Principles Overview for app development and deployment

Operational Visibiliy and Analytics - BU Seminar

Meet Puppet's new product lineup 12/7/2017

Sdn primer pdf

DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...

Netflix Cloud Architecture and Open Source

Open shift and docker - october,2014

Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service

Virtualization 101

TotalView Debugger On Blue Gene

Technology insights: Decision Science Platform

Más de Sensu Inc.

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...Sensu Inc.

Monitoring Graceful FailureSensu Inc.

Testing and monitoring and broken thingsSensu Inc.

Keynote: Measuring the right thingsSensu Inc.

AIOps & Observability to Lead Your Digital TransformationSensu Inc.

Ecosystem session: Sensu + PuppetSensu Inc.

Assets in Sensu 2.0Sensu Inc.

The Box.com success story: migrating 350K Nagios objects to SensuSensu Inc.

Project 3M: Meaningful Monitoring and MessagingSensu Inc.

Sharing Sensu with Multiple Teams using AnsibleSensu Inc.

Where's My Beer: Building a Better Kegerator with a Raspberry Pi & SensuSensu Inc.

Reimagining SensuSensu Inc.

Alert Fatigue: Avoidance and Course CorrectionSensu Inc.

Sensu and Kubernetes 1.xSensu Inc.

Sensu and PuppetSensu Inc.

Más de Sensu Inc. (15)

Introducing GoAlert: a brand-new on-call scheduling and notification open sou...

Monitoring Graceful Failure

Testing and monitoring and broken things

Keynote: Measuring the right things

AIOps & Observability to Lead Your Digital Transformation

Ecosystem session: Sensu + Puppet

Assets in Sensu 2.0

The Box.com success story: migrating 350K Nagios objects to Sensu

Project 3M: Meaningful Monitoring and Messaging

Sharing Sensu with Multiple Teams using Ansible

Where's My Beer: Building a Better Kegerator with a Raspberry Pi & Sensu

Reimagining Sensu

Alert Fatigue: Avoidance and Course Correction

Sensu and Kubernetes 1.x

Sensu and Puppet

Último

Slack Application Development 101 Slidespraypatel2

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

🐬 The future of MySQL is Postgres 🐘RTylerCroy

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

How to convert PDF to text with Nanonetsnaman860154

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Pull, don’t push: Architectures for monitoring and configuration in a microservices era

1. Pull, don’t push! Architectures for monitoring and configuration in a microservices era Julian Dunn, Director of Product Marketing, Chef @julian_dunn Fletcher Nichol, Senior Software Development Engineer, Chef @fnichol

3. • Modular, self-contained, pre-fabricated components • Neighbors share components • Complex shares services as a whole

6. Orchestration

7. An ordered set of operations Across a set of independent machines Connected to an orchestrator only via a network.

9. Humans acting on Microsoft Visio acting on machines Humans acting on code acting on machines

10. An ordered set of operations Defined in code Across a set of independent machines Connected to an orchestrator only via a network.

11. mylaptop:~$ ./disable-load-balancer.sh mylaptop:~$ ssh db01 do-database-migration.sh mylaptop:~$ for i in app01 app02; do > ssh $i do-deployment.sh > done mylaptop:~$ ./enable-load-balancer.sh

12. Problems with Orchestration Resilience Scalability Deployment Technical Operational Cognitive

13. Deployment Resilience for i in app01 app02 app03; do do-deploy.sh –server $i done

14. Deployment Resilience for i in app01 app02 app03; do do-deploy.sh –server $i if $? != 0; then failed=$i break end done # what goes down here? # roll back $failed? # roll back all others? # ignore it?

15.

16. Operational Resilience

17. Operational Resilience Orchestration Backplane – must be up at all times! Application Plane – delegated resilience to the backplane

18. Operational Resilience Orchestration Backplane Application Plane Orchestration Backplane

19. Cognitive Scalability

20. Cognitive Scalability

21. Technical Scalability

22. Mainframes Time Sharing Client/Server Web 1.0 Web 2.0 Cloud Internet of Things Edge Time Distributed Centralized The Future Is Distributed

23.

24. Distributed Devices Need Distributed Management • Adaptive Learning • Configuration Updates • Software Updates

25. Distributed, Autonomous Systems Make progress towards promised desired state Expose interfaces to allow others to verify promises Can promise to take certain behaviors in the face of failure of others

26. The Design of Sensu and The Design of Habitat

27. The Design of Sensu vs. Traditional “Monitoring” Nagios master Agent 1 Agent 2 1. Poll (orchestrate) 2. Run checks 1. Run checks Agent 1 Agent 2 Sensu Backend 2. Post data

28. Habitat supervisor in a nutshell •Network-connected supervision system •Like systemd+consul/etcd (process supervision with lifecycle hooks + shared state for reactive realtime change management) •Eventually-consistent global state using SWIM masterless (peer-to-peer) membership protocol

29. sensu- backend hab-sup sensu- backend hab-sup sensu- backend hab-sup backend.default sensu- agent hab-sup agent.default --bind sensu:backend.default Resolve symbol “sensu” in configs to properties of service group backend.default

30. Let’s See it in Action! Demo: Sensu running under Habitat

31. • Modern architectures demand a choreographed rather than an orchestrated approach • At scale, fleet management and cognitive complexity is the biggest problem • Habitat and Sensu are both examples of edge-centric, autonomous actor systems, and they work well together 😺

Notas del editor

Fletcher and I were part of the original team that launched Habitat by Chef in 2016; I was the product manager and Fletcher was one of the lead engineers. We both have technical backgrounds, except that we do different jobs now. Fletcher’s computer boots into Linux and mine boots into PowerPoint.
So this is a talk about architecture and systems design, and if we’re going to talk about architecture maybe a good way to think about good architecture is via, well, actual architecture. One of the most famous buildings in the world is the Habitat 67 complex in Montreal, built, as you can see, for Expo 67, which was Canada’s 100th anniversary. Shout out, by the way, to the Canadians in the room, including Sean Porter, Sensu’s CTO; Fletcher and I are both Canadians so we have to make a pitch for the Great White North anytime we're up here. Universal health care! One year of paid maternity leave! Super-hot prime minister! Ok, that's enough of that Anyway, Habitat 67 was such an iconic building that Canada Post put it on the stamp for Canada’s 150th anniversary last year.
Here’s another picture, in its full glory. Probably would have actually used shipping containers today but remember, TEU (standardized) containerization didn’t arrive until the late 1960’s. But the components were standardized as you can see from the middle versus the right One unit’s roof is the other neighbor’s garden Shopping, schools, common services built into the ground floor of each complex These things sound a lot like software architectural principles Every component is responsible for its own resiliency (like Bezos’ infamous memo) Components declare peer-to-peer level dependencies All components share a base substrate of services and management (e.g. deployment, monitoring, observability, etc.)
The Habitat 67 complex is actually quite large
I wanted to put the big pictures up of Habitat 67 because, well, architecture starts to look a lot like architecture, right? These are visual diagrams (probably several years old) of microservice architectures at Amazon and Netflix. When you have complex systems this big, there are architectural patterns you’ll need to put in place to deal with it. Because when you get to something big and complex, your issue isn’t adding more to it – your issue becomes how do you manage this. Today’s talk which is really about how you design complex systems so that you can _manage_ them. It’s better to design systems with these characteristics built-in up front rather than to try and bolt them on later.
Which brings me to the patterns of management for complex systems. Traditionally, we have and in many scenarios we continue to try and manage things using a centralized approach, which I call “orchestration”. So does everyone else, unfortunately, so let me define what I mean by this.
IBM Cloud Orchestrator HP Operations Orchestration VMWare vRealize Orchestrator
But since I’m in the orchestration track I’d better try to define it so that I actually have a talk, right? Here is the definition I'll be using for the rest of the talk. And then I’m still going to tell you how and why that breaks down.
This is a trivial example of orchestration. Last year I said I at least hope you’re doing your orchestration in code, if you’re doing orchestration, because this is pretty awful. And as you can see, it causes downtime because you need to wait for the previous thing to complete before you can proceed with the next one. You can add more fancy error checking and branching to orchestration to try and handle no-downtime deploys, but that orchestration gets really complicated – more complexity means more error conditions means more things that need to be handled.
Resilience Deployment Operational Scalability Technical Cognitive
Treating machines all connected via an unreliable network as an atomic unit to which updates must be applied in full, or not at all This *used* to work when you had a small fleet and/or your network was mostly reliable (e.g. on a LAN) - not so good in a cloud
An atomic set that is assumed to succeed as a whole or not. What happens when it doesn't? A lot of complexity in failure conditions that need to be encapsulated and dealt with. Or more commonly, the approach is to drop this all off on the operator's lap and have them deal with it.
Modern orchestration systems try to get around this fundamental issue by creating more disposability and just throwing away larger and larger parts of the infrastructure. The theory goes, let’s get the exact right “new” setup first, and then cut over to it. The problem is that while this mostly works, it is an incredibly complicated and slow way to make changes – you’re saying that for every config change or deployment I have to stand up a whole new production environment and cut over everything to it? For example, how do I do things like quiesce writes to a database? I think this creates more complexity even though the interfaces seem really attractive.
Orchestration systems treat application components as dumb entities to be scheduled. Those entities don’t know about each other except through the orchestration system. This means that if components fail, they depend on the orchestration backplane (and here I’m picking on Kubernetes again) to manage their lifecycle. They also depend on the orchestration backplane to tell them where the other entities are (like where the database server is, if I’m the app server). The apps themselves are deliberately kept in the dark about their execution context.
Now remember, we’re running in the cloud now – a place where machines and networks can go down at any time. And we’re trying to build reliable applications on top of that unreliable fabric.
Now who does such a system design benefit? It only benefits the person or organization that is running the orchestration backplane – that is, if it’s external to the unreliable vagaries of the “cloud”. In other words, if it’s, say, a hosted service provided by your cloud vendor? Kubernetes and other orchestration systems soften you up for that approach so that when you run into the inherent resilience limitations, you outsource. Therefore I believe Google has never intended that you run a Kubernetes cluster on your own, but to buy it from someone (hopefully them) as a managed service. And don’t get me wrong, it’s an amazing business model, and, if you can offer your developers an experience on top of all this that’s just “push a container and it runs”, then that’s great. This is why there has been this Precambrian explosion of hosted Kubernetes solutions... Because these vendors know that this architectural model locks you into building applications on their platform. When your app is operationally dumb and the backplane is operationally smart, they have your money forever.
I don’t have that much to say about this one other than that orchestration systems or operations become really difficult to understand the more entities you’re trying to address. In particular because an orchestration activity (“play”) is intended to run to completion, atomically, trying to debug failures halfway through and figure out what to do is really hard. When things go wrong, it’s easier for the human brain to try and understand a small part of the system – where the fault is – rather than the entire global state. We know this with computer programming (“locality of reference”) and that’s why we have techniques like “information hiding” (i.e. abstracting logic).
We used to show this slide as part of old Opscode training materials when I first started at Chef. I’m sure you’ve seen slides like this before, where we talk about the # of nodes running applications, etc, and how they grow over time. While this is all true, I think these graphs neglect one key thing, which is not that the *quantity* of machines increases over time, but the fact that systems as a whole tend towards becoming more *distributed*. By "distributed" I mean that more of the computing runs at the "edge" if you will and not in a centralized way.
It’s not a straight line, though. <Talk through the build> Cloud: ML, databases, etc. – now starting to centralize more stuff into the cloud. The more that our systems become distributed, the less a centralized approach makes sense. This is true not only for data processing (why can’t it happen at the edge), but also to configuration updates and even software upgrades.
https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3 Tensorflow, Keras, React Native First version was centralized – too much latency So the final version runs an entire neural network on your phone.
Nike HyperAdapt shoe Number of devices continues to increase Machine Learning, Analytics, AI Latency becomes currency At-scale problems will re-emerge just like they did with Client/Server and the Web Distributed devices need distributed management
Sounds a lot like wherein we started with convergent configuration management and this guy, right? Everything old is new again.
Using SWIM rather than something like RAFT, because SWIM is masterless
This slide will be a build to show some of Habitat’s terminology, specifically: Service group Contains one or more entities that share a configuration template, but run the same workload Leaders and followers are in the same group Have a name Supervisors are responsible for [re-]writing configuration of the workload and restarting the process, possibly in coordination with other supervisors in that group Supervisors have a REST interface that allows you to modify their config (inject new configs as rumors into the network – they will be propagated. Can use any authorized supervisor as an entrypoint, doesn’t have to be the group we care about) External service groups can be subscribed to the configuration of this service group using binding Talk about communication protocol across the fleet – SWIM membership protocol/failure detector, with a gossip layer on top for distributed consensus Because we get asked a lot of questions about the protocol, it is an implementation of SWIM It's an implementation of SWIM+Infection+Suspicion for membership, and a ZeroMQ based newscast-inspired gossip protocol. Goals Eventually consistent. Over a long enough time horizon, every living member will converge on the same state. Reasonably efficient. The protocol avoids any back-chatter; messages are sent but never confirmed. Reliable. As a building block, it should be safe and reliable to use.
Config changes: injected into any peer, ACL is checked, and if accepted, gossiped around the network. No SPOF.

Pull, don’t push: Architectures for monitoring and configuration in a microservices era

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Pull, don’t push: Architectures for monitoring and configuration in a microservices era

Similar a Pull, don’t push: Architectures for monitoring and configuration in a microservices era (20)

Más de Sensu Inc.

Más de Sensu Inc. (15)

Último

Último (20)

Pull, don’t push: Architectures for monitoring and configuration in a microservices era

Notas del editor