SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Sensu and Sensibility 
Tomas 
Doran 
@bobtfish 
2014-­‐09-­‐23
2 
Sensu and Sensibility 
I’m part of the SRE team at Yelp. 
One of my jobs is “don’t break the site, ever” 
Another job is to enable developer productivity and fast innovation. 
These two things can be in conflict.
Cycle of failure and 
disappointment 
• Manually edited and deployed monitoring 
• Changes require two teams 
• Low developer visibility about production 
3 
This talk is about one particular instance of this conflict - monitoring. 
We used nagios. It sucked. This is half to do with nagios, half to do with the way we used it.
4 
This leads to developers being separated from production. 
Pager details out of date. Not all hosts running a service monitored as services move. 
Permissions issues so developers can’t ack alerts. No sane acks system.
Cycle of failure and 
disappointment 
• Manually edited and deployed monitoring 
• Changes require two teams 
• Low developer visibility about production 
• Escalation of issues is hard 
• Ops ignore alerts from services 
• Postmortems 
5 
Ops have a lot of pain too. Alerts are too noisy, when they’re for services we can’t triage them. Host issues end up with ops sending email to developers@ 
and praying. 
Ops get alert fatigue, stuff gets missed, everything is terrible
6 
If monitoring is ‘ops problem’, everything looks on fire all the time. 
It’s very hard to know what’s actually broken. 
Lack of situational awareness, expecting broken windows stops people taking responsibility.
Cycle of failure and 
disappointment 
• Manually edited and deployed monitoring 
• Changes require two teams 
• Low developer visibility about production 
• Escalation of issues is hard 
• Ops ignore alerts from services 
• Postmortems 
• High friction, low trust, low visibility. 
7 
Both sides are actually being reasonable. 
This isn’t even a Hanlon’s razor situation - everyone is really trying.
“Normality” 
8 
-­‐ 
http://gunshowcomic.com/648 
It’s just the way we’ve built our monitoring system is killing us with a thousand cuts. 
And we’ve got Stockholm syndrome.
“Normality” 
dysfunctional 
9 
This is 
-­‐ 
http://gunshowcomic.com/648 
I’m painting a bleak picture here - not actually saying that everything was _this_ bad in our organization. 
But these were the types of problems we identified.
10 
Sensibility 
Monitoring is about enabling communication.
11 
Sensibility 
One of our core competencies is getting monitoring right! 
So, we decided to change everything!!!!1111
“51 % viewed their ERP implementation as 
unsuccessful” 
12 
The Robbins-Gioia Survey (2001) 
Why the hell would we do that? It’s clearly a massive project
The Conference Board Survey (2001) 
“40 % of the projects failed to achieve their 
business case within one year of going live” 
13 
And pretty high risk. 
If we screw the monitoring up, well, lets just not do that?
McKinsey & Company in conjunction 
with the University of Oxford (2012) 
• “17 percent of large IT projects go so 
badly that they can threaten the very 
existence of the company” 
• “On average, large IT projects run 45 
percent over budget and 7 percent over 
time, while delivering 56 percent less 
value than predicted” 
14 
This is actually really scary..
Failure is an option 
-­‐ 
blog.parasoft.com/single-­‐greatest-­‐barrier-­‐with-­‐sw-­‐delivery 
15 
You’re not gonna get it right first time 
Different teams want to work in different ways. 
Different environments are different 
How do you test your monitoring system?
Sensibility 
16 
Large team + many teams - decentralized (multiple time zones for some teams) 
Integration - we can’t pick a product off of a shelf (and get the level of value we need)
17 
Sensibility 
No big bang change, has to be incremental. 
We don’t know what our requirements are (beyond that the current system doesn’t meet them) 
Iteration is absolutely key to project success
Why Sensu? 
• Designed to be pluggable / extensible 
• Arbitrary check metadata 
• Simple model 
• Components do exactly one thing 
• Ruby 
• Not afraid to extend (or fork!) 
18 
So why did we choose Sensu - Nagios is workable, right? 
Want to work with the monitoring system to integrate it into our infra, not hack around it.
‘industry standard’ 
‘enterprise class’ 
19 
So we do have / did have nagios. 
It’s workable. In fact, it works fine, and scales pretty well (to a point). 
This is not a hate on nagios. It _could_ do all the things I talk about here….
Cheap shot 
20 
It’s ugly
21 
It tries to solve the full-stack monitoring problem. 
We’d already migrated most contacting to pager duty, rest to follow. 
Half the objects useless to us. Monolithic.
status.dat 
cmd.dat 
22 
The data formats are gross.
cmd.dat 
23
24 
Centralized 
Ephemeral clients are a problem. 
Whitelisting (needing to explicitly add hosts/services) is a problem 
Exported resources are horrible (slow + bad for ephemeral envs)
25 
To be fair, this diagram does Sensu no favors at all :)
How we use Sensu 
• Don’t use all of this! 
• ‘Standalone’ checks only 
• Default in the puppet module 
26 
We don’t use it like this, much simpler model!
Sensu data flow 
• Sensu client runs checks on each machine 
• Pushes results to RabbitMQ 
• Clustered, clients/messages will fail over. 
• Sensu server (multiple, ha) 
• Processes check results, invokes handlers 
• Writes state to redis 
• Redis + sentinel 
• Read by API (2 instances) 
• All layers behind haproxy 
27
Quis custodiet ipsos custodes? 
28 
“Sensu 
has 
so 
many 
moving 
parts 
that 
I 
wouldn’t 
be 
able 
to 
sleep 
at 
night 
unless 
I 
set 
up 
a 
Nagios 
instance 
to 
make 
sure 
they 
were 
all 
running.” 
Nagios does all of these things, itself. 
With no introspection - ‘how deep are my queues, why are things not getting scheduled’
Mutually assured monitoring 
• Multiple independent Sensu installs (per-datacenter) 
• Monitor each other! 
29 
We have a big environment, we run a Sensu per DC, they can monitor each other.
Machine readable config 
• /etc/sensu/conf.d/checks/check_name.json 
• Extensible with arbitrary metadata 
• Hash merge 
• Never edit by hand! 
30 
One of (IMO) the nice decisions is the use of JSON for config. 
JSON is a terrible format for hand-edited config, but we deploy all the config by puppet.
monitoring_check 
monitoring_check { 'systems-apache-external': 
page => true, 
command => "/usr/lib/nagios/plugins/ 
check_tcp -H ${external_ip_address} -p 443", 
check_every => ‘5m', 
alert_after => '30m', 
realert_every => 10, 
runbook => 'y/apache', 
} 
31 
This is our interface to Sensu in puppet. 
It’s a custom define which applies our business rules.
monitoring_check 
monitoring_check { 'systems-apache-external': 
page => true, 
command => "/usr/lib/nagios/plugins/ 
check_tcp -H ${external_ip_address} -p 443", 
check_every => ‘5m', 
alert_after => '30m', 
realert_every => 10, 
runbook => 'y/apache', 
} 
32 
Default to not paging people (for sanity), but turn that on easily. 
Automatically uses the default team (whoever owns the box). Can be overridden.
monitoring_check 
monitoring_check { 'systems-apache-external': 
page => true, 
command => "/usr/lib/nagios/plugins/ 
check_tcp -H ${external_ip_address} -p 443", 
check_every => ‘5m', 
alert_after => '30m', 
realert_every => 10, 
runbook => 'y/apache', 
} 
33 
We didn’t like Sensu’s alert scheduling logic. So we rewrote it :) (This is easy - just in the base class)
monitoring_check 
monitoring_check { 'systems-apache-external': 
page => true, 
command => "/usr/lib/nagios/plugins/ 
check_tcp -H ${external_ip_address} -p 443", 
check_every => ‘5m', 
alert_after => '30m', 
realert_every => 10, 
runbook => 'y/apache', 
} 
34 
Mandatory documentation!
sensu::check 
• monitoring_check wraps this 
• Writes a JSON file for each check 
• Comment safe 
35 
We do use the Sensu official puppet module. 
“Comment safe” - if you comment the puppet code out, the check goes away. 
Working on auto-resolving checks that are deleted now!
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
36 
This is what an actual auto generated check JSON looks like 
BIG BLOB OF JSON! 
Don’t stress, we’ll work through it.
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
37 
This looks the same for all of our Sensu checks. 
This is the using ‘simple mode’ and turning off half the features - servers can’t/don’t trigger checks on clients, it’s all client scheduled
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
38 
These are custom (in our base handler) - as noted before in the define. 
Times are converted to seconds (in puppet) so that all time intervals in JSON are seconds.
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
39 
Every check has to have a run book!
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
40 
Generated by a custom function. 
Goes up the parser stack and finds where it was called from.
"disk_ro_mounts": { 
"standalone": true, "handlers": [“default"], "subscribers": [], 
"command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", 
"interval": 60, 
"alert_after": 0, "realert_every": “-1", 
"dependencies": [], 
"runbook": "http://lmgtfy.com/?q=linux+read+only+disk", 
"annotation": "https://gitweb.yelpcorp.com/? 
p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", 
"team": "operations", 
"irc_channels": "operations-notifications", 
"notification_email": "undef", 
"ticket": true, 
"project": “OPS”, 
"page": false, 
"tip": false 
} 
41 
This stuff (more than half the check!) is the custom metadata 
Every alert has a team owning it. 
We can report in irc, JIRA, email (why? but some people do want this) or page!
Check scripts 
• Same as nagios checks 
• Simple (text) output 
• Exit code 
• Result sent to server, along with check definition 
• Including all the custom metadata 
• Our handlers use the extra data. 
42 
So, to recap - checks are scheduled and run on the client. 
It pushes the results to RabbitMQ, sends it’s results and definitions to the server. 
This is then all piped to the handlers setup.
Handlers 
• base 
• JIRA 
• email 
• irc 
• pagerduty 
• awsprune 
43
How do checks get run? 
• Every machine runs the client. 
• Client managed by puppet 
• Client has a TCP socket you can send JSON to 
• Custom checks + pysensu-yelp 
44 
Check scripts are simple (as per nagios). Can write them in shell/ruby/python/whatever. 
More complex things can send data to the local socket. We have a python library for this (also use the ruby libraries from the sensu project)
45 
Sensu servers know which machine is the master right now (their own leadership election). 
Deploy some checks to sensu servers (e.g. cloudwatch checks!), run on the master. 
Fake hostname!
Situational awareness 
46 
Send alerts about dev box resource usage to the developers using that box. 
Why page OPS as a developer used 90% of the disk?
Single source of truth 
• DNS is canonical for sensu servers 
• Configure things in one place! 
47 
One place can be DNS, or hiera, or whatever - but not multiple places. 
DNS AND hiera sucks
Single source of truth 
• DNS is canonical for sensu servers 
• Configure things in one place! 
48 
puppet-netstdlib 
structured facts
Automatic monitoring 
• E.g. cron jobs - check successful recently! 
• cron::d 
49 
There are a bunch of general patterns where you can automate monitoring. 
Who hates ‘cron spam’? 
We use a custom define which defaults to /dev/null 
Check jobs completed successfully (with Sensu) - make JIRA tickets!
Automatic monitoring 
• E.g. cron jobs - check successful recently! 
• cron::d 
50 
Generic handling! 
Annotations!
Generate monitoring_check 
51 
And under the hood this runs create_resources to generate monitoring_checks 
create_resources is your friend!
User specified monitoring 
52 
This is a cunning one. 
The check returns OK (assuming it can hit graphite), but also emits a bunch of additional check results to the local socket
User specified monitoring 
53 
• Data lives in the service config 
• Next to the code to emit metrics! 
This is awesome, as it reads our service configs. 
Developers can add their own alerts.
• Simple checks for free! 
54 
User specified monitoring 
This example is in ruby :)
User specified monitoring 
• Data lives in the service config 
• Next to the code to emit metrics 
• Next to metadata about SLAs and LB timeouts 
• Developers can push without OPS 
55 
Allowing developers to add their own monitoring is awesome. 
Putting the config for the monitoring in their application codebase is awesome.
Cluster checks 
• We’re working on this currently 
• Assert some % of machines are healthy. 
• Use to reduce alert noise. 
• If a service becomes fully unavailable to clients, 
you want to page someone. 
• If one machine goes belly up, you don’t (make 
a JIRA ticket for handling later!) 
56
WIP 
• This is all still a work in progress. 
• We’ve not 100% migrated off of Nagios 
• Open sourcing the pieces 
57
Thanks! 
• Slides will be online shortly: 
• slideshare.net/bobtfish 
• @bobtfish 
• Some (most?) of our code is open source: 
• https://github.com/Yelp/sensu/commit/ 
aa5c43c2fdfde5e8739952c0b8082000934f3ad2 
• https://github.com/Yelp/puppet-monitoring_check 
• https://github.com/Yelp/puppet-netstdlib 
• https://github.com/Yelp/sensu_handlers 
• https://github.com/Yelp/pysensu-yelp 
58

Más contenido relacionado

La actualidad más candente

Continuous Security Testing in a Devops World #OWASPHelsinki
Continuous Security Testing in a Devops World #OWASPHelsinkiContinuous Security Testing in a Devops World #OWASPHelsinki
Continuous Security Testing in a Devops World #OWASPHelsinkiStephen de Vries
 
DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.Matt Tesauro
 
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...Matt Tesauro
 
Auditing Drupal Sites
Auditing Drupal SitesAuditing Drupal Sites
Auditing Drupal SitesExove
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable productJulian Simpson
 
How to Build and Maintain Quality Drupal Sites with Automated Testing
How to Build and Maintain Quality Drupal Sites with Automated TestingHow to Build and Maintain Quality Drupal Sites with Automated Testing
How to Build and Maintain Quality Drupal Sites with Automated TestingAcquia
 
Taking the Best of Agile, DevOps and CI/CD into security
Taking the Best of Agile, DevOps and CI/CD into securityTaking the Best of Agile, DevOps and CI/CD into security
Taking the Best of Agile, DevOps and CI/CD into securityMatt Tesauro
 
Continuous integration using Jenkins and Sonar
Continuous integration using Jenkins and SonarContinuous integration using Jenkins and Sonar
Continuous integration using Jenkins and SonarPascal Larocque
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyMike Brittain
 
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterTaking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterMatt Tesauro
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandMatt Tesauro
 
Continuous integration
Continuous integrationContinuous integration
Continuous integrationhugo lu
 
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly Davidoff
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly DavidoffDevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly Davidoff
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly DavidoffDevSecCon
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
BsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devopsBsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devopsJames '​-- Mckinlay
 
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan Roman
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan RomanDevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan Roman
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan RomanDevSecCon
 
DevSecCon London 2017: Hands-on secure software development from design to de...
DevSecCon London 2017: Hands-on secure software development from design to de...DevSecCon London 2017: Hands-on secure software development from design to de...
DevSecCon London 2017: Hands-on secure software development from design to de...DevSecCon
 
Automating security tests for Continuous Integration
Automating security tests for Continuous IntegrationAutomating security tests for Continuous Integration
Automating security tests for Continuous IntegrationStephen de Vries
 
SecDevOps: The New Black of IT
SecDevOps: The New Black of ITSecDevOps: The New Black of IT
SecDevOps: The New Black of ITCloudPassage
 

La actualidad más candente (20)

Continuous Security Testing in a Devops World #OWASPHelsinki
Continuous Security Testing in a Devops World #OWASPHelsinkiContinuous Security Testing in a Devops World #OWASPHelsinki
Continuous Security Testing in a Devops World #OWASPHelsinki
 
DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.DevSecOps Fundamentals and the Scars to Prove it.
DevSecOps Fundamentals and the Scars to Prove it.
 
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
 
Auditing Drupal Sites
Auditing Drupal SitesAuditing Drupal Sites
Auditing Drupal Sites
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
 
Drupal 7 ci and testing
Drupal 7 ci and testingDrupal 7 ci and testing
Drupal 7 ci and testing
 
How to Build and Maintain Quality Drupal Sites with Automated Testing
How to Build and Maintain Quality Drupal Sites with Automated TestingHow to Build and Maintain Quality Drupal Sites with Automated Testing
How to Build and Maintain Quality Drupal Sites with Automated Testing
 
Taking the Best of Agile, DevOps and CI/CD into security
Taking the Best of Agile, DevOps and CI/CD into securityTaking the Best of Agile, DevOps and CI/CD into security
Taking the Best of Agile, DevOps and CI/CD into security
 
Continuous integration using Jenkins and Sonar
Continuous integration using Jenkins and SonarContinuous integration using Jenkins and Sonar
Continuous integration using Jenkins and Sonar
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things BetterTaking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
Taking AppSec to 11: AppSec Pipeline, DevOps and Making Things Better
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP Switzerland
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly Davidoff
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly DavidoffDevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly Davidoff
DevSecCon Tel Aviv 2018 - End2End containers SSDLC by Vitaly Davidoff
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
BsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devopsBsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devops
 
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan Roman
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan RomanDevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan Roman
DevSecCon Tel Aviv 2018 - Integrated Security Testing by Morgan Roman
 
DevSecCon London 2017: Hands-on secure software development from design to de...
DevSecCon London 2017: Hands-on secure software development from design to de...DevSecCon London 2017: Hands-on secure software development from design to de...
DevSecCon London 2017: Hands-on secure software development from design to de...
 
Automating security tests for Continuous Integration
Automating security tests for Continuous IntegrationAutomating security tests for Continuous Integration
Automating security tests for Continuous Integration
 
SecDevOps: The New Black of IT
SecDevOps: The New Black of ITSecDevOps: The New Black of IT
SecDevOps: The New Black of IT
 

Destacado

What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...
What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...
What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...Puppet
 
Centralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructureCentralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructureMOHD ARISH
 
Monitoring with sensu
Monitoring with sensuMonitoring with sensu
Monitoring with sensumiquelruizm
 
Monitoring your VM's at Scale
Monitoring your VM's at ScaleMonitoring your VM's at Scale
Monitoring your VM's at ScaleKris Buytaert
 
ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介VirtualTech Japan Inc.
 
Continuous Security Testing with Devops - OWASP EU 2014
Continuous Security Testing  with Devops - OWASP EU 2014Continuous Security Testing  with Devops - OWASP EU 2014
Continuous Security Testing with Devops - OWASP EU 2014Stephen de Vries
 
LasCon 2014 DevOoops
LasCon 2014 DevOoops LasCon 2014 DevOoops
LasCon 2014 DevOoops Chris Gates
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeawaysHavas
 

Destacado (10)

What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...
What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...
What Developers and Operations Can Learn from Design: 6 Ways to Work Better T...
 
Centralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructureCentralized monitoring station for it computing and network infrastructure
Centralized monitoring station for it computing and network infrastructure
 
sensu
sensusensu
sensu
 
Monitoring with sensu
Monitoring with sensuMonitoring with sensu
Monitoring with sensu
 
Sensu Monitoring
Sensu MonitoringSensu Monitoring
Sensu Monitoring
 
Monitoring your VM's at Scale
Monitoring your VM's at ScaleMonitoring your VM's at Scale
Monitoring your VM's at Scale
 
ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介
 
Continuous Security Testing with Devops - OWASP EU 2014
Continuous Security Testing  with Devops - OWASP EU 2014Continuous Security Testing  with Devops - OWASP EU 2014
Continuous Security Testing with Devops - OWASP EU 2014
 
LasCon 2014 DevOoops
LasCon 2014 DevOoops LasCon 2014 DevOoops
LasCon 2014 DevOoops
 
SXSW 2016 takeaways
SXSW 2016 takeawaysSXSW 2016 takeaways
SXSW 2016 takeaways
 

Similar a “Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #monitoringlove - PuppetConf 2014

Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Tomas Doran
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTYScyllaDB
 
Sensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourSensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourKyle Anderson
 
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebula Project
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installationsNETWAYS
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutionsCloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutionsYevgeniy Brikman
 
Making Security Agile - Oleg Gryb
Making Security Agile - Oleg GrybMaking Security Agile - Oleg Gryb
Making Security Agile - Oleg GrybSeniorStoryteller
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018Christophe Rochefolle
 
Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Arto Santala
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyKelly Looney
 
Troubleshooting: A High-Value Asset For The Service-Provider Discipline
Troubleshooting: A High-Value Asset For The Service-Provider DisciplineTroubleshooting: A High-Value Asset For The Service-Provider Discipline
Troubleshooting: A High-Value Asset For The Service-Provider DisciplineSagi Brody
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface DevicePositive Hack Days
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsPeter Mounce
 

Similar a “Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #monitoringlove - PuppetConf 2014 (20)

Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTY
 
Sensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided TourSensu @ Yelp!: A Guided Tour
Sensu @ Yelp!: A Guided Tour
 
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutionsCloud adoption fails - 5 ways deployments go wrong and 5 solutions
Cloud adoption fails - 5 ways deployments go wrong and 5 solutions
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
Making Security Agile - Oleg Gryb
Making Security Agile - Oleg GrybMaking Security Agile - Oleg Gryb
Making Security Agile - Oleg Gryb
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
 
Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
 
Troubleshooting: A High-Value Asset For The Service-Provider Discipline
Troubleshooting: A High-Value Asset For The Service-Provider DisciplineTroubleshooting: A High-Value Asset For The Service-Provider Discipline
Troubleshooting: A High-Value Asset For The Service-Provider Discipline
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
What DevOps Isn't
What DevOps Isn'tWhat DevOps Isn't
What DevOps Isn't
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface Device
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 

Más de Puppet

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet
 
Puppetcamp r10kyaml
Puppetcamp r10kyamlPuppetcamp r10kyaml
Puppetcamp r10kyamlPuppet
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)Puppet
 
Puppet camp vscode
Puppet camp vscodePuppet camp vscode
Puppet camp vscodePuppet
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twentiesPuppet
 
Applying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codeApplying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codePuppet
 
KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approachPuppet
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationPuppet
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliancePuppet
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowPuppet
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Puppet
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppetPuppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping groundPuppet
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy SoftwarePuppet
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User GroupPuppet
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsPuppet
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyPuppet
 

Más de Puppet (20)

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepo
 
Puppetcamp r10kyaml
Puppetcamp r10kyamlPuppetcamp r10kyaml
Puppetcamp r10kyaml
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)
 
Puppet camp vscode
Puppet camp vscodePuppet camp vscode
Puppet camp vscode
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twenties
 
Applying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codeApplying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance code
 
KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approach
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliance
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden Windows
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael Pinson
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User Group
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #monitoringlove - PuppetConf 2014

  • 1. Sensu and Sensibility Tomas Doran @bobtfish 2014-­‐09-­‐23
  • 2. 2 Sensu and Sensibility I’m part of the SRE team at Yelp. One of my jobs is “don’t break the site, ever” Another job is to enable developer productivity and fast innovation. These two things can be in conflict.
  • 3. Cycle of failure and disappointment • Manually edited and deployed monitoring • Changes require two teams • Low developer visibility about production 3 This talk is about one particular instance of this conflict - monitoring. We used nagios. It sucked. This is half to do with nagios, half to do with the way we used it.
  • 4. 4 This leads to developers being separated from production. Pager details out of date. Not all hosts running a service monitored as services move. Permissions issues so developers can’t ack alerts. No sane acks system.
  • 5. Cycle of failure and disappointment • Manually edited and deployed monitoring • Changes require two teams • Low developer visibility about production • Escalation of issues is hard • Ops ignore alerts from services • Postmortems 5 Ops have a lot of pain too. Alerts are too noisy, when they’re for services we can’t triage them. Host issues end up with ops sending email to developers@ and praying. Ops get alert fatigue, stuff gets missed, everything is terrible
  • 6. 6 If monitoring is ‘ops problem’, everything looks on fire all the time. It’s very hard to know what’s actually broken. Lack of situational awareness, expecting broken windows stops people taking responsibility.
  • 7. Cycle of failure and disappointment • Manually edited and deployed monitoring • Changes require two teams • Low developer visibility about production • Escalation of issues is hard • Ops ignore alerts from services • Postmortems • High friction, low trust, low visibility. 7 Both sides are actually being reasonable. This isn’t even a Hanlon’s razor situation - everyone is really trying.
  • 8. “Normality” 8 -­‐ http://gunshowcomic.com/648 It’s just the way we’ve built our monitoring system is killing us with a thousand cuts. And we’ve got Stockholm syndrome.
  • 9. “Normality” dysfunctional 9 This is -­‐ http://gunshowcomic.com/648 I’m painting a bleak picture here - not actually saying that everything was _this_ bad in our organization. But these were the types of problems we identified.
  • 10. 10 Sensibility Monitoring is about enabling communication.
  • 11. 11 Sensibility One of our core competencies is getting monitoring right! So, we decided to change everything!!!!1111
  • 12. “51 % viewed their ERP implementation as unsuccessful” 12 The Robbins-Gioia Survey (2001) Why the hell would we do that? It’s clearly a massive project
  • 13. The Conference Board Survey (2001) “40 % of the projects failed to achieve their business case within one year of going live” 13 And pretty high risk. If we screw the monitoring up, well, lets just not do that?
  • 14. McKinsey & Company in conjunction with the University of Oxford (2012) • “17 percent of large IT projects go so badly that they can threaten the very existence of the company” • “On average, large IT projects run 45 percent over budget and 7 percent over time, while delivering 56 percent less value than predicted” 14 This is actually really scary..
  • 15. Failure is an option -­‐ blog.parasoft.com/single-­‐greatest-­‐barrier-­‐with-­‐sw-­‐delivery 15 You’re not gonna get it right first time Different teams want to work in different ways. Different environments are different How do you test your monitoring system?
  • 16. Sensibility 16 Large team + many teams - decentralized (multiple time zones for some teams) Integration - we can’t pick a product off of a shelf (and get the level of value we need)
  • 17. 17 Sensibility No big bang change, has to be incremental. We don’t know what our requirements are (beyond that the current system doesn’t meet them) Iteration is absolutely key to project success
  • 18. Why Sensu? • Designed to be pluggable / extensible • Arbitrary check metadata • Simple model • Components do exactly one thing • Ruby • Not afraid to extend (or fork!) 18 So why did we choose Sensu - Nagios is workable, right? Want to work with the monitoring system to integrate it into our infra, not hack around it.
  • 19. ‘industry standard’ ‘enterprise class’ 19 So we do have / did have nagios. It’s workable. In fact, it works fine, and scales pretty well (to a point). This is not a hate on nagios. It _could_ do all the things I talk about here….
  • 20. Cheap shot 20 It’s ugly
  • 21. 21 It tries to solve the full-stack monitoring problem. We’d already migrated most contacting to pager duty, rest to follow. Half the objects useless to us. Monolithic.
  • 22. status.dat cmd.dat 22 The data formats are gross.
  • 24. 24 Centralized Ephemeral clients are a problem. Whitelisting (needing to explicitly add hosts/services) is a problem Exported resources are horrible (slow + bad for ephemeral envs)
  • 25. 25 To be fair, this diagram does Sensu no favors at all :)
  • 26. How we use Sensu • Don’t use all of this! • ‘Standalone’ checks only • Default in the puppet module 26 We don’t use it like this, much simpler model!
  • 27. Sensu data flow • Sensu client runs checks on each machine • Pushes results to RabbitMQ • Clustered, clients/messages will fail over. • Sensu server (multiple, ha) • Processes check results, invokes handlers • Writes state to redis • Redis + sentinel • Read by API (2 instances) • All layers behind haproxy 27
  • 28. Quis custodiet ipsos custodes? 28 “Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.” Nagios does all of these things, itself. With no introspection - ‘how deep are my queues, why are things not getting scheduled’
  • 29. Mutually assured monitoring • Multiple independent Sensu installs (per-datacenter) • Monitor each other! 29 We have a big environment, we run a Sensu per DC, they can monitor each other.
  • 30. Machine readable config • /etc/sensu/conf.d/checks/check_name.json • Extensible with arbitrary metadata • Hash merge • Never edit by hand! 30 One of (IMO) the nice decisions is the use of JSON for config. JSON is a terrible format for hand-edited config, but we deploy all the config by puppet.
  • 31. monitoring_check monitoring_check { 'systems-apache-external': page => true, command => "/usr/lib/nagios/plugins/ check_tcp -H ${external_ip_address} -p 443", check_every => ‘5m', alert_after => '30m', realert_every => 10, runbook => 'y/apache', } 31 This is our interface to Sensu in puppet. It’s a custom define which applies our business rules.
  • 32. monitoring_check monitoring_check { 'systems-apache-external': page => true, command => "/usr/lib/nagios/plugins/ check_tcp -H ${external_ip_address} -p 443", check_every => ‘5m', alert_after => '30m', realert_every => 10, runbook => 'y/apache', } 32 Default to not paging people (for sanity), but turn that on easily. Automatically uses the default team (whoever owns the box). Can be overridden.
  • 33. monitoring_check monitoring_check { 'systems-apache-external': page => true, command => "/usr/lib/nagios/plugins/ check_tcp -H ${external_ip_address} -p 443", check_every => ‘5m', alert_after => '30m', realert_every => 10, runbook => 'y/apache', } 33 We didn’t like Sensu’s alert scheduling logic. So we rewrote it :) (This is easy - just in the base class)
  • 34. monitoring_check monitoring_check { 'systems-apache-external': page => true, command => "/usr/lib/nagios/plugins/ check_tcp -H ${external_ip_address} -p 443", check_every => ‘5m', alert_after => '30m', realert_every => 10, runbook => 'y/apache', } 34 Mandatory documentation!
  • 35. sensu::check • monitoring_check wraps this • Writes a JSON file for each check • Comment safe 35 We do use the Sensu official puppet module. “Comment safe” - if you comment the puppet code out, the check goes away. Working on auto-resolving checks that are deleted now!
  • 36. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 36 This is what an actual auto generated check JSON looks like BIG BLOB OF JSON! Don’t stress, we’ll work through it.
  • 37. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 37 This looks the same for all of our Sensu checks. This is the using ‘simple mode’ and turning off half the features - servers can’t/don’t trigger checks on clients, it’s all client scheduled
  • 38. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 38 These are custom (in our base handler) - as noted before in the define. Times are converted to seconds (in puppet) so that all time intervals in JSON are seconds.
  • 39. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 39 Every check has to have a run book!
  • 40. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 40 Generated by a custom function. Goes up the parser stack and finds where it was called from.
  • 41. "disk_ro_mounts": { "standalone": true, "handlers": [“default"], "subscribers": [], "command": "/usr/lib/nagios/plugins/yelp/check_ro_mounts", "interval": 60, "alert_after": 0, "realert_every": “-1", "dependencies": [], "runbook": "http://lmgtfy.com/?q=linux+read+only+disk", "annotation": "https://gitweb.yelpcorp.com/? p=puppet.git;a=blob;f=modules/profile/manifests/server.pp#l80", "team": "operations", "irc_channels": "operations-notifications", "notification_email": "undef", "ticket": true, "project": “OPS”, "page": false, "tip": false } 41 This stuff (more than half the check!) is the custom metadata Every alert has a team owning it. We can report in irc, JIRA, email (why? but some people do want this) or page!
  • 42. Check scripts • Same as nagios checks • Simple (text) output • Exit code • Result sent to server, along with check definition • Including all the custom metadata • Our handlers use the extra data. 42 So, to recap - checks are scheduled and run on the client. It pushes the results to RabbitMQ, sends it’s results and definitions to the server. This is then all piped to the handlers setup.
  • 43. Handlers • base • JIRA • email • irc • pagerduty • awsprune 43
  • 44. How do checks get run? • Every machine runs the client. • Client managed by puppet • Client has a TCP socket you can send JSON to • Custom checks + pysensu-yelp 44 Check scripts are simple (as per nagios). Can write them in shell/ruby/python/whatever. More complex things can send data to the local socket. We have a python library for this (also use the ruby libraries from the sensu project)
  • 45. 45 Sensu servers know which machine is the master right now (their own leadership election). Deploy some checks to sensu servers (e.g. cloudwatch checks!), run on the master. Fake hostname!
  • 46. Situational awareness 46 Send alerts about dev box resource usage to the developers using that box. Why page OPS as a developer used 90% of the disk?
  • 47. Single source of truth • DNS is canonical for sensu servers • Configure things in one place! 47 One place can be DNS, or hiera, or whatever - but not multiple places. DNS AND hiera sucks
  • 48. Single source of truth • DNS is canonical for sensu servers • Configure things in one place! 48 puppet-netstdlib structured facts
  • 49. Automatic monitoring • E.g. cron jobs - check successful recently! • cron::d 49 There are a bunch of general patterns where you can automate monitoring. Who hates ‘cron spam’? We use a custom define which defaults to /dev/null Check jobs completed successfully (with Sensu) - make JIRA tickets!
  • 50. Automatic monitoring • E.g. cron jobs - check successful recently! • cron::d 50 Generic handling! Annotations!
  • 51. Generate monitoring_check 51 And under the hood this runs create_resources to generate monitoring_checks create_resources is your friend!
  • 52. User specified monitoring 52 This is a cunning one. The check returns OK (assuming it can hit graphite), but also emits a bunch of additional check results to the local socket
  • 53. User specified monitoring 53 • Data lives in the service config • Next to the code to emit metrics! This is awesome, as it reads our service configs. Developers can add their own alerts.
  • 54. • Simple checks for free! 54 User specified monitoring This example is in ruby :)
  • 55. User specified monitoring • Data lives in the service config • Next to the code to emit metrics • Next to metadata about SLAs and LB timeouts • Developers can push without OPS 55 Allowing developers to add their own monitoring is awesome. Putting the config for the monitoring in their application codebase is awesome.
  • 56. Cluster checks • We’re working on this currently • Assert some % of machines are healthy. • Use to reduce alert noise. • If a service becomes fully unavailable to clients, you want to page someone. • If one machine goes belly up, you don’t (make a JIRA ticket for handling later!) 56
  • 57. WIP • This is all still a work in progress. • We’ve not 100% migrated off of Nagios • Open sourcing the pieces 57
  • 58. Thanks! • Slides will be online shortly: • slideshare.net/bobtfish • @bobtfish • Some (most?) of our code is open source: • https://github.com/Yelp/sensu/commit/ aa5c43c2fdfde5e8739952c0b8082000934f3ad2 • https://github.com/Yelp/puppet-monitoring_check • https://github.com/Yelp/puppet-netstdlib • https://github.com/Yelp/sensu_handlers • https://github.com/Yelp/pysensu-yelp 58