2. This new Hype
• Docker, Docker, Docker, Docker, Docker, Docker...
• Kubernetes, Kubernetes, Kubernetes, Kubernetes,
Kubernetes, Kubernetes,
• O11y, o11y, o11y, o11y, o11y, o11y, o11y,
• We’re all doing this, right ?
• This is the new default, right . ?
O11y 2
3. A real life story :
Large Government agency
• Large CheckMk setup
• Lots of custom checks
• No automation, checks are created manually
• Custom CMDB, which is out of sync with reality
O11y 3
4. The Unhappy on call (SRE) team
• Happy with their tool, not happy with
• Being left out of the information loop ( e.g a when service
would be decommissioned )
• No known ownership for services
• Management wants “observability”
O11y 4
5. Their Plan
• Start from scratch
• Move to Prometheus (<- insert shiny new tool here)
• One year effort with a focus on the new technology stacks
(k8s)
• Then migrate the old monitoring
O11y 5
6. Result
• Old tool is still primary alerting tooling
• Rather than moving forward they added another tool to
manage
• Now they managed 2 stacks
• No real observability ever happened
• 12 months later , the prometheus stack is unmaintained
O11y 6
7. This is NOT an isolated case
• Encountered multiple similar cases,
• Pattern :
• while true ; do
• This Tool stinks, Lets do this over again and with a new tool.
• We implement exactly the same broken setup but with a
different tool
O11y 7
8. Where is the real observability ?
• Often we have metrics
• But only for a week
• Often we lost our long term metrics
• Often we have logs
• But no derived metrics
• We are only alerting on those metrics
• We are not learning from our metrics
• We’ve regressed
O11y 8
9. What’s your goal in observability
• We expect performance problems
• We really have performance problems
• We have chaos , better insights in what we run
• Gartner told us .
• We need more Hipster Credits
• We just want Prometheus, Loki and tempo
O11y 9
10. First Steps
• Fix your monitoring
• Create Single Source of Truth
• No manual Monitoring Confiuration (Automation)
• Create clear and Actionable Alerts
• Keep it GREEN
O11y 10
11. Fix your metrics / logs ...
• Fix your metrics
• I bet you have regression on shipping your Metrics
• I bet you logshipping is partially broken
• I bet you have broken dashboards
O11y 11
12. Ask
• Who wants Observability ?
• Devs / Management / Ops ?
• What do they really want ?
• Get them in one room
• Ask them what is really hurting them ?
• Where they need help ...
• Listen,
• This sounds trivial .. yet over 10 years of devops and still ...
O11y 12
13. What is still missing ?
• Probably nothing
• This might be sufficient for your use cases.
• Except if it isn’t.
• You might need traces
O11y 13
14. The Tooling Ecosystem
• Choose an Open source Observability Stack
• Beware of the Fauxpen Source
• Build your automated Observability Infrastructure
• Monitor it
• Pick a Project to start investigating.
• Build dashboards together with your peers.
O11y 14
15. Will this fit my ecosystem ?
• My proprietary vendor claims it works out of the box.
• But my developers say it doesn’t.
• Trust your devs ;)
O11y 15
17. Pitfalls of Observability
• You will DDOS Yourselves
• promquery for all MySQL parameters from MySQL exporter
• Flood your disks , kill your LTS
• Trace all the things
• You will DDOS Yourselves
O11y 17
18. Remember
• You might not need Observability (yet)
• But you DO need to fix your monitoring
• And then you can think about o11y
• But just adopting o11y, will not fix your broken culture.
O11y 18
19. Kris Buytaert
• I used to be a developer
• Then I became an Ops person
• Chief Trolling/Travel/Technical Officer @ Inuits.eu
• Chief Yak Shaver @ o11y.eu
• Organiser of #devopsdays, #cfgmgmtcamp, #loadays, ...
• Cofounder of all of the above
• Everything is a Freaking DNS Problem
• DNS : devops needs sushi
• @krisbuytaert on twitter/github
O11y 19
20. Kris Buytaert @krisbuytaert kris@inuits.eu
o11y, a subbdivision of Inuits
Essensteenweg 31 2930 Brasschaat Belgium
info@o11y.eu
O11y 20