Talk by Lead Icinga 2 Developer Michael Friedrich at the Icinga meetup on 22nd of August at OÖ Gesundheitsholding, Goethestraße 89, 4020 Linz - https://www.ooeg.at/
4. Responsibilities Contact Personal
Icinga 2 Lead Developer
Community Manager
Vagrant Boxes
michael.friedrich@icinga.c
om
@dnsmichi on Twitter
A taste of Austria
#drageekeksi
#lego & #perryrhodan
Michael Friedrich
Chief Evangelist
7. Icinga 2 Core
Scalable infrastructure monitoring
icinga.com/docs/icinga2/
Combine high availability clusters with a
distributed setup, and you have a best
practice scenario for large and complex
environments.
Monitoring as code with dynamic
configurations.
Icinga 2 Core
8. Icinga Director
Our configuration and orchestration solution
icinga.com/docs/director/
The Director aims to be the favorite Icinga
config deployment tool.
Director is designed for those who want to
automate their configuration deployment
and those who want to grant their “point &
click” users easy access to the
configuration.
Icinga Director
9. Elasticsearch
Keep in touch with all your logs all the time
icinga.com/docs/elasticsearch/
The Elasticsearch module for Icinga Web
2 gives you access to this data, embedded
in your Icinga Web 2 interface.
Custom filters allow you to limit the data
that should be displayed. You can give your
users access to certain data types without
revealing everything stored in
Elasticsearch.
Module for Elasticsearch
10. Graphite for Icinga
Quick access to your monitoring metrics
icinga.com/docs/graphite/
Add graphs from your Graphite metrics
backend directly into the host/service detail
view.
This module also provides a new menu
section with two general overviews for
hosts and services.
Graphite for Icinga
12. Icinga Module for vSphere®
Analyze your VMware vSphere® infrastructure
icinga.com/docs/vspheredb/
The easiest way to monitor a VMware
vSphere® environment. Configure a
connection to your vCenter® or ESXi™
host and you're ready to go.
This module provides a lot of context, deep
insight and great oversight. Fast drill-down
possibilities, valuable hints and reports.
Icinga Module for vSphere®
27. 01
Reporting
03
Integrations
02
Icinga DB
Running projects
Next to feature & bugfix releases for Icinga 2 and Icinga Web 2
• User feedback from early adopter releases
• PDF templates – our trainee project
• Core: Writer feature to Redis
• IcingaDB: Daemon which syncs Redis & DB
• Web: New Monitoring module
• AWS Director Import
• Graphite
• Icingabeat
28. 01
Core
03
Integrations
02
Web
Future projects
To be defined in our strategy workshop – more at OSMC
• Performance: Embedded plugins
• DSL: Formatting
• Logging capabilities
• Metrics – plugin API
• Reporting based on IcingaDB
• Cloud modules
• Director packages & core feature
• Plugins: Windows
• Graphite, InfluxDB fields and tags
• Notifications, Events & Incidents
30. 01
Boost
03
HTTP API
02
I/O Engine
Network Stack
Rewrite core parts: The story.
https://github.com/Icinga/icinga2/issues/7041
Boost 1.66+ allows the usage of additional libraries for socket/network I/O,
thread pools and HTTP server/clients.
Package Boost on platforms which don’t have this in EPEL/Backports.
Status: Done
Replace the current TLS socket I/O implementation with custom event handling
(poll, epoll) with Boost ASIO.
Use IoBoundWork and CpuBoundWork thread pools.
Status: Done
Replace custom HTTP handling with Boost ASIO & Boost Beast.
Use Beast Buffers, HTTP verbs and more things for compile time errors, not
runtime.
Replace HTTP Clients (InfluxDB, Elasticsearch, CLI commands,
check_nscp_api) with Boost implementation.
Status: Done
Done
31. • Feature HA
https://github.com/Icinga/icinga2/issues/2941
• Elasticsearch, Graphite, InfluxDB, etc.
• Failover in HA zones
• Object authority update every 10s (was 30s)
• DB IDO failover_timeout 30s (was 60s)
• More logging
• Status: Done
01
HA & Failover
03
Runtime Objects
02
Configuration
Icinga 2.11
More goodness
Done
32. 01
HA & Failover
02
Cluster Config
Icinga 2
More goodness
03
Runtime Objects
• Story
• https://github.com/Icinga/icinga2/issues/6716
• Coming from #10000 😜 😜 😜 😜 😜
• Tackle existing problems
• Staged sync, no broken config after restart
• Don‘t include deleted zones on startup
• Deal with race conditions on sync
• Status: Done Done
33. 01
HA & Failover
03
Runtime Objects
Icinga 2.11
Runtime Objects in API config packages
02
Cluster
• Story: https://github.com/Icinga/icinga2/issues/7119
• Runtime objects (downtimes, etc.) are
missing after restart (broken config
package).
• Uses _api package internally
• Active-stage is read from disk every time
• Race condition: can be empty
• Incomplete object file path on disk
• Repair broken active stage (timer)
• Logs & troubleshooting docs
• Status: Done (since Friday)
Done
34. Crashes
Icinga 2.11
Fixes, crashes, and code quality – all done
Bugs
• Permission filters API crashes #6874 (ref/NC)
• Logrotate timer crash #6737
• Replay log not cleared #6932
• Windows agent 100% cpu/logging #3029
• JSON library: YAJL -> Nlohmann #6684
• UTF8 sanitizing #4703
• Boost Filesystem for I/O #7102
• Boost Asio Thread Pool (checks, etc.) #6988
Quality Done
35. Test
Icinga 2.11
Status in CW 30 – RC Week
Fix
• Customer issues
• Recovery notifications missing on restart (HA
paused problem)
• Problem notification after downtime ends
• Killed processes on reload, KillMode=mixed
• API
• TLS v1.2+ & hardened cipher lists
• Bugfixes
• Cluster staging checksums
• Unit tests unstable
Profit Done
36. Last
Icinga 2.11
Status in CW 30 – RC Week
minute
• Reload handling broken
• Systemd kills process groups after reload/stop
• CW28 decision: PoC and rewrite
• Umbrella process managing main+helper
• Bonus: Run in Docker w/o magic tricks
• https://icinga.com/docs/icinga2/snapshot/d
oc/19-technical-concepts/#core-reload-
handling
fixes Done
37. Docs
Icinga 2.11
Status in CW 30 – RC Week
=
• Docs: https://icinga.com/docs/icinga2/snapshot/
• Service Monitoring & Plugin API (our version!)
• Distributed: s/client/agent/ + images
• Basics: s/custom attribute/custom variable/
• Command Arguments
• Development docs for trainees
• Upgrading: https://icinga.com/docs/icinga2/snapshot/doc/16-
upgrading-icinga-2/
qa-- Done
38. 01
Ciphers
03
Reload process
02
Cluster sync
2.11 RC Feedback
https://github.com/Icinga/icinga2/issues/7380
Add ciphers for non-ECDH support (el7, Windows 2.10, Debian/Ubuntu).
We cannot patch older agents immediately. Added detailed
troubleshooting docs.
Binary sync is NOT supported. Detect and prevent this on the master with
UTF8 sanitizing. New checksums for config change detection would result in an
“always change loop” otherwise.
Fix logging for systemd errors, now prints config errors again.
41. 01
Analysis
03
Tests
02
Fix
Downtime Cluster Loop
https://github.com/Icinga/icinga2/issues/7198#issuecomment-521253984
It is not related to the object version but object activation/deactivation in
HA enabled cluster zones. Affects all config object create/delete ops.
Whenever config::UpdateObject and config::DeleteObject messages are sent,
ensure to pass the “origin” handler to config creation/deletion objects.
This ensures that ConfigObject->SetActive() resp. OnActiveChanged doesn’t
start “return to sender” with the cluster message.
Stressed HA-master with a long delay of messages (replay log and live).
Downtime which expires during a reload, ensure that the secondary master
processes CREATE/DELETE after the first has finally deleted the object.
All tests proof the fix working. Added into 2.11.
42. • Fork errors with “too many open files”
• Raise number of open files (systemd, Icinga)
• Main process has a pipe stream for the child
process output
• https://github.com/Icinga/icinga2/issues/7425
01
Concurrent Checks
03
Ideas
02
Spawn Helper
Performance
Max concurrent checks
43. • Process Spawn Helper creates child process
• Waits for events
• 4 IO threads, 1 process
• More IO threads and processes
• More context switches
• No real performance gain
01
Concurrent Checks
03
Ideas
02
Spawn Helper
Performance
Max concurrent checks
44. • Process class with Fibers & Coroutines
• Less thread context switches
• Combined with ASIO
• PoC in the works
• Embedded Perl
• Subroutines, caching
• Experimental tests
01
Concurrent Checks
03
Ideas
02
Spawn Helper
Performance
Max concurrent checks
45. 1061
Commits
17
Contributors
+43450
-27330
2.11 Metrics
https://github.com/Icinga/icinga2/compare/support%2F2.10...master?diff=split#files_bucket
Sep 2018: Start cluster config sync implementation.
Oct 2018: Feature HA.
Feb 2019: Network Stack Poc by Alexander Klimov
Mar 2019: 2.10.4
Apr 2019: Boost packages by Markus Frosch (includes infra move to GitLab)
Apr 2019: Windows wizard improvements by Michael Insel
Apr 2019: Ongoing Boost ASIO in features, CLI commands, testing
May 2019: Reload deactivates IDO hosts -> requested 2.10.5
May 2019: Merge fixes for broken _api package
May 2019: 2.10.5
Jun 2019: TLS 1.2 & cipher lists
Jun 2019: Finish and merge cluster config sync
Jul 2019: Rewrite failing unit tests for TPs
Jul 2019: Re-send suppressed notifications in HA clusters
Jul 2019: Reload would kill plugin process with systemd, last minute fixes
Jul 2019: Renaming the docs: client->agent, custom attrs->vars
Jul 2019: 2.11.0 RC1
Aug 2019: TLS ciphers for older agents
Aug 2019: Refresh Windows agent for 2.11
Aug 2019: Deny syncing binaries with the cluster config sync
Aug 2019: Fix logs with systemd
Aug 2019: Fix cluster downtime loop
Aug 2019: Analyse check performance with max concurrent checks