John Murphy's presentation on well designed Nagios configurations.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
7. Contacts
User Definition
define contact { define contact {
contact_name vu-jsmurphy name read-contact
contactgroups vg-team host_notifications_enabled 0
use read-contact service_notifications_enabled 0
} host_notification_period none
service_notification_period none
define contactgroup { host_notification_options n
contactgroup_name vg-team service_notification_options n
alias Kmart Team host_notification_commands check_none
} service_notification_commands check_none
register 0
define contactgroup { }
contactgroup_name cg-main
alias Kmart Contact
contactgroup_membersvg-team
}
2012 7
8. Contacts
LDAP/AD For Nagios Core
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
SetEnv TZ "Australia/Melbourne"
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Core"
AuthType Basic
# AuthUserFile /usr/local/nagios/etc/htpasswd.users
# Require valid-user
AuthBasicProvider ldap
AuthName “Nagios server"
AuthzLDAPAuthoritative off
AuthLDAPBindDN "CN=bindAccount,OU=User,DC=domain,DC=com"
AuthLDAPBindPassword xxxxxxxxx
AuthLDAPURL ldaps://domain.com/OU=User,DC=Domain,DC=com?sAMAccountName?sub?
(objectClass=user)
AuthLDAPGroupAttribute member
AuthLDAPGroupAttributeIsDN on
Require ldap-group CN=NagiosAccessGroup,OU=Groups,DC=domain,DC=com
</Directory>
2012 8
9. Contacts Summary
Distinguish between your users and your
contacts.
Use an existing authentication source for your
user logins.
Consider the end-user experience… try to
ensure it’s easy to get the information they
need.
2012 9
11. Hosts
Focus on minimizing host configuration to
make automation easier.
Use templates to assign user view information.
Create host groups based on shared
monitoring profiles.
2012 11
12. Hosts
Host Definitions
define host { define host {
host_name exchange01 name srv-template
use srv-template alias Server host template
alias Exchange server check_command check_icmp!250.0,60%!
address exchange01 500.0,80%
parents switch001,switch002 max_check_attempts 3
hostgroups srv-exchange, srv-windows check_interval 10
icon_image exchange.png retry_interval 2
register 1 check_period 24x7
} contact_groups cg-main
notification_interval 60
notification_period 24x7
define hostgroup { notification_options d,f
hostgroup_name srv-windows notifications_enabled 1
alias Windows group register 0
} }
2012 12
13. Hosts Summary
Minimize configuration in host objects to make
automation easier.
Hostnames allow for easier maintenance than
IP addresses.
Create logical host-groupings that will make
service assignment easier e.g. OS type,
Location, Applications it serves.
2012 13
15. Services
Keep services as generic as possible to
prevent the need for duplicate services.
Minimizing service templates allows for easier
management and baseline changes.
Use service groups for applications.
2012 15
16. Services
Service Definitions
define service { define service {
service_description Windows C: usage name main-service-template
use main-service-template service_description main service template
hostgroup_name srv-windows,srv-v-windows max_check_attempts 3
check_command check_interval 10
check_nt!USEDDISKSPACE!-w 80 -c 90 retry_interval 2
contact_groups cg-main,cg-main-SMS check_period 24x7
register 1 notification_interval 60
} notification_period 24x7
notification_options c
register 0
}
2012 16
18. Services Summary
Strike a balance between your service-
templates and your service definitions.
Service groups are a very useful feature when
used appropriately, used inappropriately they
are an administrative burden.
Device life-cycle happens, ensure your
configuration isn’t burdened by over-
complexity.
2012 18
20. Good Parenting (or how to not get woken up 20 times at ~3am)
Parenting Service Dependencies
Use host parenting. Parent indirectly
Use host parenting. monitored services
with service
Use host parenting. dependencies.
2012 20
21. Indirect Services
…And the art of dependencies
A typical ESX
monitoring setup…
Q. But what happens
when the vSphere
server fails?
2012 21
22. Indirect Services
…And the art of dependencies
A. Something like this
2012 22
23. Indirect Services
…And the art of dependencies
define service { define servicedependency {
host_name vSphereServer dependent_hostgroup_name srv-v-windows
service_description Ping dependency dependent_service_description CPU Usage
use main-service-template host_name vSphereServer
check_command check_ping!100,80%!200,90% service_description Ping dependency
register 1 inherits_parent 1
} execution_failure_criteria w,u,c,p
notification_failure_criteria w,u,c
define service { dependency_period 24x7
service_description CPU Usage }
use main-service-template
hostgroup_name srv-v-windows
check_command check_esx!CPU
contact_groups cg-main
register 1
}
2012 23
24. Managing Exceptions
Clearly label
exceptions in your
config.
Make sure you can
use the same solution
again if necessary.
Image by Mike Bade:
http://robotseatingpies.blogspot.com.au/2011/06/robots-dont-have-feelings_16.html
2012 24
25. Automation (or intrapreneurship ideas for the lazy)
Every piece of infrastructure is a potential data
source… make use of it!
AD/LDAP Servers.
Virtual infrastructure API’s.
Patching systems.
Asset databases.
Network management platforms.
Network LLDP/CDP tables.
SNMP enabled servers.
Help I’m running out of space!
2012 25
Work for Kmart Australia, Server engineer, etc. Philosophical discussion, no wrong/right but sometimes… unique ways of handling problems.
Basics: Core triumvirate of objects. Pretend web developers (User experience first (contact/user distinction), making stuff work second (host/services)). Services: Object scaling and host/service relationship. Advanced: A brief look at more advanced topics. Parents and dependencies: How to not get spammed Exceptions: How to deal with non-uniform requests. Using network resources to automate.
Basics: Core triumvirate of objects. Pretend web developers (User experience first (contact/user distinction), making stuff work second (host/services)). Services: Object scaling and host/service relationship. Advanced: A brief look at more advanced topics. Parents and dependencies: How to not get spammed Exceptions: How to deal with non-uniform requests. Using network resources to automate.
Separate user and contact objects, so you can provide UAC and handle complex assignments in a manageable fashion. A contact is an object used to notify a person or a team of a problem. One-To-One relationship of contact to contact-group. A user is a human, a real person… a dummy account to match a Nagios login for access control. Many-To-One relationship of User to User-group and User-group to Contact-group. AD, LDAP, Database, etc integration.
Minimize contact template objects, reduces future work. Defining on contact will override template. Not going to touch on contact escalation.
User objects are basically a contact definition that will receive nothing. Separate users into specific view groups, attach this when a user needs to see something… add the contact group if he needs to be contacted.
For Nagios Core, XI already has an AD login component. When logon occurs, Nagios will match the apache http user context with a contact when possible. Use this to assign a user a “world view”
Logical configuration groupings (base groupings on OS, Location, Application). Minimize configuration in host for automation purposes and move as much as possible to templates. Assigning views to user objects.
Logical configuration groupings (base groupings on OS, Location, Application). Minimize configuration in host for automation purposes and move as much as possible to templates. Assigning views to user objects.
Use hostnames whenever possible. Regex matching hosts in host groups
Logical configuration groupings (base groupings on OS, Location, Application). Minimize configuration in host for automation purposes and move as much as possible to templates. Assigning views to user objects.
One-To-Many relationship of services to host groups Use service-groups for applications, minimize service-templates
Usually you can find a check period that will work for 95% of your checks. Unlike hosts, do not add contacts at the template… add them instead at the actual service definition. Services change contact frequently, hosts do not.
Last pieces of puzzle/complete picture Arrow directions dictate which object references which other object.
Jim trips on a network cable causing Europe to fail, email spammed. Ensure parents are defined and use multi-tenancy ensure service dependencies are defined when one piece of infrastructure relies on another. Indirectly monitored services = CPU usage on VMware infrastructure via VMware API.
A happy ESX environment with vSphere and working monitoring.
A sad ESX environment when vSphere fails and those services stop working.
You can use hostgroups to do broad strokes with service-dependencies.
Despite perfect design some one is going to “Kick your sand castle” Ensure that exceptions are properly labeled Ensure that this exception is re-usable in the future so that future exceptions will be consistent
Importance of naming conventions. Use AD to get computer accounts. Use virtualization API’s to get virtual infrastructure. Patching systems or resource databases. Use SNMP to get network tables/device type and/or LLDP/CDP tables to walk networks. Network management systems (I.e. Ciscoworks, NSM, etc).