2. Have you been stalking
your servers?
Marji Cermak
Sysadmin & DevOps Engineer at Morpht
marji@morpht.com
@cermakm
3. The rule of 3 things
picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
4. The rule of 3 things
1. What is monitoring and why do you want to
monitor
2. Some monitoring tools available for you
3. It is easy to start with monitoring.
7. Monitoring
Monitoring is an intermittent (regular or
irregular) series of observations in time,
carried out to show the extent of compliance
with a formulated standard or degree of
deviation from an expected norm.
J. M. Hellawell (1991), modified by A. Brown
(2000), http://jncc.defra.gov.uk/page-2268
nature conservation area
8. Why you need to monitor
● to know about the bad news before your
customers (or your boss)
9. Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in advance
10. Why you need to monitor
● to know about the bad news before your
customers (or your boss)
● to scale up your server in advance
● to tune up your app
11. Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
12. The fun of the nines
Source: http://en.wikipedia.org/wiki/High_availability
Nines: http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Nines
13. Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
14. Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
● to capture customer information
15. Why you need to monitor (cont.)
● to have data / metrics to diagnose
33. Meet Nagios, Munin and others
● Nagios
● Munin
● APC dashboard
● related Drupal modules
34.
35. Nagios /ˈnɑːɡiːoʊs/
● system, network and infrastructure
monitoring software application
● monitors and alerts
● many plugins
36. Nagios /ˈnɑːɡiːoʊs/
Name and Pronunciation:
● NetSaint -> "Nagios Ain't Gonna Insist On
Sainthood"
● Agios' a transliteration of the Greek word
άγιος (saint)
37. Nagios /ˈnɑːɡiːoʊs/
● alerts by email/pager/IM...
● alerts to different contacts
● notification escalation
● service / host dependencies
● soft / hard states
42. Munin
● master / node architecture
● connects to all nodes at regular intervals
● it uses the RRDtool (round robin database
tool, handles time-series data)
48. ● they complement each other
● nagios normally alerts on one “service”
● munin can be used to correlate different
things
Nagios & Munin
49. APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
50. APC - what is it?
The Alternative PHP Cache (APC) is a free
and open opcode cache for PHP.
Its goal is to provide a free, open, and robust
framework for caching and optimising PHP
intermediate code.
Inside your webserver (not a webcache)
57. How to install these tools?
Munin
sudo apt-get install munin munin-node
Nagios
sudo apt-get install nagios3
APC dashboard
php.apc script from php-apc package
58. How to configure these?
● It is a bit fiddly
● There are many guides targeting beginners
● You don’t want to do it again and again
59. puppet – a quick way to start
system for automating system administration
tasks
60. puppet – a quick way to start
● a declarative language for expressing
system configuration,
61. puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for distributing it
62. puppet – a quick way to start
● a declarative language for expressing
system configuration,
● a client and server for distributing it
● and a library for realising the configuration.
63. puppet – a quick way to start
package { 'munin-node': ensure => installed }
service { 'munin-node':
enable => true,
ensure => running,
require => Package['munin-node'],
}
64. puppet – a quick way to start
1. clone the stalk-your-box repo
2. run puppet apply on the code
3. monitor!
65. A quick way to start
$ git clone
git://github.com/morpht/stalk-your-box.git
/tmp/stalk-your-box
Cloning into '/tmp/stalk-your-box'...
remote: Counting objects: 23, done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 23 (delta 1), reused 23 (delta 1)
Receiving objects: 100% (23/23), 11.35 KiB, done.
Resolving deltas: 100% (1/1), done.
66. A quick way to start
$ cd /tmp/stalk-your-box/
$ sudo puppet apply
--modulepath=modules manifest.pp
notice: /Stage[main]/Nagios::Server/Package[nagios3]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Nagios::Server/File[/etc/nagios3/htpasswd.users]/ensure: created
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: Adding password for user nagiosadmin
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: executed successfully
notice: /Stage[main]/Munin::Node/Package[libcache-cache-perl]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/Package[munin-node]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/File[munin-node.conf]/content: content changed '{md5}
e486786f866d7d7e025dea401c300e7b' to '{md5}dbf97a87a8da86ef68155815ecae3c1c'
notice: /Stage[main]/Munin::Server/Service[apache2]: Triggered 'refresh' from 1 events
notice: Finished catalog run in 44.26 seconds
70. Manifest.pp
# Execute apt-get update before any package is installed:
exec { 'apt-update':
command => 'apt-get update',
# but don't execute it more than once a day:
unless => 'test $(find /var/cache/apt/pkgcache.bin -mtime 0 | wc -l ) -eq 1',
}
Exec['apt-update'] -> Package <| |>
# Include minimal apache2 installation. Munin server, nagios
# and APC dashboard depend on it.
include 'apache2'
71. Manifest.pp
# Install munin node and munin server:
class { 'munin::node': }
class { 'munin::server':
htuser => 'munin', # Username for basic access auth.
htpass => 'Prague2013' # Password for basic access auth.
}
# Install nagios:
class { 'nagios::server':
contact_email => 'root@localhost', # Email to send alerts to.
htpass => 'Prague2013', # Password for the nagiosadmin username.
}
76. Questions
Here is the get started monitoring repo:
https://github.com/morpht/stalk-your-box
Marji Cermak
Sysadmin & DevOps Engineer at Morpht
marji@morpht.com
@cermakm
78. THANK YOU!
WHAT DID YOU THINK?
Locate this session at the
DrupalCon Prague website:
http://prague2013.drupal.org/schedule
Click the “Take the survey” link