2. Barking at daemons
An small open source utility to
monitor Unix systems with
automatic error recovery
capabilities.
3. What Monit can monitor
Files, Dirs and Filesystems
Monitor these items for changes,
such as timestamps changes,
checksum changes or size
changes.
Hosts
Monitor network connections to
various servers, either on
localhost or on remote hosts.
TCP, UDP and Unix Domain
Sockets are supported. Network
tests can be performed on a
protocol level.
System
General system resources on
localhost such as overall CPU
usage, Memory and Load
Average.
Processes
Daemon processes or similar
programs running on localhost,
such as those started at system
boot time from /etc/init.d/
Programs and scripts
Test programs or scripts at
certain times, much like cron,
but in addition, you can test the
exit value of a program and
perform an action or send an
alert if the exit value indicates an
error.
5. Configuration (i)
◉ Global configuration file at /etc/monitrc.
◉ Sample global configuration:
○ Check services at 30 seconds intervals:
set daemon 30
# with start delay 240 # optional: delay the first check by 4-minutes (by
# # default Monit check immediately after Monit start)
6. Configuration (ii)
◉ Set Monit’s logfile:
◉ Mail configuration:
set logfile /var/log/monit.log
set mailserver localhost
# By default Monit will drop alert events if no mail servers are available.
# If you want to keep the alerts for later delivery retry, you can use the
# EVENTQUEUE statement.
set eventqueue
basedir /var/monit # set the base directory where events will be stored
slots 100 # optionally limit the queue size
8. Configuration (iv)
◉ HTTP interface:
◉ Additional configuration files:
set httpd port 2812 and
allow admin:monit # require user 'admin' with password 'monit'
include /etc/monit.d/*
10. Basic commands (i)
Controlled from command line with the command monit:
◉ Start Monit daemon: $ monit
◉ Exit Monit: $ monit quit
◉ Status summary: $ monit summary
◉ Disable monitoring of a named service or all services:
$ monit unmonitor name
$ monit unmonitor all
◉ Enable monitoring:
$ monit monitor name
$ monit monitor all
11. Basic commands (ii)
◉ Start named service or all services:
$ monit start name
$ monit start all
◉ Stop named service or all services:
$ monit stop name
$ monit stop all
◉ Restart named service or all services:
$ monit restart name
$ monit restart all
14. Proactive process monitoring
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
15. Restart process if it has stopped accepting
connections
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
restart program = “/etc/init.d/tomcat-8 restart”
if failed port 8080 protocol http then restart
16. Restart process if it has stopped accepting
connections avoiding false positives
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
restart program = “/etc/init.d/tomcat-8 restart”
if failed port 8080 protocol http for 2 cycles then restart
17. Check process response to requests
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
18. Avoid noisy alarms
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then unmonitor
19. Check resources used by process (e.g. DoS attacks)
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start" with timeout 60 seconds
stop program = "/etc/init.d/httpd stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then unmonitor
20. Monitor filesystem space and inode usage
check filesystem datafs with path /dev/sdb1
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
if space usage > 80% for 5 times within 15 cycles then alert
if space usage > 99% then stop
if inode usage > 30000 then alert
if inode usage > 99% then stop
21. Monitor file checksum (e.g. rootkits)
check file apache with path /usr/sbin/httpd
if failed checksum then alert
if failed uid root then alert
if failed gid root then alert
if failed permission 755 then alert
22. Monitor a directory that should change
check directory incomming with path /var/data/ftp
if timestamp > 1 hour then alert
23. Check network interface status
check network eth0 with interface eth0
start program = '/etc/init.d/net.eth0 start'
stop program = '/etc/init.d/net.eth0 stop'
if failed link then restart
24. Check network link capacity changes
check network eth0 with interface eth0
if changed link capacity then alert
25. Check network link usage (saturation,
bandwidth)
check network eth0 with interface eth0
if saturation > 90% then alert
if upload > 500 kB/s then alert
if total download > 1 GB in last 2 hours then alert
if total download > 10 GB in last day then alert
26. Check remote host availability by issuing a
ping test
check host osoco.es with address osoco.es
if failed ping then alert
27. Check the content of a response from a web
server
check host myserver with address 192.168.1.1
if failed port 80 protocol http
and request /some/path with content = "a string"
then alert
28. Check connection with custom protocol
(MySQL)
check host databaserver with address 192.168.1.1
if failed ping then alert
if failed
port 3306
protocol mysql username foo password bar
then alert
29. Check custom program status output
check program myscript with path /usr/local/bin/myscript.sh
if status != 0 then alert
30. Check custom program every workday at 8AM
check program checkOracleDatabase
with path /var/monit/programs/checkoracle.pl
every "* 8 * * 1-5"
31. Check service dependencies before
start/stop/monitor/unmonitor
check process apache
with pidfile "/usr/local/apache/logs/httpd.pid"
...
depends on httpd
check file httpd with path /usr/local/apache/bin/httpd
if failed checksum then unmonitor
32. Hierarchy of dependencies
check process apache
...
depends on tomcat
check process tomcat
...
depends on mysql
check process mysql
...
depends on datafs
check filesystem datafs with path /dev/sdb1
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
35. One interface to rule them all
◉ M/Monit:
○ Monitoring and
management of all
your Monit hosts.
○ Also works on mobile
devices.
○ A one-time payment
and the license is
perpetual.
36. One interface to rule them all
◉ Monittr:
○ https://github.com/karmi/monittr
○ Free and very basic option.
38. Thanks!
This work is licensed under a Creative Commons
Attribution 4.0 International License.
You can find me at
◉ @rafael_luque
◉ rafael.luque@osoco.es
Cover photo licensed by Edward Conte under a Creative Commond by-nc license: https:
//www.flickr.com/photos/edwardconde/11447139646/