Zabbix is an open source monitoring solution that can monitor all elements of an IT infrastructure including operating systems, databases, applications and network devices. It collects metrics using agents and SNMP and analyzes the data to detect problems, generate alerts and trigger automatic actions. Zabbix uses triggers to define problems and can forecast and predict issues. It provides visualizations, reporting and centralized management for monitoring large, complex environments.
6. What’s wrong with the
script?
• Hard to maintain and extend
• It did not scale well
• Provided no advanced problem detection
• Any change required script modifications
• Etc etc etc
19. What do we need to monitor?
OS metrics
• CPU, memory and network utilization
• Disk IO time
• Available disk space
Database metrics
• Configuration related: max connections, buffer sizes, sync mode
• Performance: QPS, query performance, cache hit rate, slow queries, buffer pool usage
• Availability: DB is up, connections, log files
• Consistency: DB encoding, replication
• Security: SSL enabled, opened ports, log files
19
20. Linux (OS) metrics: Zabbix Agent
shell> apt-get install zabbix-agent
or
shell> rpm -Uvh zabbix-agent
* for AIX, Solaris, HP-UX, *BSD, Windows: download pre-compiled from
www.zabbix.com
20
21. Active vs Passive
Pull
• Service checks
• Passive agent
• SSH and Telnet
Push
• Active agent
• Zabbix Trapper and SNMP Traps
• Monitoring of log files
21
49. Performance: MySQL is overloaded
{MySQL_001:mysql.status[Questions].last()} > 5000
Availability: MySQL is not available
{MySQL_001:mysql.ping.last()} = 0
Junior level
49
53. Properly define problem
conditions and think
carefully!
MySQL is overloaded
MySQL is not available
running out of disk space
53
What really means ?
54. Take advantage of history
MySQL is overloaded
{MySQL_001:mysql.status[Questions].min(10m)} > 5000
MySQL node is not available
{MySQL_001:mysql.ping.max(#3)} = 0
54
56. A few examples
Problem: Queries per second > 5000
Now: 4999 Resolved?
Problem: Disk space < 10%
Now: 9.95% Resolved?
Problem: MySQL is not available
Now: last check returned Up Resolved?
56
57. A few examples
Problem: Queries per second > 5000
Now: 4999 Resolved?
Problem: Disk space < 10%
Now: 10.05% Resolved?
Problem: MySQL is not available
Now: last check returned Up Resolved?
57
58. A few examples
Problem: Queries per second > 5000
Now: 4999 Resolved?
Problem: Disk space < 10%
Now: 10.05% Resolved?
Problem: MySQL is not available
Now: last check returned Up Resolved?
58
59. Different conditions for problem and
recovery
Before:
{MySQL_001:mysql.status[Questions].last()} > 5000
Better alternative:
Problem: {MySQL_001:mysql.status[Questions].last()} > 5000
Recovery: {MySQL_001:mysql.status[Questions].last()} < 3000
59
60. Several examples
System is overloaded
Problem: {MySQL_001:mysql.status[Questions].min(2m)} > 5000
Recovery: {MySQL_001:mysql.status[Questions].max(10m)} < 3000
MySQL server is not available
Problem: {MySQL_001:mysql.ping.max(#3)} = 0
Recovery: {MySQL_001:mysql.ping.min(#10)} = 1
60
61. No flapping. No false positives.
Suddenly we trust our monitoring!
61
62. Anomaly detection
Compare with a norm, where norm is system state in the past.
Average number of queries per second for the last hour is 2x less than number of
queries per second for the same period week ago
{HA Proxy:Questions.avg(1h)} < 2 * {HA Proxy:Questions.avg(1h,
7d)}
62
68. Possible reactions
• Automatic problem resolution
• Sending alerts to user and user group
• Opening tickets in Helpdesk systems
• Unlimited number of possible reactions
68
69. Escalate!
69
MySQL Cluster is
down Repeated Email
SMS and ticket in Helpdesk system
Restart HA Proxy
SMS to manager
5 min
10 min
15 min
20 min
0 min
71. 71
All-in-one solution
Trend prediction
Data collection
Problem
detection
Automatic
actions
Agent based
monitoring
Encryption
Anomaly
detection
Maintenance
Event
correlation
Scalability
Visualization
Auto discovery
Trigger
dependencies
Centralized
management
Service checks
IoT and
embedded
Distributed
monitoring
Zabbix API Alerting Escalations
User
permissions
Integration with
AD, OpenLDAP
LLD
Agent-less
monitoring
… and more
72. Focus on quality and ease of maintenance
72
All components are compatible within one major release
Virtually no third party dependencies
Zabbix Agents are backward compatible since Zabbix 1.0!
73. Benefits of Zabbix
Free and Open Source Software
Extremely flexible
Easy to adopt, use commercial services if needed
No License Fees
Extremely low TCO
No vendor lock in
73
75. The Universal Open Source Enterprise Level Monitoring Solution
Thank you!
Twitter: @avladishev
Email: alex@zabbix.com
Learn more at Zabbix booth or www.zabbix.com