Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Nagios Conference 2013 - Sheeri Cabral - Alerting With MySQL and Nagios

1.683 visualizaciones

Publicado el

Sheeri Cabral's presentation on Alerting With MySQL and Nagios.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna

  • Sé el primero en comentar

Nagios Conference 2013 - Sheeri Cabral - Alerting With MySQL and Nagios

  1. 1. Alerting With MySQL and Nagios http://bit.ly/nagios_mysql2013 Sheeri Cabral Senior DB Admin/Architect Mozilla @sheeri
  2. 2. What is Monitoring?  Threshold alerting  Graphing/trending
  3. 3. Why Monitor?  Problem alerting  Find patterns – capacity planning – troubleshooting  Early warning for potential issues
  4. 4. What to Alert?  Problems you can fix – max_connections – long running queries – locked queries – backup disk space
  5. 5. Nagios is great because... ...anyone can write a plugin
  6. 6. The problem with Nagios... ...anyone can write a plugin
  7. 7. Official Nagios Plugins for MySQL  check_mysql  check_mysql_query
  8. 8. check_mysql  db connectivity  slave running  slave lag using seconds_behind_master
  9. 9. check_mysql_query  Checks the output of a query is within a certain range (numerical)
  10. 10. System vars Status vars Caching Calculations check_mysql yes yes no no check_mysql (standard) no yes no no check_mysqld_status no yes no no check_mysql_stats yes no yes no check_mysqld no many no no check_mysql_health yes yes no Hard-coded check_mysql yes yes no Hard-coded Third party plugins
  11. 11. “Let us know what you'd like to see!”
  12. 12. What I wanted System vars Status vars Caching Calculations mysql_health_check.pl yes yes yes flexible
  13. 13. mysql_health_check.pl
  14. 14. Caching  Save information to a file
  15. 15. Caching  Save information to a file  --cache-dir /path/to/dir/
  16. 16. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again
  17. 17. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again --max-cache-age <seconds>
  18. 18. Caching  Save information to a file  --cache-dir /path/to/dir/  Use the file instead of connecting again --max-cache-age <seconds> --no-cache to force connection
  19. 19. --mode=varcomp  %metadata{varstatus}
  20. 20. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS
  21. 21. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS  --expression allows word replacement
  22. 22. --mode=varcomp  %metadata{varstatus}  SHOW GLOBAL VARIABLES  SHOW GLOBAL STATUS  --expression allows word replacement  --warning --critical are flexible
  23. 23. Sample Command Definition define command { command_name check_mysql_tmp_tables command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass
  24. 24. Sample Command Definition define command { command_name check_mysql_tmp_tables command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300
  25. 25. Sample Command Definition define command { command_name check_mysql_cxns command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=varcomp --expression= "Max_used_connections/max_connections * 100" --warning=">80" –critical=">90" }
  26. 26. Sample Command Definition command_name check_mysql_cxns --mode=varcomp --expression= "Max_used_connections/max_connections * 100" --warning=">80" –critical=">90" }
  27. 27. Sample Service Definition define service { use generic-service host_name __HOSTNAME__ service_description MySQL Connections check_command check_mysql_cxns }
  28. 28. Rates  Compare to last run
  29. 29. Rates  Compare to last run  mode=lastrun-varcomp
  30. 30. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}
  31. 31. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}  lastrun{expr}
  32. 32. Rates  Compare to last run  mode=lastrun-varcomp  current{expr}  lastrun{expr}  --comparison, no warn/crit
  33. 33. Query Rate mysql_health_check.pl [host,user,pass]
  34. 34. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp Query Rate
  35. 35. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "(current{Queries} - lastrun{Queries}) Query Rate
  36. 36. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "(current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime})" Query Rate
  37. 37. mysql_health_check.pl [host,user,pass] --mode lastrun-varcomp --expression "abs((current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime}))*100" --comparison ">80" Query Rate
  38. 38. define command { command_name check_mysql_query_rate command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=lastrun-varcomp --expression= "abs((current{Queries} - lastrun{Queries}) / (current{Uptime} – lastrun{Uptime}))*100" --comparison > 100 }
  39. 39. Other Modes  --mode=long-query  --mode=locked-query  %metadata{proc_list}  SHOW FULL PROCESSLIST
  40. 40. Sample Command Definition define command { command_name check_mysql_locked_queries command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --user myuser --password mypass --cache-dir=/var/lib/nagios/mysql_cache --max-cache-age=300 --mode=locked-query --warning=$ARG1$ --critical=$ARG2$ }
  41. 41. Extending Information sub fetch_server_meta_data {} add a new hash key to %metadata $metadata{varstatus} = $dbh->selectall_arrayref( q|SHOW GLOBAL VARIABLES|);
  42. 42. Extending Information sub fetch_server_meta_data {} add a new hash key to %metadata $metadata{innodb_status} = $dbh->selectall_arrayref( q|SHOW ENGINE INNODB STATUS|);
  43. 43. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused
  44. 44. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused • %metadata{master_status} – SHOW MASTER STATUS
  45. 45. For example • %metadata{innodb_status} – SHOW ENGINE INNODB STATUS • Already exists, unused • %metadata{master_status} – SHOW MASTER STATUS • %metadata{slave_status} – SHOW SLAVE STATUS
  46. 46. “Standard” checks % max connections --expression 'Threads_connected/max_connections*100'
  47. 47. “Standard” checks % max connections --expression 'Threads_connected/max_connections*100' InnoDB enabled --expression “have_innodb” --comparison=“ne 'YES'”
  48. 48. define command { command_name check_mysql_connections command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Threads_connected/max_connections * 100" --comparison=">80" } define command { command_name check_mysql_innodb command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="have_innodb" --comparison="ne 'YES'" }
  49. 49. Did MySQL Crash? Nagios set to check every 5 minutes
  50. 50. Did MySQL Crash? Nagios set to check every 5 minutes Might miss a crash
  51. 51. Did MySQL Crash? Nagios set to check every 5 minutes Might miss a crash Uptime! --expression 'Uptime' --comparison=“<1800”
  52. 52. Did MySQL Crash? define command { command_name check_mysql_uptime command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="Uptime" --comparison="<1800" }
  53. 53. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'”
  54. 54. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'” % of sleeping connections # connected, # running, # max connections
  55. 55. “Standard” checks read_only for slaves --expression “read_only” --comparison=“ne 'YES'” % of sleeping connections # connected, # running, # max connections --expression="(Threads_connected- Threads_running)/max_connections * 100" --comparison=">$ARG1$"
  56. 56. define command { command_name check_mysql_read_only command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="read_only" --comparison="ne 'YES'" } define command { command_name check_mysql_connections command_line $USER1$/mysql_health_check.pl --hostname $HOSTADDRESS$ --port 3306 --user <user> --password <password> --mode=varcomp --expression="(Threads_connected- Threads_running)/max_connections * 100" --comparison=">$ARG1$"
  57. 57. Limitations  One check/calculation per Nagios service – But, you can use many variables – Cached output  Does not output for performance data – Not hard to modify, just no need yet
  58. 58. Where to get it https://github.com/palominodb/PalominoDB- Public-Code-Repository/tree/master/nagios/ www.palominodb.com->Community->Projects
  59. 59. Other Nagios Plugins
  60. 60. Nagios Plugin for Partitions table_partitions.pl --host --user --pass --database --table --range [days|weeks|months] --verify #
  61. 61. Nagios Plugin for Partitions table_partitions.pl --host db1.mozilla.com --user nagiosuser --pass nagiospass --database addons --table user_addons --range months --verify 3
  62. 62. After using pt-table-checksum (master) Slaves have a table with checksum this_crc vs master_crc Nagios Plugin for Checksums
  63. 63. Nagios Plugin for Checksums check_table_checksums.pl -H host --user username --pass password -T checksum_table -I checksum_freshness -b dbs,to,skip
  64. 64. More Resources www.palominodb.com->Community->Projects www.mysqlmarinate.com scabral@mozilla.com OurSQL podcast (oursql.com) slides: http://bit.ly/nagios_mysql2013 MySQL Administrator's Bible youtube.com/tcation http://planet.mysql.com

×