4. Architecture of talkbits service
One way to configure service, logs, metrics.
One way to package and deploy service.
One way to lunch service.
Bundled in one-jar.
5. One delivery unit. Contains:
Java service
In a single executable fat-jar.
Installation script
[Re]installs service on the
machine, registers it in
/etc/init.d
Init.d script
Contains instructions to start,
stop, restart JVM and get quick
status.
Delivery
6. Logging
Confuguration
• SLF4J as an API, all other libraries redirected
• Logback as a logging implementation
• Each service logs to /var/log/talkbits/... (application logs, GC logs)
• Daily rotation policy applied
• Also sent to loggly.com for aggregation, grouping etc.
Aggregation
• loggly.com
• sshfs for analyzing logs by means of linux tools such as grep, tail,
less, etc.
Aggregation alternatives
Splunk.com, Flume, Scribe, etc...
7. Metrics
Application metrics and health checks are implemented with CodaHale
lib (metrics.codahale.com). Codahale reports metrics via JMX.
Jolokia JVM agent (www.jolokia.org/agent/jvm.html) exposes JMX beans
via REST (JSON / HTTP), using JVMs internal HTTP server.
Monitoring agent use jolokia REST interface to fetch metrics and send
them to monitoring system.
All metrics are divided into common metrics (HW, JVM, etc) and
service-specific metrics.
8. Deployment
Fabric (http://fabfile.org) used for
environments provisioning and
services deployment.
Process
• Fabric script provisions new env
(or uses existing) by cluster
scheme
• Amazon instances are
automatically tagged with
services list (i.e., instance roles)
• Fabric script reads instance roles
and deploys (redeploys)
appropriate components.
9. Monitoring
As monitoring platform we chose Datadoghq.com. Datadog is a SaaS
which is easy to integrate into your infrastucture. Datadog agent is
opensourced and implemented in Python. There are many predefined
checksets (plugins, or integrations) for popular products out of the box -
including JVM, Cassandra, Zookeeper and ElasticSearch.
Datadog provides REST API.
Alternatives
• Nagios, Zabbix - need to have bearded admin in team. We wanted to
go SaaS and outsource infrastructure as far as possible.
• Amazon CloudWatch, LogicMonitor, ManageEngine, etc.
Process
Each service has own monitoring agent instance on a single machine. If
node has 'monitoring-agent' role in the roles tag of EC2 instance,
monitoring agent will be installed for each service on this node.