Here's a presentation that our Head of DevOps presented at Container World 2019 about our experience scaling Elastic Search with Kubernetes including pro tips of configuration files, index templates, and more.
2. Fast multi-cloud logging
What is Elasticsearch (ES) and why would I use it?
● Elasticsearch is a distributed full-text search engine that is queryable via a JSON API
● It’s the ‘E’ in the popular ELK stack and allows easy searching of unstructured data
● Native distributed clustering support makes adding Elasticsearch nodes easy
● You’ve been watching the Elasticsearch hype train and want to hop aboard
In brief:
Presentation by Ryan Staatz
3. Fast multi-cloud logging
What is Kubernetes (k8s) and why would I run ES on it?
● Kubernetes is an open-source container orchestration platform developed by Google
● Scheduling & distributing application workloads onto hardware resources is automatic
● Configuration as code & static docker images enforce consistent pod behaviors
● You’ve been watching the Kubernetes hype train ship and want to hop aboard
In brief:
Presentation by Ryan Staatz
4. Fast multi-cloud logging
At LogDNA we run ES on k8s at scale
● We needed a consistent way to deploy our software across varying infrastructures
● There are a number of custom modifications we have made to Elasticsearch interfaces
● We run in-house versions of the L (Logstash) and K (Kibana) of the ELK stack
● Kubernetes enables easier automation for versioning, CI/CD, and maintenance
Both cloud and on-prem!
Presentation by Ryan Staatz
5. Fast multi-cloud logging
So managing ES with Kubernetes should be easy, right?
● Choose the appropriate Elasticsearch version and select the correct settings (there are hundreds of settings)
● Learn the expansive query language for Elasticsearch and integrate it into your workflows
● Set up a Kubernetes environment with access to appropriately sized hardware
● Configure the Elasticsearch k8s workload to request the appropriate resources, including disks
● Ensure the correct index templates and cluster settings are applied after launching your ES cluster
● Create k8s services such that Elasticsearch pods can find each other
● Troubleshoot all remaining issues as they arise and continue to manage and scale the cluster
These are some of the steps involved in running ES on Kubernetes:
Sounds great, let’s get started!
Presentation by Ryan Staatz
6. Fast multi-cloud logging
Getting started
● ES version 5.5 & Kubernetes cluster v1.11+ (for preemption)
● Hardware resources (k8s nodes) with at least 64 GB of RAM and 16 vCPUs (depends on your volume)
● Statefulsets and Services yaml configurations (we need identity, disks, and networking)
● Basic, but important cluster settings & a good starter index template
● Deploy an ES cluster management GUI (cerebro) to help with troubleshooting
Maybe let’s just start with some sane defaults
Presentation by Ryan Staatz
7. Fast multi-cloud logging
A tale of too (many) yamls
● Two ConfigMaps:
○ The elasticsearch configuration file
○ A start script used to configure ulimits, permissions, and JVM heap size
● Three ES role types (statefulsets)
○ Master - handles lightweight cluster-wide actions (does not require disk)
○ Hot - handles incoming writes to active indices (higher cpu to disk ratio)
○ Cold - stores and queries older indices (lower cpu to disk ratio)
There’s going to be a lot of these, but configuration as code is good!
Presentation by Ryan Staatz
8. Fast multi-cloud logging
Important ES configuration notes
● Use the alpine flavor of ES to reduce image size: elasticsearch:5.5.2-alpine
● Configure volumeClaimTemplates to dynamically provision disks
● Ensure the correct security context settings are specified in each statefulset
● Use k8s pre-emption to ensure your ES pods get scheduling priority
● Create a startup script to set the correct configuration prior to starting the JVM
Pro tip: this slide contains several pro tips
Presentation by Ryan Staatz
9. Fast multi-cloud logging
Service discovery
● ES hot and cold have a single load balanced cluster IP service endpoint for insertions
● ES masters have 2 services
○ 1 load-balanced cluster IP for transport (9300) and http API requests (9200)
○ 1 clusterIP: None used for ES unicast discovery
● 2 important settings for clusterIP: None
○ Ensure DNS is publishable immediately
○ No sessionAffinity ensures up-to-date addresses
Leverage Kubernetes’ native services
Presentation by Ryan Staatz
10. Fast multi-cloud logging
ES startup settings
● Ensure memory_lock is on
● Adjust the min master nodes based on the
total number of masters you have
● The clusterIP: None service from the last
slide is referenced by unicast settings
● Set the correct ES role
● Specify the number of cores
Just the ones we use
Presentation by Ryan Staatz
11. Fast multi-cloud logging
Configuring an index template
● Configure index.total_shards_per_node based on your expected load
○ Optimizing shards can increase performance and reduced cluster state overhead
● Set a refresh_interval that works for you
○ Higher refresh intervals offer better throughput performance at the cost of latency
○ We typically use 15-30 seconds
● Change translog.durability to async (allow asynchronous translog writes)
○ We regret not discovering this setting sooner, as it gave us 5-10x increase in performance
● Note: index templates MUST be applied AFTER the ES masters are already running
Index templates can have a huge impact on your cluster performance
Presentation by Ryan Staatz
12. Fast multi-cloud logging
Manage ES the GUI way: Cerebro
● Cerebro connects to your ES service endpoint(s)
● Contains an ES node/pod list and their health stats
● View indices and shards across the available data nodes
● Modify index settings, templates, and data
● Move shards around (important)
● Not all options are accessible via Cerebro
Previously kopf if you’re using ES v2.X or lower
Presentation by Ryan Staatz
13. Fast multi-cloud logging
Manage ES the API way
● We use Insomnia (a REST API GUI to share API calls)
● Curl works too!
● API calls we commonly use:
○ /_cluster/health
○ /_cat/pending_tasks?v
○ /_flush?force & /_cluster/reroute?retry_failed=true
A bit more work to start on, but automation is much easier
Presentation by Ryan Staatz
14. Fast multi-cloud logging
Wrap up
● ES requires some coaxing to properly run inside a container
○ Use the correct security context, ulimit, and vm settings
● There are native concepts in Kubernetes than can make running ES easier
○ Service discovery, volumeClaimTemplates, pre-emption, and more
○ ...or you could just use an operator! (your mileage may vary)
● Index templates have a big impact on how well your ES cluster runs
● GUIs (cerebro) and ES APIs are extremely useful for tuning performance
That was a lot of info, but here’s what to walk away with:
Presentation by Ryan Staatz