How LogDNA Scaled Elasticsearch on Kubernetes

Scaling Elasticsearch on Kubernetes
By Ryan Staatz

Fast multi-cloud logging
What is Elasticsearch (ES) and why would I use it?
● Elasticsearch is a distributed full-text search engine that is queryable via a JSON API
● It’s the ‘E’ in the popular ELK stack and allows easy searching of unstructured data
● Native distributed clustering support makes adding Elasticsearch nodes easy
● You’ve been watching the Elasticsearch hype train and want to hop aboard
In brief:
Presentation by Ryan Staatz

What is Kubernetes (k8s) and why would I run ES on it?
● Kubernetes is an open-source container orchestration platform developed by Google
● Scheduling & distributing application workloads onto hardware resources is automatic
● Conﬁguration as code & static docker images enforce consistent pod behaviors
● You’ve been watching the Kubernetes hype train ship and want to hop aboard
In brief:

At LogDNA we run ES on k8s at scale
● We needed a consistent way to deploy our software across varying infrastructures
● There are a number of custom modiﬁcations we have made to Elasticsearch interfaces
● We run in-house versions of the L (Logstash) and K (Kibana) of the ELK stack
● Kubernetes enables easier automation for versioning, CI/CD, and maintenance
Both cloud and on-prem!

So managing ES with Kubernetes should be easy, right?
● Choose the appropriate Elasticsearch version and select the correct settings (there are hundreds of settings)
● Learn the expansive query language for Elasticsearch and integrate it into your workflows
● Set up a Kubernetes environment with access to appropriately sized hardware
● Configure the Elasticsearch k8s workload to request the appropriate resources, including disks
● Ensure the correct index templates and cluster settings are applied after launching your ES cluster
● Create k8s services such that Elasticsearch pods can find each other
● Troubleshoot all remaining issues as they arise and continue to manage and scale the cluster
These are some of the steps involved in running ES on Kubernetes:
Sounds great, let’s get started!

Getting started
● ES version 5.5 & Kubernetes cluster v1.11+ (for preemption)
● Hardware resources (k8s nodes) with at least 64 GB of RAM and 16 vCPUs (depends on your volume)
● Statefulsets and Services yaml conﬁgurations (we need identity, disks, and networking)
● Basic, but important cluster settings & a good starter index template
● Deploy an ES cluster management GUI (cerebro) to help with troubleshooting
Maybe let’s just start with some sane defaults

A tale of too (many) yamls
● Two ConfigMaps:
○ The elasticsearch configuration file
○ A start script used to configure ulimits, permissions, and JVM heap size
● Three ES role types (statefulsets)
○ Master - handles lightweight cluster-wide actions (does not require disk)
○ Hot - handles incoming writes to active indices (higher cpu to disk ratio)
○ Cold - stores and queries older indices (lower cpu to disk ratio)
There’s going to be a lot of these, but configuration as code is good!

Important ES configuration notes
● Use the alpine flavor of ES to reduce image size: elasticsearch:5.5.2-alpine
● Configure volumeClaimTemplates to dynamically provision disks
● Ensure the correct security context settings are specified in each statefulset
● Use k8s pre-emption to ensure your ES pods get scheduling priority
● Create a startup script to set the correct configuration prior to starting the JVM
Pro tip: this slide contains several pro tips

Service discovery
● ES hot and cold have a single load balanced cluster IP service endpoint for insertions
● ES masters have 2 services
○ 1 load-balanced cluster IP for transport (9300) and http API requests (9200)
○ 1 clusterIP: None used for ES unicast discovery
● 2 important settings for clusterIP: None
○ Ensure DNS is publishable immediately
○ No sessionAﬃnity ensures up-to-date addresses
Leverage Kubernetes’ native services

ES startup settings
● Ensure memory_lock is on
● Adjust the min master nodes based on the
total number of masters you have
● The clusterIP: None service from the last
slide is referenced by unicast settings
● Set the correct ES role
● Specify the number of cores
Just the ones we use

Conﬁguring an index template
● Conﬁgure index.total_shards_per_node based on your expected load
○ Optimizing shards can increase performance and reduced cluster state overhead
● Set a refresh_interval that works for you
○ Higher refresh intervals offer better throughput performance at the cost of latency
○ We typically use 15-30 seconds
● Change translog.durability to async (allow asynchronous translog writes)
○ We regret not discovering this setting sooner, as it gave us 5-10x increase in performance
● Note: index templates MUST be applied AFTER the ES masters are already running
Index templates can have a huge impact on your cluster performance

Manage ES the GUI way: Cerebro
● Cerebro connects to your ES service endpoint(s)
● Contains an ES node/pod list and their health stats
● View indices and shards across the available data nodes
● Modify index settings, templates, and data
● Move shards around (important)
● Not all options are accessible via Cerebro
Previously kopf if you’re using ES v2.X or lower

Manage ES the API way
● We use Insomnia (a REST API GUI to share API calls)
● Curl works too!
● API calls we commonly use:
○ /_cluster/health
○ /_cat/pending_tasks?v
○ /_ﬂush?force & /_cluster/reroute?retry_failed=true
A bit more work to start on, but automation is much easier

Wrap up
● ES requires some coaxing to properly run inside a container
○ Use the correct security context, ulimit, and vm settings
● There are native concepts in Kubernetes than can make running ES easier
○ Service discovery, volumeClaimTemplates, pre-emption, and more
○ ...or you could just use an operator! (your mileage may vary)
● Index templates have a big impact on how well your ES cluster runs
● GUIs (cerebro) and ES APIs are extremely useful for tuning performance
That was a lot of info, but here’s what to walk away with:

Fast Multi-Cloud Logging
Visit Booth #215
ryan@logdna.com

How LogDNA Scaled Elasticsearch on Kubernetes

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a How LogDNA Scaled Elasticsearch on Kubernetes

Similar a How LogDNA Scaled Elasticsearch on Kubernetes (20)

Último

Último (20)

How LogDNA Scaled Elasticsearch on Kubernetes