So you are deployed to production (or soon to be) with Elasticsearch running and powering important application features. Or maybe used for centralized logging for effective debugging.
Was your Elastic cluster deployed correctly? Is it stable? Can it hold the throughput you expect it to?
How did you do capacity planning? How to tell if the cluster is healthy and what to monitor? How to apply effective multi-tenancy? and what would be an ideal cluster topology and data ingestion architecture?
3. What does it take?
• Cluster deployed using best
practices
• Thorough monitoring
• Inspect. Fix. Repeat.
• Good capacity planning
• Memory management
• Indexing and sharding strategy
• Security
5. Deployments
• Prefer immutable images & scripted deployments
• For AWS see https://github.com/synhershko/elasticsearch-
cloud-deploy/
• GCP coming soon
6. Backups
• Very efficient
• Very important
• Several storages supported
• To a shared file system
• HDFS
• Azure / GCP / AWS repositories via plugins
7. What to monitor (on the cluster, per
host)?
• CPU load
• Memory utilization
• Heap utilization
• GC time
• Disk utilization
• Disk IOPs
• Merges
• Deleted docs
• Requests per sec (indexing, search)
• Load average < number of cores
• Network in / out
• Thread pool rejections
• Number of nodes
• Cache sizes
• Cache evictions
• Cluster state / health
• Number of shards per type
9. Grafana
dashboards
• More fine-grained, cluster-wide view
• Provided with metrics polling script (Python)
https://github.com/synhershko/elasticsearch-grafana-monitoring
10. Monitoring Destination
• To the same cluster
• To a different cluster (Recommended)
• External systems (e.g. graphite) – only if already in org
• X-Pack subscribers can now send metrics to Elastic Cloud
15. Boosting slow operations
• Search or Indexing heavy?
• Measure operations also from applications side!
• Slow searches
• Queries need optimization
• Scoring (not using filters)
• Numeric ranges pre-5
• Scripts
• Slow indexing
• Sharding strategy
• Use bulk indexing (optimize for 10-15MB of data, regardless of
number of documents / operations)
• Slow analyzers affects both! (e.g. n-grams)
16. Don’t use NGrams!
• Being used for “contains” search
• You ain’t gonna need it, use WordDelimiter Token Filter instead
• Useful for fuzzy search / auto-correction
• Best used via Elasticsearch’s Suggesters
• Useful for languages without spaces, or with compound
words
• min_gram , max_gram
18. Memory Allocation
• ES_HEAP_SIZE
• DocValues used?
• Fielddata usage
• Query cache (for queries in filter context)
• Request cache (for aggregations and count queries)
• Never over 32GB!
• Default cache sizes not always fit usage
• Set appropriate static configs in elasticsearch.yml
• At least 50% of memory to file-system cache
• Usually more
19. Server Sizing
• Master nodes
• 1-2 cores, 2-4 GB memory, 50% ES_HEAP_SIZE
• Data nodes
• > 4 cores, measure and preserve disk/mem ratio (can start with
1/24)
• ES_HEAP_SIZE as per previous slide
• Client nodes
• CPU and network heavy, 4GB memory should be enough for most
use cases
20. Index Management Patterns
• A Monolith Index
• Search façade on top of your data
• Record linkage
• Anomaly detection
• Rolling indexes (time based events)
• Centralized logging
• Auditing
• IoT
logs-2016.11.20 logs-2016.11.21 logs-2016.11.22 logs-2016.11.23logs-2016.11.19
21. Optimal shard size
• Few millions in document size, for search performance
• A bit more if only doing aggregations
• 5-8GB on disk max, for startup times and network
reallocation
• doc_values are enabled by default, turn off for non-aggs fields to
save space
22. Sharding
• Index Shards
• Resharding / auto-sharding not supported
• Index-level sharding
• Avoid using types (deprecated > 6.x)
• Multi-tenancy
• Rollover API (> 5.x)
• Cluster level
• Cluster per project
• Cross-cluster search capability
23. Multitenancy
• Silos – Every tenant get their own index
• Index sizes vary
• Potentially wasting resources
• Pool – All tenants are in one big index
• Sharding isn’t dynamic
• Effects on tf/idf, aggregations, throughput
• Hybrid – Big tenants in their own index, pool(s) for small
ones
24. Use Explicit Mapping
(aka Avoid Schemaless)
• In one of two ways:
• Disable dynamic mapping in settings (index.mapper.dynamic: false). Will
refuse indexing.
• Create catch-all dynamic template with enabled:false mapping
• Why?
• Avoids hundreds of fields by mistake
• Saves effort on indexing and disk space
• Defaults are bad anyhow, don’t rely on them
• Prefer using index templates (especially for rolling indices)
25. Re-balancing is your enemy
• Lock down shard rebalancing
• cluster.routing.rebalance.enable
• none
• cluster.routing.allocation.enable
• primaries
• new_primaries
• none
26. More safe configs
• action.disable_delete_all_indices: true
• action.auto_create_index: false
27. Deep paging (don’t!)
• Don’t from-size
• search_after (> 5.x)
• Scroll and sliced-scroll (> 5.x)
• Not for normal operation
28. Deletions
• Deletions have an overhead
• Slow searches
• Segmentation
• More work on segment merging
• Non-exact tf/idf
• Every document update is a deletion
• No need to avoid it completely, just design accordingly
29. Geographic Distribution
• Never with the same cluster!
• Cross-cluster search (formerly Tribe Node)
• For geographic sharding
• Different indexes in different regions
• xDCR for HA / DR
• Can be solved by infra – replicating queues (Kafka), DBs
• Solution coming in X-Pack
30. Your ingestion architecture?
• Favor external ingestion, relieve Elastic from that responsibility
• Upgrade Logstash to 5.x
• Consider using FileBeat instead of logstash for log-tailing
• Prefer logstash machines over ingest nodes
• Use queues (Kafka, Redis) to protect against surges
32. Protecting your cluster
• Don’t bind to a public IP
• Use only private IP/DNSs, preferably in subnets (e.g. AWS VPC)
• network.host in elasticsearch.yml
• Proxy all client requests to ES
• Disable HTTP where not needed
• + Don’t use default ports
• Secure publicly available client nodes
• Access via VPN only
• At the very least SSL + authentication if VPN not an option
• Disable dynamic scripting (pre-5.x)
33. Securing Indexes and Documents
• Heavy Kibana user?
• Authentication and authorization
• Index, Document and Field level security
• Requires X-Pack Security
• Application level authentication and authorization
• Application filtering of content (fields, documents)
• Index level (e.g. index per tenant)
• Document level (using permissions)
• Inter-node comms, encryption at rest (X-Pack only)
34. Upcoming in ES land
• Elasticsearch 6
• Machine Learning
• Anomaly detection on time series data
• Enterprise Cloud
• Elastic Cloud deployed on-premise
• Any plugin authors in the crowd?
35. Elasticsearch Training
Elasticsearch for Developers &
Maintaining Elasticsearch in Production
• September (10,11,17/9)
• November (12,13,16/11)
http://bdbq.co.il/courses
Consultancy and Development services
http://bdbq.co.il/services/elasticsearch
36. Questions?
@synhershko on social (Twitter, github, …)
Blog at http://code972.com
Training and consultancy at
http://BigDataBoutique.co.il