10. 10
Healthy Advice
• Rename your cluster from “elasticsearch” to something else.
When you end up with two Elasticsearch clusters on your network, you’ll
be glad you did.
• Oops, deleted all the indices again!
Set action.destructive_requires_name=true
• Always use SSDs. This is not optional.
• If you’re seeing this talk, you probably need 10G networking too.
• Use curator. We developed our own version before it was available.
11. 11
ACT 1, SC ENE 1
Sizing up your Elasticsearch
Cluster
12. 12
What resources influence cluster make-up?
• CPU
- Cores > clock speed
• Memory
- Number of documents
- Number of shards
• Disk I/O
- SSD sustained write rates
• Network bandwidth
- 10G mandatory on large installations for fast recovery / relocation
13. 13
What resources influence cluster memory?
• Memory
- Segment memory: ~4b RAM per document = ~4Gb per billion log lines
- Field data memory: Approximately same as segment memory
(less for older, less accessed data)
- Filter cache: ~1/4 to 1/2 of segment memory, depending on searches
- All the rest (at least 50% of system memory) for OS file cache
- You can't have enough memory!
14. 14
What resources influence cluster I/O?
• Disk I/O
- SSD sustained write rates
- Calculate shard recovery speed if one node fails:
- Shard size = (Daily storage / number of shards)
- (Shards per node * shard size) / (disk write speed / shards per node)
• Eg: 30Gb shards, 2 shards per node, 250Mbps write speed:
- (2 * 30Gb) / 125Mbps = 8 minutes
• How long are you comfortable losing resilience?
• How many nodes are you comfortable losing?
• Multiple nodes per server increase recovery time
15. 15
What resources influence cluster networking?
• Network bandwidth
- 10G mandatory on large installations for fast recovery / relocation
- 10 minute recovery vs 50+ minute recovery:
• 1G Bottleneck: Network uplink
• 10G Bottleneck: Disk speed
16. 16
ACT 1, SC ENE 2
Sizing up your Logstash
Cluster
21. 21
• Easy to use
• Data saved to ES
• So many metrics!
• No integration
• Costs $$$
• Time to develop
• Integrates with your
systems
• Re-inventing the wheel
• Free (libre, not gratis)
Roll your ownMarvel
22. 22
Monitoring: Elasticsearch
• Metrics are exposed in several places:
- _cat API
Covers most metrics, human readable
- _stats API, _nodes API
Covers everything, JSON, easy to parse
• Send to Graphite
• Create dashboards
23. 23
Monitoring: Systems
• SSD endurance
• Monitor how often Logstash says the pipeline is blocked
If it happens frequently, find out why (mention the possibilities and that
we’ll cover them later)
24. 24
Monitoring: Systems
• Dynamic disk space thresholds
• ((num_servers - failure_capacity) / num_servers) - 15%
- 100 servers
- Allow up to 6 to fail
- Disk space alert threshold = ((100 - 6) / 100) - 15%
Disk space alert threshold = 79%
• Let your configuration management system tune this up and down for
you, as you add and remove nodes from your cluster.
• The additional 15% is to give you some extra time to order or build more
nodes.
26. 26
Scaling Logstash: What impacts performance?
• Line length
• Grok pattern complexity - regex is slow
• Plugins used
• Garbage collection
- Increase heap size
• Hyperthreading
- Measure, then turn it off
27. 27
Scaling Logstash: Measure Twice
• Writing your logs as JSON has little benefit, unless you do away with
grok, kv, etc. Logstash still has to convert the incoming string to a ruby
hash anyway.
29. 29
Scaling Logstash: Garbage Collection
• Defaults are usually OK
• Make sure you’re graphing GC
• Ruby LOVES to generate objects: monitor your GC as you scale
• Write plugins thoughtfully with GC in mind:
- Bad: 1_000_000.times { "This is a string" }
user system total real
time 0.130000 0.000000 0.130000 ( 0.132482)
- Good: foo = 'This is a string'; 1_000_000.times { foo }
user system total real
time 0.060000 0.000000 0.060000 ( 0.055005)
31. 31
Scaling Logstash: Plugin Performance: Baseline
• How to establish a baseline
• Measure again with some filters
• Measure again with more filters
• Establish the costs of each filter
• Community filters are for the general case
- You should write their own for your specific case
- Easy to do
• Run all benchmarks for at least 5 mins, with a large data set
41. 41
Scaling Logstash: Plugin Performance
• kv is slow, we wrote a `splitkv` plugin for query strings, etc:
kvarray = text.split(@field_split).map { |afield|
pairs = afield.split(@value_split)
if pairs[0].nil? || !(pairs[0] =~ /^[0-9]/).nil? || pairs[1].nil? ||
(pairs[0].length < @min_key_length && !@preserve_keys.include?(pairs[0]))
next
end
if !@trimkey.nil?
# 2 if's are faster (0.26s) than gsub (0.33s)
#pairs[0] = pairs[0].slice(1..-1) if pairs[0].start_with?(@trimkey)
#pairs[0].chop! if pairs[0].end_with?(@trimkey)
# BUT! in-place tr is 6% faster than 2 if's (0.52s vs 0.55s)
pairs[0].tr!(@trimkey, '') if pairs[0].start_with?(@trimkey)
end
if !@trimval.nil?
pairs[1].tr!(@trimval, '') if pairs[1].start_with?(@trimval)
end
pairs
}
kvarray.delete_if { |x| x == nil }
return Hash[kvarray]
43. 43
Scaling Logstash: Elasticsearch Output
• Logstash output settings directly impact CPU on Logstash machines
- Increase flush_size from 500 to 5000, or more.
- Increase idle_flush_time from 1s to 5s
- Increase output workers
- Results vary by log lines - test for yourself:
• Make a change, wait 15 minutes, evaluate
• With the default 500 from logstash, we peaked at 50% CPU on the
logstash cluster, and ~40k log lines/sec. Bumping this to 10k, and
increasing the idle_flush_time from 1s to 5s got us over 150k log lines/
sec at 25% CPU.
52. 52
Scaling Logstash: Testing Configuration Changes
describe package('logstash'),
:if => os[:family] == 'redhat' do
it { should be_installed }
end
describe command('chef-client') do
its(:exit_status) { should eq 0 }
end
describe command('logstash -t -f ls.conf.test') do
its(:exit_status) { should eq 0 }
end
describe command('logstash -f ls.conf.test') do
its(:stdout) { should_not match(/parse_fail/) }
end
describe command('restart logstash') do
its(:exit_status) { should eq 0 }
end
describe command('sleep 15') do
its(:exit_status) { should eq 0 }
end
describe service('logstash'),
:if => os[:family] == 'redhat' do
it { should be_enabled }
it { should be_running }
end
describe port(5555) do
it { should be_listening }
end
56. 56
Scaling Logstash: Summary
• Faster CPUs matter
- CPU cores > CPU clock speed
• Increase pipeline size
• Lots of memory
- 18Gb+ to prevent frequent garbage collection
• Scale horizontally
• Add context to your log lines
• Write your own plugins, share with the world
• Benchmark everything
61. 61
Scaling Elasticsearch: What impacts indexing performance?
• Line length and analysis, default mapping
• doc_values - required, not a magic fix:
- Uses more CPU time
- Uses more disk space, disk I/O at indexing
- Helps blowing out memory.
- If you start using too much memory for fielddata, look at the biggest
memory hogs and move them to doc_values
• Available network bandwidth for recovery
65. 65
Scaling Elasticsearch: Where does memory go?
• Example memory distribution with 32Gb heap:
- Field data: 10%
Filter cache: 10%
Index buffer: 500Mb
- Segment cache (~4 bytes per doc):
How many docs can you store per node?
• 32Gb - ( 32G / 10 ) - ( 32G / 10 ) - 500Mb = ~25Gb for segment cache
• 25Gb / 4b = 6.7bn docs across all shards
• 10bn docs / day, 200 shards = 50m docs/shard
1 daily shard per node: 6.7bn / 50m / 1 = 134 days
5 daily shards per node: 6.7bn / 50m / 5 = 26 days
66. 66
Scaling Elasticsearch: Doc Values
• Doc values help reduce memory
• Doc values cost CPU and storage
- Some fields with doc_values:
1.7G Aug 11 18:42 logstash-2015.08.07/7/index/_1i4v_Lucene410_0.dvd
- All fields with doc_values:
106G Aug 13 20:33 logstash-2015.08.12/38/index/_2a9p_Lucene410_0.dvd
• Don't blindly enable Doc Values for every field
- Find your most frequently used fields, and convert them to Doc Values
- curl -s 'http://localhost:9200/_cat/fielddata?v' | less -S
68. 68
Scaling Elasticsearch: Memory
• Run instances with 128Gb or 256Gb RAM
• Configure RAM for optimal hardware configuration
- Haswell/Skylake Xeon CPUs have 4 memory channels
• Multiple instances of Elasticsearch
- Do you name your instances by hostname?
Give each instance it’s own node.name!
70. 70
Scaling Elasticsearch: CPUs
• CPU intensive activities
- Indexing: analysis, merging, compression
- Searching: computations, decompression
• For write-heavy workloads
- Number of CPU cores impacts number of concurrent index operations
- Choose more cores, over higher clock speed
87. 87
Scaling Elasticsearch: Disk I/O
• Uncommon advice
- Good SSDs are important
Cheap SSDs will make you very, very sad
- Don’t use multiple data paths, use RAID 0 instead
Heavy translog writes to one disk will bottleneck
- If you have heavy merging, but CPU and disk I/O to spare:
Extreme case: increase index.merge.scheduler.max_thread_count
(But try not to…)
88. 88
Scaling Elasticsearch: Disk I/O
• Uncommon advice
- Reduced durability
index.translog.durability: async
Translog fsync() every 5s, may be sufficient with replication
- Cluster recovery eats disk I/O
Be prepared to tune it up and down during recovery, eg:
indices.recovery.max_bytes_per_sec: 300mb
cluster.routing.allocation.cluster_concurrent_rebalance: 24
cluster.routing.allocation.node_concurrent_recoveries: 2
- Any amount of consistent I/O wait indicates a suboptimal state
99. 99
Scaling Elasticsearch: Multi-tiered Storage
• Put your most accessed indices across more servers, with more
memory, and faster CPUs.
• Spec out “cold” storage
- SSDs still necessary! Don't even think about spinning platters
- Cram bigger SSDs per server
• Set index.codec: best_compression
• Move indices, re-optimize
• elasticsearch-curator makes this easy
105. 105
Scaling Elasticsearch: Custom Mapping
• A small help.. Unfortunately the server is maxed out now!
Expect this to normally have a bigger impact :-)
107. 107
Scaling Elasticsearch: Indexing Performance
• Increasing bulk thread pool queue can help under bursty indexing
- Be aware of the consequences, you're hiding a performance problem
• Increase index buffer
• Increase refresh time, from 1s to 5s
• Spread indexing requests to multiple hosts
• Increase output workers until you stop seeing improvements
We use num_cpu/2 with transport protocol
• Increase flush_size until you stop seeing improvements
We use 10,000
• Disk I/O performance
108. 108
Scaling Elasticsearch: Indexing Performance
• Indexing protocols
- HTTP
- Node
- Transport
• Transport still slightly more performant, but HTTP has closed the gap.
• Node is generally not worth it. Longer start up, more resources, more
fragile, more work for the cluster.
109. 109
Scaling Elasticsearch: Indexing Performance
• Custom mapping template
- Default template creates an additional not_analyzed .raw field for
every field.
- Every field is analyzed, which eats CPU
- Extra field eats more disk
- Dynamic fields and Hungarian notation
• Use a custom template which has dynamic fields enabled, but has them
not_analyzed
Ditch .raw fields, unless you really need them
• This change dropped Elasticsearch cluster CPU usage from 28% to 15%
110. 110
Scaling Elasticsearch: Indexing Performance
• Message complexity matters.
Adding new lines which are 20k, compared to the average of 1.5k tanked
indexing rate for all log lines:
114. 114
Scaling Elasticsearch: Indices
• Tune shards per index
- num_shards = (num_nodes - failed_node_limit) / (number_of_replicas + 1)
- With 50 nodes, allowing 4 to fail at any time, and 1x replication:
num_shards = (50 - 4) / (1 + 1) = 23
• If your shards are larger than 25Gb, increase shard count accordingly.
• Tune indices.memory.index_buffer_size
- index_buffer_size = num_active_shards * 500Mb
- “Active shards”: any shard updated in the last 5 minutes
115. 115
Scaling Elasticsearch: Indices
• Tune refresh_interval
- Defaults to 1s - way too frequent!
- Increase to 5s
- Tuning higher may cause more disk thrashing
- Goal: Flushing as much as your disk’s buffer than take
• Example: Samsung SM863 SSDs:
- DRAM buffer: 1Gb
- Flush speed: 500Mb/sec
119. 119
5230 segments
29Gb memory
10.5Tb disk space
124 segments
23Gb memory
10.1Tb disk space
OptimizedUnoptimized
Scaling Elasticsearch: Optimize Indices
120. 120
ruby {
code =>
"event['message'] = event['message'].slice!(0,10240)"
}
ruby {
code =>
"if event['message'].length > 10240; then
event['message'] = event['message'].slice!(0,10240)
end"
}
The Thoughtful WayThe Easy Way