Elasticsearch Sharding Strategy at Tubular Labs

Elasticsearch
Sharding Strategy at
Tubular Labs
How we arrived at a sharding strategy

Our Elasticsearch Infrastructure?

• 3 clusters for search/aggregations
• 1 small autocomplete cluster
• 1 medium sized cluster for internal use
• 1 Elastic Stack cluster
Our Elasticsearch Clusters
© 2016 Tubular Labs
3

• 2.5 billion documents
• 4TB not including replicas
• Constant indexing load with periodic spikes
• Queries range from simple search request to heavy terms aggregations
• Not many concurrent queries, but queries can be demanding
• Cluster is very CPU heavy
• Recently migrated from Elasticsearch 1.7 to 2.3
Our Largest Cluster
4

• We have to reindex anyway
• Our dataset has grown substantially
• Performance wasn’t great
• We don’t want to have to reindex in the near future
Migrating to 2.x is a good time to reconsider sharding
5

● How many shards should I have per index?
● How large should my shards be?
● How many shards should I have per node?
● What hardware/instance type should I use?
Sharding Questions...
7

• How large is your dataset?
• How fast will your dataset grow?
• What kinds of queries are you running?
• How fast will usage grow?
• When do you want to reindex next?
• I’m sure there are more...
It Depends...
8

How do we get answers?
9

Repeatable Elasticsearch Experiments

What We Want
• Repeatable
• Others can easily run the same tests and should get about the same results
• Easily modified
• Easy to define and understand
• Easy to run
• understandable results
Repeatable Elasticsearch Experiments:
11

• Benchmarking framework for Elasticsearch
• Easily define a set of repeatable tests
• Tests are defined in JSON
• Compare different configurations
• Sets up a single node cluster for tests or
target existing (external) clusters
• Targeting external clusters is not fully supported
and you’ll get warnings telling you as much
What is Rally?
12

Terms
•Track - a benchmarking scenario
•Car - system (Elasticsearch) configuration for a
benchmark
•Challenge - what benchmarks are run and its
configuration
•Race - an actual run of the benchmark
•Tournaments - A way to analyze the impact of
changes
What is Rally?
13

Example track config
https://gist.github.com/mdelaney/b710fb3d25fabf7818f471bd4abe70a5
How does Rally work?
14

NOTE: The following experiments are written as we would do them next time. Due to time
constraints we had to do some of this in parallel. I’ll also mention where we deviated from
what is in the next few slides.
• We’re still pretty new at running benchmarks with Elasticsearch so we’re still learning the
best way to do this.
• Running these tests answered a lot of questions (and raised brand new ones)
How we used this at Tubular Labs
16

How big should my shards be?
Determining a good shard size
17

The experiment
1. Obtain a realistic data set
2. Write the Rally config to:
• Index your data (single shard)
• Run a set of common queries
3. Run benchmark with different document counts
4. Graph the results
18

The queries we used
• Query A and B:
• Very similar but aggregate on a slightly different set of terms
• Hits about 10% of our dataset
• Query C and D:
• Same aggregations as queries A and B
• Full dataset
19

Our results
20

We need to consider
• How fast do you need each query to be?
• How much do you expect your data set to grow before you want to look at reindexing
again?
• Your use case likely will have other concerns as well
21

How many shards per node?
Determining how many shards per node
22

The experiment (almost the same as before)
1. Obtain a dataset of realistic data
2. Write the Rally config to:
• Index your data
• Run a set of common queries
3. Run benchmark with different shard counts
4. Graph the results
23

What we did differently this time (time constraints)
• Used the Apache HTTP Benchmark Tool with a script to run the queries.
• Our production cluster had 26 data nodes with about 200 million documents each
• Wanted to avoid expanding the cluster further if at all possible (c3.8xlarge is pricey!)
• 10 total shards per node (about 20 million docs/shard)
• 16 total shards per node (about 12.5 million docs/shard)
• 32 total shards per node (about 6.25 million docs/shard)
• Tested on 3 node clusters (2 data nodes, 1 client/master)
24

Our Results - Testing Number of Shards per node
Query response by shard count (C 1) Query response by shard count (C 3)
25

Our Results - Testing Number of Shards per node
Query response production vs test (C 1) Query response production vs test (C 3)
26
Production - 26 data nodes
Test Cluster - 2 data nodes

• Significant performance drop in each level of testing, why?
• A single shard on a single node performed much better than our
multiple shards per node tests
• The fully loaded 3 node cluster performed much better than our full
cluster in production
• Impact of moving to a machine with more memory
• Will the extra file system cache make a large difference?
New Questions Raised
27

Query load isn’t evenly distributed
Current path of performance investigation
28
1 4
3* 2*
5* 8*
10 13*
11 6*
2 5
7* 4*
10* 9*
11* 12*
14 15
3 6
1* 9
13 8
12 7
15* 14*

Rally related
• Document count in track.json != the
document count Rally checks at the end
of indexing with nested documents.
• Multi node support not yet available
Problems We Encountered?
30

Non Rally related
•Performance in reality wasn’t as good as our testing suggested it should be
• We haven’t found the reason for this yet
• We’ve noticed a correlation between the number of shards a query hits per node and the time taken to run the
query on the shard but have not yet identified the bottleneck.
• We were able to mitigate this by adding additional data nodes
Problems We Encountered?
31

Elasticsearch Sharding Strategy at Tubular Labs

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Elasticsearch Sharding Strategy at Tubular Labs

Similar a Elasticsearch Sharding Strategy at Tubular Labs (20)

Último

Último (20)

Elasticsearch Sharding Strategy at Tubular Labs