Enviar búsqueda
Cargar
Elasticsearch Sharding Strategy at Tubular Labs
•
1 recomendación
•
1,370 vistas
Tubular Labs
Seguir
Presentation given by Matthew Delaney at the Elasticsearch meetup on 9/13/16.
Leer menos
Leer más
Software
Denunciar
Compartir
Denunciar
Compartir
1 de 32
Descargar ahora
Descargar para leer sin conexión
Recomendados
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
ITS World Congress 2014 - Performance Evaluation of Transit Data Formats on a...
ITS World Congress 2014 - Performance Evaluation of Transit Data Formats on a...
Sean Barbeau
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
Andrew Trossman
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
Renzo Tomà
Journey and evolution of Presto@Grab
Journey and evolution of Presto@Grab
Shubham Tagra
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
Graylog
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2
Recomendados
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs - Using Elastic to Search Over 2.5B Videos
Tubular Labs
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
ITS World Congress 2014 - Performance Evaluation of Transit Data Formats on a...
ITS World Congress 2014 - Performance Evaluation of Transit Data Formats on a...
Sean Barbeau
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
Andrew Trossman
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
Renzo Tomà
Journey and evolution of Presto@Grab
Journey and evolution of Presto@Grab
Shubham Tagra
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
Graylog
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2Con USA 2015: Deployment Patterns and Capacity Planning
WSO2
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
Renzo Tomà
3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
振东 刘
Presto changes
Presto changes
N Masahiro
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Ceph Community
Data- How Does It Work-
Data- How Does It Work-
Boyang Niu
Scaling Up with PHP and AWS
Scaling Up with PHP and AWS
Heath Dutton ☕
OLAP Architecture
OLAP Architecture
Evgeny Budnik
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphere
Yaroslav Tarasenko
NRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Jose Luis Martínez
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal
Sql Server Best Practices
Sql Server Best Practices
Shubham Sharma
InfluxDB: Upgrade to 0.10 considerations
InfluxDB: Upgrade to 0.10 considerations
Sean Beckett
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
Alluxio, Inc.
Developing Scylla Applications: Practical Tips
Developing Scylla Applications: Practical Tips
ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
kbajda
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
DevOps.com
Fast dataarchitecture
Fast dataarchitecture
Knoldus Inc.
Logs aggregation and analysis
Logs aggregation and analysis
Divante
The Millennial Woman on YouTube Study
The Millennial Woman on YouTube Study
Tubular Labs
Facebook Video: Insights, Trends & Best Practices
Facebook Video: Insights, Trends & Best Practices
Tubular Labs
Más contenido relacionado
La actualidad más candente
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
Renzo Tomà
3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
振东 刘
Presto changes
Presto changes
N Masahiro
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Ceph Community
Data- How Does It Work-
Data- How Does It Work-
Boyang Niu
Scaling Up with PHP and AWS
Scaling Up with PHP and AWS
Heath Dutton ☕
OLAP Architecture
OLAP Architecture
Evgeny Budnik
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphere
Yaroslav Tarasenko
NRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Jose Luis Martínez
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Christophe Marchal
Sql Server Best Practices
Sql Server Best Practices
Shubham Sharma
InfluxDB: Upgrade to 0.10 considerations
InfluxDB: Upgrade to 0.10 considerations
Sean Beckett
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
Alluxio, Inc.
Developing Scylla Applications: Practical Tips
Developing Scylla Applications: Practical Tips
ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
ScyllaDB
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
kbajda
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
DevOps.com
Fast dataarchitecture
Fast dataarchitecture
Knoldus Inc.
Logs aggregation and analysis
Logs aggregation and analysis
Divante
La actualidad más candente
(20)
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
3.1.Performance and BigData Ecosystem
3.1.Performance and BigData Ecosystem
Presto changes
Presto changes
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Data- How Does It Work-
Data- How Does It Work-
Scaling Up with PHP and AWS
Scaling Up with PHP and AWS
OLAP Architecture
OLAP Architecture
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphere
NRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Big data: Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
Sql Server Best Practices
Sql Server Best Practices
InfluxDB: Upgrade to 0.10 considerations
InfluxDB: Upgrade to 0.10 considerations
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
Developing Scylla Applications: Practical Tips
Developing Scylla Applications: Practical Tips
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
Fast dataarchitecture
Fast dataarchitecture
Logs aggregation and analysis
Logs aggregation and analysis
Destacado
The Millennial Woman on YouTube Study
The Millennial Woman on YouTube Study
Tubular Labs
Facebook Video: Insights, Trends & Best Practices
Facebook Video: Insights, Trends & Best Practices
Tubular Labs
Tubular Labs Overview
Tubular Labs Overview
Tubular Labs
The Rise of Multi-Platform Video: Why Brands Need a Multi-Platform Video Stra...
The Rise of Multi-Platform Video: Why Brands Need a Multi-Platform Video Stra...
Ogilvy Consulting
Buying a Website Design? 5 tips to avoid mistakes!
Buying a Website Design? 5 tips to avoid mistakes!
Jean-Christophe Bougle
TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius
Elasticsearch 實戰介紹
Elasticsearch 實戰介紹
Kang-min Liu
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Jurriaan Persyn
Destacado
(8)
The Millennial Woman on YouTube Study
The Millennial Woman on YouTube Study
Facebook Video: Insights, Trends & Best Practices
Facebook Video: Insights, Trends & Best Practices
Tubular Labs Overview
Tubular Labs Overview
The Rise of Multi-Platform Video: Why Brands Need a Multi-Platform Video Stra...
The Rise of Multi-Platform Video: Why Brands Need a Multi-Platform Video Stra...
Buying a Website Design? 5 tips to avoid mistakes!
Buying a Website Design? 5 tips to avoid mistakes!
TrustRadius Conversion Rate Optimization Survey Results 2014
TrustRadius Conversion Rate Optimization Survey Results 2014
Elasticsearch 實戰介紹
Elasticsearch 實戰介紹
An Introduction to Elastic Search.
An Introduction to Elastic Search.
Similar a Elasticsearch Sharding Strategy at Tubular Labs
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Nicolas Poggi
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
Fan Robbin
performance_tuning.pdf
performance_tuning.pdf
Alexadiaz52
performance_tuning.pdf
performance_tuning.pdf
Alexadiaz52
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
Mirantis
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
NoSQL Overview
NoSQL Overview
Tu Hoang
Index Provisioning for ALM Search - My Presentation
Index Provisioning for ALM Search - My Presentation
Sunita Shrivastava
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
hyeongchae lee
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Jonathan Beard
Fastest Servlets in the West
Fastest Servlets in the West
Stuart (Pid) Williams
Observer, a "real life" time series application
Observer, a "real life" time series application
Kévin LOVATO
Presentation cmg2016 capacity management essentials-boston
Presentation cmg2016 capacity management essentials-boston
Mohit Verma
DefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStack
Mark Voelker
Benchmarking Apache Druid
Benchmarking Apache Druid
Imply
Benchmarking Apache Druid
Benchmarking Apache Druid
Matt Sarrel
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)
Mark Voelker
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
Tim Callaghan
Similar a Elasticsearch Sharding Strategy at Tubular Labs
(20)
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Benchmarking Elastic Cloud Big Data Services under SLA Constraints
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
performance_tuning.pdf
performance_tuning.pdf
performance_tuning.pdf
performance_tuning.pdf
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
NoSQL Overview
NoSQL Overview
Index Provisioning for ALM Search - My Presentation
Index Provisioning for ALM Search - My Presentation
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
20141206 4 q14_dataconference_i_am_your_db
20141206 4 q14_dataconference_i_am_your_db
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Fastest Servlets in the West
Fastest Servlets in the West
Observer, a "real life" time series application
Observer, a "real life" time series application
Presentation cmg2016 capacity management essentials-boston
Presentation cmg2016 capacity management essentials-boston
DefCore: The Interoperability Standard for OpenStack
DefCore: The Interoperability Standard for OpenStack
Benchmarking Apache Druid
Benchmarking Apache Druid
Benchmarking Apache Druid
Benchmarking Apache Druid
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)
Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
Último
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
AmarnathKambale
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
Andolasoft Inc
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
kalichargn70th171
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
Define the academic and professional writing..pdf
Define the academic and professional writing..pdf
PearlKirahMaeRagusta1
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
software pro Development
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
Jhone kinadey
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
kalichargn70th171
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
kalichargn70th171
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
OnePlan Solutions
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
ryanfarris8
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
harshavardhanraghave
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Último
(20)
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
Define the academic and professional writing..pdf
Define the academic and professional writing..pdf
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Elasticsearch Sharding Strategy at Tubular Labs
1.
Elasticsearch Sharding Strategy at Tubular
Labs How we arrived at a sharding strategy
2.
Our Elasticsearch Infrastructure?
3.
• 3 clusters
for search/aggregations • 1 small autocomplete cluster • 1 medium sized cluster for internal use • 1 Elastic Stack cluster Our Elasticsearch Clusters © 2016 Tubular Labs 3
4.
• 2.5 billion
documents • 4TB not including replicas • Constant indexing load with periodic spikes • Queries range from simple search request to heavy terms aggregations • Not many concurrent queries, but queries can be demanding • Cluster is very CPU heavy • Recently migrated from Elasticsearch 1.7 to 2.3 Our Largest Cluster © 2016 Tubular Labs 4
5.
• We have
to reindex anyway • Our dataset has grown substantially • Performance wasn’t great • We don’t want to have to reindex in the near future Migrating to 2.x is a good time to reconsider sharding © 2016 Tubular Labs 5
6.
Sharding Strategy
7.
● How many
shards should I have per index? ● How large should my shards be? ● How many shards should I have per node? ● What hardware/instance type should I use? Sharding Questions... © 2016 Tubular Labs 7
8.
• How large
is your dataset? • How fast will your dataset grow? • What kinds of queries are you running? • How fast will usage grow? • When do you want to reindex next? • I’m sure there are more... It Depends... © 2016 Tubular Labs 8
9.
How do we
get answers? © 2016 Tubular Labs 9
10.
Repeatable Elasticsearch Experiments
11.
What We Want •
Repeatable • Others can easily run the same tests and should get about the same results • Easily modified • Easy to define and understand • Easy to run • understandable results Repeatable Elasticsearch Experiments: © 2016 Tubular Labs 11
12.
• Benchmarking framework
for Elasticsearch • Easily define a set of repeatable tests • Tests are defined in JSON • Compare different configurations • Sets up a single node cluster for tests or target existing (external) clusters • Targeting external clusters is not fully supported and you’ll get warnings telling you as much What is Rally? © 2016 Tubular Labs 12
13.
Terms •Track - a
benchmarking scenario •Car - system (Elasticsearch) configuration for a benchmark •Challenge - what benchmarks are run and its configuration •Race - an actual run of the benchmark •Tournaments - A way to analyze the impact of changes What is Rally? © 2016 Tubular Labs 13
14.
Example track config https://gist.github.com/mdelaney/b710fb3d25fabf7818f471bd4abe70a5 How
does Rally work? © 2016 Tubular Labs 14
15.
Our Experiments and
Results
16.
NOTE: The following
experiments are written as we would do them next time. Due to time constraints we had to do some of this in parallel. I’ll also mention where we deviated from what is in the next few slides. • We’re still pretty new at running benchmarks with Elasticsearch so we’re still learning the best way to do this. • Running these tests answered a lot of questions (and raised brand new ones) How we used this at Tubular Labs © 2016 Tubular Labs 16
17.
How big should
my shards be? Determining a good shard size © 2016 Tubular Labs 17
18.
The experiment 1. Obtain
a realistic data set 2. Write the Rally config to: • Index your data (single shard) • Run a set of common queries 3. Run benchmark with different document counts 4. Graph the results Determining a good shard size © 2016 Tubular Labs 18
19.
The queries we
used • Query A and B: • Very similar but aggregate on a slightly different set of terms • Hits about 10% of our dataset • Query C and D: • Same aggregations as queries A and B • Full dataset Determining a good shard size © 2016 Tubular Labs 19
20.
Our results Determining a
good shard size © 2016 Tubular Labs 20
21.
We need to
consider • How fast do you need each query to be? • How much do you expect your data set to grow before you want to look at reindexing again? • Your use case likely will have other concerns as well Determining a good shard size © 2016 Tubular Labs 21
22.
How many shards
per node? Determining how many shards per node © 2016 Tubular Labs 22
23.
The experiment (almost
the same as before) 1. Obtain a dataset of realistic data 2. Write the Rally config to: • Index your data • Run a set of common queries 3. Run benchmark with different shard counts 4. Graph the results Determining how many shards per node © 2016 Tubular Labs 23
24.
What we did
differently this time (time constraints) • Used the Apache HTTP Benchmark Tool with a script to run the queries. • Our production cluster had 26 data nodes with about 200 million documents each • Wanted to avoid expanding the cluster further if at all possible (c3.8xlarge is pricey!) • 10 total shards per node (about 20 million docs/shard) • 16 total shards per node (about 12.5 million docs/shard) • 32 total shards per node (about 6.25 million docs/shard) • Tested on 3 node clusters (2 data nodes, 1 client/master) Determining how many shards per node © 2016 Tubular Labs 24
25.
Our Results -
Testing Number of Shards per node Query response by shard count (C 1) Query response by shard count (C 3) © 2016 Tubular Labs 25
26.
Our Results -
Testing Number of Shards per node Query response production vs test (C 1) Query response production vs test (C 3) © 2016 Tubular Labs 26 Production - 26 data nodes Test Cluster - 2 data nodes
27.
• Significant performance
drop in each level of testing, why? • A single shard on a single node performed much better than our multiple shards per node tests • The fully loaded 3 node cluster performed much better than our full cluster in production • Impact of moving to a machine with more memory • Will the extra file system cache make a large difference? New Questions Raised © 2016 Tubular Labs 27
28.
Query load isn’t
evenly distributed Current path of performance investigation © 2016 Tubular Labs 28 1 4 3* 2* 5* 8* 10 13* 11 6* 2 5 7* 4* 10* 9* 11* 12* 14 15 3 6 1* 9 13 8 12 7 15* 14*
29.
Problems We Encountered
30.
Rally related • Document
count in track.json != the document count Rally checks at the end of indexing with nested documents. • Multi node support not yet available Problems We Encountered? © 2016 Tubular Labs 30
31.
Non Rally related •Performance
in reality wasn’t as good as our testing suggested it should be • We haven’t found the reason for this yet • We’ve noticed a correlation between the number of shards a query hits per node and the time taken to run the query on the shard but have not yet identified the bottleneck. • We were able to mitigate this by adding additional data nodes Problems We Encountered? © 2016 Tubular Labs 31
32.
Thank You! Questions??
Descargar ahora