SlideShare a Scribd company logo
1 of 35
Download to read offline
Lessons Learned
While Scaling
Elasticsearch at
$ whoami
{
"name": "Dainius Jocas",
"company": {
"name": "Vinted",
"mission": "Make second-hand the first choice worldwide"
},
"role": "Staff Engineer",
"website": "https://www.jocas.lt",
"twitter": "@dainius_jocas",
"github": "dainiusjocas",
"author_of_oss": ["lucene-grep"]
}
2
Agenda
1. Intro
2. Scale as of January, 2020
3. IDs as keywords
4. Dates
5. function_score
6. Scale as of January, 2021
7. Discussion
3
Intro: Vinted and Elasticsearch
- Vinted is a second-hand clothes marketplace
- Operates in 10+ countries, 10+ languages
- Elasticsearch is in use since 2014
- Elasticsearch 1.4.1
- Today I’ll share lessons learned while scaling Elasticsearch at Vinted
4
Elasticsearch Scale @Vinted as of January 2020 (1)
- Elasticsearch 5.8.6 (~end-of-life at that time)
- 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal)
- ~300K requests per minute (RPM) during peak hours
- ~160M documents
- 84 shards with 4 replicas
- p99 latency during peak hours was ~250 ms
- Slow Log (i.e latency >100ms) queries skyrockets during peak hours
Elasticsearch Scale @Vinted as of January 2020 (2)
- The company see Elasticsearch as a risk(!)
- Back-end developers didn’t want to touch it
- The usage of the Vinted platform was expected to at least double by October
(similar increase in usage of Elasticsearch and more servers is a bad idea)
- Functionality on top of Elasticsearch just accumulated over the years, no
oversight, no clear ownership
- SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that
much when the Elasticsearch cluster is overloaded)
Replayed 10k queries during the “easy Tuesday”
Adventures
IDs as keywords (1)
{
"query": {
"bool": {
"filter": [
{
"terms": {
"country_id": [2,4]
}
}
]
}
}
}
IDs as keywords (2): context
- Elasticsearch indexed data from MySQL (check vinted.engineering blog on
details for that)
- Common practice for Ruby on Rails apps is to create database tables with
primary keys as auto-increment integers
- Q: Which Elasticsearch data type to use?
- A: integer, because why not?
IDs as keywords (3)
IDs as keywords (4): TL;DR of the blog post
- Before: integers were indexed as padded string terms
- After: integers indexed as block k-d tree (BKD)
- Change in Lucene get into Elasticsearch since 5.0
- Numeric data types were optimized for range queries
- Numeric data types still support terms queries
IDs as keywords (5): from Vinted point of view
- On IDs we don’t do range queries
- We use IDs for simple terms filtering
- The “optimized” integer data type for our use case degraded performance
- How much?
- For our workload it was a ~15% instant decrease in p99 latency, ~20ms
- We use around 10 such fields for filtering in every search query
- The required change was as simple as changing the index mappings and
reindexing the data
IDs as keywords (6): summary
- Remember that Vinted uses Elasticsearch since pre 5.0
- At that time it was OK to index IDs as Elasticsearch integers
- Post 5.0 IDs as integers became a performance issue
- Such a change that brakes nothing can easily slip under the regular
developers radar and then could backfire badly
- Regular developers think that Elasticsearch performs badly
Dates
- Date math
- Date filters
Date Math (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"gte": "now-7d"
}
}
}
]
}
}
}
Date Math (2)
- Most queries that use now (see Date Math) cannot be cached
- From developer POV, it is simple, you just hardcode `now-7d`
- If most of your queries are using Date Math then CPU usage increases
- With more traffic the cluster starts to have “hot nodes”, queueing
- My advice: always use timestamps in production
- Cached queries -> massive gains in the cluster throughput
Date filters (1)
{
"query": {
"bool": {
"filter": [
{
"range": {
"created_at": {
"lte": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (2)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is meant to be now and your documents are accumulated over
last 10 year?
- Then this filter matches ~99% of all docs in your indices
- Not a good “filter”
Date filters (3)
{
"query": {
"bool": {
"must_not": [
{
"range": {
"created_at": {
"gt": "2021-05-27T09:55:00Z"
}
}
}
]
}
}
}
Date filters (4)
- This query clause asks Elasticsearch to “collect docs that are not newer than
X”
- What if the X is now and your documents are accumulated over 10 year?
- Then this filter matches ~1% of all docs in your indices
- A good “filter”, i.e. more specific filter is a good filter
- From docs: “if you must filter by timestamp, use a coarse granularity (e.g.
round timestamp to 5 minutes) so the query value changes infrequently”
- For our setup it reduced the p99 by ~15%, ~10ms
Dates: summary
- Don’t use Date Math in production
- Write filters on timestamp in a way that it matches fewer documents
Function score (1)
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"params": {
"now": 1579605387087
},
"source": "now > doc['user_updated_at'].value ? 1 /
ln(now - doc['user_updated_at'].value + 1) :
0",
"lang": "expression"
}
}
}
]
}
}
}
Function score (2)
- Models boosting on novelty
- Multiplicates _score
- Why a custom script score? Why not some decay function?
- Decay functions performed worse in our benchmarks
Function score (3): problems
- This clause was the biggest contributor to the query latency
- The script is executed on every hit
- Problematic for queries that have many hits, i.e. > 10k
- Sharding helped keep the latency down (remember we had 84 shards for
160M documents)
- But at some point sharding turns into oversharding and no longer helps
- The task was to come up with a way to replace the function_score with a
more performant solution while preserving the ranking logic
Function score (4): distance_feature
{
"query": {
"bool": {
"should": [
{
"distance_feature": {
"field": "user_updated_at",
"origin": "2021-05-27T10:59:00Z",
"pivot": "7d"
}
}
]
}
}
}
Function score (5): distance_feature
- A similar boost on novelty
- Multiplicative boost turned into additive boost (big change in logic)
- Unlike the function_score query or other ways to change relevance scores,
the distance_feature query efficiently skips non-competitive hits
(remember those 10 year old documents)
- NOTE: the origin value is not set to `now` because of the Date Math!
Function score (5): summary
- Function score provides a flexible scoring options
- However, when applying on big indices you must be carefull
- Function score is applied on all hits
- No way for Elasticsearch to skip hits
- Caching is the single most important thing for a good Elasticsearch performance and the
function score query doesn’t play well with it
- Distance feature query clause might be an option to replace the function
score if you have issues with performance
Elasticsearch Scale @Vinted as of January 2021 (1)
- ES 7.9.3
- 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal)
- A offline cluster of similar size for testing (upgrades, cluster setup, etc.)
- ~1000K RPM during peak hours
- ~360M documents
- p99 latency during peaks ~150 ms
- Timeouts (>500ms) are 0.0367% of all queries
Elasticsearch Scale @Vinted as of January 2021 (2)
- Team (8 people strong) is responsible for Elasticsearch
- Regular capacity testing for 2x load in terms of
- document count
- query throughput
- Elasticsearch is seen as the system that can handle the growth
- Functionality is tested performance wise before releasing to production
- The team members rotate on duty
- Keeping clusters operational
- Maintenance tasks from backlog
- Is everything perfect? No.
- Elasticsearch is resource hungry
- Version upgrades still has to be checked before releasing
- Offline testing cluster helps with that
- Machine Learning engineers insist that Elasticsearch is not up for their tasks
- Despite the fact that the search ranking data is used for their model training
- Search re-ranking is done outside of Elasticsearch (e.g. operational complexity)
- Elasticsearch default installation offers very few tools for search relevance
work
Discussion (1)
Discussion (2)
Thank You!

More Related Content

What's hot

Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
Databricks
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
Chester Chen
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
Nikolay Samokhvalov
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 

Similar to Lessons Learned While Scaling Elasticsearch at Vinted (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
ElasticSearch.pptx
ElasticSearch.pptxElasticSearch.pptx
ElasticSearch.pptx
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseThe Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Lessons Learned While Scaling Elasticsearch at Vinted

  • 2. $ whoami { "name": "Dainius Jocas", "company": { "name": "Vinted", "mission": "Make second-hand the first choice worldwide" }, "role": "Staff Engineer", "website": "https://www.jocas.lt", "twitter": "@dainius_jocas", "github": "dainiusjocas", "author_of_oss": ["lucene-grep"] } 2
  • 3. Agenda 1. Intro 2. Scale as of January, 2020 3. IDs as keywords 4. Dates 5. function_score 6. Scale as of January, 2021 7. Discussion 3
  • 4. Intro: Vinted and Elasticsearch - Vinted is a second-hand clothes marketplace - Operates in 10+ countries, 10+ languages - Elasticsearch is in use since 2014 - Elasticsearch 1.4.1 - Today I’ll share lessons learned while scaling Elasticsearch at Vinted 4
  • 5. Elasticsearch Scale @Vinted as of January 2020 (1) - Elasticsearch 5.8.6 (~end-of-life at that time) - 1 cluster of ~420 data nodes (each 14 CPU with 64GB RAM, bare metal) - ~300K requests per minute (RPM) during peak hours - ~160M documents - 84 shards with 4 replicas - p99 latency during peak hours was ~250 ms - Slow Log (i.e latency >100ms) queries skyrockets during peak hours
  • 6. Elasticsearch Scale @Vinted as of January 2020 (2) - The company see Elasticsearch as a risk(!) - Back-end developers didn’t want to touch it - The usage of the Vinted platform was expected to at least double by October (similar increase in usage of Elasticsearch and more servers is a bad idea) - Functionality on top of Elasticsearch just accumulated over the years, no oversight, no clear ownership - SRE were on the Elasticsearch duty (hint: server restart doesn’t help all that much when the Elasticsearch cluster is overloaded)
  • 7. Replayed 10k queries during the “easy Tuesday”
  • 9. IDs as keywords (1) { "query": { "bool": { "filter": [ { "terms": { "country_id": [2,4] } } ] } } }
  • 10. IDs as keywords (2): context - Elasticsearch indexed data from MySQL (check vinted.engineering blog on details for that) - Common practice for Ruby on Rails apps is to create database tables with primary keys as auto-increment integers - Q: Which Elasticsearch data type to use? - A: integer, because why not?
  • 12. IDs as keywords (4): TL;DR of the blog post - Before: integers were indexed as padded string terms - After: integers indexed as block k-d tree (BKD) - Change in Lucene get into Elasticsearch since 5.0 - Numeric data types were optimized for range queries - Numeric data types still support terms queries
  • 13. IDs as keywords (5): from Vinted point of view - On IDs we don’t do range queries - We use IDs for simple terms filtering - The “optimized” integer data type for our use case degraded performance - How much? - For our workload it was a ~15% instant decrease in p99 latency, ~20ms - We use around 10 such fields for filtering in every search query - The required change was as simple as changing the index mappings and reindexing the data
  • 14. IDs as keywords (6): summary - Remember that Vinted uses Elasticsearch since pre 5.0 - At that time it was OK to index IDs as Elasticsearch integers - Post 5.0 IDs as integers became a performance issue - Such a change that brakes nothing can easily slip under the regular developers radar and then could backfire badly - Regular developers think that Elasticsearch performs badly
  • 15. Dates - Date math - Date filters
  • 16. Date Math (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "gte": "now-7d" } } } ] } } }
  • 17. Date Math (2) - Most queries that use now (see Date Math) cannot be cached - From developer POV, it is simple, you just hardcode `now-7d` - If most of your queries are using Date Math then CPU usage increases - With more traffic the cluster starts to have “hot nodes”, queueing - My advice: always use timestamps in production - Cached queries -> massive gains in the cluster throughput
  • 18. Date filters (1) { "query": { "bool": { "filter": [ { "range": { "created_at": { "lte": "2021-05-27T09:55:00Z" } } } ] } } }
  • 19. Date filters (2) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is meant to be now and your documents are accumulated over last 10 year? - Then this filter matches ~99% of all docs in your indices - Not a good “filter”
  • 20. Date filters (3) { "query": { "bool": { "must_not": [ { "range": { "created_at": { "gt": "2021-05-27T09:55:00Z" } } } ] } } }
  • 21. Date filters (4) - This query clause asks Elasticsearch to “collect docs that are not newer than X” - What if the X is now and your documents are accumulated over 10 year? - Then this filter matches ~1% of all docs in your indices - A good “filter”, i.e. more specific filter is a good filter - From docs: “if you must filter by timestamp, use a coarse granularity (e.g. round timestamp to 5 minutes) so the query value changes infrequently” - For our setup it reduced the p99 by ~15%, ~10ms
  • 22.
  • 23.
  • 24. Dates: summary - Don’t use Date Math in production - Write filters on timestamp in a way that it matches fewer documents
  • 25. Function score (1) { "query": { "function_score": { "functions": [ { "script_score": { "script": { "params": { "now": 1579605387087 }, "source": "now > doc['user_updated_at'].value ? 1 / ln(now - doc['user_updated_at'].value + 1) : 0", "lang": "expression" } } } ] } } }
  • 26. Function score (2) - Models boosting on novelty - Multiplicates _score - Why a custom script score? Why not some decay function? - Decay functions performed worse in our benchmarks
  • 27. Function score (3): problems - This clause was the biggest contributor to the query latency - The script is executed on every hit - Problematic for queries that have many hits, i.e. > 10k - Sharding helped keep the latency down (remember we had 84 shards for 160M documents) - But at some point sharding turns into oversharding and no longer helps - The task was to come up with a way to replace the function_score with a more performant solution while preserving the ranking logic
  • 28. Function score (4): distance_feature { "query": { "bool": { "should": [ { "distance_feature": { "field": "user_updated_at", "origin": "2021-05-27T10:59:00Z", "pivot": "7d" } } ] } } }
  • 29. Function score (5): distance_feature - A similar boost on novelty - Multiplicative boost turned into additive boost (big change in logic) - Unlike the function_score query or other ways to change relevance scores, the distance_feature query efficiently skips non-competitive hits (remember those 10 year old documents) - NOTE: the origin value is not set to `now` because of the Date Math!
  • 30. Function score (5): summary - Function score provides a flexible scoring options - However, when applying on big indices you must be carefull - Function score is applied on all hits - No way for Elasticsearch to skip hits - Caching is the single most important thing for a good Elasticsearch performance and the function score query doesn’t play well with it - Distance feature query clause might be an option to replace the function score if you have issues with performance
  • 31. Elasticsearch Scale @Vinted as of January 2021 (1) - ES 7.9.3 - 3 clusters each ~160 data nodes (each 16 CPU with 64GB RAM, bare metal) - A offline cluster of similar size for testing (upgrades, cluster setup, etc.) - ~1000K RPM during peak hours - ~360M documents - p99 latency during peaks ~150 ms - Timeouts (>500ms) are 0.0367% of all queries
  • 32. Elasticsearch Scale @Vinted as of January 2021 (2) - Team (8 people strong) is responsible for Elasticsearch - Regular capacity testing for 2x load in terms of - document count - query throughput - Elasticsearch is seen as the system that can handle the growth - Functionality is tested performance wise before releasing to production - The team members rotate on duty - Keeping clusters operational - Maintenance tasks from backlog
  • 33. - Is everything perfect? No. - Elasticsearch is resource hungry - Version upgrades still has to be checked before releasing - Offline testing cluster helps with that - Machine Learning engineers insist that Elasticsearch is not up for their tasks - Despite the fact that the search ranking data is used for their model training - Search re-ranking is done outside of Elasticsearch (e.g. operational complexity) - Elasticsearch default installation offers very few tools for search relevance work Discussion (1)