SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
Maintaining a high load Python project
for newcomers
Viacheslav Kakovskyi
PyCon Ukraine 2016
Me!
@kakovskyi
Python Developer at SoftServe
Contributor of Atlassian HipChat — Python 2, Twisted
Maintainer of KPIdata — Python 3, asyncio
2
Agenda
● What project is `high load`?
● High loaded projects from my experience
● Case study: show last 5 feedbacks for a university course
● Developer's checklist
● Tools that help to make customers happy
● Summary
● Further reading
3
What project is `high load`?
4
What project is `high load`?
● 2+ nodes?
● 10 000 connections?
● 200 000 RPS?
● 1 000 000 daily active users?
● monitoring?
● scalability?
● continuous deployment?
● disaster recovery?
● sharding?
● clustering?
● ???
5
What project is `high load`?
a project where an inefficient solution or a tiny bug
has a huge impact on your business
(due to a lack of resources)→
→ causes an increase of costs $$$ or loss of reputation
(due to performance degradation)
6
High loaded Python projects from my experience
● Instant messenger:
○ 100 000+ connected users
○ 100+ nodes
○ 100+ developers
● Embedded system for traffic analysis:
○ scaling and upgrade options are unavailable
7
Some examples of issues from my experience
● usage of a less-efficient library: json VS ujson
● usage of a more complex serialization format: XML vs JSON
● usage of a wrong data format for a certain case: JPEG vs BMP
● usage of a wrong protocol: TCP vs UDP
● usage of legacy code without understanding how it works under the hood: 100
PostgreSQL queries instead of 1
● spawning a lot of objects that aren't destroyed by garbage collector
● ...
● deployment of a new feature which does not fit well with the load
on your production environment
8
Terms
9
● Elasticsearch - a search server that provides a full-text search engine
● Redis - an in-memory data structure server
● Capacity planning - a process aimed to determine an amount of resources
that will be needed over some future period of time
● StatsD - a daemon for stats aggregation
● Feature flag - an ability to turn on/off some functionality of an application
without deployment
Case study
Let's imagine some application for assessing the quality of higher education
● A university has faculties
● A faculty has departments
● A department has directions
● A direction has groups
● A group has students
● A student learns courses
10
Case study
● A student leaves feedback about courses
● Feedbacks are stored in Elasticsearch for full-text search
A feedback looks like this:
Introduction to Software Engineering. Faculty of Applied Math
Good for ones who don't have any previous experience with programming and
algorithms. Optional for prepared folks. They should request additional tasks to
stay in a good shape.
11
Case study: show recent 5 feedbacks for the course
12
INTRODUCTION TO SOFTWARE ENGINEERING
100500
Recent feedbacks
Software engineering is about teams and it is about quality. The problems to
solve are so complex or large, that a single developer cannot solve them
anymore.
See https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering
Faculties
Case study: obvious solution
Request the last 5 feedbacks directly from Elasticsearch
13
from elasticsearch import Elasticsearch
es = Elasticsearch()
def fetch_feedback(es, course_id, amount):
query = _build_es_filter_query(doc_type='course',
id=course_id,
amount=amount)
# blocking call to Elasticsearch
entries = es.search(index='kpi', body=query)
result = _validate_and_adapt(entries)
return result
Case study
OK, just implement the solution, test on
staging, and deploy to production.
14
15
WHERE YOU LIVE
YOUR OPS KNOWS
16
EsRejectedExecutionException
[rejected execution (queue capacity 1000)
on org.elasticsearch.search.action.SearchServiceTransportAction]
Case study: optimization
Hypotheses:
● configure Elasticsearch properly for the case
● cache responses from Elasticsearch for some time
● use double writes:
○ write a feedback to Elasticsearch and Redis queue with a limited size
○ fetch from Redis at first
17
Case study: prerequisites from our domain*
● up to 1000 characters allowed for a feedback
● 50 000 feedbacks expected just for Kyiv Polytechnic Institute every year
● 300 000+ applicants in 2016
● 100+ universities in Ukraine if we decide to scale
18
*it's just an assumption for the example case study
Case study: let's measure the current load on production
Operations:
● add a feedback
● retrieve last 5 feedbacks
● find a feedback by a phrase
19
Case study: let's measure the current load on production
Application metrics:
● add a feedback
○ stats.count.feedback.course.added.es
○ stats.timing.feedback.course.added.es
● retrieve latest 5 feedbacks
○ stats.count.feedback.course.fetched.es
○ stats.timing.feedback.course.fetched.es
● find a feedback by a phrase
○ stats.count.feedback.course.found.es
○ stats.timing.feedback.course.found.es
20
Case study: how to add a metric to your code
21
from elasticsearch import Elasticsearch
from statsd import StatsClient
es = Elasticsearch()
statsd = StatsClient()
def fetch_feedback(statsd, es, course_id, amount):
statsd.incr('feedback.course.fetched.es')
# don't perform anything query, just collect stats
return
query = _build_es_filter_query(doc_type='course', id=course_id,
amount=amount)
with statsd.timer('feedback.course.fetched.es'):
# blocking call to Elasticsearch
entries = es.search(index='kpi', body=query)
result = _validate_and_adapt(entries)
return result
Case study: how to add a metric to your code
22
def write_feedback_to_elasticsearch(statsd, es, course_id, doc):
statsd.incr('feedback.course.added.es')
with statsd.timer('feedback.course.added.es'):
# blocking call to Elasticsearch
result = es.index(index='kpi', doc_type='course',
id=course_id, body=doc)
def find_feedback(statsd, es, phrase, course_id=None)
statsd.incr('feedback.course.found.es')
query = _build_es_search_query(doc_type='course',
id=course_id, phrase=phrase)
with statsd.timer('feedback.course.found.es'):
# blocking call to Elasticsearch
entries = es.search(index='kpi', body=query)
result = _validate_and_adapt(entries)
return result
Visualize metrics: RPS, feature-related operations
23
Add feedback Find feedback Fetch feedback
Visualize metrics: Course feedback request performance
24
Add feedback Find feedback Fetch feedback
Case study: visualize collected metrics
Outcomes:
● we know frequency of operations
● we know timing of operations
● we know what to optimize
● we can perform a capacity planning for a new flow
25
Optimization: double writes
● continue using Elasticsearch as a storage for feedbacks
● duplicate writing of a feedback to Elasticsearch and Redis
● store last 5 feedbacks in Redis for faster retrieval
● use Elasticsearch for custom queries and full-text search
26
Optimization
27
from elasticsearch import Elasticsearch
es = Elasticsearch()
def fetch_feedback(es, redis, course_id, amount):
result = None
if amount <= REDIS_FEEDBACK_QUEUE_SIZE: # REDIS_FEEDBACK_QUEUE_SIZE = 5
result = _fetch_feedback_from_redis(redis, course_id, amount)
if not result:
result = _fetch_feedback_from_elasticsearch(es, course_id, amount)
return result
Optimization
28
def _fetch_feedback_from_elasticsearch (es, course_id, amount):
query = _build_es_filter_query(doc_type ='course', id=course_id,
amount =amount)
# blocking call to Elasticsearch
entries = es.search(index='kpi', body=query)
result = _validate_and_adapt(entries)
return result
def _fetch_feedback_from_redis (redis, course_id, amount):
queue = redis.get_queue(entity ='course', id=course_id)
# blocking call to Redis
result = queue.get(amount)
return result
Optimization
29
def add_feedback(es, redis, course_id, doc):
_write_feedback_to_redis(redis, course_id, doc)
_write_feedback_to_elasticsearch(es, course_id, doc)
def _write_feedback_to_elasticsearch (es, course_id, doc):
# blocking call to Elasticsearch
result = es.index(index='kpi', doc_type='course', id=course_id,
body =doc)
def _write_feedback_to_redis (statsd, redis, course_id, doc):
queue = redis.get_queue(entity ='course', id=course_id)
# blocking call to Redis
queue.push(doc)
Optimization: potential impact on production
● Increased:
○ Insert feedback time
○ Redis capacity
○ Network traffic for Redis
● Reduced:
○ Fetch feedback time
○ Elasticsearch capacity
○ Network traffic for Elasticsearch
30
31
MEASURE ALL THE THINGS
Measure: timing of insert and fetch operations
32
def _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount):
statsd.incr('feedback.course.fetched.es')
query = _build_es_filter_query(doc_type='course', id=course_id,
amount=amount)
with statsd.timer('feedback.course.fetched.es'):
# blocking call to Elasticsearch
entries = es.search(index='kpi', body=query)
result = _validate_and_adapt(entries)
return result
def _fetch_feedback_from_redis(statsd, redis, course_id, amount):
statsd.incr('feedback.course.fetched.redis')
queue = redis.get_queue(entity='course', id=course_id)
with statsd.timer('feedback.course.fetched.redis'):
# blocking call to Redis
result = queue.get(amount)
return result
Measure: timing of insert and fetch operations
33
def _write_feedback_to_elasticsearch (statsd, es, course_id, doc):
statsd.incr('feedback.course.added.es' )
with statsd.timer('feedback.course.added.es' ):
# blocking call to Elasticsearch
result = es.index(index='kpi', doc_type='course',
id =course_id, body =doc)
def _write_feedback_to_redis (statsd, redis, course_id, doc):
statsd.incr('feedback.course.added.redis' )
queue = redis.get_queue(entity ='course', id=course_id)
with statsd.timer('feedback.course.added.redis' ):
# blocking call to Redis
queue.push(doc)
Measure: Redis capacity
● A feedback - up to 1000 characters
● Redis is used for storing 5 feedbacks per course
● 10 000 courses for Kyiv Polytechnic Institute
● Key: feedback:course:<course_id>
● Data structure: List
● Commands:
○ LPUSH - O(1)
○ LRANGE - O(S+N), S=0, N=5
○ LTRIM - O(N)
34
Measure: Redis capacity
● Don't trust benchmarks from the internet
● Run a benchmark for a production-like environment with your sample data
● Example:
○ FLUSHALL
○ define a sample feedback (string up to 1000 characters)
○ create N=10 000 lists with M=5 sample feedbacks
○ measure allocated memory
● You can run an approximated benchmark and calculate expected memory
size
35
Measure: Redis capacity
● 76.3 MB for 10000 courses, Kyiv Polytechnic Institute
● 7GB for 100 Ukrainian universities
36
Measure: Network traffic for Redis
● Measure network traffic for send/receive operations:
○ add_feedback → LPUSH
○ fetch_feedback → LRANGE
● Revise Redis protocol (RESP)
● Calculate expected sent/received data for new Redis
operations:
○ How much data sent for LPUSH
○ How much data received for LRANGE
37
Measure: Network traffic for Redis
from aioredis.util import encode_command
add_feedback = len(encode_command(b'LPUSH
feedback:course:100500
"MY_AWESOME_FEEDBACK"'))
https://github.com/aio-libs/aioredis/blob/master/aioredis/util.py
38
Measure: Network traffic for Redis *
● MAX add_feedback_traffic = 1.5 Mbps
● AVG add_feedback_traffic = 0.8 Mbps
● MAX fetch_feedback_traffic = 30 Mbps
● AVG fetch_feedback_traffic = 10 Mbps
* This step is optional and depends on your architecture
(optional)
39
Summary of the investigation around double writes
● 90% of fetch feedback requests could be processed by
Redis
● Initial issue when Elasticsearch is out of queue capacity
should be avoided
40
Summary of the investigation
● Fetch feedback time is reduced
■ 2 ms per fetch for 90% of cases
● Increased:
○ Insert feedback time
■ 16 ms per insert
○ Redis capacity
■ 76.3 MB for 10000 courses, Kyiv Polytechnic Institute
■ 7GB for 100 Ukrainian universities
○ Network traffic for Redis
■ 11 Mbps 41
Making a decision
42
● Implement a prototype
● Discuss collected stats with Ops
● And with Business guys
● Implement the solution
● Deploy under a feature flag
Adding a feature flag
43
from feature import Feature
feature = Feature()
def fetch_feedback(feature, statsd, es, redis, course_id, amount):
result = None
if feature.is_enabled('fetch_feedback_from_redis')
and amount <= REDIS_FEEDBACK_QUEUE_SIZE: # 5 feedbacks in queue
fetched_from_redis = True
result = _fetch_feedback_from_redis(statsd, redis, course_id, amount)
if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result:
result = _fetch_feedback_from_elasticsearch(statsd, es, course_id,
amount)
return result
Rolling the feature only for a subset of users
44
Rolling the feature only for a subset of users
45
RPS. Feature "Fetch last 5 feedbacks about a course".
Rolled out for 1% of users.
46
Fetch from Elasticsearch Fetch from Redis
Incremental rollout prevented the incident
EsRejectedExecutionException[rejected execution (queue capacity 1000)
on org.elasticsearch.search.action.SearchServiceTransportAction]
47
Investigation
48
● Disable the feature
● Run investigation
○ Only recent feedbacks are retrieved from Redis
○ Legacy feedbacks are fetched directly from Elasticsearch
● Solution
○ Write legacy feedbacks to Redis using a background job
Fixing missed data in Redis
49
def fetch_feedback(feature, statsd, es, redis, course_id, amount):
fetched_from_redis, result = False, None
if feature.is_enabled('fetch_feedback_from_redis')
and amount <= REDIS_FEEDBACK_QUEUE_SIZE:
fetched_from_redis = True
result = _fetch_feedback_from_redis(statsd, redis, course_id, amount)
if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result:
result = _fetch_feedback_from_elasticsearch(statsd, es, course_id,
amount)
if fetched_from_redis: # redis was empty for the course
fill_redis(redis, result, amount=REDIS_FEEDBACK_QUEUE_SIZE)
return result
RPS. Feature "Fetch last 5 feedbacks about a course".
Fixed and rolled out for 1% of users.
50
Fetch from Elasticsearch Fetch from Redis
RPS. Feature "Fetch last 5 feedbacks about a course".
Fixed and rolled out for 100% of users.
51Fetch from Elasticsearch Fetch from Redis
Feature has been deployed for 100% users
52
Developer's checklist for adding a feature to a high loaded project
● discover which services are hit by the feature
○ database
○ cache
○ storage
○ whatever
● measure the impact of the feature on the existing environment
○ call frequency
○ amount of memory
○ traffic
○ latency 53
Developer's checklist for adding a feature to a high loaded project (2)
● calculate allowed load for the feature
○ requests per second for the existing environment
○ a timing of request processing
● calculate the additional load for the feature
○ latency for additional requests
○ how to deal with a lack of resources
54
Developer's checklist for adding a feature to a high loaded project (3)
● discuss the acceptability of the solution
○ with peers
○ with Ops
○ with business owners
● consider alternatives if needed
● perform load testing on staging
● rollout the feature to production incrementally
55
Tools that help to make customers happy
● profiling:
○ cProfile
○ kcachegrind
○ memory_profiler
○ guppy
○ objgraph
56
Tools that help to make customers happy (2)
● metrics
○ StatsD
● graphs and dashboards
○ Graphana
○ Graphite
● logging
○ Elasticsearch
○ Logstash
○ Kibana
57
Tools that help to make customers happy (3)
● feature flags:
○ Gargoyle and Gutter from Disqus
○ Flask-FeatureFlags
○ Switchboard
● alerting:
○ elastalert
○ monit
○ graphite-beacon
○ cabot
58
Summary
59
Summary
● Be careful with calls to external services
● Collect metrics about state of your production environment
● Perform a capacity planning for "serious" changes
● Use application metrics and measure potential load
● Roll out new code incrementally with feature flags
● Set proper monitoring, it can prevent majority of incidents
● Use the tools, it's really easy
● Be ready to rollback fast
60
To be continued
● asynchronous programming
● infrastructure as a service
● testing
● monitoring and alerting
● dealing with bursty traffic
● OS and hardware metrics
● scaling
● distributed applications
● continuous integration
61
Further reading
● How HipChat Stores and Indexes Billions of Messages Using
● Continuous Deployment at Instagram
● How Twitter Uses Redis To Scale
● Why Leading Companies Dark Launch - LaunchDarkly Blog
● Lessons Learned From A Year Of Elasticsearch ... - Tech blog
● Notes on Redis Memory Usage
● Using New Relic to Understand Redis Performance: The 7 Key Metrics
● A guide to analyzing Python performance
62
Questions?
63
Viacheslav Kakovskyi
viach.kakovskyi@gmail.com
@kakovskyi

Más contenido relacionado

La actualidad más candente

Assignment of pseudo code
Assignment of pseudo codeAssignment of pseudo code
Assignment of pseudo codeBurhan Chaudhry
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applicationsaccount inactive
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Flink Forward
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and deletePlatonov Sergey
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Robert Schadek
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Taiwan User Group
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsFlink Forward
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
 
Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Sumant Tambe
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)Manohar Kuse
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortStefan Marr
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and IterationsSameer Wadkar
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 

La actualidad más candente (20)

Assignment of pseudo code
Assignment of pseudo codeAssignment of pseudo code
Assignment of pseudo code
 
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling TradeNANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling Trade
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and delete
 
Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...Asynchronous single page applications without a line of HTML or Javascript, o...
Asynchronous single page applications without a line of HTML or Javascript, o...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 
Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Building High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low EffortBuilding High-Performance Language Implementations With Low Effort
Building High-Performance Language Implementations With Low Effort
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 

Similar a PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
Automating Performance Monitoring at Microsoft
Automating Performance Monitoring at MicrosoftAutomating Performance Monitoring at Microsoft
Automating Performance Monitoring at MicrosoftThousandEyes
 
MongoDB & The McGraw-Hill Education Learning Analytics Platform
MongoDB & The McGraw-Hill Education Learning Analytics PlatformMongoDB & The McGraw-Hill Education Learning Analytics Platform
MongoDB & The McGraw-Hill Education Learning Analytics PlatformMongoDB
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldBeyondTrees
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0Russell Jurney
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopQuantUniversity
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleAaron (Ari) Bornstein
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
MySql_PlSQL_7yrs_CV
MySql_PlSQL_7yrs_CVMySql_PlSQL_7yrs_CV
MySql_PlSQL_7yrs_CVsathisha D R
 

Similar a PyCon Ukraine 2016: Maintaining a high load Python project for newcomers (20)

Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
Automating Performance Monitoring at Microsoft
Automating Performance Monitoring at MicrosoftAutomating Performance Monitoring at Microsoft
Automating Performance Monitoring at Microsoft
 
MongoDB & The McGraw-Hill Education Learning Analytics Platform
MongoDB & The McGraw-Hill Education Learning Analytics PlatformMongoDB & The McGraw-Hill Education Learning Analytics Platform
MongoDB & The McGraw-Hill Education Learning Analytics Platform
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Nagacv
NagacvNagacv
Nagacv
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Resume
ResumeResume
Resume
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
SQL Optimizer vs Hive
SQL Optimizer vs Hive SQL Optimizer vs Hive
SQL Optimizer vs Hive
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
MySql_PlSQL_7yrs_CV
MySql_PlSQL_7yrs_CVMySql_PlSQL_7yrs_CV
MySql_PlSQL_7yrs_CV
 

Último

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Último (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

  • 1. Maintaining a high load Python project for newcomers Viacheslav Kakovskyi PyCon Ukraine 2016
  • 2. Me! @kakovskyi Python Developer at SoftServe Contributor of Atlassian HipChat — Python 2, Twisted Maintainer of KPIdata — Python 3, asyncio 2
  • 3. Agenda ● What project is `high load`? ● High loaded projects from my experience ● Case study: show last 5 feedbacks for a university course ● Developer's checklist ● Tools that help to make customers happy ● Summary ● Further reading 3
  • 4. What project is `high load`? 4
  • 5. What project is `high load`? ● 2+ nodes? ● 10 000 connections? ● 200 000 RPS? ● 1 000 000 daily active users? ● monitoring? ● scalability? ● continuous deployment? ● disaster recovery? ● sharding? ● clustering? ● ??? 5
  • 6. What project is `high load`? a project where an inefficient solution or a tiny bug has a huge impact on your business (due to a lack of resources)→ → causes an increase of costs $$$ or loss of reputation (due to performance degradation) 6
  • 7. High loaded Python projects from my experience ● Instant messenger: ○ 100 000+ connected users ○ 100+ nodes ○ 100+ developers ● Embedded system for traffic analysis: ○ scaling and upgrade options are unavailable 7
  • 8. Some examples of issues from my experience ● usage of a less-efficient library: json VS ujson ● usage of a more complex serialization format: XML vs JSON ● usage of a wrong data format for a certain case: JPEG vs BMP ● usage of a wrong protocol: TCP vs UDP ● usage of legacy code without understanding how it works under the hood: 100 PostgreSQL queries instead of 1 ● spawning a lot of objects that aren't destroyed by garbage collector ● ... ● deployment of a new feature which does not fit well with the load on your production environment 8
  • 9. Terms 9 ● Elasticsearch - a search server that provides a full-text search engine ● Redis - an in-memory data structure server ● Capacity planning - a process aimed to determine an amount of resources that will be needed over some future period of time ● StatsD - a daemon for stats aggregation ● Feature flag - an ability to turn on/off some functionality of an application without deployment
  • 10. Case study Let's imagine some application for assessing the quality of higher education ● A university has faculties ● A faculty has departments ● A department has directions ● A direction has groups ● A group has students ● A student learns courses 10
  • 11. Case study ● A student leaves feedback about courses ● Feedbacks are stored in Elasticsearch for full-text search A feedback looks like this: Introduction to Software Engineering. Faculty of Applied Math Good for ones who don't have any previous experience with programming and algorithms. Optional for prepared folks. They should request additional tasks to stay in a good shape. 11
  • 12. Case study: show recent 5 feedbacks for the course 12 INTRODUCTION TO SOFTWARE ENGINEERING 100500 Recent feedbacks Software engineering is about teams and it is about quality. The problems to solve are so complex or large, that a single developer cannot solve them anymore. See https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering Faculties
  • 13. Case study: obvious solution Request the last 5 feedbacks directly from Elasticsearch 13 from elasticsearch import Elasticsearch es = Elasticsearch() def fetch_feedback(es, course_id, amount): query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result
  • 14. Case study OK, just implement the solution, test on staging, and deploy to production. 14
  • 16. 16 EsRejectedExecutionException [rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction]
  • 17. Case study: optimization Hypotheses: ● configure Elasticsearch properly for the case ● cache responses from Elasticsearch for some time ● use double writes: ○ write a feedback to Elasticsearch and Redis queue with a limited size ○ fetch from Redis at first 17
  • 18. Case study: prerequisites from our domain* ● up to 1000 characters allowed for a feedback ● 50 000 feedbacks expected just for Kyiv Polytechnic Institute every year ● 300 000+ applicants in 2016 ● 100+ universities in Ukraine if we decide to scale 18 *it's just an assumption for the example case study
  • 19. Case study: let's measure the current load on production Operations: ● add a feedback ● retrieve last 5 feedbacks ● find a feedback by a phrase 19
  • 20. Case study: let's measure the current load on production Application metrics: ● add a feedback ○ stats.count.feedback.course.added.es ○ stats.timing.feedback.course.added.es ● retrieve latest 5 feedbacks ○ stats.count.feedback.course.fetched.es ○ stats.timing.feedback.course.fetched.es ● find a feedback by a phrase ○ stats.count.feedback.course.found.es ○ stats.timing.feedback.course.found.es 20
  • 21. Case study: how to add a metric to your code 21 from elasticsearch import Elasticsearch from statsd import StatsClient es = Elasticsearch() statsd = StatsClient() def fetch_feedback(statsd, es, course_id, amount): statsd.incr('feedback.course.fetched.es') # don't perform anything query, just collect stats return query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) with statsd.timer('feedback.course.fetched.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result
  • 22. Case study: how to add a metric to your code 22 def write_feedback_to_elasticsearch(statsd, es, course_id, doc): statsd.incr('feedback.course.added.es') with statsd.timer('feedback.course.added.es'): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id=course_id, body=doc) def find_feedback(statsd, es, phrase, course_id=None) statsd.incr('feedback.course.found.es') query = _build_es_search_query(doc_type='course', id=course_id, phrase=phrase) with statsd.timer('feedback.course.found.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result
  • 23. Visualize metrics: RPS, feature-related operations 23 Add feedback Find feedback Fetch feedback
  • 24. Visualize metrics: Course feedback request performance 24 Add feedback Find feedback Fetch feedback
  • 25. Case study: visualize collected metrics Outcomes: ● we know frequency of operations ● we know timing of operations ● we know what to optimize ● we can perform a capacity planning for a new flow 25
  • 26. Optimization: double writes ● continue using Elasticsearch as a storage for feedbacks ● duplicate writing of a feedback to Elasticsearch and Redis ● store last 5 feedbacks in Redis for faster retrieval ● use Elasticsearch for custom queries and full-text search 26
  • 27. Optimization 27 from elasticsearch import Elasticsearch es = Elasticsearch() def fetch_feedback(es, redis, course_id, amount): result = None if amount <= REDIS_FEEDBACK_QUEUE_SIZE: # REDIS_FEEDBACK_QUEUE_SIZE = 5 result = _fetch_feedback_from_redis(redis, course_id, amount) if not result: result = _fetch_feedback_from_elasticsearch(es, course_id, amount) return result
  • 28. Optimization 28 def _fetch_feedback_from_elasticsearch (es, course_id, amount): query = _build_es_filter_query(doc_type ='course', id=course_id, amount =amount) # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result def _fetch_feedback_from_redis (redis, course_id, amount): queue = redis.get_queue(entity ='course', id=course_id) # blocking call to Redis result = queue.get(amount) return result
  • 29. Optimization 29 def add_feedback(es, redis, course_id, doc): _write_feedback_to_redis(redis, course_id, doc) _write_feedback_to_elasticsearch(es, course_id, doc) def _write_feedback_to_elasticsearch (es, course_id, doc): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id=course_id, body =doc) def _write_feedback_to_redis (statsd, redis, course_id, doc): queue = redis.get_queue(entity ='course', id=course_id) # blocking call to Redis queue.push(doc)
  • 30. Optimization: potential impact on production ● Increased: ○ Insert feedback time ○ Redis capacity ○ Network traffic for Redis ● Reduced: ○ Fetch feedback time ○ Elasticsearch capacity ○ Network traffic for Elasticsearch 30
  • 32. Measure: timing of insert and fetch operations 32 def _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount): statsd.incr('feedback.course.fetched.es') query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) with statsd.timer('feedback.course.fetched.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result def _fetch_feedback_from_redis(statsd, redis, course_id, amount): statsd.incr('feedback.course.fetched.redis') queue = redis.get_queue(entity='course', id=course_id) with statsd.timer('feedback.course.fetched.redis'): # blocking call to Redis result = queue.get(amount) return result
  • 33. Measure: timing of insert and fetch operations 33 def _write_feedback_to_elasticsearch (statsd, es, course_id, doc): statsd.incr('feedback.course.added.es' ) with statsd.timer('feedback.course.added.es' ): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id =course_id, body =doc) def _write_feedback_to_redis (statsd, redis, course_id, doc): statsd.incr('feedback.course.added.redis' ) queue = redis.get_queue(entity ='course', id=course_id) with statsd.timer('feedback.course.added.redis' ): # blocking call to Redis queue.push(doc)
  • 34. Measure: Redis capacity ● A feedback - up to 1000 characters ● Redis is used for storing 5 feedbacks per course ● 10 000 courses for Kyiv Polytechnic Institute ● Key: feedback:course:<course_id> ● Data structure: List ● Commands: ○ LPUSH - O(1) ○ LRANGE - O(S+N), S=0, N=5 ○ LTRIM - O(N) 34
  • 35. Measure: Redis capacity ● Don't trust benchmarks from the internet ● Run a benchmark for a production-like environment with your sample data ● Example: ○ FLUSHALL ○ define a sample feedback (string up to 1000 characters) ○ create N=10 000 lists with M=5 sample feedbacks ○ measure allocated memory ● You can run an approximated benchmark and calculate expected memory size 35
  • 36. Measure: Redis capacity ● 76.3 MB for 10000 courses, Kyiv Polytechnic Institute ● 7GB for 100 Ukrainian universities 36
  • 37. Measure: Network traffic for Redis ● Measure network traffic for send/receive operations: ○ add_feedback → LPUSH ○ fetch_feedback → LRANGE ● Revise Redis protocol (RESP) ● Calculate expected sent/received data for new Redis operations: ○ How much data sent for LPUSH ○ How much data received for LRANGE 37
  • 38. Measure: Network traffic for Redis from aioredis.util import encode_command add_feedback = len(encode_command(b'LPUSH feedback:course:100500 "MY_AWESOME_FEEDBACK"')) https://github.com/aio-libs/aioredis/blob/master/aioredis/util.py 38
  • 39. Measure: Network traffic for Redis * ● MAX add_feedback_traffic = 1.5 Mbps ● AVG add_feedback_traffic = 0.8 Mbps ● MAX fetch_feedback_traffic = 30 Mbps ● AVG fetch_feedback_traffic = 10 Mbps * This step is optional and depends on your architecture (optional) 39
  • 40. Summary of the investigation around double writes ● 90% of fetch feedback requests could be processed by Redis ● Initial issue when Elasticsearch is out of queue capacity should be avoided 40
  • 41. Summary of the investigation ● Fetch feedback time is reduced ■ 2 ms per fetch for 90% of cases ● Increased: ○ Insert feedback time ■ 16 ms per insert ○ Redis capacity ■ 76.3 MB for 10000 courses, Kyiv Polytechnic Institute ■ 7GB for 100 Ukrainian universities ○ Network traffic for Redis ■ 11 Mbps 41
  • 42. Making a decision 42 ● Implement a prototype ● Discuss collected stats with Ops ● And with Business guys ● Implement the solution ● Deploy under a feature flag
  • 43. Adding a feature flag 43 from feature import Feature feature = Feature() def fetch_feedback(feature, statsd, es, redis, course_id, amount): result = None if feature.is_enabled('fetch_feedback_from_redis') and amount <= REDIS_FEEDBACK_QUEUE_SIZE: # 5 feedbacks in queue fetched_from_redis = True result = _fetch_feedback_from_redis(statsd, redis, course_id, amount) if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result: result = _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount) return result
  • 44. Rolling the feature only for a subset of users 44
  • 45. Rolling the feature only for a subset of users 45
  • 46. RPS. Feature "Fetch last 5 feedbacks about a course". Rolled out for 1% of users. 46 Fetch from Elasticsearch Fetch from Redis
  • 47. Incremental rollout prevented the incident EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction] 47
  • 48. Investigation 48 ● Disable the feature ● Run investigation ○ Only recent feedbacks are retrieved from Redis ○ Legacy feedbacks are fetched directly from Elasticsearch ● Solution ○ Write legacy feedbacks to Redis using a background job
  • 49. Fixing missed data in Redis 49 def fetch_feedback(feature, statsd, es, redis, course_id, amount): fetched_from_redis, result = False, None if feature.is_enabled('fetch_feedback_from_redis') and amount <= REDIS_FEEDBACK_QUEUE_SIZE: fetched_from_redis = True result = _fetch_feedback_from_redis(statsd, redis, course_id, amount) if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result: result = _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount) if fetched_from_redis: # redis was empty for the course fill_redis(redis, result, amount=REDIS_FEEDBACK_QUEUE_SIZE) return result
  • 50. RPS. Feature "Fetch last 5 feedbacks about a course". Fixed and rolled out for 1% of users. 50 Fetch from Elasticsearch Fetch from Redis
  • 51. RPS. Feature "Fetch last 5 feedbacks about a course". Fixed and rolled out for 100% of users. 51Fetch from Elasticsearch Fetch from Redis
  • 52. Feature has been deployed for 100% users 52
  • 53. Developer's checklist for adding a feature to a high loaded project ● discover which services are hit by the feature ○ database ○ cache ○ storage ○ whatever ● measure the impact of the feature on the existing environment ○ call frequency ○ amount of memory ○ traffic ○ latency 53
  • 54. Developer's checklist for adding a feature to a high loaded project (2) ● calculate allowed load for the feature ○ requests per second for the existing environment ○ a timing of request processing ● calculate the additional load for the feature ○ latency for additional requests ○ how to deal with a lack of resources 54
  • 55. Developer's checklist for adding a feature to a high loaded project (3) ● discuss the acceptability of the solution ○ with peers ○ with Ops ○ with business owners ● consider alternatives if needed ● perform load testing on staging ● rollout the feature to production incrementally 55
  • 56. Tools that help to make customers happy ● profiling: ○ cProfile ○ kcachegrind ○ memory_profiler ○ guppy ○ objgraph 56
  • 57. Tools that help to make customers happy (2) ● metrics ○ StatsD ● graphs and dashboards ○ Graphana ○ Graphite ● logging ○ Elasticsearch ○ Logstash ○ Kibana 57
  • 58. Tools that help to make customers happy (3) ● feature flags: ○ Gargoyle and Gutter from Disqus ○ Flask-FeatureFlags ○ Switchboard ● alerting: ○ elastalert ○ monit ○ graphite-beacon ○ cabot 58
  • 60. Summary ● Be careful with calls to external services ● Collect metrics about state of your production environment ● Perform a capacity planning for "serious" changes ● Use application metrics and measure potential load ● Roll out new code incrementally with feature flags ● Set proper monitoring, it can prevent majority of incidents ● Use the tools, it's really easy ● Be ready to rollback fast 60
  • 61. To be continued ● asynchronous programming ● infrastructure as a service ● testing ● monitoring and alerting ● dealing with bursty traffic ● OS and hardware metrics ● scaling ● distributed applications ● continuous integration 61
  • 62. Further reading ● How HipChat Stores and Indexes Billions of Messages Using ● Continuous Deployment at Instagram ● How Twitter Uses Redis To Scale ● Why Leading Companies Dark Launch - LaunchDarkly Blog ● Lessons Learned From A Year Of Elasticsearch ... - Tech blog ● Notes on Redis Memory Usage ● Using New Relic to Understand Redis Performance: The 7 Key Metrics ● A guide to analyzing Python performance 62