Dead Simple Scalability Patterns

•Descargar como PPTX, PDF•

3 recomendaciones•777 vistas

Everyone dreams of being ‘Web Scale’, but we start out small. We — most of us — don’t launch a service and expect it to serve millions of requests from Day 1. This means that we don’t think about the ways in which our stack will blow up when the number of requests does start climbing. This talk lists simple patterns and checks that Development and Operations teams should implement from Day 1 in order to ensure a robust distributed system.

Ingeniería

Dead Simple
Scalability Patterns
Vedang Manerikar
Platform Architect, Helpshift
Fifth Elephant, 16th July 2015

Agenda
• Monitoring 101: Logs!
• Database Access Patterns
• Serialization and Deserialization
• Network Calls
• Integration Points

Monitoring 101: Logs!
• Request URL
• UUID
• Resource being queried
• Time taken (ms)
• Size of response (bytes)
• Human Identifiers for Data Store, Type of operation
IMP

Monitoring 101: Logs!
• Top 5 slowest DB calls
• $ sort -k6 -r -n <logs> | cut -f3- -d ‘ ‘
• Top 5 popular URLs
• $ sort -k4,4 -u <logs> | sort -k3 | cut -f 3-3
-d ' ' | uniq -c | sort -k1 -n -r
• Top 5 routes making the maximum number of DB
calls.
• $ sort -k4 <logs> | cut -f2-4 -d ' ' | uniq -
f1 -c | sort -k1 -n -r

Unbounded DB Calls
• Lock up Resources
• Eat up Memory
• Discovered when:
• Data Sludge
• Unexpected Use-case
• Fixed by:
• Batch Sizes
• Abstractions for
Chunked Requests
QA < Crazy Users

Check Out:
• Batch requests to the same Data Source
• Cache previous results

Avoid Network Calls
• Caching
• Precomputing
Avoid

Integration Points and
Domino Effects
App Server
DB
Network Calls
Web Server
Cascading Failures
QUEUE
Analytics
Consumer

Integration Points and
Domino Effects
App Server
DB
Network Calls
Web Server
Cascading Failures
QUEUE
Analytics
Consumer
X
X
X
X

Timeouts and Circuit
Breakers
Closed
(everything is
operational)
Half-Open
(Has it
recovered?)
Open
(Resource has
failed)
Failure
Wait for some time,
in the meanwhile:
- Fail Fast
- Gracefully Degrade
Attempt Reset
Failure
Success
Resource Timeout

Health Checks
• Am I Alive?
• Auto-Scaling

Revisiting our Stack
App Server
DB
Network Calls
Web Server
Cascading Failures
QUEUE
Analytics
Consumer
CB, T, GD
CB, T
BP
CB, T
You Shall Not Pass
• CB: Circuit Breaker
• T: Timeouts
• BP: Back Pressure
• GD: Graceful
Degradation

Take-Aways and
Questions
• Designing Scalability > Testing Scalability
• Scalability is an important perspective
vedang@helpshift.com
@vedang

Más contenido relacionado

La actualidad más candente

Conditional Requests in HTTP - Nafis FuadCefalo

Solr: 4 big featuresDavid Smiley

Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...StampedeCon

Roaring with elastic search sangam2018Vinay Kumar

Apache Arrow at DataEngConf Barcelona 2018Wes McKinney

Centralized log-management-with-elastic-stackRich Lee

ELK - Stack - Munich .net UGSteve Behrendt

Webinar: Site Search in an Hour with FusionLucidworks

Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"Wes McKinney

Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć

Insights into Customer Behavior from Clickstream Data by Ronald NowlingSpark Summit

Presto+MySQLで分散SQLSadayuki Furuhashi

Introduction to Apache SolrAlexandre Rafalovitch

How Solr Search WorksAtlogys Technical Consulting

FHIR Server Design ReviewBrian Postlethwaite

Elasticsearch IntroductionRoopendra Vishwakarma

Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI

What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan

La actualidad más candente (20)

Conditional Requests in HTTP - Nafis Fuad

Solr: 4 big features

Apache Arrow -- Cross-language development platform for in-memory data

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...

Roaring with elastic search sangam2018

Apache Arrow at DataEngConf Barcelona 2018

Centralized log-management-with-elastic-stack

ELK - Stack - Munich .net UG

Webinar: Site Search in an Hour with Fusion

Understanding Presto - Presto meetup @ Tokyo #1

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"

Battle of the giants: Apache Solr vs ElasticSearch

Insights into Customer Behavior from Clickstream Data by Ronald Nowling

Presto+MySQLで分散SQL

Introduction to Apache Solr

How Solr Search Works

FHIR Server Design Review

Elasticsearch Introduction

Tale of ISUCON and Its Bench Tools

What I learnt: Elastic search & Kibana : introduction, installtion & configur...

Similar a Dead Simple Scalability Patterns

Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A

Database performance monitoring:Key to seamless application performanceManageEngine, Zoho Corporation

Internals of Presto ServiceTreasure Data, Inc.

Building Software Systems at Google and Lessons Learnedparallellabs

Lessons Learned from Building SW at Googleadrianionel

«Scrapy internals» Александр Сибиряков, Scrapinghubit-people

Boost the Performance of SharePoint Today!Brian Culver

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks

Share point 2013 enterprise search (public)Petter Skodvin-Hvammen

Bullet: A Real Time Data Query EngineDataWorks Summit

SQL Explore 2012: P&T Part 1sqlserver.co.il

Making Session Stores More IntelligentKyle Davis

Frontera распределенный робот для обхода веба в больших объемах / Александр С...Ontico

Resolving problems & high availabilityZend by Rogue Wave Software

Monitoring MongoDB’s Engines in the WildTim Vaillancourt

Oracle database threats - LAOUC WebinarOsama Mustafa

SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver

MyHeritage backend group - build to scaleRan Levy

InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData

Introduction to Apache CassandraJesus Guzman

Similar a Dead Simple Scalability Patterns (20)

Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC

Database performance monitoring:Key to seamless application performance

Internals of Presto Service

Building Software Systems at Google and Lessons Learned

Lessons Learned from Building SW at Google

«Scrapy internals» Александр Сибиряков, Scrapinghub

Boost the Performance of SharePoint Today!

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...

Share point 2013 enterprise search (public)

Bullet: A Real Time Data Query Engine

SQL Explore 2012: P&T Part 1

Making Session Stores More Intelligent

Frontera распределенный робот для обхода веба в больших объемах / Александр С...

Resolving problems & high availability

Monitoring MongoDB’s Engines in the Wild

Oracle database threats - LAOUC Webinar

SharePoint Saturday The Conference 2011 - SP2010 Performance

MyHeritage backend group - build to scale

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

Introduction to Apache Cassandra

Último

Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913

Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

young call girls in Green Park🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

National Level Hackathon Participation Certificate.pdfRajuKanojiya4

Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

US Department of Education FAFSA Week of ActionMebane Rash

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian

Work Experience-Dalton Park.pptxfvvvvvvvLewisJB

NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali

Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra

Earthing details of Electrical Substationstephanwindworld

welding defects observed during the weldingMuhammadUzairLiaqat

Dead Simple Scalability Patterns

1. Dead Simple Scalability Patterns Vedang Manerikar Platform Architect, Helpshift Fifth Elephant, 16th July 2015

2. Background: Helpshift +

3. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

4. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

5. Monitoring 101: Logs!

6. Monitoring 101: Logs! • Request URL • UUID • Resource being queried • Time taken (ms) • Size of response (bytes) • Human Identifiers for Data Store, Type of operation IMP

8. Monitoring 101: Logs!

9. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

10. Unbounded DB Calls • Lock up Resources • Eat up Memory • Discovered when: • Data Sludge • Unexpected Use-case • Fixed by: • Batch Sizes • Abstractions for Chunked Requests QA < Crazy Users

11. Indiscriminate DB Calls

12. Check Out: • Batch requests to the same Data Source • Cache previous results

13. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

14. Serialisation / Deserialisation

15. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

16. The Network is Wonderful

17. The Network …

18. Avoid Network Calls • Caching • Precomputing Avoid

19. Agenda • Monitoring 101: Logs! • Database Access Patterns • Serialization and Deserialization • Network Calls • Integration Points

20. Integration Points and Domino Effects App Server DB Network Calls Web Server Cascading Failures QUEUE Analytics Consumer

21. Integration Points and Domino Effects App Server DB Network Calls Web Server Cascading Failures QUEUE Analytics Consumer X X X X

22. Timeouts and Circuit Breakers Closed (everything is operational) Half-Open (Has it recovered?) Open (Resource has failed) Failure Wait for some time, in the meanwhile: - Fail Fast - Gracefully Degrade Attempt Reset Failure Success Resource Timeout

23. Check out:

24. Health Checks • Am I Alive? • Auto-Scaling

25. Revisiting our Stack App Server DB Network Calls Web Server Cascading Failures QUEUE Analytics Consumer CB, T, GD CB, T BP CB, T You Shall Not Pass • CB: Circuit Breaker • T: Timeouts • BP: Back Pressure • GD: Graceful Degradation

26. Take-Aways and Questions • Designing Scalability > Testing Scalability • Scalability is an important perspective vedang@helpshift.com @vedang

Notas del editor

Hey Guys, How’s everyone today? I’m looking forward to some kick-ass sessions today and tomorrow! My name’s Vedang,
Helpshift is a 3 year old startup in the Mobile CRM space. As a startup, your greatest weapon is your agility - shipping faster is how you compete with established companies. Any structural and architectural processes need to be balanced against the need to keep shipping. The aim of this talk is to discuss scalability patterns that balance these constraints: they are relatively simple to implement, and they bake resilience into the system.
So this is the agenda for my talk today.
Monitoring your system is important, I hope we’re all agreed on that. Without monitoring, you are basically flying blind. Having said that, monitoring systems can be complex to setup.
Logs are very effective for monitoring system behaviour! For example, you can add logging around every network call you make, and record stats like this. In a runtime like Clojure, you can even do this on-the-fly, so you can collect a reasonable sample on production and turn them off to avoid the performance penalty.
With logs like this and simple UNIX tools like sort, cut, uniq and grep you can gain deep insight into your what your system is doing. These are the low-hanging fruit that you can fix quickly. If you know awk and sed as well - well then you’re a wizard and you can do what you want.
This is a simple macro to do something like this in Clojure.
Unbounded calls are where you don’t have a bound on the size of the response or the number of requests you make. No matter how hard you try, your Dev and QA environments are never going to match up to production. Real-world usage is hard to predict, and we rarely think about the effects of data piling up over time. Build default batch sizes into your DB request abstractions. Real World: I have seen programmers explicitly over-ride this and ask the system to “return everything”. Catch this in Code Review! Build abstractions and contracts for chunked requests: limit-skip, total count. =scan-and-scroll=
I don’t know if it’s a functional paradigm thing, but a _lot_ of code gets written without any thought about the side-effects. Catch this in Code Review and fix your functions.
Facebook recently open-sourced a library called Haxl which provides safe abstractions to dev and abstracts away access to remote data.
When you are building your data structures, think about how your data will flow through the system currently, as well as in the planned future. Every message queue and cache that it passes through imposes a serialization/deserialization penalty. The slide shows an example of a data-structure which stores dates as objects vs one which stores them as longs. The second one is more than twice as fast to serialize/deserialize.
When you’re young and write things like “became Core Java expert in 6 months” on your resume, you also believe that the Network “just works”.
Once you start working with Distributed Systems though… We’re going to have full sessions dedicated to network flakiness, I trust that everyone here will definitely attend them. For the purposes of this talk, I’d just like to say…
Just avoid them. Network calls are slow, and they are the number 1 reason for cascading failures in your system. If your data size is small and it doesn’t change too often, cache it in memory. If your data size is large, cache it on local disk.
Integration points are the #1 cause for cascading failure in the system
Explain the diagram 2 powerful patterns to help combat cascading failures are
Every call to a resource should be configured to timeout Make sure that the default timeouts are sane (Eg: Monger) Circuit breakers track the health of your resource, and avoid badgering it when it is unresponsive. You can now fallback to a secondary source, or just fail fast. If you know you will fail eventually, you might as well fail immediately. Check the CBs you need.
Hystrix gives you a nice implementation of Circuit Breakers and a whole host of other Scalability Patterns
A Health Check is a way to tell if your production service is responsive or not, and is essential to support features like auto-scaling. The idea is to wait until the machine passes the health check before sending production traffic to it. Circuit breakers also act as a poor man’s health check.
With our patterns in place, we can contain failures and stop them from infecting the rest of the system.

Dead Simple Scalability Patterns

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Dead Simple Scalability Patterns

Similar a Dead Simple Scalability Patterns (20)

Último

Último (20)

Dead Simple Scalability Patterns

Notas del editor