SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Auto-scaling your API
Insights and Tips from the Zalando Team
JBCNConf 2015

Barcelona, Spain
Sean Patrick Floyd - @oldJavaGuy

Luis Mineiro - @voidmaze
15 countries
3 fulfilment centers
15+ million active customers
2.2+ billion € revenue 2014
130+ million visits per month
8.000+ employees
ONE of EUROPE’S LARGEST ONLINE FASHION RETAILERS
Visit us: tech.zalando.com
Tech hubs in Berlin, Dublin, Dortmund
and Helsinki
Our Scale
Based on https://flic.kr/p/bX5E4c
2 datacenters
thousands of production instances
serving 15 countries
Zalando stack
Credits to our colleague Kolja Wilcke
Credits to our colleague Kolja Wilcke
Credits to our colleague Kolja Wilcke
The Shop
Monolith
We call it “Jimmy”
http://blog.codinghorror.com/new-programming-jargon/
Thousands of Java classes, undocumented features
Business logic on all layers (including the database)
https://flic.kr/p/nBhvfy
https://flic.kr/p/tW4sus
It’s 2013…



Zalando wants to
sponsor hackathons…
We need a REST API!
Hackathon

API
First Step
Let’s create some REST-(ish) Spring
controllers inside our monolithic web
application
Deploy a couple of Jimmy instances just to
serve requests for this “API”
Pros
The infrastructure is already there!
A few days of coding and we’re all set
Cons
This “API” cannot be deployed independently
of Jimmy.
Jimmy is infested with business logic
everywhere.
New
Requirements
New Requirements
We needed this new API for our existing
frontends, also (very high traffic).
Some third-party apps also wanted to use this
API as their backend.
Shop Public API
We decided to build a “real” API as a separate
standalone project.
Of course, there were some dependencies…
Data Sources
1.Catalog - SOLr
2.Stock - Memcached
3.Reviews - REST(-ish) API
4.Recommendations - RPC
5.Database - PostgreSQL
Shop Public API
Catalog Stock Reviews Recommendations Database
All of them in the same data centers.
This should be no problem…
Tech
Company
Paradigm Shift
Until 2014
We’re a Fashion Company that has a lot 

of tech knowledge
2015
Suddenly we’re a Tech Company, providing
“Fashion as a Service”
We established
5 principles
API FIRST
API First
Document and peer review your API in a
format like Swagger before writing a single
line of code.
Ideally, either generate either your server
interfaces or your test data (or both) from the
Swagger spec
REST
REST
Manipulate resources, don't call methods
Expose nouns, not verbs
Use HTTP verbs
GET, PUT, POST, DELETE, PATCH
SAAS
Identity Management
Our backend services don’t expose APIs yet
Company-wide IAM strategy is not ready yet
Can only expose read-only features for now
MICRO-
SERVICES
Microservices
We already have a Service-Oriented
Architecture
It was mostly SOAP, though…
And definitely not micro
CLOUD
Road to AWS
Some challenges ahead…
1. Backend services not available yet
2. SOLr also not available
3. We can’t just move the databases there
How we

did it
Challenges
Catalog and Stock datasources are latency critical
and they’re owned by different teams!
Can we solve that?
Step 1 - move critical datasources to AWS
“Move” Data Sources to AWS
We can’t just move them!
Jimmy also needs them in the data centers,
you insensitive clod!
Ok… we just replicate them.
and then…?
Replicating Data Sources
SOLr has its own replication mechanism
over HTTP
memcached should be easy …
You wish!
Step 2 - Build memcached replication
Stock Relay
Memcached #1
Memcached #2
Memcached #3
Stock Relay(s)
SNS Topic
…
Datacenter
eu-central-1
Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue
eu-west-1
Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue
us-east-1
Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue
SET / DELETE
GET
GET
GET
and then…?
AWS SOLr Repeater
Datacenter Master
eu-central-1
eu-west-1
us-east-1
SOLr Slaves Shop Public APIRegion Repeater
SOLr Slaves Shop Public APIAWS Master Repeater
SOLr Slaves Shop Public APIRegion Repeater
Autoscaling SOLr and API
http://www.docstoc.com/docs/109290533/Lucid-Imagination# by Erick Erickson
Region repeater
Slave #1
Autoscaling Group
Slave #2
Slave n
API
Node #1
Autoscaling Group
API
Node #2
API
Node n
https://api.zalando.com
Scaling Limitations
Edited from https://flic.kr/p/zuBXM
Region repeater
Autoscaling Group
Slave #1
Slave #2
Slave #3
Slave #3
Slave #3
Slave #3
Slave #11
Slave #18
Slave #3
Slave #34Slave #3
Slave #3
Slave #59
Slave #3
Slave #3
Slave #123
Slave #3
Slave #3
Slave #3
Slave #3
Slave #3
Slave #325
Slave #3
Slave #3
Slave #3
Slave #3
Slave
#42815
Y U no scale!
S3 Bucket for Replication
Datacenter Master
eu-central-1
eu-west-1
us-east-1
SOLr Slaves Shop Public API
Master Repeater SOLr Slaves Shop Public API
SOLr Slaves Shop Public API
S3replicationS3replication
I see…
“Onion layers” for Replication
• Start with a single repeater layer -
layer0.solr
• Setup a Route53 CNAME repeater
that links to it - repeater.solr
• Entire slave fleet in the ASG is
always configured to replicate from
repeater.solr
CNAME
repeater
Slave #1
Autoscaling Group
Slave #2
Slave n
slave.solr
layer0-repeater.solr
“Onion layers” for Replication
• Setup CloudWatch alarms for
relevant metrics in the repeater layer.
Give them some slack space.
• Configure it to send notifications to
the onion-layers-topic.
• Bring up your instance of onion-
layers.
CloudWatch
CNAME
repeater
Slave #1
Autoscaling Group
Slave #2
Slave n
slave.solr
layer0-repeater.solr
“Onion layers” for Replication
• onion-layers creates a new ASG
layer. Calculates currentLayer =
currentLayer + 1
• The new layer starts replicating from
layer<currentLayer-1>-repeater
• Adds a new Route53 recordset
layer<currentLayer>-repeater.
CNAME
repeater
CloudWatch
Autoscaling Group
layer1-repeater.solr
layer0-repeater.solr onion-layers
Autoscaling Group
slave.solr
“Onion layers” for Replication
• After replication, the CNAME
repeater is updated to link to
layer<currentLayer>-repeater.
• onion-layers acts on alarms for the
connection between current layer
and previous layer.
• You can still have your own AS
alarms for the connection between
slaves and current layer.
CNAME
repeater
CloudWatch
Autoscaling Group
layer1-repeater.solr
layer0-repeater.solr
Autoscaling Group
slave.solr
Yes. These tools will be open-sourced soon
Thank you for listening
Check out our blog https://tech.zalando.com
Our many open source products https://github.com/zalando
The STUPS stack https://stups.io
Got more questions? You can reach us on twitter
@ZalandoTech
We’re hiring!
Special thanks to Jessie Dude.
No Continuum Transfunctioners were harmed during the production of these slides.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Scapyで作る・解析するパケット
Scapyで作る・解析するパケットScapyで作る・解析するパケット
Scapyで作る・解析するパケット
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Autoscaling with Kubernetes
Autoscaling with KubernetesAutoscaling with Kubernetes
Autoscaling with Kubernetes
 
分散トレーシング技術について(Open tracingやjaeger)
分散トレーシング技術について(Open tracingやjaeger)分散トレーシング技術について(Open tracingやjaeger)
分散トレーシング技術について(Open tracingやjaeger)
 
DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)
DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)
DDD&Scalaで作られたプロダクトはその後どうなったか?(Current state of products made with DDD & Scala)
 
Vue.js で XSS
Vue.js で XSSVue.js で XSS
Vue.js で XSS
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Linux on Power と x86 Linux との技術的な相違点
Linux on Power と x86 Linux との技術的な相違点Linux on Power と x86 Linux との技術的な相違点
Linux on Power と x86 Linux との技術的な相違点
 
比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)比較サイトの検索改善(SPA から SSR に変換)
比較サイトの検索改善(SPA から SSR に変換)
 
OpenAPIを利用したPythonWebアプリケーション開発
OpenAPIを利用したPythonWebアプリケーション開発OpenAPIを利用したPythonWebアプリケーション開発
OpenAPIを利用したPythonWebアプリケーション開発
 
ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術
 
マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021マイクロサービスに至る歴史とこれから - XP祭り2021
マイクロサービスに至る歴史とこれから - XP祭り2021
 
AWSで透過プロキシをやってみた
AWSで透過プロキシをやってみたAWSで透過プロキシをやってみた
AWSで透過プロキシをやってみた
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixScaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @Netflix
 
React & GraphQL
React & GraphQLReact & GraphQL
React & GraphQL
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
openstack+cephインテグレーション
openstack+cephインテグレーションopenstack+cephインテグレーション
openstack+cephインテグレーション
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 

Similar a Auto-scaling your API: Insights and Tips from the Zalando Team

SAP portal: breaking and forensicating
SAP portal: breaking and forensicating SAP portal: breaking and forensicating
SAP portal: breaking and forensicating
ERPScan
 
AWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWSAWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWS
Amazon Web Services
 

Similar a Auto-scaling your API: Insights and Tips from the Zalando Team (20)

Increasing velocity via serless semantics
Increasing velocity via serless semanticsIncreasing velocity via serless semantics
Increasing velocity via serless semantics
 
Varun-CV-J
Varun-CV-JVarun-CV-J
Varun-CV-J
 
SAP portal: breaking and forensicating
SAP portal: breaking and forensicating SAP portal: breaking and forensicating
SAP portal: breaking and forensicating
 
Serverless Microservices
Serverless MicroservicesServerless Microservices
Serverless Microservices
 
AWS Serverless API Management - Meetup
AWS Serverless API Management - MeetupAWS Serverless API Management - Meetup
AWS Serverless API Management - Meetup
 
Oracle Code Javaday Sao Paulo Nowadays Architecture Trends, from Monolith to ...
Oracle Code Javaday Sao Paulo Nowadays Architecture Trends, from Monolith to ...Oracle Code Javaday Sao Paulo Nowadays Architecture Trends, from Monolith to ...
Oracle Code Javaday Sao Paulo Nowadays Architecture Trends, from Monolith to ...
 
DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017
 
AWS Summit Barcelona 2015 - Introducing Amazon API Gateway
AWS Summit Barcelona 2015 - Introducing Amazon API GatewayAWS Summit Barcelona 2015 - Introducing Amazon API Gateway
AWS Summit Barcelona 2015 - Introducing Amazon API Gateway
 
Effective DevSecOps
Effective DevSecOpsEffective DevSecOps
Effective DevSecOps
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
Open-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring SystemOpen-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring System
 
AWS Observability Made Simple
AWS Observability Made SimpleAWS Observability Made Simple
AWS Observability Made Simple
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
PLNOG15: The Power of the Open Standards SDN API’s - Mikael Holmberg
 
Knolx session
Knolx sessionKnolx session
Knolx session
 
Oracle Developer Tour Latam Nowadays Architecture Trends, from Monolith to Mi...
Oracle Developer Tour Latam Nowadays Architecture Trends, from Monolith to Mi...Oracle Developer Tour Latam Nowadays Architecture Trends, from Monolith to Mi...
Oracle Developer Tour Latam Nowadays Architecture Trends, from Monolith to Mi...
 
Alberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.jsAlberto Paro - Hands on Scala.js
Alberto Paro - Hands on Scala.js
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJS
 
Securing Serverless Architecture
Securing Serverless ArchitectureSecuring Serverless Architecture
Securing Serverless Architecture
 
AWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWSAWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWS
 

Más de Zalando Technology

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 

Más de Zalando Technology (14)

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
How We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityHow We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards Scalability
 
Powering Radical Agility with Docker
Powering Radical Agility with Docker Powering Radical Agility with Docker
Powering Radical Agility with Docker
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Reactive Design Patterns: a talk by Typesafe's Dr. Roland Kuhn
Reactive Design Patterns: a talk by Typesafe's Dr. Roland KuhnReactive Design Patterns: a talk by Typesafe's Dr. Roland Kuhn
Reactive Design Patterns: a talk by Typesafe's Dr. Roland Kuhn
 
Zalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three MonthsZalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three Months
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Building a Reactive RESTful API with Akka Http & Slick
Building a Reactive RESTful API with Akka Http & SlickBuilding a Reactive RESTful API with Akka Http & Slick
Building a Reactive RESTful API with Akka Http & Slick
 
Radical Agility with Autonomous Teams and Microservices
Radical Agility with Autonomous Teams and MicroservicesRadical Agility with Autonomous Teams and Microservices
Radical Agility with Autonomous Teams and Microservices
 
Order Processing at Scale: Zalando at Camunda Community Day
Order Processing at Scale: Zalando at Camunda Community DayOrder Processing at Scale: Zalando at Camunda Community Day
Order Processing at Scale: Zalando at Camunda Community Day
 
ZMON: Monitoring Zalando's Engineering Platform
ZMON: Monitoring Zalando's Engineering PlatformZMON: Monitoring Zalando's Engineering Platform
ZMON: Monitoring Zalando's Engineering Platform
 
Mobile Testing Challenges at Zalando Tech
Mobile Testing Challenges at Zalando TechMobile Testing Challenges at Zalando Tech
Mobile Testing Challenges at Zalando Tech
 
Radical Agility with Autonomous Teams and Microservices in the Cloud
Radical Agility with Autonomous Teams and Microservices in the CloudRadical Agility with Autonomous Teams and Microservices in the Cloud
Radical Agility with Autonomous Teams and Microservices in the Cloud
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Auto-scaling your API: Insights and Tips from the Zalando Team

  • 1. Auto-scaling your API Insights and Tips from the Zalando Team JBCNConf 2015
 Barcelona, Spain Sean Patrick Floyd - @oldJavaGuy
 Luis Mineiro - @voidmaze
  • 2. 15 countries 3 fulfilment centers 15+ million active customers 2.2+ billion € revenue 2014 130+ million visits per month 8.000+ employees ONE of EUROPE’S LARGEST ONLINE FASHION RETAILERS Visit us: tech.zalando.com Tech hubs in Berlin, Dublin, Dortmund and Helsinki
  • 3. Our Scale Based on https://flic.kr/p/bX5E4c 2 datacenters thousands of production instances serving 15 countries
  • 4. Zalando stack Credits to our colleague Kolja Wilcke
  • 5. Credits to our colleague Kolja Wilcke
  • 6.
  • 7. Credits to our colleague Kolja Wilcke
  • 9. We call it “Jimmy” http://blog.codinghorror.com/new-programming-jargon/
  • 10. Thousands of Java classes, undocumented features Business logic on all layers (including the database) https://flic.kr/p/nBhvfy
  • 11. https://flic.kr/p/tW4sus It’s 2013…
 
 Zalando wants to sponsor hackathons… We need a REST API!
  • 12.
  • 14. First Step Let’s create some REST-(ish) Spring controllers inside our monolithic web application Deploy a couple of Jimmy instances just to serve requests for this “API”
  • 15. Pros The infrastructure is already there! A few days of coding and we’re all set
  • 16. Cons This “API” cannot be deployed independently of Jimmy. Jimmy is infested with business logic everywhere.
  • 18.
  • 19. New Requirements We needed this new API for our existing frontends, also (very high traffic). Some third-party apps also wanted to use this API as their backend.
  • 20. Shop Public API We decided to build a “real” API as a separate standalone project. Of course, there were some dependencies…
  • 21. Data Sources 1.Catalog - SOLr 2.Stock - Memcached 3.Reviews - REST(-ish) API 4.Recommendations - RPC 5.Database - PostgreSQL Shop Public API Catalog Stock Reviews Recommendations Database
  • 22. All of them in the same data centers. This should be no problem…
  • 24. Paradigm Shift Until 2014 We’re a Fashion Company that has a lot 
 of tech knowledge 2015 Suddenly we’re a Tech Company, providing “Fashion as a Service”
  • 27. API First Document and peer review your API in a format like Swagger before writing a single line of code. Ideally, either generate either your server interfaces or your test data (or both) from the Swagger spec
  • 28. REST
  • 29. REST Manipulate resources, don't call methods Expose nouns, not verbs Use HTTP verbs GET, PUT, POST, DELETE, PATCH
  • 30. SAAS
  • 31. Identity Management Our backend services don’t expose APIs yet Company-wide IAM strategy is not ready yet Can only expose read-only features for now
  • 33. Microservices We already have a Service-Oriented Architecture It was mostly SOAP, though… And definitely not micro
  • 34. CLOUD
  • 35. Road to AWS Some challenges ahead… 1. Backend services not available yet 2. SOLr also not available 3. We can’t just move the databases there
  • 37. Challenges Catalog and Stock datasources are latency critical and they’re owned by different teams!
  • 38. Can we solve that?
  • 39. Step 1 - move critical datasources to AWS
  • 40. “Move” Data Sources to AWS We can’t just move them! Jimmy also needs them in the data centers, you insensitive clod! Ok… we just replicate them.
  • 42. Replicating Data Sources SOLr has its own replication mechanism over HTTP memcached should be easy …
  • 44. Step 2 - Build memcached replication
  • 45. Stock Relay Memcached #1 Memcached #2 Memcached #3 Stock Relay(s) SNS Topic … Datacenter eu-central-1 Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue eu-west-1 Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue us-east-1 Memcached ElasticCacheStock Repeater(s) Shop Public APISQS Queue SET / DELETE GET GET GET
  • 47. AWS SOLr Repeater Datacenter Master eu-central-1 eu-west-1 us-east-1 SOLr Slaves Shop Public APIRegion Repeater SOLr Slaves Shop Public APIAWS Master Repeater SOLr Slaves Shop Public APIRegion Repeater
  • 48. Autoscaling SOLr and API http://www.docstoc.com/docs/109290533/Lucid-Imagination# by Erick Erickson Region repeater Slave #1 Autoscaling Group Slave #2 Slave n API Node #1 Autoscaling Group API Node #2 API Node n https://api.zalando.com
  • 49. Scaling Limitations Edited from https://flic.kr/p/zuBXM Region repeater Autoscaling Group Slave #1 Slave #2 Slave #3 Slave #3 Slave #3 Slave #3 Slave #11 Slave #18 Slave #3 Slave #34Slave #3 Slave #3 Slave #59 Slave #3 Slave #3 Slave #123 Slave #3 Slave #3 Slave #3 Slave #3 Slave #3 Slave #325 Slave #3 Slave #3 Slave #3 Slave #3 Slave #42815
  • 50. Y U no scale!
  • 51. S3 Bucket for Replication Datacenter Master eu-central-1 eu-west-1 us-east-1 SOLr Slaves Shop Public API Master Repeater SOLr Slaves Shop Public API SOLr Slaves Shop Public API S3replicationS3replication
  • 53. “Onion layers” for Replication • Start with a single repeater layer - layer0.solr • Setup a Route53 CNAME repeater that links to it - repeater.solr • Entire slave fleet in the ASG is always configured to replicate from repeater.solr CNAME repeater Slave #1 Autoscaling Group Slave #2 Slave n slave.solr layer0-repeater.solr
  • 54. “Onion layers” for Replication • Setup CloudWatch alarms for relevant metrics in the repeater layer. Give them some slack space. • Configure it to send notifications to the onion-layers-topic. • Bring up your instance of onion- layers. CloudWatch CNAME repeater Slave #1 Autoscaling Group Slave #2 Slave n slave.solr layer0-repeater.solr
  • 55. “Onion layers” for Replication • onion-layers creates a new ASG layer. Calculates currentLayer = currentLayer + 1 • The new layer starts replicating from layer<currentLayer-1>-repeater • Adds a new Route53 recordset layer<currentLayer>-repeater. CNAME repeater CloudWatch Autoscaling Group layer1-repeater.solr layer0-repeater.solr onion-layers Autoscaling Group slave.solr
  • 56. “Onion layers” for Replication • After replication, the CNAME repeater is updated to link to layer<currentLayer>-repeater. • onion-layers acts on alarms for the connection between current layer and previous layer. • You can still have your own AS alarms for the connection between slaves and current layer. CNAME repeater CloudWatch Autoscaling Group layer1-repeater.solr layer0-repeater.solr Autoscaling Group slave.solr
  • 57. Yes. These tools will be open-sourced soon
  • 58. Thank you for listening Check out our blog https://tech.zalando.com Our many open source products https://github.com/zalando The STUPS stack https://stups.io Got more questions? You can reach us on twitter @ZalandoTech We’re hiring! Special thanks to Jessie Dude. No Continuum Transfunctioners were harmed during the production of these slides.