This document discusses using Netflix's microservices stack on AWS. It describes Netflix's architecture of using hundreds of microservices across multiple regions to handle billions of requests per day. It outlines the principles of Netflix's stack including stateless services, auto-scaling, no single points of failure, and designing for failures. Key technologies in Netflix's open source stack are explained like Eureka for service discovery, Ribbon for load balancing, Hystrix for latency and fault tolerance, RxJava for reactive programming, and Dynomite for distributed caching. Chaos engineering practices like fault injection testing are also covered.
5. Why Netflix?
Billions Requests Per Day
1/3 US internet
bandwidth
~10k EC2 Instances
Multi-Region
100s Microservices
Innovation + Solid
Service
SOA, Microservices and
DevOps Benchmark
Social Product
Social Network
Video
Docs
Apps
Chat
Scalability
Distributed Teams
Could reach some
Web Scale
Netflix My Problem
8. Principles
Stateless Services
Ephemeral Instances
Everything fails all the
time
Auto Scaling / Down
Scaling
Multi AZ and multi
Region
No SPOF
Design for Failure
(expected)
SOA
Microservices
No Central Database
NoSQL
Lightweight Serializable
Objects
Latency tolerant
protocols
DevOps Enabler
Immutable Infrastructure
Anti-Fragility
32. Reactive Extension of the JVM
Async/Event based programming
Observer Pattern
Less 1mb
Heavy usage by Netflix OSS Stack
RX-Java
33. Archaius
Configuration Management Solution
Dynamic and Typed Properties
High Throughtput and Thread Safety
Callbacks: Notifications of config changes
JMX Beans
Dynamic Config Sources: File, Db, DynamoDB, Zookeper
Based on Apache Commons Configuration
34. Archaius + Git
MicroserviceMicroservice Slave Side Car
Central
Internal GIT
Property
Files
File
System
MicroserviceMicroservice Slave Side Car
File
System
MicroserviceMicroservice Slave Side Car
File
System
39. Dynomite
Implements the Amazon Dynamo
Similar to Cassandra, Riak and DynamoDB
Strong Consistency – Quorum-like – No Data Loss
Pluggable
Scalable
Redis / Memcached
Multi-Clients with Dyno
Can use most of redis commands
Integrated with Eureka via Prana
49. Chaos Results and Learnings
Retry configuration and Timeouts in Ribbon
Right Class in Zuul 1.x (default retry only SocketException)
RequestSpecificRetryHandler (Httpclient Exceptions)
zuul.client.ribbon.MaxAutoRetries=1
zuul.client.ribbon.MaxAutoRetriesNextServer=1
zuul.client.ribbon.OkToRetryOnAllOperations=true
Eureka Timeouts
It Works
Everything needs to have redudancy
ASG is your friend :-)
Stateless Service FTW
51. Chaos Results and Learnings
Before:
Data was not in Elastic Search
Producers was loosing data
After:
No Data Loss
It Works
Changes:
No logging on Microservice :( (Log was added)
Code that publish events on a try-catch
Retry config in kafka producer from 0 to 5