12. Best Practices for Service Migration
12
๏Avoid Changes of Functionalities
๏Test-Driven Development
‣ Unit Test
‣ Integration Test
๏Dark Traffic
‣ Comparison
‣ Capacity
๏Light Traffic
26. Service Proxy Rate Limiting
26
๏Guava RateLimiter (Token Bucket based) and ServerSet
๏No coordination among servers in the cluster
๏Rate limiting rules
๏Max QPS to service
๏Max QPS to a particular API endpoint
๏Max QPS from a specific client to service
๏Max QPS from a specific client to a particular API
endpoint
About me:
Joined 3 years ago, early engineer on Infra team at Pinterest
Storage systems on Twitter’s core data sets: users, tweets, social graph
CRM, Chatter, Platform as a Service at Salesforce
Pinterest if you want to be inspired, you want to discover
What should I have for lunch?
What art pieces I may love?
Emphasize “Picked for you”
Explore what may interest you. Explore interests, related interests, discover things.
Follow interests
Discovery at scale
Why SOA???
3 founders (Ben, Evan, and Paul), 1 engineer (Yash)
December, 2014
Currently 600+ employees, 200+ engineers
We are scaling our organization really rapidly
We want to foster long term ownership.
Accountability.
SLA
Responsive: timeout setting possible at many levels
Resilient: retries, connection pooling, bad host detection/eviction/retry, queue policies
Elastic: dynamic service discovery
Message Driven: Async RPC, thrift, protobuf, http, memcache, redis, etc.
Efficiency: not Python (slow, not static type checked, single process, single request handling), JVM based platform, tens of thousands QPS per host
Future: better than callbacks, way better than java future back in the days
Community: very well supported, plus we know the committers :-)
For both dark and light stages, a mechanism can easily dial up/down the traffic to the new service is a key component of the system.
dark traffic most of time will cause additional load, so capacity needs to be watched closely in some cases; while light traffic and traffic to original service are typically exclusive to each other, no such a concern in most cases.
Today we have ~100 services on prod, majority of them are finagle based, but we also have services in goLang, c++, and python. There are challenges, we want to build the same framework features for all platforms, but it takes time and effort.
Use Java Dynamic Proxy for code injection
sampled access logging, slow request logging, and failed request logging. make it much easier for the service owners to debug and monitor their services.
Dimensions: client, server host, api endpoint, exception type
massive productivity gain: several hours down to a couple of minutes
good investment for team if creating a lot of services, also a building block for one-step service generation
Shell: JLine2, autocompletion of file path; FB Swift for thrift IDL parsing, Python ptsd package for thrift IDL parsing
screenshots of statsboard generated, finagle server/client generation, thrift client generation
screenshots of statsboard generated, finagle server/client generation, thrift client generation