How do effective large-scale service ecosystems work? Keynote Presentation at Istanbul Tech Talks 2018
How to Design Services
* Systems of record
* Interface specification
* Interface backward / forward compatibility
Service Ecosystems
* Layered services
* "Standardization" through encouragement
* Vendor-customer relationships between teams
Operating and Deploying Services
* Data Migration
* Automated Pipelines
* Incremental Deployment
* Feature Flags
2. Background
• VP Engineering at WeWork
o Physical space as a service
• VP Engineering at Stitch Fix
o Revolutionizing retail using data science and machine learning
• Director of Engineering for Google App Engine
o World’s largest Platform-as-a-Service
• Chief Engineer at eBay
o Multiple generations of eBay’s infrastructure
10. Service as System of Record
• Single System of Record
o Every piece of data is owned by a single service
o That service is the canonical system of record for that data
• Every other copy is a read-only, non-authoritative cache
@randyshoup linkedin.com/in/randyshoup
customer-service
styling-service
customer-search
billing-service
11. Service Interface
• A service interface includes
o Synchronous request-response (REST, gRPC, etc)
o Events the service produces
o Events the service consumes
o Bulk reads and writes (ETL)
• Events are a first-class part of a service interface
• The interface includes any mechanism for getting data in or
out of the service (!)
@randyshoup linkedin.com/in/randyshoup
12. Service Interfaces / Prototypes
• Define Service Interface (Formally!)
o Propose
o Discuss with client(s)
o Agree
• Prototype Implementation
o Limited investment of time and people
o Simplest thing that could possibly work
o Client can integrate with prototype
o Implementor can learn what works and what does not
13. Service Interfaces / Prototypes
• Real Implementation
o Throw away the prototype (!)
• Rinse and Repeat
14. Maintaining Interface Stability
• Backward / forward compatibility of interfaces
o Can *never* break your clients’ code
• Semantic versioning (major.minor.patch)
o Often multiple interface versions
o Sometimes multiple deployments
• Explicit deprecation policy
o Strong incentive to wean customers off old versions (!)
15. Service Design Anti-Patterns
• The “Mega-Service”
o Overbroad area of responsibility is difficult to reason about, change
o Leads to more upstream / downstream dependencies
• “Leaky Abstraction” Service
o Interface reflects provider’s implementation, not the consumer’s model
o Consumer’s model is typically more aligned with the domain, simpler, more abstract
o Leaking provider’s model in the interface constrains evolution of the implementation
16. Service Design Anti-Patterns
• “Client-less” Service
o Service built without a specific client in mind
o Very difficult to design a minimal, complete interface without a tangible use-case
o Opportunity cost and wasted effort
o Usage is the true metric of the value of a service
• Shared persistence
o Breaks encapsulation, encourages “backdoor” interface violations
o Unhealthy and near-invisible coupling of services
o (-) Initial eBay SOA efforts
18. Ecosystem of Services
• Hundreds to thousands of
independent services
• Many layers of dependencies, no
strict tiers
• Graph of relationships, not a
hierarchy
C
B
A E
F
G
D
19. Google Service Layering
• Cloud Datastore: NoSQL service
o Strong transactional consistency
o SQL-like rich query capabilities
• Megastore: geo-scale structured database
o Multi-row transactions
o Synchronous cross-datacenter replication
• Bigtable: cluster-level structured storage
o (row, column, timestamp) -> cell contents
• Colossus: distributed file system
o Block distribution and replication
• Borg: cluster management infrastructure
o Task scheduling, machine assignment
Cloud Datastore
Megastore
Bigtable
Colossus
Cluster manager
20. Evolution, not Central Control
• No centralized design or approval
o Most technology decisions made locally instead of globally
• Build services as needed
o Create / extract new services when needed to solve a problem
o Services justify their continued existence through usage
o Deprecate services when no longer used
• Appearance of clean layering is an emergent property
21. “Every service at Google is
either deprecated or not ready
yet.”
-- Google engineering proverb
22. Standardization
• Standardized communication
o Network protocols
o Data formats
o Interface schema / specification
• Standardized infrastructure
o Source control
o Configuration management
o Cluster management
o Monitoring, alerting, diagnosing, etc.
24. The easiest way to “enforce” a
standard practice is with
working code.
25. In a healthy service
ecosystem, standards become
standards by being better than
the alternatives.
26. Service Relationships
• Vendor – Customer Relationship
o Friendly and cooperative, but structured
o Clear ownership and division of responsibility
• Customer Focus
o Value of service comes from its value to its customers
• Customer can choose to use service or not (!)
o Must be strictly better than the alternatives of build, buy, borrow
27. Service Relationships
• Competition keeps system healthy
o If a common service is no longer useful, we can redeploy its resources
• Incentives are aligned
o Usage is an objective measure of a service’s success
29. Goals of a Service Owner
• Meet the needs of my clients …
• Functionality
• Quality
• Performance
• Stability and reliability
• Constant improvement over time
• … at minimum cost and effort
• Leverage common tools and infrastructure
• Leverage other services
• Automate building, deploying, and operating my service
• Optimize for efficient use of resources
30. Bounded Context
• Primary focus on my service
o Clients which depend on my service
o Services which my service depends on
• Very little worry about
o The complete ecosystem
o The underlying infrastructure
• Cognitive load is very bounded
• Small, nimble service teams
Service
Client
A
Client
B
Client
C
31. Incremental Change
• Decompose every change into incremental steps
• Each step maintains backward / forward compatibility of
data and interfaces
• Multiple service versions commonly coexist
o Every change is a rolling upgrade
o Transitional states are normal, not exceptional
32. Example: Data Migration
• Requires dual data processing and storage (“dual
writes”)
• Careful set of transitional steps
o A A | B B | A B
A
A
A A A A
A
BB B
B
BB BB B
B
BB A
A
A BB B
B
33. Service Deployment
• Automated Pipeline
o Push-button, repeatable pipeline of build, test, deploy
• Independent Deployment
o Each service deployed independently, *not* as a group
• Incremental Deployment
o Canary testing
o Staged rollouts by %
34. Service Deployment
• “Feature Flags”
o Decouple code deployment from feature deployment
o Rapidly turn on / off features without redeploying code
o Typically deploy with feature turned off, then turn on as a separate step