What if we designed our organizations like we design our systems? Applying scalability principles that we know from building large-scale distributed systems, as well as practical lessons learned at eBay and Google, this session covers how we can design and evolve our engineering organizations to scale.
4. Universal
Scalability Law
System throughput is limited by
• Contention
o Queueing on a shared resource, O(N)
• Coherence
o Coordination and communication between all nodes, O(N2)
http://www.perfdynamics.com/Manifesto/USLscalability.html
5. Universal
Scalability Law
• Implications
o Find ways to remove contention points
o Find ways to reduce or eliminate coordination overhead
o Increased N more contention, more coherence
• Multicore processor design
o Fast to stay within a core
o Expensive to synchronize across cores
• Distributed system design
o Sharding
o Eventual Consistency
6. “What if we designed our
organizations like we design
our systems?”
8. Small
“Service” Teams
• Amazon “2 Pizza” Teams
o No team should be larger than can be fed by 2 large pizzas
o Typically 3-5 people
o Mix of junior and senior people
• Team == Component | Service
o Clear, well-defined area of responsibility
o Single service or set of related services
o Minimal, well-defined “interface”
• Applying the Universal Scalability Law
o Reduce N within teams
o Well-defined responsibilities reduce synchronization / coordination points
between teams
9. End-to-End
Ownership
• Teams own their roadmap
• No separate maintenance or sustaining engineering
team
• Engineers own service from design to deployment
to retirement
10. Team
Anti-Patterns
• Skill-based teams
o Based around tiers or technologies (e.g., front-end team, application
team, DBA team, Ops team)
o (-) Every project crosses many team boundaries
o (-) No end-to-end ownership of the system
o (-) No end-to-end ownership of the customer experience
• Project-based teams
o Form ad-hoc team for a particular project, then disband
o (-) No long-term ownership of code, product, service
o (-) Encourages short-term approach instead of sustainable technical debt
11. Team
Anti-Patterns
• Large teams
o (-) Teams larger than 6-8 should be split
o (-) Communication and coordination overhead makes it increasingly
difficult to sustain velocity
13. Autonomy and
Accountability
• Give teams autonomy
• Freedom to choose technology, methodology, working environment
• Responsibility for the results of those choices
• Make teams self-sufficient
• Team has inside it all skill sets needed to do the job
• Depend on other teams for supporting services
• Hold team accountable for *results*
• Give a team a goal, not a solution
• Let team own the best way to achieve the goal
14. Autonomy and
Accountability
• Clear “contract” provided to other teams
• Functionality: agreed-upon scope of responsibility
• Service levels and performance
15. Decisionmaking
Anti-Patterns
• Single authority
o Decisions made or approved by single person (CTO?)
o (-) Single bottleneck / contention point
o (-) Single point of failure
o (-) Unsustainable for decisionmaker
o (-) Discourages autonomy, ownership, growth
• Unanimity / Consensus
o Decisions made or approved by “everyone”
o (-) Constant need for coordination / coherence
o (-) Increasingly ineffective / counterproductive as organization grows
o (-) Discourages autonomy, ownership, growth
17. Effective
Global Teams
• Local Ownership
o Well-defined area of responsibility
o Clean interface with the rest of the organization
• Individual teams are co-located
o High-bandwidth communication within a team
o Minimal coordination across teams
18. Global Team
Anti-Patterns
• Anti-Pattern: Split Teams Over Geographies
o (-) Constant need for coordination over time zones
o (-) Local conversations become disruptive rather than helpful
o (-) No local pride of ownership
• Anti-Pattern: Remote Team as Job Shop
o (-) Constant need for management and task assignment
o (-) Resentment between first-tier and second-tier sites
o (-) No local pride of ownership
o Ex. eBay remote offices vs. Google remote offices
19. Distributed
Teams
• Fully distributed *OR* fully co-located
o Distributed teams rely on virtual proximity (chat, hangouts, IRC)
o Co-located teams rely on physical proximity (co-working)
• Anti-Pattern: “Mostly” co-located
o (-) Co-located majority ends up determining communication methods
o (-) Remote individuals left out, less able to contribute, less productive