Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Puppet at Google
1. Puppet at Google
Gordon Rowell
Puppet Camp Sydney 2013
gordonr@google.com
2. Non-Goals
Not here to to talk about
● Hiring practices
● Release schedules
● Puppet configs
● Monitoring
● Compliance
● Auditing
● ...
See also Jason Wright's talk from PuppetConf 2011
3. Background
Puppet at Google is offered as an infrastructure service
● Run by a Site Reliability Engineering (SRE) team
● Customers are OS teams
● Does not manage Google's customer facing infrastructure
(search, Gmail, etc.)!
● Manages internal laptops, desktops and servers
4. How Many Nodes?
Clients:
● "Lots" of Mac desktops and laptops
● "Lots" of Ubuntu desktops, laptops and servers
● "Some" others
Servers:
● "Tens" of puppet config servers
● "Units" of puppet CAs
● Deployed in five globally distributed VIPs
● Clients use Anycast to find closest "server"
5. Scaling is fun
● We don't deploy "a server"
○ Servers break, power fails
○ Clients/DNS need to be reconfigured
● We don't deploy "a cluster"
○ Networks break, servers break, power fails
○ Clients/DNS need to be reconfigured
● We deploy redundant clusters
○ Attempt to send clients to nearest serving cluster
○ Anycast means unified client configuration
6. Load balancing is fun
Do you have enough capacity?
● How many backends do you need?
● What happens if half of your backends lose power?
● What about when half are already out for repairs?
How do you send clients to the right cluster?
● Client configuration
● DNS round-robin (simple global load balancing)
● DNS views (give best answer for client IP)
● Anycast (portable IP, routed to "nearest" cluster)
● Consider: DNS views plus Anycast
7. Anycast is fun
● Anycast is "coarse-grain" load balancing
○ It normally sends traffic to closest serving cluster
● Networks break
○ Physical issues
○ Routing issues
○ Configuration issues
○ VIP load balancer bugs
● All clients could be sent to the same cluster
○ Be ready for that
○ Can a single cluster handle worldwide traffic?
○ What do you do if you can't?
8. Puppet problems: Thundering herds
● "Lots" + "lots" + "some" == "thundering herds"
● What if they all want to do a puppet run?
● What about every hour?
● What about every five minutes?
● Masterless puppet is being considered
9. Puppet problems: Release tracks
● OS releases have unstable, testing, stable branches
○ Maintained by OS platform teams
● Addons also have unstable, testing, stable branches
○ Maintained by service owners
● Using different tracks for OS and addons is hard
○ However, that's common - testing a new addon release
○ Puppet's global namespace is part of the problem
10. Puppet problems: Namespaces
● Lots of developers moving fast == conflicts
● Conflicts mean surprises
● Qualify everything
● Testing with rspec-puppet helps to catch issues early