"Puppet at Pinterest", by Ryan Park, Operations Engineer at Pinterest. Talk from PuppetConf 2012.
Video of "Puppet at Pinterest": http://youtu.be/aU-bCbBq8zs
Learn more about Puppet: http://bit.ly/QQoAP1
Abstract: A case study of how Pinterest uses Puppet to manage its infrastructure. Pinterest has hundreds of Amazon EC2 virtual servers and uses Puppet Dashboard as the “source of truth” about its server inventory. Pinterest built a REST API for this database, which powers tools and automated scripts that integrate Puppet with internal systems and with Amazon Web Services.
Speaker Bio: Ryan Park leads operations and infrastructure at Pinterest, one of 2012’s fastest growing web sites. Pinterest’s entire infrastructure is in the cloud, built atop hundreds of Amazon EC2 virtual server instances. Ryan introduced Puppet to their infrastructure as soon as he joined the company, and they now use Puppet as the primary tool for managing their infrastructure. Prior to joining Pinterest, Ryan was the Head of Operations at PBworks, an online team collaboration service.
3. Ryan Park / rpark@pinterest.com
Web Application
Servers
Internal
Web Services
Memcache MySQL Redis
4. Ryan Park / rpark@pinterest.com
Before Puppet
‣ 150 virtual servers: web app, MySQL,
Memcache, Membase, Redis,
Elastic Search...
‣ 12 Amazon Machine Images
‣ cut -f 1 ~/.ssh/known_hosts
5. Ryan Park / rpark@pinterest.com
Puppet Dashboard
‣ The “source of truth” about what’s
running in our infrastructure
‣ Alternatives we considered
‣ Puppet manifests: only useful in Puppet
‣ LDAP: difficult to set up
‣ Foreman: too much for our needs
6.
7.
8. Ryan Park / rpark@pinterest.com
Puppet Dashboard
‣ Problem: Some
dependencies are
configured in Puppet
Dashboard, others in
Puppet manifests
‣ Solution: Define your
dependencies in Puppet
manifests when possible
9. Ryan Park / rpark@pinterest.com
Puppet Dashboard
‣ Node Groups are useful…
‣ …but more useful when
you can use the data to
power other systems.
‣ ...and even more useful
when you combine
Puppet Dashboard data
with storedconfigs.
10. Ryan Park / rpark@pinterest.com
REST API
[ryan@mac:~]$ curl https://puppet-dashboard/api/
{
"nodes": "https://puppet-dashboard/api/node",
"node_classes": "https://puppet-dashboard/api/class",
"node_groups": "https://puppet-dashboard/api/group"
}
Self-documenting and
nicely formatted
21. Ryan Park / rpark@pinterest.com
Sample API Client
[ryan@mac:~]$ cat puppet_to_hosts.py
import json
import urllib2
def download_and_decode(url):
request = urllib2.Request(url)
response = urllib2.urlopen(request)
return json.loads(response.read())
def main():
data = download_and_decode("http://puppet-dashboard/api/node/")
for node in data['nodes']:
if node.has_key('ipaddress') and node['ipaddress']:
print node['ipaddress'] + " " + node['name']
if __name__ == "__main__":
main()
22. Ryan Park / rpark@pinterest.com
Sample API Client
[ryan@mac:~]$ python puppet_to_hosts.py
10.150.39.222 azkaban001
10.169.164.132 datalayer001
10.39.63.178 datalayer002
10.97.34.202 datalayer003
10.112.144.31 datalayer004
10.49.10.163 followerredis001a
10.18.185.220 followerredis001b
23. Ryan Park / rpark@pinterest.com
Our API Clients
‣ Generate /etc/hosts file
‣ Generate Monit configuration files
‣ Push hostnames to Amazon Route 53
DNS service
‣ Remove SSL certificates (puppetca
--clean) for nodes that have been deleted
from Puppet Dashboard
24. Ryan Park / rpark@pinterest.com
Our API Clients
‣ Source code deploy tools
‣ Monitoring dashboards
‣ Metrics dashboards
25. Ryan Park / rpark@pinterest.com
Puppet and
Amazon EC2
26. Ryan Park / rpark@pinterest.com
Bootstrapping EC2
‣ One custom image for all our instances
‣ Start with a basic Ubuntu AMI.
‣ Add packages facter, puppet, and
ec2-api-tools.
‣ Modify /etc/rc.local to run Puppet when
the instance launches.
27. Ryan Park / rpark@pinterest.com
We Cheat
‣ Problem: Using Puppet to install all our
dependencies is too slow—it would take
20 minutes to launch an instance.
‣ Solution: We pre-install about 60
Debian packages and 60 Python
packages.
28. Ryan Park / rpark@pinterest.com
EC2 Hostnames
‣ Problem: EC2 instance hostnames look
like “ip-10-113-111-43.ec2.internal.”
‣ Solution: Set the hostname when
booting the instance.
29. Ryan Park / rpark@pinterest.com
/etc/rc.local
[ryan@followerredis001a:~]$ cat /etc/rc.local
#!/bin/bash
# Use ec2-api-tools to determine our instance name.
# /etc/aws/cert.pem and /etc/aws/pk.pem must be present on the AMI,
# along with the Debian packages ec2-api-tools and facter.
export EC2_CERT=/etc/aws/cert.pem
export EC2_PRIVATE_KEY=/etc/aws/pk.pem
INSTANCE_ID=`facter ec2_instance_id`
INSTANCE_NAME=`ec2-describe-tags --filter "key=Name"
--filter "resource-type=instance"
--filter "resource-id=$INSTANCE_ID" | sed 's/.*t//g'`
30. # Set the hostname to $INSTANCE_NAME.example.com
hostname $INSTANCE_NAME
echo $INSTANCE_NAME > /etc/hostname
sed -i "s/^domain .*$/domain example.com/g" /etc/resolv.conf
sed -i "s/^search .*$/search example.com/g" /etc/resolv.conf
IP_ADDRESS=`facter ipaddress_eth0`
echo "# Additional entries added by bootstrap script" >> /etc/hosts
echo "$IP_ADDRESS $INSTANCE_NAME.example.com $INSTANCE_NAME"
>> /etc/hosts
# Puppet will configure this instance based on the classes in the
# Puppet Dashboard.
puppet agent --onetime
31. Ryan Park / rpark@pinterest.com
EC2 Auto Scaling
Busy Provisioned
80
60
40
20
0
5AM 12PM 7PM 2AM
32. Ryan Park / rpark@pinterest.com
EC2 Auto Scaling
Busy Provisioned
80
60
40
20
0
5AM 12PM 7PM 2AM
33. Ryan Park / rpark@pinterest.com
EC2 Auto Scaling
‣ Problem: When using Puppet
Dashboard as an external node
classifier, every host must be declared
explicitly in the Puppet Dashboard
database.
‣ Solution: When a new instance starts,
have it register itself in the Puppet
Dashboard using our REST API.
34. Ryan Park / rpark@pinterest.com
EC2 Auto Scaling
‣ A POST to /api/provision/<node_group>
adds a node to the Dashboard database
and returns the hostname.
[root@ip-10-88-155-31:~]# curl -X POST
https://puppet-dashboard/api/provision/datalayer
datalayer005
‣ This endpoint returns the hostname as
a string, not JSON.
35. Ryan Park / rpark@pinterest.com
EC2 Auto Scaling: /etc/rc.local
# If there's no hostname, there may be a node group name in the
# EC2 user-data string. Use the Puppet Dashboard API to request
# a hostname in that node group.
if [ -z "$INSTANCE_NAME" ]; then
FILENAME="/var/lib/cloud/instances/$INSTANCE_ID/user-data.txt"
if [ -f "$FILENAME" ]; then
NODE_GROUP=`cat $FILENAME`
if [ ! -z "$NODE_GROUP" ]; then
INSTANCE_NAME=`curl -X POST
https://puppet-dashboard/api/provision/$NODE_GROUP`
fi
fi
fi
36. Ryan Park / rpark@pinterest.com
After Puppet
‣ Hundreds of virtual servers in 60 host
groups
‣ 1 Amazon Machine Image
‣ Dozens of scripts pull data from
Puppet Dashboard’s database
37. Ryan Park / rpark@pinterest.com
We’re Hiring!
http://pinterest.com/about/careers
38. Ryan Park / rpark@pinterest.com
Contact
rpark@pinterest.com
ryanpark
@StanfordRyan
Download slides and code samples at:
https://github.com/pinterest/puppetconf