Jason Cook discusses his experience setting up boot infrastructure for Fastly's caching clusters. He outlines how they moved from using existing tools like Cobbler and Razor to building their own solution called Donner using iPXE to boot machines over HTTP. Donner uses Chef to store machine metadata and configuration which allows the boot process to install operating systems, configure networking, and run Chef on first boot to provision machines.
8. Fastly “Mega” Design
• Single platform for caching clusters
• Deployed as a unit, limited to no incremental growth
• Same components for 4 to 32 machine clusters
• Able to justify management infrastructure
• Able to lean on convention
9.
10. The “oob” machine
• Private link to internet
• Provides local provisioning
• DHCP
• Squid
• Donner
12. Why not existing?
• 20+ "datacenters"
• No backbone/internal network
• Too many moving pieces
• Host network complexity
13. Donner
• Sinatra app and cookbook for booting things over http
• iPXE
• Chef as datastore
• Open Source soon (stupid heartbleed)
14. iPXE
• Open Source implementation of pxe
• Formerly known as both gPXE and Etherboot
• ROM image that can be burned into firmware
• Can boot off floppy/usb/hard/other pxe as well
15. Why iPXE?
• Boot of more than just tftp targets
• http, iSCSI,ATAoE, Fiber Channel
• Scriptable
• Minimal hardware and network inventory data
16. Why Chef for the datastore?
• Already available as a common service
• Multiple sources of truth suck
• Databags as integration point
17. Why databags?
• Hardware lifecycle is independent from the node object
• Searchable
• Easy to consume from other tools
18. Partial Search?
• Fast
• Somewhat convenient API
• I’m too lazy to deal with the databag api for reads
19. The Workflow
• Shipment Manifest
• Racking/Cabling
• Map Serial to Real World Location
• Power on machines and wait
20. Vendor Data
• For each shipment vendor provides a spreadsheet
• Serial number
• mac addresses
• Converted to data bag entries
22. Site Details
• Racking/Cabling done by remote hands
• Labels applied to physical position
• Labels mapped to serial numbers in data bags
23. From Bare Metal to Chef
1. Get address
2. Assign boot image
3. Build installer config
4. Build post-install config
5. Install
6. Run chef on first boot
24. Getting iPXE in your pxe
• ISC dhcpd can do conditional responses
subnet 172.16.16.0 netmask 255.255.255.0 {
range 172.16.16.225 172.16.16.254;
if exists user-class and option user-class = "iPXE" {
filename “http://172.16.16.7/images/dhcpd.ipxe”;
} if-else substring(hardware, 1, 3) = 01:1C:73 {
option bootfile-name “http://172.16.16.7:1080/ztp”;
} else {
filename "undionly.kpxe";
}
option routers 172.16.16.7;
option domain-name-servers 172.16.16.7;
}
27. The Install
• Ubuntu with preseed in our case
• Another erb template
• Nothing special here
28. The post-install
• Annoying amount of our magic happens here
• Lots of netconfig the installer can’t handle
• Install internal apt keys and repos
• Install our chef package and kernels
• Configure chef for first boot
• Generated from a template with access to chef objects
29. For more than just installers
• BIOS/Firmware Update ISOs
• Boot a live debug image
• Network Gear
30. Boot an ISO
FreeDOS ISO + vendor firmware
#!ipxe
echo Installing Supermicro Firmware for: <%= @machine['hostname'] %>
!
sleep 3
initrd http://<%= @serverip %>/images/current_firmware.iso || goto error
kernel http://<%= @serverip %>/images/memdisk.iso || goto error
boot
!
:error
echo Something went wrong, dropping to a shell…
shell
31. Network Gear
• Arista Supports dhcp + http
get '/ztp' do
mac = request['X-Arista-SystemMAC']
@device = lookup_device(mac)
erb :ztp
end