4. Where We Started
• The network worked, but it wasn’t easy
– Large L2 bridged architecture, minimal L3
segmentation, multiple NAT layers
– Two distinct “business units”
– Manual configuration, “tribal knowledge”
– Numerous single points of failure
5. Where We Went
• Needed to operate as a unified team
– Consistent support experience, improved RF
spectral efficiency, coordinated IP allocations
• Standardized COTS equipment
– “CCIE off the street” factor, escalation path
• Standardized service offerings
– Org department handoff’s always wired gigE; as
aggregated “islands” or single demarc
– Participant camps supply very prescriptive
equipment, “self-install” provisioning
6. Where We Went
• Route, always
– No L2 segments past a single device
– OSPF everywhere, core backbone & “islands”
– Segment where possible, even over WiFi
• Automation
– Initially covering all routers & switches
– Target goal to cover any device with a config or
supplemental service (DNS, monitoring)
7. Automation!
• Held bakeoff (mid 2011 evaluation)
– Homegrown YAML config templates
– Prototyped NCG (see NANOG49 Tutorial
“Automating Network Configuration”)
• NCG won (3 yrs ago)
– Open source, vendor agnostic
– Initial steep curve, very easy to embrace
– Principal developer already a team member
10. Summary Overview
• {Automation} data modeling isn’t easy
– Imagine all your inputs & outputs (device configs,
DNS, monitoring, billing, etc.)
• Single source of truth
– Git, a wiki, fancy IPAM: choose what fits your
organization’s workflow, stress level, & budget
• Start at L8 and L1, meet in the middle
– People + physical layer = organic processes
– End goal is to be efficient, not become a SW dev
Thank you for the opportunity this morning
Originally just automation - background on design, technology
Burning Man is an event … held on federal land … northern Nevada, 2hrs north-east of Reno
Called the playa or Black Rock City
Just under 70k participants
Described as social experiment, festival, party, … - it’s a city
Leave no trace event
Zero infrastructure before & after event
~3 weeks to build out & 3 weeks to tear down
BM provides basic life safety (port-a-potties, medical care) along with “guard rails” (mutual aid – law enforcement, ice for purchase)
Everything else – water, food, shelter, Bring Your Own
Everyone city has infrastructure
This is what BM looks like to me
When I returned, took the approach of
Toyota manufacturing, Six Sigma, consultant evaluation phase
Team = sysadmin/helpdesk, not network engineers or architects
Some level of routing, done with shell scripts that adjusted local routing tables
Limited investment to add redundancy, power outages and physical stability
For historical reasons, two different customer bases
Departments pre-event, camps during = staff exhausted
Common team with common goal of an effective service, regardless if the end-user is a department OR camp
COTS is an old term, “commercial off the shelf” – products that offer technical support, warranty, known best practices
For switching - wanted fanless, active PoE, gigE
Handoff isn’t known to the customer, be it wired or wireless backhaul
A truck roll is incredibly expensive
Same L2 across playa, bring your own VPN or tunnel mechanism
OSPF just works, most devices L2 bridges – no need for exotic mesh networking
Tired of sitting in a shipping container for a week, too much hands-on L1 work to be done
Can’t afford to manually config
Consistent service, standardized equipment = made it easier to automate!
You don’t have to buy all the same equipment, but in our circumstances, it helped
Only did switch + router configs, then added DNS, monitoring
Added wireless equipment later
Take baby steps
NCG + static configs in Git very powerful, offline distributed database
Change is difficult for everyone to handle, tackle people and physical layer first = automation becomes a natural extension