Keeping your service configuration aligned over hundreds of hosts is not a simple task. In this talk, we illustrate how we automated the integration of HAProxy into our infrastructure at University of Paderborn.
As our current generation of commercial load balancer appliances approached end of life, we thought about replacement options and improving how we manage our services while being at it. The main goal was building a scaleable, consistent, active-active setup of load balancers which could be easily automated with open source tools.
We needed a way to define what a service is and how/where it should be configured, balanced and monitored we created a simple service defintion format in YAML and small Python library to help with parsing, inheritence, defaults etc. The automation framework bcfg2 was a given as it was already in use to manage hundreds of Linux and Windows systems and services. As it's written in Python it's easily extendable.
As load balacing options we implemented anycast (for examples for Kerberos KDCs) as well balancing by HAproxy nodes where the HAproxy frontend IPs might be anycasted as well. When running production services it's important to know when things break before the user does, so setting up monitoring for frontend and backend services is part of the picture, too. All bits of configuration for HAproxy, anycast, route reflection, monitoring with Icinga2, netfilter (nftables) rules, etc. are automagically generated based on the service configuration. This talk will lay out how all those parts fit together and are generated.
Of course, we also explain the pitfalls of this setup and what we (hopefully) learned from it.
3. Who's who Maximilian Wilhelm
Networker
OpenSource Hacker
Fanboy of
(Debian) Linux
ifupdown2
Occupation:
By day: Senior Infrastructure Architect, Uni Paderborn
By night: Infrastructure Archmage, Freifunk Hochstift
In between: Freelance Solution Architect for hire
Contact
@BarbarossaTM
max@sdn.clinic
3 / 24
5. Who's who
Context
Context
Paderborn University
20.000 students
2.500 employees
Lots of central IT services
IDM (LDAP, Kerberos, AD, …)
Mail (SMTP, IMAP, PMX, Mailman, Exchange)
An aweful lot of websites
eLearning things (Moodle, PAUL, …)
SharePoint
File services
The Internet
...
5 / 24
8. Who's who
Context
The Past
The Past
Cisco Nexus based L2 fabric
VLANs for service / backend networks
2x F5 Viprion 2400 LBs
Router / default gateway for all service networks
Prefixes for VIPs statically routed to VRRP IP
Prefixes for backend networks statically routed to VRRP IP
No ACLs between service networks
Out-of-everything end of 2018
Manually configured
Even monitoring
8 / 24
12. Who's who
Context
The Past
The Idea
The idea
A service as central config element
Can be balanced by
Anycast
HAProxy
If balanced
Service VIPs announced via BGP
Should be Active/Active
Monitoring configured automatically
Checks for frontends / VIPs as well as backends
Config of webserver(s) generated
Should additionally allow
H/A clusters
Caching layer for web stuff
Subnets of service nodes should be routed by DC routers
with ACLs
12 / 24
13. Who's who
Context
The Past
The Idea
Working DC network setup
All VLANs everywhere
BGP capable DC routers
Heavy automation for Linux boxes
bcfg2
Written in Python
Easily extendable
Config generators for Icinga2
Basic Apache2 templating
People not afraid of automation
On the contrary
What was in the cards?
13 / 24
14. Who's who
Context
The Past
The Idea
Now what IS a service
Has an FQDN
resolves to IP and/or Legacy-IP addresses
Has a proto and service
proto derived from service, if possible
e.g. tcp/http or tcp/80
Is provided by hosts of $bcfg2_group
e.g. kdc-production
May be anycasted
May be balanced
And the LBs anycasted
May be a web thing
With special http config
e.g. template, redirects and stuff
May have special monitoring config
14 / 24
15. Who's who
Context
The Past
The Idea
How does it look like?
mwilhelm@kili:/bcfg2/etc/services/imt/infrastructure/anycasted$ cat kerberos-kdc.srv
anycast: True
status: produktiv
name: kerberos-kdc
fqdn: kerberos.srv.imt.uni-paderborn.de
service: kerberos
bcfg2_srv_group: kdc-slave
monitoring:
virtual_bcfg2_groups:
- kdc
- imt-master
15 / 24
16. Who's who
Context
The Past
The Idea
Well OK, it has a defaulting mechanism, too
mwilhelm@kili:/bcfg2/etc/services/imt/infrastructure/anycasted$ cat defaults.yaml
anycast: True
status: produktiv
mwilhelm@kili:/bcfg2/etc/services/imt/infrastructure/anycasted$ cat kerberos-kdc.srv
name: kerberos-kdc
fqdn: kerberos.srv.imt.uni-paderborn.de
service: kerberos
bcfg2_srv_group: kdc-slave
monitoring:
virtual_bcfg2_groups:
- kdc
- imt-master
16 / 24
19. Who's who
Context
The Past
The Idea
The Now
Lessons learned
Bad NIC firmware is bad
BGP timeouts are long
Recovery times are bad when L2 is a black hole
BFD will solve this
HAProxy configuration is complex
Lots of switches have effect on other switches
No way to ask HAProxy what config options are active
19 / 24
20. Who's who
Context
The Past
The Idea
The Now
The good
Backends with support for Proxy Protocol
Apache2
Cyrus IMAP
Dovecot
Exim
Nginx
Postfix
Varnish
...
20 / 24
21. Who's who
Context
The Past
The Idea
The Now
The bad
OpenLDAP
No support for Proxy Protocol
Has to be DNATed by HAProxy when slapd should see client IPs
Therefore LDAP backends have to be routed by HAProxy
Exchange
Funny problems with timeouts (solved)
Funny problems with Outlook for Mac clients
SharePoint
Funny problems when you don't use tcp mode for some vHosts
I want this hour of my life back
21 / 24
22. Who's who
Context
The Past
The Idea
The Now
Bonus level: Packet lter con guration
We know what ports a service is using
We know where (backend, frontend)
Let's generate netfilter rules
Limiting access to source prefixes just came on top
Specifying additional_ports, too
mwilhelm@kili:/bcfg2/etc/services/imt/infrastructure/anycasted$ cat proxy.srv
name: proxy
fqdn: proxy.srv.imt.uni-paderborn.de
service: proxy
protos: tcp
port: 3128
bcfg2_srv_group: proxy-server-produktiv
acl:
allow_from:
- imt_thinclients
- imt_fw_mgmt
22 / 24
23. Who's who
Context
The Past
The Idea
The Now
Links
Further Reading
BGP / networking basics
https://myfirst.network
Anycast with Cisco Nexus 7000 and Debian Linux
https://blog.sdn.clinic/2018/02/anycasted-services-with-debian-bird-anycast-
healthchecker-and-cisco-nexus-7000/
Anycast all the things
https://www.slideshare.net/BarbarossaTM/anycast-all-the-things
23 / 24