- The document discusses an architecture for providing high availability and performance for a Plone site handling high traffic volumes with a requirement for 100% uptime.
- The proposed architecture uses multiple Plone instances behind a load balancer, with Relstorage (MySQL) replication providing redundancy. Mod_wodan and Varnish are used for caching to improve performance. The design eliminates all single points of failure and allows automated failover.
15. Architecture Goals
● Must convince “file-based 100% uptime” sysadmins
● No SPOF
– eliminate all Single Points Of Failure
● Automated failover
– no manual intervention
● Extreme performance
● Extreme resilience
– killall -9 Plone
16. Meet Paul Stevens
● My brother
● mod_wodan + DBmail
● Plone developer
● pjstevns on irc/github/etc
NFG Net Facilities Group
● premium hosting
● 24/7 MySQL HA
– since stone age
● www.nfg.nl
22. Load Balancer
● Client provided hardware load balancer
● Alternative: Linux Virtual Server + HAproxy
– 2x HAproxy in active/passive config
● this would be an EXTRA layer of HAproxy not shown in diagram
– use highly available “virtual” IP address
– monitor with Heartbeat or comparable
– failover virtual IP addres with arping broadcasts
● Alternative: AWS
24. Ensure physical separation
● Ensure redundancy across physical servers
– no use to fail over on same machine
– separate machines in separate data centers
● Gotcha: moving virtuals around
– Disable HA facilities of virtualization platform
– We'll do our own HA
27. ZEO versus Relstorage
● ZEO
– ZEO protocol
– filestorage
– object pickles
● ZRS Replication
– $$$ at the time
– later opensourced
● No hot-failover
– slave master reconfig→
● Relstorage
– ZEO protocol
– MySQL or PostgreSQL
– object pickles: no alchemy!
● MySQL replication
– done that 24/7 since 2001
– widely used
● Hot failover
– multi-master
29. Blobstorage
● Not shown in diagram
● Client provided Netapp Metrocluster NFS disks
– no need to care about replication and HA for those
● Alternatives:
– DRBD + NFS
– AWS Elastic Block Device
– F-sniper + rsync + NFS
● Why not run database on that?
– disk replication + NFS + ZEO
– what can possibly go wrong?
32. mod_wodan
● Caching module for Apache
– C
– Originally by ICS for nu.nl
– Now maintained by NFG
● Store response body + headers on disk
● BOFH attitude to caching policies
● Used in anger
● Alternative: stxnext.staticdeployment
33. Varnish ↔ Wodan
● Proxy process
● RAM memory cache
– restart → empty cache
– expired → gone
● Plays nice
– request + response headers
– etag split-view
● purge API
– plone.app.caching
● Apache module
● Persistent disk cache
– restart full cache→
– expired keep fallback→
● BOFH
– my way or the highway
– single cache file per page
● Cronjobs maintenance
– crawl sitemap
– delete removed pages
38. Multi Master MySQL
● multi-master
– cross replication
● each slaves the other
– any can be master
● hot failover and failback
● Gotcha: use only 1 master at a time
– Relstorage is not multi-master
– avoid replication errors
● mmm_agent server (not shown in diagram)
– monitors mysql health and replication
– manages virtual MySQL HA ip address
● think: Heartbeat for MySQL
42. Readonly Rescue Mode
● File-based content delivery
– mod_wodan
– full cache of all pages + resources
– cached search results (Subject / tag cloud)
● AJAX-driven graceful degradation
– detect backend down via non-cached lightweight view
● @@ipaddress not a full page: minimal rendering overhead
– disable interactive elements via CSS
● search bar, personal tools display:none→
● Gotcha: anonymous only
– down for authenticated until manual reconfig→
● Gotcha: ErrorDocument
– pre-cache nice page but preserve http error status code→