2. Who am I?
• Chris Miller
• Huffington Post - Senior Developer
• CMS platform and API
• Started in systems/network admin before code
3. What is Huffington Post?
• #87 most popular site in the world (Alexa)
• #3 most popular news site in world (Alexa)
• #19 most popular US site (Alexa)
• More traffic than nytimes.com
4. Our Platform: Today
• Everything! No, really.
• Perl: CMS core
• PHP “layer” integrated on top of Perl code
• MySQL data storage
• MongoDB for comments storage
• Hadoop for internal statistical analysis
• Memcache for lightweight caching
• Redis for more structured data types
• Varnish for caching!
5. Our Platform: Tomorrow
• Re-think tools and platform from ground up
• Building new API
– Yes, OAuth 2.0!
– Complete REST approach
– Will be public!
• We can’t re-write everything at once, so the API build has 4
phases:
– Build “bridge” middleware to allow access to existing functionality
– Refactor backend edit/admin tools
– Refactor frontend to use API
– Transparently, and calmly, refactor old code while maintaining API
interfaces
6. So what about CI?
• New API is built on CodeIgniter
– Using Phil’s REST library as a starting point
• Thanks Phil!
• Backend editorial tools are being built on CI
• We love CI
– But it isn’t our only framework
– Different tools work better for different teams
– We use what works. You should too.
7. How we scale
• CDN: Akamai
• 80%+ hit rate
• Amazon S3 for origin of static files
• Basic page layout/content is generated to flat file
• These contain some dynamic content, in PHP
• By having the basic page as a flat file, it's less overhead to
load
• It also means for certain changes, we have to "regenerate"
the page. Ugh.
8. Varnish
• HTTP caching reverse proxy (“HTTP Accelerator”)
• Caching layer in front of your web server
• Stores complete responses in memory
• If request exists, serves from memory
– Otherwise, forwards to web server, and then caches
• Works nicely with Linux Kernel to delegate memory
allocation and management to the OS, where it
belongs
9. Controlling Varnish
• Set custom TTLs for content:
if (beresp.http.X-HP-Cache-Control ~ "s-maxage") {
set beresp.http.X-HP-Cache-Control = regsub(beresp.http.X-HP-Cache-Control, "^.*s-maxage=([0-9]+).*", "1");
// set the ttl.
C{
char *ttl;
ttl = VRT_GetHdr(sp, HDR_BERESP, "023X-HP-Cache-Control:");
VRT_l_beresp_ttl(sp, atoi(ttl));
}C
set beresp.http.X-Cacheable = "CUSTOM: " + beresp.ttl ;
} elsif (beresp.http.X-HP-Cache-Control ~ "(no-cache|private)" || beresp.http.pragma ~ "no-cache") {
set beresp.ttl = 0s;
set beresp.http.X-Cacheable = "NO-CACHE";
} else {
set beresp.http.X-Cacheable = "DEFAULT: 30s";
set beresp.ttl = 30s;
}
10. Controlling Varnish
• Refreshing content
sub process_refresh_requests {
if (req.request == "REFRESH") {
set req.request = "GET";
set req.hash_always_miss = true;
}
}
• This is invoked early in the vcl_recvvcl_recv method
11. Edge Side Includes
• Include cached content blocks into pages
<html>
<body>
<esi:include
src="http://example.com/my_page1.html”
alt="http://example.com/my_page2.html"
onerror="continue”
/>
</body>
</html>
12. Edge Side Includes
• How to use ESI:
– Make complicated blocks independently-
accessible URIs
– Create a “template” file with ESI includes to bring
the page together
• Why this is powerful
– If multiple pages use different combinations of
page components, some may already be cached
– Reduces amount of times entire page must be
served; Serve only components needed
13. Varnish Tricks
• Intelligently purge the cache when your
content changes
– Allows you to increase TTL without fear of caching
outdated content
if (req.request == "PURGE") {
if (!client.ip ~ purgers) {
error 405 "Method not allowed";
}
return (lookup);
}
14. Other Scaling Tips
• Hardware SSL offloading is your friend
• Consider mod_php
– CGI has huge overhead
– CGI/SuExec has huge security advantages
– FastCGI is a happy-medium for some
15. Other Scaling Tips
• Don’t try to do everything on one
server/cluster
– Splitting your application is ok
– 1 cluster for frontend, 1 server/cluster for backend, etc.
• Keep an open mind about technologies,
platforms, and tools
17. Guilds!
• What a guild is:
– Groups of people around a topic
– Membership/participating is encouraged, but not
required
– Think of it as an internal Meetup
• Join to learn new things
• Join to talk about things you are interested in
• Examples: PHP, Front End, Python, Ruby,
Management, Platform/Architecture, Big Data,
etc…
18. Guilds!
• Experts to solve technology-specific problems
– Example: Front-end swat team to improve page load
time due to slow/too much JS
• Collectively give back to the community around
your technology
• Help others learn, and learn from others
• Meet people on other teams