Alex Norcliffe from Conde Nast International Digital/Umbraco Core Team and Peter Miller from Conde Nast Digital UK will discuss umbraco on a large scale, cloud computing and scalability.
2. Alex Norcliffe International Tech Lead for CondéNast International, now consultant Technology Architect Email: alex.norcliffe@boxbinary.com Twitter: alex_norcliffe URL: www.boxbinary.com Peter Miller Head of Tech & Development, CondéNast UK Email: peter.miller@condenast.co.uk Twitter: petemill Blog: http://wishfulcode.blogspot.com/ About us
3. Scaling Umbraco to a large, high traffic publishing environment Development Environment Handling traffic – code scalability Handling traffic – servers & the cloud
5. Editors want... Flexibility - Site Structure - Template configuration Workflow: - Notifications - Granular permissions Great experience: - Convergence of tools - Slick editing interface
6. Rapid development Easy to extend Integration with existing systems Open-source No license fees for the product Share costs – become part of an evolving platform Avoid buying limited access to a vendor Leave more budget for creating great sites Re-use content across multiple sites and domains Host many sites on one server set-up Non-technical user interface Low cost of support Why Umbraco is a great choice for large publishers
7. Umbraco in a large publishing environment Use simple, custom (but generic!) DataTypes for component convergence. Our goal is to give the editor’s one experience and a consistent flexibility.
8. Umbraco in a large publishing environment Highly configurable, layout and structure-independent templates. Logical inheritence of templates, even in code - base classes for similar layout templates that share the same data template.
10. What’s “massive”? Unpredictable traffic pattern Timezone of audience spread across globe Large amount of content: cache turnover rate High amount of pageviews CondeNet UK + Italy combined = >100m PVs per month Handling Massive Traffic
11. Scaling the code Code performance before adding caching JetBrains dotTrace code profiler (Screenshot) Even String.Concat rather than String.Format! BoxBinaryWebCacheManagerframework (Screenshot) Memcached is great too Lucene indexing of external data sources Comments, Image + Video assets OutputCaching “icing the cake” Handling Massive Traffic
Intro by Alex and PeteAlex - my name is Alex, Pete (...) launchesTogether we have worked on:High traffic sites, large content structures.>200,000 articles, 2 million images, >40 million pageviews a month
Q&A?
PETEMany CMS in the past. Our exp UmbracoGreat solution for med 2 large publishing sitesWe = not biggest player, but want to make sure we’re having the right discussions
PETECurrent env:7+ sitesEditorial team 30+Developers 10+IND tools, shared componentsDevs want convergence and not to reinvent the wheel each time they make an app
PETENot only passionateEds passionate, want great experience – they have great contentANWhyUmbraco
ALEX (3)
ALEX“So, what are some ways we put this into effect moving on since the original Wired launch?”Avoid logins to several systemsCustom DataTypes – give a streamlined experienceIntegration with our existing systems – images / comments
ALEXSuper-generic Document Types. Generic, configurable code that knows about the site context, but doesn’t depend on a really restrictive site structure. Write code that can use the same data schema (doctype) but different layout (template)
PETEConstant Red light / Green light monitoring (shanselmann)Continous dev site deployment to test serversBuild on-demand to staging / editorial working areaPush to live from staging buildIt’ll get interesting when we see where the code goes from there... Into the cloud....
ALEXWhat’s massiveCondeNet websites outside of the US total around 200 million page views per month across about 20 websites. Wired.com in the US runs at about xxx million per month, and the goal was to replicate this kind of success (spread across each country) on one central platform.Therefore, massive is about trying to squeeze as much as possible from the same platform whilst coping with peaks:Wired has the kind of traffic pattern that can peak very suddenly, e.g. if a story gets picked up by EngadgetOnce Wired was spread across the globe, the time zones of peak usage hitting one central platform meant sustained 24/7 load (although only UK and IT so far )-Massive is also about amount of content: caching is OK, but if you have tons of pages, your cache turnover rate is very high so the code performance is still paramount. This is even more important because CondeNet does not have a glut of spare servers hanging around waiting to pick up the slack for our lazy coding
ALEXScaling the codeA few years ago Alex put in place a target policy of a TTFB of 200ms for ASP.NET pages under load of 100rps BEFORE outputcaching - that means:Code profiling: JetBrainsdotTrace 3.1 is a great tool from the same guys who make ReSharperDEMOCode-level caching of common data (e.g. the profiling showed us about the Umbraco Dictionary needing caching)WebCacheManager framework available on Alex’s blog.Instead of writing code which sets the timeout of a cache entry based on minutes, instead decorate objects with attributes which describe it:How much memory does it use?How expensive is it to create (e.g. long-running db query, webservice call)?How often does the data need to be refreshed?Then, the WebCacheManager makes a judgment on how long to cache the object for based on the whole landscape Also allows for very expensive objects to be serialized for disk asynchronously so that cache items survive application restartsUsing DFS you can then distribute cache items to other machines which monitor the cache folder and load items into cacheDisconnecting data connections as soon as possibleAvoiding Session state like the plague. Do you really need server-side generated user-specific content on the page? E.g. you can show login status using a jQuery callbackPage lifecycle caching: using singleton objects like the HttpContext to ensure you only grab data once per page lifecycleUsing Lucene indexes for common data queries (e.g. external Image and Comment databases) MORE ON THIS LATERAlways keep in mind that your code may run on more than one web server: be careful with file locks and replicationEven using String.Concat instead of String.Format!ONLY when you’re happy with this, THEN put in OutputCaching. OutputCaching doesn’t work with Umbraco. Why? Small bug in the requestModule which sets the UrlRewriting path (Default.aspx) just before the Framework stores the path for OutputCaching.We subclassed the requestModule to change the event at which point Umbraco does this, which enables OutputCaching. This code will be in 4.1 but is available on our blog.OutputCaching on a large site can give you a high turnover, but allows you to prevent high CPU during a peak and covers the parts Macro caching doesn’t reach
ALEX - ref
ALEX - ref
ALEX - ref
PETEWhen you’re too large to consider shared hosting, but not big enough (or crazy) to manage 12 data centres around the world, you have a few options – managed or co-located.Vs cloud-provider model - pick & choose services.GREAT Services – and the prices are comparatively amazing.Put as much in the hands of the experts, use disposable instances for the rest.ELB, S3 / Azure Blob, CF, EC2, AZURE... Azure is even better – upload your app and metadata about spec needs, and the cloud will handle the rest.Redundancy everywhere!Backup everywhere!And our personal tip – run Umbraco sites from a distributed repository, not a NAS.... Git... Rollback!Tools for editors to do stuff – do it in Umbraco!
PETE - ref
PETE - refCloud providers operate through API.That means communities have developed great tools already to help you manage servers, view uptime, statistics....but to be honest, what we’re doing will be made a lot easier when the full Azure platform comes out. You still have to RDP and manage servers at Amazon... So we’ll be running tests with Umbraco.
PETEApplyingI’ve spent a lot of time looking at ways to keep your web farm in synch: synch framework providers, msdeployAlex’s patch...Azure will handle all this for us....