2. • How to build a bigger, faster, and more
reliable website
• You will learn the concepts of Speed and
Scalability
• Specific Examples of Caching, Load
Balancing and testing tools.
Introduction
Friday, November 12, 2010
3. • What is Scalability?
• Avoiding Failure
• High Availability?!?!?
• Monitoring
• Release Cycles
• Fault Tolerance
• Load Balancing
• Static Content
• Caching
• Yslow & PageSpeed
Agenda
Friday, November 12, 2010
4. • Horizontal Scalability
• Capacity can be
increased just by
adding more
hardware/software
• Best solution
• Does not
guarantee that you
are safe
• Up (Vertical) Scalability
• Capacity can be
increased by adding
more Disk Storage,
RAM , Processors
• Expensive
• Should only be
used if Horizontal
will not work for
you
What is Scalability?
Friday, November 12, 2010
5. • Capital investment will be made
• The system will be more complex
• Maintenance costs will increase
• Time will be required to act
Scalability
Considerations
Friday, November 12, 2010
6. • Good Planning
• Have a plan for whatever you are about to do to
your system, and most importantly, have a roll-
back plan if and when things do not work the
way you expected.
Avoiding Failure
Friday, November 12, 2010
7. • Functional and Unit Testing
• Automated test do not catch everything that
can go wrong, but they are very good at catching
bugs introduced by changes elsewhere in your
code base
• Unit Testing (PHPUnit, Simpletest)
• Function Testing (selenium)
Avoiding Failure
Friday, November 12, 2010
8. • Control Change (Version Control)
• USE IT!!!! There is no better way even as a single
developer to keep your codebase safe from bad
changes
Avoiding Failure
Friday, November 12, 2010
9. • /trunk/
• Used for all mainline development
• /branches
• Used to do development that needs to be
separate from the trunk code
• /tags/
• Holds copies of production ready code
• Do not useVersion Control as a backup solution,
backup yourVCS separately
Version Control in Action
Friday, November 12, 2010
10. • What is “five nines” 99.999%?
• Do the math, 60 seconds * 60 minutes *
24 hours * 365 days
• 31,536,000 seconds of uptime a year
• 99.999 * 31536000 = 315.36 seconds
of downtime a year
High Availability?!?!?!
Friday, November 12, 2010
11. • Understand the goodness of “Planned
maintenance periods”
• There are things you will need to do to
your systems on a periodic basis I.E.
Database Cleanup, Disk Defrag, Software/
Hardware Upgrades
High Availability?!?!?!
Friday, November 12, 2010
12. • You can stagger your maintenance periods
if you have enough servers so you have no
customer downtime, just a reduction in
capacity
High Availability?!?!?!
Friday, November 12, 2010
13. • No matter how stable your code is or how
reliable your hardware, you will have failure
Monitoring
Friday, November 12, 2010
14. • Top Down (Business Monitors)
• Monitor the application as the customer
interacts with it
Monitoring Methods
Friday, November 12, 2010
15. Monitoring Methods
• Bottom Up (System Monitors)
• Most commonly used
• Monitors the base components of your application like
• Disk Space
• Network speed
• Database Statistics
• By no means bad, but without Business Monitoring you will not be able
to catch all failures
Friday, November 12, 2010
16. • SNMP Support
• Can support most systems out there
• Extensibility
• Ability to plugin custom monitoring packages
• Flexible notifications
• Handle notifying operators and escalating issues if they are not looked
into
• Custom reaction
• In the event of errors that can not be diagnosed by computers, need to
be able to notify a human to do further investigation
Criteria For A Monitoring System
Friday, November 12, 2010
17. • Complex scheduling
• Ability to set the monitoring frequency and timing per monitoring item
• Maintenance scheduling
• Monitors should never be taken offline, they need to be smart enough to
know when a maintenance period is in effect
• Event acknowledgment
• Ability to understand when a event needs to be paged to a human at
2am, and when it shouldn't
• Service dependencies
• You need to monitor all points between your monitoring system and the
client.This includes Firewalls, Routers, Switches
Criteria For A Monitoring System
Friday, November 12, 2010
18. • Basic Release Cycle
• Development
• Things are expected to break
• Staging
• QA and bug fixing a build before release
• Production
• Only serious bug fixes are pushed
Release Cycles
Friday, November 12, 2010
19. • Keep in mind that reality has priority over “Best
Practice”
• You can and will have to release from
development… it happens
Release Cycles
Friday, November 12, 2010
21. • Load Balancing is NOT HA
• Balancing is meant to spread the workload
of requests across the cluster
Load Balancing
Friday, November 12, 2010
22. • Round robin
• One request per server in a uniform
rotation
Balancing Approaches
Friday, November 12, 2010
23. • Least connections
• The faster the machine processes
requests the more it will receive
Balancing Approaches
Friday, November 12, 2010
24. • Predictive
• Usually based on Round robin or Least
connections with some custom code
Balancing Approaches
Friday, November 12, 2010
25. • Available resources
• Not a good choice, bad performance
Balancing Approaches
Friday, November 12, 2010
26. • Random
• Pure random distribution of requests
• Weighted random
• Random with a preference to specific
machines
Balancing Approaches
Friday, November 12, 2010
27. • Static content examples
• Images
• CSS
• JS
• Any non dynamic element
Static Content
Friday, November 12, 2010
28. • Serving these items from a dedicated
server fees up your web process for actual
dynamic code, intern increasing your
capacity and response speed
• On you static server you can use
lightHTTP, which is very quick at serving
static content compared to apache
(Although apache 2.2.x is much better than
1.3.x)
Static Content
Friday, November 12, 2010
29. • Layered / Transport Cache
• “Transparent”
• Placed in-front of your hardware and
caches requests before they hit your
webserver
Types of Caching
Friday, November 12, 2010
30. • Integrated (Look-Aside) Cache
• Computational Reuse technique
• Used where the cost of storing the
results of a computation and later
finding them again is less expensive
than performing the computation again
Types of Caching
Friday, November 12, 2010
31. • Write-Thru Caches
• Application is responsible for updating
the Cache and Datastore when changes
are made
• Write-Back Caches
• All data changes are made to the cache
• Cache layer is responsible for modifying
the backend datastore
Types of Caching
Friday, November 12, 2010
32. • Distributed Cache
• Using several machines to cache data,
distributing the data and load
• Memcached can do this very simply
Types of Caching
Friday, November 12, 2010
33. • It is a high-performance, distributed object
caching system
Memcahed
Friday, November 12, 2010
34. • It is simple to setup and use
• # ./memcached -d -m 2048 -l 10.0.0.40 -p 11211
Memcahed
Friday, November 12, 2010
35. • It is not designed to be redundant
• If you loose data you memcache will
repopulate the data as it is accessed
Memcahed
Friday, November 12, 2010
36. • It provides no security to your cache
• “Memcached is the soft, doughy underbelly of
your application. Part of what makes the clients
and server lightweight is the complete lack of
authentication. New connections are fast, and
server configuration is nonexistent. If you wish
to restrict access, you may use a firewall, or have
memcached listen via unix domain sockets.”
Memcahed
Friday, November 12, 2010
37. • Alternative PHP Cache
• The Alternative PHP Cache (APC) is a free and open
opcode cache for PHP. It was conceived of to provide a
free, open, and robust framework for caching and
optimizing PHP intermediate code.
• Just enabling APC will transparently cache your code as
you use it, no code changes required on your side
• Provides a cheap caching layer that can be shared on a
between all apache processes on one machine
APC and why it’s your friend
Friday, November 12, 2010
38. • Based on 13 principles from http://
developer.yahoo.com/performance/
rules.html
• 1.) Make fewer HTTP requests
• 80% of the end-user response time is
spent on the front-end. Most of this
YSlow
Friday, November 12, 2010
39. • Based on 13 principles from
• http://developer.yahoo.com/performance/rules.html
YSlow
Friday, November 12, 2010
40. • Make fewer HTTP requests
• Use a CDN
• Add an Expires header
• Gzip components
YSlow
Friday, November 12, 2010
41. • Put CSS at the top
• Put JS at the bottom
• Avoid CSS expressions
• Make JS and CSS External
YSlow
Friday, November 12, 2010
42. • Reduce DNS lookups
• Minify JS
• Avoid redirects
• Remove duplicate scripts
• Configure Etags
• Make AJAX cachable
YSlow
Friday, November 12, 2010
43. • http://code.google.com/speed/page-speed
• The Page Speed family consists of several
products.Web developers can use the Page Speed
extension for Firefox/Firebug to analyze
performance issues while developing web pages.
Apache web hosters can use mod_pagespeed, a
module for the Apache™ HTTP Server that
automatically optimizes web pages and their
resources at serving time.
PageSpeed
Friday, November 12, 2010
44. • Adds client-side latency instrumentation.
• Improves cacheability.
• Removes unnecessary whitespace in HTML.
• Combines multiple <head> elements & CSS files into one.
• Moves CSS into the <head> element.
• Removes unnecessary attributes in HTML tags.
• Inlines small external CSS & Javascript files.
mod_page_speed
Friday, November 12, 2010
45. • Moves large inline <style> & <script> tags into external files for cacheability.
• Removes unnecessary quotes in HTML tags.
• Removes HTML comments.
• Minifies CSS.
• Rescales, and compresses images; inlines small ones.
• Minifies Javascript.
mod_page_speed
Friday, November 12, 2010
46. # enable expirations
ExpiresActive On
# expire GIF images after a month in the client's cache
ExpiresByType image/gif A2592000
ExpiresByType image/jpeg A2592000
ExpiresByType text/css A2592000
ExpiresByType application/x-javascript A2592000
# disable ETags
FileETag None
Example apache 2.x performance config
Friday, November 12, 2010
47. # Gzip Compression
# Insert filter
SetOutputFilter DEFLATE
# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fine
BrowserMatch bMSIE !no-gzip !gzip-only-text/html
# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work.You can use the following
# workaround to get the desired effect:
BrowserMatch bMSI[E] !no-gzip !gzip-only-text/html
# Don't compress images
SetEnvIfNoCase Request_URI
.(?:gif|jpe?g|png|mp3)$ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
Header appendVary User-Agent env=!dont-vary
Example apache 2.x
performance config
Friday, November 12, 2010