Successfully reported this slideshow.

7 Stages of Scaling Web Applications

131.895 visualizaciones

Publicado el

Slides from LinuxWorld presentation by John Engates, CTO of Rackspace. Posted by permission.

Publicado en: Empresariales

7 Stages of Scaling Web Applications

  1. The 7 Stages of Scaling Web Apps: Strategies for Architects John Engates CTO, Rackspace Presented: LinuxWorld Conference & Expo, San Francisco August 6, 2008
  2. Agenda <ul><li>Desirable Properties in a Web App </li></ul><ul><li>Typical Growth Scenario </li></ul><ul><li>Best practices </li></ul><ul><li>Q & A </li></ul>
  3. Desirable Properties of a Web App <ul><li>Scalability </li></ul><ul><li>High Availability </li></ul><ul><li>Performance </li></ul><ul><li>Manageability </li></ul><ul><li>Low Cost </li></ul><ul><li>Feature Rich </li></ul><ul><li>Generates $$$ </li></ul>
  4. High Availability Defined <ul><li>High Availability (HA) is a design and implementation that ensures a certain degree of operational continuity. </li></ul><ul><li>In other words… </li></ul><ul><ul><li>The site is up </li></ul></ul><ul><ul><li>The users are happy </li></ul></ul><ul><ul><li>The business is not losing money due to outages </li></ul></ul><ul><ul><li>(And the system doesn’t cost more than it’s worth) </li></ul></ul>
  5. Scalability Defined <ul><li>What scalability is: </li></ul><ul><ul><li>Scalability is a desirable property of a system which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged as demands increase. </li></ul></ul><ul><li>What scalability is not : </li></ul><ul><ul><li>Raw speed or performance (2 GHz vs. 3 Ghz) </li></ul></ul><ul><ul><li>About the operating system (Solaris vs. Linux) </li></ul></ul><ul><ul><li>About a particular software technology (Java vs. Python vs. Rails) </li></ul></ul><ul><ul><li>About a particular hardware platform (AMD vs. Intel) </li></ul></ul><ul><ul><li>About optimized code (10 lines of code vs. 1000) </li></ul></ul><ul><ul><li>About storage technology (SAN vs. NAS) </li></ul></ul>
  6. PERFORMANCE AND SCALABILITY ARE NOT THE SAME…
  7. Performance
  8. Scalability
  9.  
  10. Performance
  11. Scalability
  12. More Scalability
  13. Truth #1 <ul><li>It won’t scale if it’s not designed to scale. </li></ul>
  14. Truth #2 <ul><li>Even if it’s designed to scale, there’s going to be pain! </li></ul>
  15. Pain Scale Back
  16. Typical Growth Scenario <ul><li>Stage 1 – The Beginning </li></ul><ul><li>Simple architecture </li></ul><ul><ul><li>Firewall and load balancer </li></ul></ul><ul><ul><li>A pair of web servers </li></ul></ul><ul><ul><li>Database server </li></ul></ul><ul><ul><li>Internal storage </li></ul></ul><ul><li>Low complexity and overhead means quick development and lots of features, fast </li></ul><ul><li>No redundancy, low operational cost – great for startups </li></ul>
  17. Typical Growth Scenario <ul><li>Stage 2 – More of the same, just bigger </li></ul><ul><li>Business is becoming successful – risk tolerance low </li></ul><ul><li>Add redundant firewalls, load balancers </li></ul><ul><li>Add more web servers for performance </li></ul><ul><li>Scale up the database and optimize with DBA help </li></ul><ul><li>Add database redundancy </li></ul><ul><li>Database storage moves to SAN or DAS </li></ul><ul><li>Still relatively simple from an application perspective </li></ul>
  18. Typical Growth Scenario <ul><li>Stage 3 – The Pain Begins </li></ul><ul><li>Publicity hits (Digg, Slashdot) </li></ul><ul><li>Squid or Varnish reverse proxy, or high end load balancers – to cache static content </li></ul><ul><li>Add even more web servers </li></ul><ul><ul><li>Managing content becomes painful </li></ul></ul><ul><li>Single database can’t cut it anymore </li></ul><ul><ul><li>Split reads and writes - all writes go to a single master server with read-only slaves </li></ul></ul><ul><li>May require some re-coding of the app </li></ul>
  19. Scaling Through Database Replication
  20. Typical Growth Scenario <ul><li>Stage 4 – The Pain Intensifies </li></ul><ul><li>Caching with memcached </li></ul><ul><li>Replication doesn’t work for everything </li></ul><ul><ul><li>Single “writes” database - Too many writes - Replication takes too long </li></ul></ul><ul><li>Database partitioning starts to make sense </li></ul><ul><ul><li>Certain features get their own database </li></ul></ul><ul><li>Shared storage makes sense for content </li></ul><ul><li>Requires significant re-architecting of the app and DB </li></ul><ul><ul><li>Devs may not have done this stuff before </li></ul></ul>
  21. Typical Growth Scenario <ul><li>Stage 5 – This Really Hurts! </li></ul><ul><li>Panic sets in. Hasn’t anyone done this before? </li></ul><ul><ul><li>Re-thinking entire application / business model </li></ul></ul><ul><ul><li>Why didn’t we architect this thing for scale? </li></ul></ul><ul><li>Can’t just partition on features – what else can we use? </li></ul><ul><ul><li>Partitioning based on geography, last name, user ID, etc </li></ul></ul><ul><ul><li>Create user-clusters </li></ul></ul><ul><li>All features available on each user-cluster </li></ul><ul><li>Use a hashing scheme or master DB for locating which user belongs to which cluster </li></ul>
  22. Typical Growth Scenario <ul><li>Stage 6 – Getting (a little) less painful </li></ul><ul><li>Scalable application and database architecture </li></ul><ul><li>Acceptable performance </li></ul><ul><li>Starting to add new features again </li></ul><ul><li>Optimizing some of the code </li></ul><ul><li>Still growing, but it’s manageable </li></ul>
  23. Typical Growth Scenario <ul><li>Stage 7 – Entering the unknown… </li></ul><ul><li>Where are the remaining bottlenecks? </li></ul><ul><ul><li>Power, Space </li></ul></ul><ul><ul><li>Bandwidth, CDN, Hosting provider big enough? </li></ul></ul><ul><ul><li>Firewall, Load balancer bottlenecks </li></ul></ul><ul><ul><li>Storage </li></ul></ul><ul><ul><li>People and process </li></ul></ul><ul><ul><li>Database technology limits – scalable, key-value store anyone? </li></ul></ul><ul><li>All eggs in one basket? </li></ul><ul><ul><li>Single datacenter </li></ul></ul><ul><ul><li>Single instance of the data </li></ul></ul><ul><ul><li>Difficult to replicate data and load balance geographically </li></ul></ul>
  24. Good Practices <ul><li>Don’t re-invent the wheel, copy someone else </li></ul><ul><li>Think Simplicity </li></ul><ul><ul><li>Everything should be made as simple as possible -- but not simpler. A. Einstein </li></ul></ul><ul><li>Think horizontal…not vertical…on everything </li></ul><ul><ul><li>“ How many?” vs. “how fast?” </li></ul></ul><ul><li>Use commodity equipment </li></ul><ul><li>Make troubleshooting easy </li></ul><ul><ul><li>Design for operation </li></ul></ul><ul><ul><li>Isolate services </li></ul></ul><ul><ul><li>Don’t change lots of things at once </li></ul></ul>
  25. More good practices… <ul><li>Don’t spend your time over-optimizing </li></ul><ul><ul><li>Get your architecture right, adjust often, optimize later (or never) </li></ul></ul><ul><li>Test your ability to scale with appropriate load testing </li></ul><ul><ul><li>Get a baseline before you think you need it </li></ul></ul><ul><li>Use caching wherever it makes sense </li></ul><ul><li>Lots of memory and 64-bit architecture helps </li></ul><ul><li>Evaluate every feature vs. performance/scalability impact </li></ul><ul><ul><li>Nice to have vs. have to have </li></ul></ul>
  26. Managing Change Protects Availability <ul><li>Don’t underestimate the need for process and documentation </li></ul><ul><li>Release Management </li></ul><ul><ul><li>Develop – Test – Release </li></ul></ul><ul><ul><li>Procedures in place to support these activities </li></ul></ul><ul><li>Source Control </li></ul><ul><ul><li>RCS, CVS, Subversion </li></ul></ul><ul><li>Issue Tracking </li></ul><ul><li>Coding Standards </li></ul><ul><li>Change Management </li></ul><ul><ul><li>Plan – Test – Implement </li></ul></ul><ul><ul><li>Critical for high availability infrastructure </li></ul></ul>
  27. <ul><li>Cloud Computing … </li></ul><ul><li>The Future? </li></ul>
  28. Questions? <ul><li>jengates “at” rackspace.com </li></ul>
  29. http://racklabs.com
  30. Help Wanted!

×