SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
The Web Scale
Tuenti architecture to withstand
1500+ million pageviews / day
                           Guillermo Pérez - bisho@tuenti.com
                    Security & Backend Architecture Tech Lead
What is a scalable
    system?
What is scalability
Some Tuenti stats
Tuenti Stats


        13M users
     REALLY ACTIVE
   50%+ active weekly
  >1h browsing per DAY!
Tuenti Stats

 - Each month, over:
    40,000 M pageviews
    50,000 M requests
    100 M new photos
    2,000+ Tb served photos
 - On peaks:
    1,600 million pageviews/day
    35,000 requests/second
    6,000 million served photos/day
Tuenti Stats

 - 1200+ servers
    ~500 FEs
    ~300 DBs
    ~100 MCs
    ~100 image servers
    Others: Chat, HBase, Queues, Processors...
How to scale?
No silver bullet
Monitor
Know your tools
Evolve, iterate
    Learn
Monitoring

 - Your crystal ball!
    Glimpse of the future
    Answer questions
 - Detect bottlenecks
 - Detect what needs to be optimized
    The 90/10 Rule
    No premature optimization
 - Detect bad usages
 - Detect browser patterns
 - Detect changes, issues
  
Monitoring
Monitoring
Monitoring
Monitor
Know your tools
 Evolve, iterate
     Learn
Know your tools

 -   Stop reading blogs
 -   Read internals documentation
 -   Test software
 -   Test hardware
 -   Experiment
  
Know your tools

 - Mysql (innoDB) IS fast
    photos table (photo_id, user_id, ...)
       PK photo_id, KEY user_id
       PK user_id, photo_id, KEY photo_id
       Usage: select * from photos where user=X
    sorting
    covering index
    Even No SQL :)
    Hardware limits, replication
Know your tools
Know your tools

 - Memcache
    Tons of persistent TCP conns eats your ram
       UDP performance issues
          Single thread for UDP
          Multiport patch
       proxies
    Stresses the network to the max
       Driver issues, configuration
       Variable performance with net devices
Know your tools

 - No SQL
    Not magic!
    Good for heavy write loads
    Good for data processing
    Still needs tweaking partitioning, schemas
Monitor
Know your tools
Evolve, iterate
    Learn
Evolve, iterate

 - All architectures scale till certain point
 - Then you must rethink everything
    Then, and only then!
    Remember premature optimization?
    Scale != efficient
    Future is hard to predict
  
  
Monitor
Know your tools
Evolve, iterate
    Learn
Learn

        Learn from:
        Experience
          Failure
          Others
Architecture
Architecture

 - Basic rules:
    Static: Add layers (easy caching)
    Dynamic: Move responsibility to edges
    General: Decentralize, redundancy
  
Architecture

 - Design for failure:
    Support disabling
    Nice degradation, fallbacks
    Controlled launches
 - Test with dark launches
 - Think on storage operations
 - Be able to migrate live
 - Focus on your core, use CDNs
Architecture

 - Move work to the browser:
   Request routing
   Templates
   Cache
   Pefetch
 - Move remaining to your FEs:
   Data relations
   Consistency
   Privacy, access check
   Live migrations
   Knowledge of the storage infraestructure
Architecture

 - All teams involved
   Frontend
      Good JS, templating, caching, prefetching
   Backend
      Data design, parallelization, optimizations
   Systems
      Iron benchmarks, tunning, networking
Dynamic site example
Scaling a website

 -   Setup: 1 server
 -   Bottleneck: cpu
  
 -   Solution: Add fronteds
 -   Changes: Share sessions
Scaling a website

 -   Setup: N fronteds, 1 DB
 -   Bottleneck: DB Reads
  
 -   Solution: Add DB slaves
 -   Changes: Split reads to slaves or DB proxy
Scaling a website

 -   Setup: N fronteds, 1 DB Master + N Slaves
 -   Bottleneck: Limited # of slaves, so DB Reads
  
 -   Solution: Chain replication / Add cache layer
 -   Changes: Big ones!
      Some caches in certain places is easy
      But for dynamic app, Memcache as storage
      Makes your DB nor relational
Scaling a website

 -   Setup: N FEs, 1 DB Master + N Slaves, Caches
 -   Bottleneck: DB Writes
  
 -   Solution: Split tables into DB clusters
 -   Changes: Add some DB abstraction
Scaling a website

 -   Setup: N FEs, N DB clusters, Caches
 -   Bottleneck: DB Writes on certain table
  
 -   Solution: Partition tables
 -   Changes: DB abstraction and big changes
      DB no longer relational, more key based
      Partition key limits queries
      Denormalization, duplicity
       
Scaling a website

 -   Setup: N FEs, N partitioned DBs, Caches
 -   Bottleneck: Disk space, DB cost
  
 -   Solution: Archive tables
 -   Changes: DB abstraction + migration scripts
Scaling a website

 - Setup: N FEs, N partition+archive DBs, Cache
 - Bottleneck: Internal network traffic
  
 - Solution: 2 level caches, split services, cache
 affinity
 - Changes: Cache abstraction, browsers
Scaling a website

 - Setup: N FEs, N partition+archive DBs,
 multilayered Cache, services
 - Bottleneck: Datacenter
  
 - Solution:
    Split services
    Partition users data
 - Changes: Big ones!
    Greater replication lags, inconsistencies
The Tuenti Backend
    Framework
Backend Framework

 - Our mission:
    Provide easy to use, productive, easy to
    debug, testable, fast, extensible,
    customizable, deterministic, reusable,
    instrumentalized (stats) framework and
    tools to ease developers daily work and
    manage the infraestructure.
Backend Framework

 - From Request routing to Storage
 - Simple layers, clean responsibilities
 - Clean, organized codebase
 - Using:
    convention over configuration
    configuration over coding
 - Queuing system for async execution
 - Gathering stats from all levels
Backend Framework

 - Request routing:
    Multiple entry points
    Fast request parsers route to Agents
    Data centric agents
    Printers
Backend Framework

 - Domain Api:
    Expose top-level business actions
    Clean, semantic Api
    No state, no magic, all data in params
    Check privacy (the right place!)
     
Backend Framework

 - Domain Backend:
    Implement public/internal business actions
    Clean, semantic Api
    No state, no magic, all data in params
    Coordinate transactions
    No privacy
     
Backend Framework

 - Domain Storages (ORM like)
    Configure storage access for a table
      Fields, validation, partitioning, primary
      key, caching techniques, custom queries.
    Provide access to storage via standard apis:
      CRUD actions
      Cached Lists
      Cached Queries
      + Custom
    Data container
       
Backend Framework

 - Storage Strategies
    CRUD
    Cached Lists
    Cached Queries
    CUD Observers for custom actions
        
     
Backend Framework

 - Storage Service
    Provides access to the different storage
    services:
       mysql, memcache, hbase...
    Coordinates transactions
    Abstract the infrastructure complexities:
       partitioning, read/write, weights, hosts
    Handles transactions
     
Backend Framework

 - Storage Services (concrete ones)
    Abstract the infrastructure complexities:
       partitioning, read/write, weights, hosts
    Api close to real one:
       Memcache: set, get, cas...
       Mysql: insert, select, update...
Backend Framework

 - Storage Drivers (concrete ones)
    Read config
    Manage PHP drivers
    Enhance API
Love challenges?
We are hiring!
      http://jobs.tuenti.com




      Stay tuned for our
  d...
An Tuenti Challenge 2!
     http://contest.tuenti.net
?
                                              Thanks!
                                    Guillermo Pérez - bisho@tuenti.com
                            Security & Backend Architecture Tech Lead
                                     Images Creative Commons from flickr:
heydanielle, eschipul, deanfotos66, nrbelex, mikolski, fdecomite, guldfisken

Más contenido relacionado

La actualidad más candente (8)

I <3 Drupal
I <3 DrupalI <3 Drupal
I <3 Drupal
 
Caching technology comparison
Caching technology comparisonCaching technology comparison
Caching technology comparison
 
豆瓣技术架构的发展历程 @ QCon Beijing 2009
豆瓣技术架构的发展历程 @ QCon Beijing 2009豆瓣技术架构的发展历程 @ QCon Beijing 2009
豆瓣技术架构的发展历程 @ QCon Beijing 2009
 
Radhin 4+
Radhin 4+Radhin 4+
Radhin 4+
 
Understanding Web Cache
Understanding Web CacheUnderstanding Web Cache
Understanding Web Cache
 
In-memory database
In-memory databaseIn-memory database
In-memory database
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 

Destacado

Tuenti teams - Php Conference
Tuenti teams - Php ConferenceTuenti teams - Php Conference
Tuenti teams - Php Conference
Guille -bisho-
 
Software Libre Y Escalabilidad
Software Libre Y EscalabilidadSoftware Libre Y Escalabilidad
Software Libre Y Escalabilidad
Guille -bisho-
 

Destacado (7)

Tuenti teams - Php Conference
Tuenti teams - Php ConferenceTuenti teams - Php Conference
Tuenti teams - Php Conference
 
Tuenti: Web Application Security
Tuenti: Web Application SecurityTuenti: Web Application Security
Tuenti: Web Application Security
 
Software Libre Y Escalabilidad
Software Libre Y EscalabilidadSoftware Libre Y Escalabilidad
Software Libre Y Escalabilidad
 
NORMA ISA SP95
NORMA ISA SP95NORMA ISA SP95
NORMA ISA SP95
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar a The Web Scale

Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
Nguyễn Duy Nhân
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
Membase
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated Deployment
Michael Peacock
 

Similar a The Web Scale (20)

Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Scaling 101 test
Scaling 101 testScaling 101 test
Scaling 101 test
 
Scaling 101
Scaling 101Scaling 101
Scaling 101
 
Membase Meetup 2010
Membase Meetup 2010Membase Meetup 2010
Membase Meetup 2010
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
System Architecture at DDVE
System Architecture at DDVESystem Architecture at DDVE
System Architecture at DDVE
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Qcon
QconQcon
Qcon
 
shan2016_ot_cv
shan2016_ot_cvshan2016_ot_cv
shan2016_ot_cv
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Planning For High Performance Web Application
Planning For High Performance Web ApplicationPlanning For High Performance Web Application
Planning For High Performance Web Application
 
SharePoint Advanced Administration with Joel Oleson, Shane Young and Mike Watson
SharePoint Advanced Administration with Joel Oleson, Shane Young and Mike WatsonSharePoint Advanced Administration with Joel Oleson, Shane Young and Mike Watson
SharePoint Advanced Administration with Joel Oleson, Shane Young and Mike Watson
 
20080611accel
20080611accel20080611accel
20080611accel
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated Deployment
 
Automated Deployment
Automated DeploymentAutomated Deployment
Automated Deployment
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 

The Web Scale

  • 1. The Web Scale Tuenti architecture to withstand 1500+ million pageviews / day Guillermo Pérez - bisho@tuenti.com Security & Backend Architecture Tech Lead
  • 2. What is a scalable system?
  • 3.
  • 4.
  • 7. Tuenti Stats 13M users REALLY ACTIVE 50%+ active weekly >1h browsing per DAY!
  • 8. Tuenti Stats - Each month, over: 40,000 M pageviews 50,000 M requests 100 M new photos 2,000+ Tb served photos - On peaks: 1,600 million pageviews/day 35,000 requests/second 6,000 million served photos/day
  • 9. Tuenti Stats - 1200+ servers ~500 FEs ~300 DBs ~100 MCs ~100 image servers Others: Chat, HBase, Queues, Processors...
  • 13. Monitoring - Your crystal ball! Glimpse of the future Answer questions - Detect bottlenecks - Detect what needs to be optimized The 90/10 Rule No premature optimization - Detect bad usages - Detect browser patterns - Detect changes, issues  
  • 17. Monitor Know your tools Evolve, iterate Learn
  • 18. Know your tools - Stop reading blogs - Read internals documentation - Test software - Test hardware - Experiment  
  • 19. Know your tools - Mysql (innoDB) IS fast photos table (photo_id, user_id, ...) PK photo_id, KEY user_id PK user_id, photo_id, KEY photo_id Usage: select * from photos where user=X sorting covering index Even No SQL :) Hardware limits, replication
  • 21. Know your tools - Memcache Tons of persistent TCP conns eats your ram UDP performance issues Single thread for UDP Multiport patch proxies Stresses the network to the max Driver issues, configuration Variable performance with net devices
  • 22. Know your tools - No SQL Not magic! Good for heavy write loads Good for data processing Still needs tweaking partitioning, schemas
  • 24. Evolve, iterate - All architectures scale till certain point - Then you must rethink everything Then, and only then! Remember premature optimization? Scale != efficient Future is hard to predict    
  • 26. Learn Learn from: Experience Failure Others
  • 28. Architecture - Basic rules: Static: Add layers (easy caching) Dynamic: Move responsibility to edges General: Decentralize, redundancy  
  • 29. Architecture - Design for failure: Support disabling Nice degradation, fallbacks Controlled launches - Test with dark launches - Think on storage operations - Be able to migrate live - Focus on your core, use CDNs
  • 30. Architecture - Move work to the browser: Request routing Templates Cache Pefetch - Move remaining to your FEs: Data relations Consistency Privacy, access check Live migrations Knowledge of the storage infraestructure
  • 31. Architecture - All teams involved Frontend Good JS, templating, caching, prefetching Backend Data design, parallelization, optimizations Systems Iron benchmarks, tunning, networking
  • 33. Scaling a website - Setup: 1 server - Bottleneck: cpu   - Solution: Add fronteds - Changes: Share sessions
  • 34. Scaling a website - Setup: N fronteds, 1 DB - Bottleneck: DB Reads   - Solution: Add DB slaves - Changes: Split reads to slaves or DB proxy
  • 35. Scaling a website - Setup: N fronteds, 1 DB Master + N Slaves - Bottleneck: Limited # of slaves, so DB Reads   - Solution: Chain replication / Add cache layer - Changes: Big ones! Some caches in certain places is easy But for dynamic app, Memcache as storage Makes your DB nor relational
  • 36. Scaling a website - Setup: N FEs, 1 DB Master + N Slaves, Caches - Bottleneck: DB Writes   - Solution: Split tables into DB clusters - Changes: Add some DB abstraction
  • 37. Scaling a website - Setup: N FEs, N DB clusters, Caches - Bottleneck: DB Writes on certain table   - Solution: Partition tables - Changes: DB abstraction and big changes DB no longer relational, more key based Partition key limits queries Denormalization, duplicity  
  • 38. Scaling a website - Setup: N FEs, N partitioned DBs, Caches - Bottleneck: Disk space, DB cost   - Solution: Archive tables - Changes: DB abstraction + migration scripts
  • 39. Scaling a website - Setup: N FEs, N partition+archive DBs, Cache - Bottleneck: Internal network traffic   - Solution: 2 level caches, split services, cache affinity - Changes: Cache abstraction, browsers
  • 40. Scaling a website - Setup: N FEs, N partition+archive DBs, multilayered Cache, services - Bottleneck: Datacenter   - Solution: Split services Partition users data - Changes: Big ones! Greater replication lags, inconsistencies
  • 41. The Tuenti Backend Framework
  • 42. Backend Framework - Our mission: Provide easy to use, productive, easy to debug, testable, fast, extensible, customizable, deterministic, reusable, instrumentalized (stats) framework and tools to ease developers daily work and manage the infraestructure.
  • 43. Backend Framework - From Request routing to Storage - Simple layers, clean responsibilities - Clean, organized codebase - Using: convention over configuration configuration over coding - Queuing system for async execution - Gathering stats from all levels
  • 44. Backend Framework - Request routing: Multiple entry points Fast request parsers route to Agents Data centric agents Printers
  • 45. Backend Framework - Domain Api: Expose top-level business actions Clean, semantic Api No state, no magic, all data in params Check privacy (the right place!)  
  • 46. Backend Framework - Domain Backend: Implement public/internal business actions Clean, semantic Api No state, no magic, all data in params Coordinate transactions No privacy  
  • 47. Backend Framework - Domain Storages (ORM like) Configure storage access for a table Fields, validation, partitioning, primary key, caching techniques, custom queries. Provide access to storage via standard apis: CRUD actions Cached Lists Cached Queries + Custom Data container  
  • 48. Backend Framework - Storage Strategies CRUD Cached Lists Cached Queries CUD Observers for custom actions    
  • 49. Backend Framework - Storage Service Provides access to the different storage services: mysql, memcache, hbase... Coordinates transactions Abstract the infrastructure complexities: partitioning, read/write, weights, hosts Handles transactions  
  • 50. Backend Framework - Storage Services (concrete ones) Abstract the infrastructure complexities: partitioning, read/write, weights, hosts Api close to real one: Memcache: set, get, cas... Mysql: insert, select, update...
  • 51. Backend Framework - Storage Drivers (concrete ones) Read config Manage PHP drivers Enhance API
  • 53. We are hiring! http://jobs.tuenti.com Stay tuned for our d... An Tuenti Challenge 2! http://contest.tuenti.net
  • 54. ? Thanks! Guillermo Pérez - bisho@tuenti.com Security & Backend Architecture Tech Lead Images Creative Commons from flickr: heydanielle, eschipul, deanfotos66, nrbelex, mikolski, fdecomite, guldfisken