SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Deploying large
       payloads at scale
        Ramon van Alteren




Friday, November 11, 2011
Hyves


           •            9,7M dutch members (16,7M population)
           •            ~7M unique visitors / month (Comscore 09/2011)
           •            ~2.3M unique visitors / day
           •            800.000 photo uploads / day
           •            7M chat messages / day




Friday, November 11, 2011
Hyves - Operational environment



            •               3500 node serverpark in 3 datacenters
            •               6Gbps daily outgoing traffic
            •               System Engineering team: 12
            •               Development team: 33




Friday, November 11, 2011
Weekend project




                            Result: 4.5 x speed/throughput increase



Friday, November 11, 2011
Weekend -> Company project




Friday, November 11, 2011
A Few Minor problems



          •             compilation took ~40-60 minutes
          •             resulting binary was 750MB
          •             Code issues
          •             gcc 4.5.2 required + added deps




Friday, November 11, 2011
Part I - Solving the build problem




                                    Jenkins to the rescue




                            Add distCC to speed up compilation times



Friday, November 11, 2011
Part I - Solving the build problem
                            Add some serious hardware




                            Compile / build in < 6 mins

Friday, November 11, 2011
Part II: Deploying

 Sequential ?
                              500MB @ 1Gb/s = 4 seconds
                             500MB @ 500Mb/s = 8 seconds
                            500MB @ 200Mb/s = 20 seconds
      450 servers * 8 seconds = 3600 seconds == 1 hour
      450 servers * 20 seconds = 9000 seconds == 2.5 hour

  Diffs ?
                      binary diff would be between 10KB - 400MB
                      Even on consecutive runs without change

Friday, November 11, 2011
Part II: Deploying - Bittorrent




Friday, November 11, 2011
Bittorrent - Previous experiences


                            Naive run using bittorrent to transport
                            300MB throughout our serverpark


                            •   Near-complete network outage due to
                                bandwidth starvation
                            •   Several crucial subsystems delayed or
                                unreachable due to network bandwidth
                                shortage



Friday, November 11, 2011
Bittorrent - Previous experiences




Friday, November 11, 2011
Bittorrent - The Problem


                            Every server has 1Gb/s link to every other
                                             server



                                        they don’t



Friday, November 11, 2011
Bittorrent - Actual bandwidth available




                            1-4Gb/s    Core
                                      Network




Friday, November 11, 2011
Bittorrent - Actual bandwidth available




                             Production
                               traffic
                                              Core
                            Administration
                                             Network
                               traffic




Friday, November 11, 2011
Murder - Why not ?

                            Murder uses two tricks:
                            •      Clients (including the seeder) capped
                                   to 1 upload peer
                            •      Every client receives every peer from
                                   the tracker
                            •      No download bandwidth cap
                                   (easy to add though)

                                Peers will still connect all over the place
                                It’s slow, timing run over 25 peers took 7 mins

Friday, November 11, 2011
DIY: Location aware tracker




Friday, November 11, 2011
SMDB - Location metadata


                            We have bandwidth information available in
                                our server management database


                        Build two-tier bittorrent swarms:
                        •       1 swarm with 2 peers / rack(uplink)
                        •       1 additional swarm per rack (uplink)
                        •       cap every client @ 96mbit/s



Friday, November 11, 2011
DIY: 2-tier swarms




Friday, November 11, 2011
DIY - Tracker


                            Tracker in python + Flask:
                            •   1100 lines of code (1900 with tests)
                            •   Stores transfer metadata in redis
                            •   Connects to our SMDB using REST
                            •   Exposes REST interface




Friday, November 11, 2011
DIY - Client

                            We use a slightly modified rtorrent client
                            Same things twitter modified:
                            •   Remove features related to operating
                                on the big bad internet.
                            •   Make various timeouts more aggresive
                            •   No DHT, UPNP etc.


                            Nice bonus: RPC remote API


Friday, November 11, 2011
Results - single deploy ~300 hosts


                            2000


                            1500


                            1000


                             500


                               0
                                   Without build      With build

                                   Classic deploy   Bittorrent deploy



Friday, November 11, 2011
Graphs - full webfarm cluster




Friday, November 11, 2011
Graphs - single node




Friday, November 11, 2011
Graphs - single rack (24 nodes)




Friday, November 11, 2011
Bitorrent - Statistics
 == release: 101482 expected: 287 actual: 107 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 ==
 == release: 101482 expected: 287 actual: 267 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 ==
 == release: 101482 expected: 287 actual: 286 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 1 progress: 0.35% start: 12:04:20 last_completed: none failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 2 progress: 42.11% start: 12:04:20 last_completed: 12:06:01 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 5 progress: 44.48% start: 12:04:20 last_completed: 12:06:05 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 34 progress: 45.80% start: 12:04:20 last_completed: 12:06:09 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 97 progress: 48.46% start: 12:04:20 last_completed: 12:06:15 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 101 progress: 49.23% start: 12:04:20 last_completed: 12:06:21 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 108 progress: 50.72% start: 12:04:20 last_completed: 12:06:24 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 118 progress: 52.95% start: 12:04:20 last_completed: 12:06:27 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 131 progress: 55.88% start: 12:04:20 last_completed: 12:06:30 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 172 progress: 67.77% start: 12:04:20 last_completed: 12:06:34 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 213 progress: 80.53% start: 12:04:20 last_completed: 12:06:37 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 239 progress: 87.36% start: 12:04:20 last_completed: 12:06:40 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 246 progress: 91.24% start: 12:04:20 last_completed: 12:06:43 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 265 progress: 97.55% start: 12:04:20 last_completed: 12:06:46 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 274 progress: 98.79% start: 12:04:20 last_completed: 12:06:49 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 280 progress: 99.07% start: 12:04:20 last_completed: 12:06:52 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 283 progress: 99.26% start: 12:04:20 last_completed: 12:06:55 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 285 progress: 99.52% start: 12:04:20 last_completed: 12:07:04 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 286 progress: 99.85% start: 12:04:20 last_completed: 12:07:06 failed: 0 ==
 == release: 101482 expected: 287 actual: 287 seeders: 287 progress: 100.00% start: 12:04:20 last_completed: 12:07:37 failed: 0 ==




Friday, November 11, 2011
Next Steps

                      Enable more projects
                      •     currently only three main projects
                      •     one codebase
                      •     Multi-transfer
                      Move to Continuous Delivery
                      •     currently doing continuous deployment
                      •     6-12 deploys a day
                      •     want to allow feature teams to deploy


Friday, November 11, 2011
Next Steps

            Open Source the tracker:
            •               Very closely tied to our infra
            •               Not overly clean code
            We will open source it,
            however some refactoring is needed
            contact me if you’re interested.
                                   Watch our github repository:
                            https://github.com/organizations/hyves-org


Friday, November 11, 2011
Next Steps - Deployment Glue

                      TODO:
                      [] prepare stuff
                      [] transport stuff
                      [] do some more stuff
                      [] Activate payload
                      [] post check




Friday, November 11, 2011
Next Steps - Deployment Glue

                      Simple from a single host perspective


                      Complex when executed in parallel,
                      remotely, with failure handling and proper
                      reporting




                                                        Fabric

Friday, November 11, 2011
Thank you, Questions ?

          Bram Cohen - http://www.bittorrent.com/
          Flask: http://flask.pocoo.org/
          Rtorrent: http://libtorrent.rakshasa.no/
          Twitter - Murder: https://github.com/lg/murder
          Boris, Cor, Lorenzo & others at hyves.nl
          Michael Tekel: https://github.com/mtekel/
          http://wiki.theory.org/BitTorrentSpecification


          You ? Weʼre hiring: http://werkenbijhyves.hyves.nl
          email: ramon@hyves.nl
          twitter: @ramonvanalteren

Friday, November 11, 2011

Más contenido relacionado

La actualidad más candente

SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorldSQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
Polish SQL Server User Group
 
Dueling duplications RMAN vs Delphix
Dueling duplications RMAN vs DelphixDueling duplications RMAN vs Delphix
Dueling duplications RMAN vs Delphix
Kyle Hailey
 
Zabbix visión general del sistema - 04.12.2013
Zabbix   visión general del sistema - 04.12.2013Zabbix   visión general del sistema - 04.12.2013
Zabbix visión general del sistema - 04.12.2013
Emmanuel Arias
 
Cumulus networks - Overcoming traditional network limitations with open source
Cumulus networks - Overcoming traditional network limitations with open sourceCumulus networks - Overcoming traditional network limitations with open source
Cumulus networks - Overcoming traditional network limitations with open source
Nat Morris
 

La actualidad más candente (20)

January OpenNTF Webinar - Backup your Domino Server - New Options in V12
January OpenNTF Webinar - Backup your Domino Server - New Options in V12January OpenNTF Webinar - Backup your Domino Server - New Options in V12
January OpenNTF Webinar - Backup your Domino Server - New Options in V12
 
Deskpool making vdi cost effective for smb
Deskpool making vdi cost effective for smbDeskpool making vdi cost effective for smb
Deskpool making vdi cost effective for smb
 
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
Engage 2020 - Kubernetes for HCL Connections Component Pack - Build or Buy?
 
SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorldSQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
SQLDay2013_Denny Cherry - SQLServer2012inaHighlyAvailableWorld
 
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best PracticesApril, 2021 OpenNTF Webinar - Domino Administration Best Practices
April, 2021 OpenNTF Webinar - Domino Administration Best Practices
 
Introduction to Novell ZENworks Configuration Management Troubleshooting
Introduction to Novell ZENworks Configuration Management TroubleshootingIntroduction to Novell ZENworks Configuration Management Troubleshooting
Introduction to Novell ZENworks Configuration Management Troubleshooting
 
Xen Project Hypervisor for the Cloud
Xen Project Hypervisor for the CloudXen Project Hypervisor for the Cloud
Xen Project Hypervisor for the Cloud
 
Engage 2020 - HCL Notes V11 Performance Boost
Engage 2020 - HCL Notes V11 Performance BoostEngage 2020 - HCL Notes V11 Performance Boost
Engage 2020 - HCL Notes V11 Performance Boost
 
Juniper Network Automation for KrDAG
Juniper Network Automation for KrDAGJuniper Network Automation for KrDAG
Juniper Network Automation for KrDAG
 
1084: Planning and Completing an IBM Connections Upgrade
 1084: Planning and Completing an IBM Connections Upgrade 1084: Planning and Completing an IBM Connections Upgrade
1084: Planning and Completing an IBM Connections Upgrade
 
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
Introduction to the Cluster Infrastructure and the Systems Provisioning Engin...
 
IBM Domino / IBM Notes Performance Tuning
IBM Domino / IBM Notes Performance Tuning IBM Domino / IBM Notes Performance Tuning
IBM Domino / IBM Notes Performance Tuning
 
Unattended Deployment with Zero Touch Provisioning (ZTP)
Unattended Deployment with Zero Touch Provisioning (ZTP)Unattended Deployment with Zero Touch Provisioning (ZTP)
Unattended Deployment with Zero Touch Provisioning (ZTP)
 
A Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingA Tour of Internal Accumulo Testing
A Tour of Internal Accumulo Testing
 
Automation Evolution with Junos
Automation Evolution with JunosAutomation Evolution with Junos
Automation Evolution with Junos
 
You don't want to do it like that
You don't want to do it like thatYou don't want to do it like that
You don't want to do it like that
 
Dueling duplications RMAN vs Delphix
Dueling duplications RMAN vs DelphixDueling duplications RMAN vs Delphix
Dueling duplications RMAN vs Delphix
 
Zabbix visión general del sistema - 04.12.2013
Zabbix   visión general del sistema - 04.12.2013Zabbix   visión general del sistema - 04.12.2013
Zabbix visión general del sistema - 04.12.2013
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
Cumulus networks - Overcoming traditional network limitations with open source
Cumulus networks - Overcoming traditional network limitations with open sourceCumulus networks - Overcoming traditional network limitations with open source
Cumulus networks - Overcoming traditional network limitations with open source
 

Similar a Deploying large payloads at scale

Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011
IPv6no
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr Pilot
GWAVA
 
Conquistando el Servidor con Node.JS
Conquistando el Servidor con Node.JSConquistando el Servidor con Node.JS
Conquistando el Servidor con Node.JS
Caridy Patino
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
Leonardo Borges
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
Theo Schlossnagle
 
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Ontico
 

Similar a Deploying large payloads at scale (20)

Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr Pilot
 
Netflix Update (MeetBSD California 2014 Lightning Talk)
Netflix Update (MeetBSD California 2014 Lightning Talk)Netflix Update (MeetBSD California 2014 Lightning Talk)
Netflix Update (MeetBSD California 2014 Lightning Talk)
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Yeti DNS Project
Yeti DNS ProjectYeti DNS Project
Yeti DNS Project
 
Caridy patino - node-js
Caridy patino - node-jsCaridy patino - node-js
Caridy patino - node-js
 
Conquistando el Servidor con Node.JS
Conquistando el Servidor con Node.JSConquistando el Servidor con Node.JS
Conquistando el Servidor con Node.JS
 
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Layer 7 denial of services attack mitigation
Layer 7 denial of services attack mitigationLayer 7 denial of services attack mitigation
Layer 7 denial of services attack mitigation
 
Ext osad initial-eval-march2015
Ext osad initial-eval-march2015Ext osad initial-eval-march2015
Ext osad initial-eval-march2015
 
DeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized EnvironmentDeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized Environment
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Gears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development ChallengesGears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development Challenges
 
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
Как сделать высоконагруженный сервис, не зная количество нагрузки / Олег Обле...
 
Spotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for moreSpotify: Playing for millions, tuning for more
Spotify: Playing for millions, tuning for more
 
Ch07
Ch07Ch07
Ch07
 
Puppet at DemonWare - Ruaidhri Power - Puppetcamp Dublin '12
Puppet at DemonWare - Ruaidhri Power - Puppetcamp Dublin '12Puppet at DemonWare - Ruaidhri Power - Puppetcamp Dublin '12
Puppet at DemonWare - Ruaidhri Power - Puppetcamp Dublin '12
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Deploying large payloads at scale

  • 1. Deploying large payloads at scale Ramon van Alteren Friday, November 11, 2011
  • 2. Hyves • 9,7M dutch members (16,7M population) • ~7M unique visitors / month (Comscore 09/2011) • ~2.3M unique visitors / day • 800.000 photo uploads / day • 7M chat messages / day Friday, November 11, 2011
  • 3. Hyves - Operational environment • 3500 node serverpark in 3 datacenters • 6Gbps daily outgoing traffic • System Engineering team: 12 • Development team: 33 Friday, November 11, 2011
  • 4. Weekend project Result: 4.5 x speed/throughput increase Friday, November 11, 2011
  • 5. Weekend -> Company project Friday, November 11, 2011
  • 6. A Few Minor problems • compilation took ~40-60 minutes • resulting binary was 750MB • Code issues • gcc 4.5.2 required + added deps Friday, November 11, 2011
  • 7. Part I - Solving the build problem Jenkins to the rescue Add distCC to speed up compilation times Friday, November 11, 2011
  • 8. Part I - Solving the build problem Add some serious hardware Compile / build in < 6 mins Friday, November 11, 2011
  • 9. Part II: Deploying Sequential ? 500MB @ 1Gb/s = 4 seconds 500MB @ 500Mb/s = 8 seconds 500MB @ 200Mb/s = 20 seconds 450 servers * 8 seconds = 3600 seconds == 1 hour 450 servers * 20 seconds = 9000 seconds == 2.5 hour Diffs ? binary diff would be between 10KB - 400MB Even on consecutive runs without change Friday, November 11, 2011
  • 10. Part II: Deploying - Bittorrent Friday, November 11, 2011
  • 11. Bittorrent - Previous experiences Naive run using bittorrent to transport 300MB throughout our serverpark • Near-complete network outage due to bandwidth starvation • Several crucial subsystems delayed or unreachable due to network bandwidth shortage Friday, November 11, 2011
  • 12. Bittorrent - Previous experiences Friday, November 11, 2011
  • 13. Bittorrent - The Problem Every server has 1Gb/s link to every other server they don’t Friday, November 11, 2011
  • 14. Bittorrent - Actual bandwidth available 1-4Gb/s Core Network Friday, November 11, 2011
  • 15. Bittorrent - Actual bandwidth available Production traffic Core Administration Network traffic Friday, November 11, 2011
  • 16. Murder - Why not ? Murder uses two tricks: • Clients (including the seeder) capped to 1 upload peer • Every client receives every peer from the tracker • No download bandwidth cap (easy to add though) Peers will still connect all over the place It’s slow, timing run over 25 peers took 7 mins Friday, November 11, 2011
  • 17. DIY: Location aware tracker Friday, November 11, 2011
  • 18. SMDB - Location metadata We have bandwidth information available in our server management database Build two-tier bittorrent swarms: • 1 swarm with 2 peers / rack(uplink) • 1 additional swarm per rack (uplink) • cap every client @ 96mbit/s Friday, November 11, 2011
  • 19. DIY: 2-tier swarms Friday, November 11, 2011
  • 20. DIY - Tracker Tracker in python + Flask: • 1100 lines of code (1900 with tests) • Stores transfer metadata in redis • Connects to our SMDB using REST • Exposes REST interface Friday, November 11, 2011
  • 21. DIY - Client We use a slightly modified rtorrent client Same things twitter modified: • Remove features related to operating on the big bad internet. • Make various timeouts more aggresive • No DHT, UPNP etc. Nice bonus: RPC remote API Friday, November 11, 2011
  • 22. Results - single deploy ~300 hosts 2000 1500 1000 500 0 Without build With build Classic deploy Bittorrent deploy Friday, November 11, 2011
  • 23. Graphs - full webfarm cluster Friday, November 11, 2011
  • 24. Graphs - single node Friday, November 11, 2011
  • 25. Graphs - single rack (24 nodes) Friday, November 11, 2011
  • 26. Bitorrent - Statistics == release: 101482 expected: 287 actual: 107 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 267 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 286 seeders: 0 progress: 0.00% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 1 progress: 0.35% start: 12:04:20 last_completed: none failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 2 progress: 42.11% start: 12:04:20 last_completed: 12:06:01 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 5 progress: 44.48% start: 12:04:20 last_completed: 12:06:05 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 34 progress: 45.80% start: 12:04:20 last_completed: 12:06:09 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 95 progress: 48.17% start: 12:04:20 last_completed: 12:06:10 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 97 progress: 48.46% start: 12:04:20 last_completed: 12:06:15 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 101 progress: 49.23% start: 12:04:20 last_completed: 12:06:21 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 108 progress: 50.72% start: 12:04:20 last_completed: 12:06:24 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 118 progress: 52.95% start: 12:04:20 last_completed: 12:06:27 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 131 progress: 55.88% start: 12:04:20 last_completed: 12:06:30 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 172 progress: 67.77% start: 12:04:20 last_completed: 12:06:34 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 213 progress: 80.53% start: 12:04:20 last_completed: 12:06:37 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 239 progress: 87.36% start: 12:04:20 last_completed: 12:06:40 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 246 progress: 91.24% start: 12:04:20 last_completed: 12:06:43 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 265 progress: 97.55% start: 12:04:20 last_completed: 12:06:46 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 274 progress: 98.79% start: 12:04:20 last_completed: 12:06:49 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 280 progress: 99.07% start: 12:04:20 last_completed: 12:06:52 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 283 progress: 99.26% start: 12:04:20 last_completed: 12:06:55 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 284 progress: 99.36% start: 12:04:20 last_completed: 12:06:56 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 285 progress: 99.52% start: 12:04:20 last_completed: 12:07:04 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 286 progress: 99.85% start: 12:04:20 last_completed: 12:07:06 failed: 0 == == release: 101482 expected: 287 actual: 287 seeders: 287 progress: 100.00% start: 12:04:20 last_completed: 12:07:37 failed: 0 == Friday, November 11, 2011
  • 27. Next Steps Enable more projects • currently only three main projects • one codebase • Multi-transfer Move to Continuous Delivery • currently doing continuous deployment • 6-12 deploys a day • want to allow feature teams to deploy Friday, November 11, 2011
  • 28. Next Steps Open Source the tracker: • Very closely tied to our infra • Not overly clean code We will open source it, however some refactoring is needed contact me if you’re interested. Watch our github repository: https://github.com/organizations/hyves-org Friday, November 11, 2011
  • 29. Next Steps - Deployment Glue TODO: [] prepare stuff [] transport stuff [] do some more stuff [] Activate payload [] post check Friday, November 11, 2011
  • 30. Next Steps - Deployment Glue Simple from a single host perspective Complex when executed in parallel, remotely, with failure handling and proper reporting Fabric Friday, November 11, 2011
  • 31. Thank you, Questions ? Bram Cohen - http://www.bittorrent.com/ Flask: http://flask.pocoo.org/ Rtorrent: http://libtorrent.rakshasa.no/ Twitter - Murder: https://github.com/lg/murder Boris, Cor, Lorenzo & others at hyves.nl Michael Tekel: https://github.com/mtekel/ http://wiki.theory.org/BitTorrentSpecification You ? Weʼre hiring: http://werkenbijhyves.hyves.nl email: ramon@hyves.nl twitter: @ramonvanalteren Friday, November 11, 2011