SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
from disaster to stability
      the scaling challenges of my.opera.com
                       Surge 2010 – Version 3
1999
                                                                         5,500
                                   Servers
                                   kUsers




                                                                 2,500

                                                         1,640

                                                  887
                            257 205 430
        1     10     50
2000   2001   2002   2003   2004    2005   2006   2007    2008    2009    2010
2001
                                                                         5,500
                                   Servers
                                   kUsers




                                                                 2,500

                                                         1,640

                                                  887
                            257 205 430
        1     10     50
2000   2001   2002   2003   2004    2005   2006   2007    2008    2009    2010
2004
                                                                         5,500
                                   Servers
                                   kUsers




                                                                 2,500

                                                         1,640

                                                  887
                            257 205 430
        1     10     50
2000   2001   2002   2003   2004    2005   2006   2007    2008    2009    2010
2007
                                                                         5,500
                                   Servers
                                   kUsers




                                                                 2,500

                                                         1,640

                                                  887
                            257 205 430
        1     10     50
2000   2001   2002   2003   2004    2005   2006   2007    2008    2009    2010
2009
                                                                         5,500
                                   Servers
                                   kUsers




                                                                 2,500

                                                         1,640

                                                  887
                            257 205 430
        1     10     50
2000   2001   2002   2003   2004    2005   2006   2007    2008    2009    2010
the current beta
the situation
    2007
crashes every day

too many connections!!!

NFS volume of doom

Team?
monitoring
many improvements since then

➔   Efficient filesystem cache

➔   "Dogpile effect" AKA stampeding AKA ...

➔   Persistent db + memcached connections

➔   Soft counters

➔   Profiling, profiling, …
code profiling
[DML] time=1237308152, user=,
url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/...,
name=XWA::User, variable=active, type=module,
elapsed=0.068473, host=my.opera.com

[DML] time=1237308152, user=, url=/community/,
name=XWA::User, variable=, type=module,
elapsed=0.015935, host=my.opera.com

[DML] ...
top time-intensive modules
XWA::User::Sidebar           2024.919s   (27.2%, 0.28 s/call)
XWA::User                    1778.445s   (23.9%, 0.09 s/call)
XWA::User::Journal           1121.224s   (15.1%, 0.24 s/call)
XWA::User::Album              321.522s   ( 4.3%, 0.17 s/call)
XWA::User::Journal::Search    223.477s   ( 3.0%, 20.32 s/call)
XWA::User::Comments           188.011s   ( 2.5%, 0.05 s/call)
XWA::Skins                    180.486s   ( 2.4%, 0.49 s/call)
XWA::User::JournalArchive     159.525s   ( 2.1%, 4.43 s/call)
XWA::User::Posts              146.644s   ( 2.0%, 0.45 s/call)
XWA::User::Picture            141.324s   ( 1.9%, 0.10 s/call)
XWA::Albums                    93.740s   ( 1.3%, 2.04 s/call)
XWA::Journals                  92.390s   ( 1.2%, 2.37 s/call)
many improvements since then

➔   YSlow?

➔   The Expires header is your friend!

➔   Hot MyISAM tables converted to InnoDB

➔   MySQL Master/Master setup

➔   Jet Profiler
jet profiler
3
scalability
1. avatars
Avatars - 2007

            75%
        /<user-name>/avatar.pl

/<user-name>/avatar.pl?xscale=8192 (!)
Avatars               wtf!?


my $sql = DBConnect('master');

my %user = $sql->get(
  "SELECT a.blob, a.filename,
   FROM avatars a, users u
   WHERE u.user=? AND u.id=a.user",
   $user);

$req->print( $user{'blob'} );
Avatars - reloaded
 ➔   Export to balanced fs (5 formats)

 ➔   Zero SQL queries

 ➔   Storage subsystem

 ➔   static.myopera.com was born
resources
                  (user uploads, binary blobs, ...)




             Pools
      or single servers




                                 URLs
http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_o.png
http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_t.jpg
http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_m.jpg
http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_l.jpg
+                   x
➔   Load             ➔   HTTP::DAV

➔   Flexibility      ➔   Precomp URLs

➔   Static scales!
2. varnish
Varnish
Most popular RSS feeds

My Opera frontpage

Opera Mini approval

Datacenter emergencies
Varnish
Most popular RSS feeds

➔   /desktopteam/blog/

➔   Friends, Groups API

➔   No cookies (remove req.http.cookie)
Varnish
My Opera frontpage

➔   Danger, Will Robinson!

➔   Mangle cookies

➔   Accept-Language headers
Varnish
Opera Mini 5.0 approval

➔   Global coverage

➔   Traffic surge (5x peak, 2x over 24h)
IT NEEDS
            TO BE OUT
            TOMORROW
                 !!!




 THERE
WILL BE A
  PRESS
RELEASE !
Varnish
Opera Mini 5.0 approval

➔   Global coverage

➔   Traffic surge (5x peak, 2x over 24h)

➔   No problems!
Opera Mini “countup” traffic
   Submitted        Approved
   to Apple Store   April, 12th
   March, 23rd
Varnish
Datacenter emergencies
Datacenter emergencies



files.myopera.com




            DC1




                    User Files Storage SAN
Datacenter emergencies



files.myopera.com                       DC2



                LVS + Varnish servers


       DC1




             User Files Storage SAN
~ 1Gbit/s!   Varnish
+                 x
➔   Load              ➔   Chainsaw!

➔   Flexibility       ➔   Purging

➔   Instant scaling
3. geodns
geodns
+                     x
➔   Prototype 1 week   ➔ Accuracy


➔   Geo-scaling        ➔   No DC feedback

➔   Redundant          ➔   Monitoring
Next steps
➔   Search (Solr?)

➔   Batch activity feed

➔   Real connection pooling

➔   … and on ...
Remember!
➔   Team spirit is important

➔   Another level of indirection...

➔   Keep it simple

➔   Keep a log
the heroes
http://my.opera.com/devblog/about/
http://my.opera.com/devblog/
any questions?
                 ?
handout download:

  http://tinyurl.com/surge2010-cosimo



thanks!

Más contenido relacionado

Similar a Surge 2010 - from disaster to stability - scaling my.opera.com

IPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityIPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityCosimo Streppone
 
NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0Cosimo Streppone
 
Deep dive why networking must fundamentally change
Deep dive why networking must fundamentally changeDeep dive why networking must fundamentally change
Deep dive why networking must fundamentally changeInterop
 
Competitive data science: A tale of two web services
Competitive data science: A tale of two web servicesCompetitive data science: A tale of two web services
Competitive data science: A tale of two web servicesDavid Thompson
 
Virt Exchange2k7 Final Frontier V Mworld2007
Virt Exchange2k7 Final Frontier V Mworld2007Virt Exchange2k7 Final Frontier V Mworld2007
Virt Exchange2k7 Final Frontier V Mworld2007Kong Yang
 
Using Omnet++ in Simulating Ad-Hoc Network
Using Omnet++ in Simulating Ad-Hoc Network Using Omnet++ in Simulating Ad-Hoc Network
Using Omnet++ in Simulating Ad-Hoc Network Ahmed Nour
 
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop ServiceCOLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop ServiceThe Linux Foundation
 
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03Diego Ferreyra
 
Packet shaper datasheet 81
Packet shaper datasheet 81Packet shaper datasheet 81
Packet shaper datasheet 81Zalli13
 
Packet shaper datasheet 81
Packet shaper datasheet 81Packet shaper datasheet 81
Packet shaper datasheet 81Zalli13
 
Solving the VDI Storage Problem, WhipTail Technologies
Solving the VDI Storage Problem, WhipTail TechnologiesSolving the VDI Storage Problem, WhipTail Technologies
Solving the VDI Storage Problem, WhipTail Technologiessubtitle
 
Whiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayWhiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayDarren Williams
 
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011Mitsuyoshi Kawabata
 
(ATS4-PLAT07) Interactive Charts Revamped
(ATS4-PLAT07) Interactive Charts Revamped(ATS4-PLAT07) Interactive Charts Revamped
(ATS4-PLAT07) Interactive Charts RevampedBIOVIA
 
Move SAP to Cloud in 3 Easy Steps
Move SAP to Cloud in 3 Easy StepsMove SAP to Cloud in 3 Easy Steps
Move SAP to Cloud in 3 Easy StepsAppZero
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionJason Strate
 
2009 11 11 Byrd A Different Approach To Decom Liability
2009 11 11 Byrd A Different Approach To Decom Liability2009 11 11 Byrd A Different Approach To Decom Liability
2009 11 11 Byrd A Different Approach To Decom LiabilityRobert Byrd
 

Similar a Surge 2010 - from disaster to stability - scaling my.opera.com (20)

IPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalabilityIPW2008 - my.opera.com scalability
IPW2008 - my.opera.com scalability
 
NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0NPW2009 - my.opera.com scalability v2.0
NPW2009 - my.opera.com scalability v2.0
 
Deep dive why networking must fundamentally change
Deep dive why networking must fundamentally changeDeep dive why networking must fundamentally change
Deep dive why networking must fundamentally change
 
Competitive data science: A tale of two web services
Competitive data science: A tale of two web servicesCompetitive data science: A tale of two web services
Competitive data science: A tale of two web services
 
Hosting Proposal3
Hosting Proposal3Hosting Proposal3
Hosting Proposal3
 
Virt Exchange2k7 Final Frontier V Mworld2007
Virt Exchange2k7 Final Frontier V Mworld2007Virt Exchange2k7 Final Frontier V Mworld2007
Virt Exchange2k7 Final Frontier V Mworld2007
 
Using Omnet++ in Simulating Ad-Hoc Network
Using Omnet++ in Simulating Ad-Hoc Network Using Omnet++ in Simulating Ad-Hoc Network
Using Omnet++ in Simulating Ad-Hoc Network
 
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop ServiceCOLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service
 
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03
Implementación de tfs 2010 en entornos complejos (cómo y por qué) v03
 
Packet shaper datasheet 81
Packet shaper datasheet 81Packet shaper datasheet 81
Packet shaper datasheet 81
 
Packet shaper datasheet 81
Packet shaper datasheet 81Packet shaper datasheet 81
Packet shaper datasheet 81
 
Virtual Box Aquarium May09
Virtual Box Aquarium May09Virtual Box Aquarium May09
Virtual Box Aquarium May09
 
Solving the VDI Storage Problem, WhipTail Technologies
Solving the VDI Storage Problem, WhipTail TechnologiesSolving the VDI Storage Problem, WhipTail Technologies
Solving the VDI Storage Problem, WhipTail Technologies
 
Whiptail XLR8r SSD Array
Whiptail XLR8r SSD ArrayWhiptail XLR8r SSD Array
Whiptail XLR8r SSD Array
 
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011
エクストリーム・プログラミング開発事例TOP5 - Agile Japan 2011
 
(ATS4-PLAT07) Interactive Charts Revamped
(ATS4-PLAT07) Interactive Charts Revamped(ATS4-PLAT07) Interactive Charts Revamped
(ATS4-PLAT07) Interactive Charts Revamped
 
Move SAP to Cloud in 3 Easy Steps
Move SAP to Cloud in 3 Easy StepsMove SAP to Cloud in 3 Easy Steps
Move SAP to Cloud in 3 Easy Steps
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a Function
 
2009 11 11 Byrd A Different Approach To Decom Liability
2009 11 11 Byrd A Different Approach To Decom Liability2009 11 11 Byrd A Different Approach To Decom Liability
2009 11 11 Byrd A Different Approach To Decom Liability
 
MySQL Latest News
MySQL Latest NewsMySQL Latest News
MySQL Latest News
 

Más de Cosimo Streppone

How we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaHow we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaCosimo Streppone
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Cosimo Streppone
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
VUG5: Varnish at Opera Software
VUG5: Varnish at Opera SoftwareVUG5: Varnish at Opera Software
VUG5: Varnish at Opera SoftwareCosimo Streppone
 
Velocity 2011 - Our first DDoS attack
Velocity 2011 - Our first DDoS attackVelocity 2011 - Our first DDoS attack
Velocity 2011 - Our first DDoS attackCosimo Streppone
 
Mojolicious: what works and what doesn't
Mojolicious: what works and what doesn'tMojolicious: what works and what doesn't
Mojolicious: what works and what doesn'tCosimo Streppone
 
My Opera meets Varnish, Dec 2009
My Opera meets Varnish, Dec 2009My Opera meets Varnish, Dec 2009
My Opera meets Varnish, Dec 2009Cosimo Streppone
 
YAPC::EU::2009 - How Opera Software uses Perl
YAPC::EU::2009 - How Opera Software uses PerlYAPC::EU::2009 - How Opera Software uses Perl
YAPC::EU::2009 - How Opera Software uses PerlCosimo Streppone
 

Más de Cosimo Streppone (9)

How we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaHow we use and deploy Varnish at Opera
How we use and deploy Varnish at Opera
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Italian, do you speak it?
Italian, do you speak it?Italian, do you speak it?
Italian, do you speak it?
 
VUG5: Varnish at Opera Software
VUG5: Varnish at Opera SoftwareVUG5: Varnish at Opera Software
VUG5: Varnish at Opera Software
 
Velocity 2011 - Our first DDoS attack
Velocity 2011 - Our first DDoS attackVelocity 2011 - Our first DDoS attack
Velocity 2011 - Our first DDoS attack
 
Mojolicious: what works and what doesn't
Mojolicious: what works and what doesn'tMojolicious: what works and what doesn't
Mojolicious: what works and what doesn't
 
My Opera meets Varnish, Dec 2009
My Opera meets Varnish, Dec 2009My Opera meets Varnish, Dec 2009
My Opera meets Varnish, Dec 2009
 
YAPC::EU::2009 - How Opera Software uses Perl
YAPC::EU::2009 - How Opera Software uses PerlYAPC::EU::2009 - How Opera Software uses Perl
YAPC::EU::2009 - How Opera Software uses Perl
 

Surge 2010 - from disaster to stability - scaling my.opera.com

  • 1. from disaster to stability the scaling challenges of my.opera.com Surge 2010 – Version 3
  • 2.
  • 3. 1999 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  • 4. 2001 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  • 5. 2004 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  • 6. 2007 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  • 7. 2009 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  • 10. crashes every day too many connections!!! NFS volume of doom Team?
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. many improvements since then ➔ Efficient filesystem cache ➔ "Dogpile effect" AKA stampeding AKA ... ➔ Persistent db + memcached connections ➔ Soft counters ➔ Profiling, profiling, …
  • 17. code profiling [DML] time=1237308152, user=, url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/..., name=XWA::User, variable=active, type=module, elapsed=0.068473, host=my.opera.com [DML] time=1237308152, user=, url=/community/, name=XWA::User, variable=, type=module, elapsed=0.015935, host=my.opera.com [DML] ...
  • 18. top time-intensive modules XWA::User::Sidebar 2024.919s (27.2%, 0.28 s/call) XWA::User 1778.445s (23.9%, 0.09 s/call) XWA::User::Journal 1121.224s (15.1%, 0.24 s/call) XWA::User::Album 321.522s ( 4.3%, 0.17 s/call) XWA::User::Journal::Search 223.477s ( 3.0%, 20.32 s/call) XWA::User::Comments 188.011s ( 2.5%, 0.05 s/call) XWA::Skins 180.486s ( 2.4%, 0.49 s/call) XWA::User::JournalArchive 159.525s ( 2.1%, 4.43 s/call) XWA::User::Posts 146.644s ( 2.0%, 0.45 s/call) XWA::User::Picture 141.324s ( 1.9%, 0.10 s/call) XWA::Albums 93.740s ( 1.3%, 2.04 s/call) XWA::Journals 92.390s ( 1.2%, 2.37 s/call)
  • 19. many improvements since then ➔ YSlow? ➔ The Expires header is your friend! ➔ Hot MyISAM tables converted to InnoDB ➔ MySQL Master/Master setup ➔ Jet Profiler
  • 20.
  • 24. Avatars - 2007 75% /<user-name>/avatar.pl /<user-name>/avatar.pl?xscale=8192 (!)
  • 25. Avatars wtf!? my $sql = DBConnect('master'); my %user = $sql->get( "SELECT a.blob, a.filename, FROM avatars a, users u WHERE u.user=? AND u.id=a.user", $user); $req->print( $user{'blob'} );
  • 26. Avatars - reloaded ➔ Export to balanced fs (5 formats) ➔ Zero SQL queries ➔ Storage subsystem ➔ static.myopera.com was born
  • 27. resources (user uploads, binary blobs, ...) Pools or single servers URLs http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_o.png http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_t.jpg http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_m.jpg http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_l.jpg
  • 28. + x ➔ Load ➔ HTTP::DAV ➔ Flexibility ➔ Precomp URLs ➔ Static scales!
  • 30.
  • 31. Varnish Most popular RSS feeds My Opera frontpage Opera Mini approval Datacenter emergencies
  • 32. Varnish Most popular RSS feeds ➔ /desktopteam/blog/ ➔ Friends, Groups API ➔ No cookies (remove req.http.cookie)
  • 33. Varnish My Opera frontpage ➔ Danger, Will Robinson! ➔ Mangle cookies ➔ Accept-Language headers
  • 34. Varnish Opera Mini 5.0 approval ➔ Global coverage ➔ Traffic surge (5x peak, 2x over 24h)
  • 35.
  • 36. IT NEEDS TO BE OUT TOMORROW !!! THERE WILL BE A PRESS RELEASE !
  • 37. Varnish Opera Mini 5.0 approval ➔ Global coverage ➔ Traffic surge (5x peak, 2x over 24h) ➔ No problems!
  • 38. Opera Mini “countup” traffic Submitted Approved to Apple Store April, 12th March, 23rd
  • 40. Datacenter emergencies files.myopera.com DC1 User Files Storage SAN
  • 41. Datacenter emergencies files.myopera.com DC2 LVS + Varnish servers DC1 User Files Storage SAN
  • 42. ~ 1Gbit/s! Varnish
  • 43. + x ➔ Load ➔ Chainsaw! ➔ Flexibility ➔ Purging ➔ Instant scaling
  • 46. + x ➔ Prototype 1 week ➔ Accuracy ➔ Geo-scaling ➔ No DC feedback ➔ Redundant ➔ Monitoring
  • 47. Next steps ➔ Search (Solr?) ➔ Batch activity feed ➔ Real connection pooling ➔ … and on ...
  • 48. Remember! ➔ Team spirit is important ➔ Another level of indirection... ➔ Keep it simple ➔ Keep a log
  • 51. handout download: http://tinyurl.com/surge2010-cosimo thanks!