Erlang factory 2011 london

Paolo Negri
Paolo NegriCo-Founder & CTO at Contentful
Designing for Scale
    Knut Nesheim @knutin
   Paolo Negri @hungryblank
About this talk

2 developers and erlang
           vs.
  1 million daily users
Social Games
Flash client (game)   HTTP API
Social Games
Flash client


               • Game actions need to be
                 persisted and validated

               • 1 API call every 2 secs
Social Games
                            HTTP API



• @ 1 000 000 daily users
• 5000 HTTP reqs/sec
• more than 90% writes
The hard nut


http://www.flickr.com/photos/mukluk/315409445/
Users we expect
                                        DAU

                       1000000



                        750000
  “Monster World”
       daily users
july - december 2010    500000



                        250000



                             0
                                 July         December
Users we have
                                  DAU



   New game
   daily users
march - june 2011



                    50
                     0
                         march april    may   june
What to do?


1 Simulate users
Simulating users

• Must not be too synthetic (like
  apachebench)
• Must look like a meaningful game session
• Users must come online at a given rate and
  play
Tsung


         •    Multi protocol (HTTP, XMPP) benchmarking tool

         •    Able to test non trivial call sequences

         •    Can actually simulate a scripted gaming session




http://tsung.erlang-projects.org/
Tsung - configuration
       Fixed content                    Dynamic parameter
       <request subst="true">
       <http url="http://server.wooga.com/users/%
       %ts_user_server:get_unique_id%%/resources/column/5/
       row/14?%%_routing_key%%"
       method="POST" contents='{"parameter1":"value1"}'>
       </http>
       </request>



http://tsung.erlang-projects.org/
Tsung - configuration
         • Not something you fancy writing
         • We’re in development, calls change and we
              constantly add new calls
         • A session might contain hundreds of
              requests
         • All the calls must refer to a consistent game
              state

http://tsung.erlang-projects.org/
Tsung - configuration
         • From our ruby test code
         user.resources(:column => 5, :row => 14)

         • Same as
          <request subst="true">
          <http url="http://server.wooga.com/users/%
          %ts_user_server:get_unique_id%%/resources/column/5/
          row/14?%%_routing_key%%"
          method="POST" contents='{"parameter1":"value1"}'>
          </http>
          </request>
http://tsung.erlang-projects.org/
Tsung - configuration

         • Session                    A session is a
          • requests                group of requests

         • Arrival phase            Sessions arrive in
          • duration                  phases with a
                                     specific arrival
          • arrival rate                  rate


http://tsung.erlang-projects.org/
Tsung - setup
            Application                         Benchmarking
                                                   cluster
             app server                             tsung
                                    HTTP reqs      worker
                                                        ssh
             app server
                                                    tsung
                                                   master

             app server

http://tsung.erlang-projects.org/
Tsung

         • Generates ~ 2500 reqs/sec on AWS
              m1.large
         • Flexible but hard to extend
         • Code base rather obscure

http://tsung.erlang-projects.org/
What to do?


2 Collect metrics
Tsung-metrics

         • Tsung collects measures and provides
              reports
         • But these measure include tsung network/
              cpu congestion itself
         • Tsung machines aren’t a good point of view

http://tsung.erlang-projects.org/
HAproxy
Application                         Benchmarking
                                       cluster
app server                              tsung
                        HTTP reqs      worker
              haproxy                       ssh
app server
                                        tsung
                                       master

app server
HAproxy

  “The Reliable, High Performance TCP/
  HTTP Load Balancer”
• Placed in front of http servers
• Load balancing
• Fail over
HAproxy - syslog


• Easy to setup
• Efficient (UDP)
• Provides 5 timings per each request
HAproxy
  • Time to receive request from client
Application                        Benchmarking
                                      cluster
app server                                 tsung
                   haproxy                worker
                                               ssh
app server
                                           tsung
                                          master
HAproxy
    • Time spent in HAproxy queue
Application                     Benchmarking
                                   cluster
app server                           tsung
                 haproxy            worker
                                         ssh
app server
                                     tsung
                                    master
HAproxy
    • Time to connect to the server
Application                       Benchmarking
                                     cluster
app server                             tsung
                  haproxy             worker
                                           ssh
app server
                                       tsung
                                      master
HAproxy
• Time to receive response headers from server
Application                        Benchmarking
                                      cluster
 app server                             tsung
                   haproxy             worker
                                            ssh
 app server
                                        tsung
                                       master
HAproxy
• Total session duration time
Application                     Benchmarking
                                   cluster
 app server                         tsung
                    haproxy        worker
                                        ssh
 app server
                                    tsung
                                   master
HAproxy - syslog

• Application urls identify directly server call
• Application urls are easy to parse
• Processing haproxy syslog gives per call
  metric
What to do?


3 Understand metrics
Reading/aggregating
       metrics

• Python to parse/normalize syslog
• R language to analyze/visualize data
• R language console to interactively explore
  benchmarking results
R is a free software environment for
 statistical computing and graphics.
What you get

• Aggregate performance levels (throughput,
  latency)
• Detailed performance per call type
• Statistical analysis (outliers, trends,
  regression, correlation, frequency, standard
  deviation)
What you get
What to do?


4 go deeper
Digging into the data

• From HAproxy log analisys one call
  emerged as exceptionally slow
• Using eprof we were able to determine
  that most of the time was spent in a redis
  query fetching many keys (MGET)
Tracing erldis query
• More than 60% of runtime is spent
  manipulating the socket
• gen_tcp:recv/2 is the culprit
• But why is it called so many times?
Understanding the
     redis protocol
C: LRANGE mylist 0 2
                       <<"*2rn
s: *2                     $5rn
s: $5                     Hellorn
                          $5rn
s: Hello                  Worldrn">>
s: $5
s: World
Understanding erldis
• recv_value/2 is used in the protocol parser
  to get the next data to parse
A different approach
• Two ways to use gen_tcp: active or passive
• In passive, use gen_tcp:recv to explicitly ask
  for data, blocking
• In active, gen_tcp will send the controlling
  process a message when there is data
• Hybrid: active once
A different approach

• Is active sockets faster?
• Proof-of-concept proved active socket
  faster
• Change erldis or write a new driver?
A different approach

• Radical change => new driver
• Keep Erldis queuing approach
• Think about error handling from the start
• Use active sockets
A different approach
• Active socket, parse partial replies
Circuit breaker
• eredis has a simple circuit breaker for when
  Redis is down/unreachable
• eredis returns immediately to clients if
  connection is down
• Reconnecting is done outside request/
  response handling
• Robust handling of errors
Benchmarking eredis

• Redis driver critical for our application
• Must perform well
• Must be stable
• How do we test this?
Basho bench

• Basho produces the Riak KV store
• Basho build a tool to test KV servers
• Basho bench
• We used Basho bench to test eredis
Basho bench
• Create callback module
Basho bench
• Configuration term-file
Basho bench output
eredis is open source




https://github.com/wooga/eredis
What to do?


5 measure internals
Measure internals

HAproxy point of view is valid but how to
measure internals of our application, while
we are live, without the overhead of
tracing?
Think Basho bench

• Basho bench can benchmark a redis driver
• Redis is very fast, 100K ops/sec
• Basho bench overhead is acceptable
• The code is very simple
Cherry pick ideas from
    Basho Bench
• Creates a histogram of timings on the fly,
  reducing the number of data points
• Dumps to disk every N seconds
• Allows statistical tools to work on already
  aggregated data
• Near real-time, from event to stats in N+5
  seconds
Homegrown stats
• Measures latency from the edges of our
  system (excludes HTTP handling)
• And at interesting points inside the system
• Statistical analysis using R
• Correlate with HAproxy data
• Produces graphs and data specific to our
  application
Homegrown stats
Recap

  Measure:
• From an external point of view (HAproxy)
• At the edge of the system (excluding
  HTTP handling)
• Internals in the single process (eprof)
Recap
  Analyze:
• Aggregated measures
• Statistical properties of measures
 • standard deviation
 • distribution
 • trends
Thanks!

http://www.wooga.com/jobs

knut.nesheim@wooga.com       @knutin
paolo.negri@wooga.com    @hungryblank
1 de 58

Recomendados

Erlang factory SF 2011 "Erlang and the big switch in social games" por
Erlang factory SF 2011 "Erlang and the big switch in social games"Erlang factory SF 2011 "Erlang and the big switch in social games"
Erlang factory SF 2011 "Erlang and the big switch in social games"Paolo Negri
1.3K vistas64 diapositivas
Combining the strength of erlang and Ruby por
Combining the strength of erlang and RubyCombining the strength of erlang and Ruby
Combining the strength of erlang and RubyMartin Rehfeld
1.9K vistas17 diapositivas
FunctionalConf '16 Robert Virding Erlang Ecosystem por
FunctionalConf '16 Robert Virding Erlang EcosystemFunctionalConf '16 Robert Virding Erlang Ecosystem
FunctionalConf '16 Robert Virding Erlang EcosystemRobert Virding
672 vistas36 diapositivas
Erlang as a cloud citizen, a fractal approach to throughput por
Erlang as a cloud citizen, a fractal approach to throughputErlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputPaolo Negri
1.4K vistas69 diapositivas
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent... por
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...Amazon Web Services
10.3K vistas55 diapositivas
Storm presentation por
Storm presentationStorm presentation
Storm presentationShyam Raj
659 vistas27 diapositivas

Más contenido relacionado

La actualidad más candente

Storm por
StormStorm
Stormnathanmarz
8.2K vistas84 diapositivas
Learning Stream Processing with Apache Storm por
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
1.9K vistas100 diapositivas
Analysis big data by use php with storm por
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm毅 吕
365 vistas25 diapositivas
Benchmarking at Parse por
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
1.2K vistas35 diapositivas
Introduction to Storm por
Introduction to StormIntroduction to Storm
Introduction to StormEugene Dvorkin
5.4K vistas82 diapositivas
Message:Passing - lpw 2012 por
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012Tomas Doran
1.5K vistas150 diapositivas

La actualidad más candente(17)

Learning Stream Processing with Apache Storm por Eugene Dvorkin
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin1.9K vistas
Analysis big data by use php with storm por 毅 吕
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
毅 吕365 vistas
Benchmarking at Parse por Travis Redman
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman1.2K vistas
Message:Passing - lpw 2012 por Tomas Doran
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012
Tomas Doran1.5K vistas
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures por Lightbend
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Lightbend23.6K vistas
Akka-chan's Survival Guide for the Streaming World por Konrad Malawski
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski5.4K vistas
Apache Storm por masifqadri
Apache StormApache Storm
Apache Storm
masifqadri592 vistas
Atlanta Hadoop Users Meetup 09 21 2016 por Chris Fregly
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016
Chris Fregly624 vistas
Introduction to Apache Storm - Concept & Example por Dung Ngua
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
Dung Ngua452 vistas
Python Raster Function - Esri Developer Conference - 2015 por akferoz07
Python Raster Function - Esri Developer Conference - 2015Python Raster Function - Esri Developer Conference - 2015
Python Raster Function - Esri Developer Conference - 2015
akferoz071.1K vistas
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05) por Tibo Beijen
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen148 vistas
Introduction to the intermediate Python - v1.1 por Andrei KUCHARAVY
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
Andrei KUCHARAVY847 vistas
spray: REST on Akka (Scala Days) por sirthias
spray: REST on Akka (Scala Days)spray: REST on Akka (Scala Days)
spray: REST on Akka (Scala Days)
sirthias15.1K vistas
The inherent complexity of stream processing por nathanmarz
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
nathanmarz3.3K vistas
Webinar: Queues with RabbitMQ - Lorna Mitchell por Codemotion
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna Mitchell
Codemotion646 vistas

Similar a Erlang factory 2011 london

Distributed app development with nodejs and zeromq por
Distributed app development with nodejs and zeromqDistributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromqRuben Tan
14.5K vistas33 diapositivas
Scalable Web Apps por
Scalable Web AppsScalable Web Apps
Scalable Web AppsPiotr Pelczar
6.6K vistas40 diapositivas
Api crash por
Api crashApi crash
Api crashTony Nguyen
117 vistas9 diapositivas
Api crash por
Api crashApi crash
Api crashFraboni Ec
93 vistas9 diapositivas
Api crash por
Api crashApi crash
Api crashHoang Nguyen
305 vistas9 diapositivas
Api crash por
Api crashApi crash
Api crashLuis Goldster
107 vistas9 diapositivas

Similar a Erlang factory 2011 london(20)

Distributed app development with nodejs and zeromq por Ruben Tan
Distributed app development with nodejs and zeromqDistributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromq
Ruben Tan14.5K vistas
Introduction to Apache NiFi And Storm por Jungtaek Lim
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
Jungtaek Lim1.4K vistas
Deploying Apache Flume to enable low-latency analytics por DataWorks Summit
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit13.7K vistas
How to build a Neutron Plugin (stadium edition) por Salvatore Orlando
How to build a Neutron Plugin (stadium edition)How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)
Salvatore Orlando744 vistas
How to write a Neutron plugin (stadium edition) por salv_orlando
How to write a Neutron plugin (stadium edition)How to write a Neutron plugin (stadium edition)
How to write a Neutron plugin (stadium edition)
salv_orlando821 vistas
3.2 Streaming and Messaging por 振东 刘
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘219 vistas
Debugging Microservices - key challenges and techniques - Microservices Odesa... por Lohika_Odessa_TechTalks
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Push jobs: an orchestration building block for private Chef por Chef Software, Inc.
Push jobs: an orchestration building block for private ChefPush jobs: an orchestration building block for private Chef
Push jobs: an orchestration building block for private Chef
Chef Software, Inc.8.3K vistas

Más de Paolo Negri

Turning the web stack upside down rethinking how data flows through systems por
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systemsPaolo Negri
231 vistas47 diapositivas
AWS Lambda in infrastructure por
AWS Lambda in infrastructureAWS Lambda in infrastructure
AWS Lambda in infrastructurePaolo Negri
2.2K vistas18 diapositivas
Erlang introduction geek2geek Berlin por
Erlang introduction geek2geek BerlinErlang introduction geek2geek Berlin
Erlang introduction geek2geek BerlinPaolo Negri
1.4K vistas22 diapositivas
Getting real with erlang por
Getting real with erlangGetting real with erlang
Getting real with erlangPaolo Negri
4.7K vistas58 diapositivas
Scaling Social Games por
Scaling Social GamesScaling Social Games
Scaling Social GamesPaolo Negri
1.6K vistas49 diapositivas
Mongrel2, a short introduction por
Mongrel2, a short introductionMongrel2, a short introduction
Mongrel2, a short introductionPaolo Negri
1.9K vistas43 diapositivas

Más de Paolo Negri(10)

Turning the web stack upside down rethinking how data flows through systems por Paolo Negri
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systems
Paolo Negri231 vistas
AWS Lambda in infrastructure por Paolo Negri
AWS Lambda in infrastructureAWS Lambda in infrastructure
AWS Lambda in infrastructure
Paolo Negri2.2K vistas
Erlang introduction geek2geek Berlin por Paolo Negri
Erlang introduction geek2geek BerlinErlang introduction geek2geek Berlin
Erlang introduction geek2geek Berlin
Paolo Negri1.4K vistas
Getting real with erlang por Paolo Negri
Getting real with erlangGetting real with erlang
Getting real with erlang
Paolo Negri4.7K vistas
Scaling Social Games por Paolo Negri
Scaling Social GamesScaling Social Games
Scaling Social Games
Paolo Negri1.6K vistas
Mongrel2, a short introduction por Paolo Negri
Mongrel2, a short introductionMongrel2, a short introduction
Mongrel2, a short introduction
Paolo Negri1.9K vistas
RabbitMQ with python and ruby RuPy 2009 por Paolo Negri
RabbitMQ with python and ruby RuPy 2009RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009
Paolo Negri5.9K vistas
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U... por Paolo Negri
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Paolo Negri6.6K vistas
SimpleDb, an introduction por Paolo Negri
SimpleDb, an introductionSimpleDb, an introduction
SimpleDb, an introduction
Paolo Negri2.9K vistas
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs por Paolo Negri
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
Paolo Negri9.4K vistas

Último

virtual reality.pptx por
virtual reality.pptxvirtual reality.pptx
virtual reality.pptxG036GaikwadSnehal
14 vistas15 diapositivas
Special_edition_innovator_2023.pdf por
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
18 vistas6 diapositivas
Voice Logger - Telephony Integration Solution at Aegis por
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
39 vistas1 diapositiva
NET Conf 2023 Recap por
NET Conf 2023 RecapNET Conf 2023 Recap
NET Conf 2023 RecapLee Richardson
10 vistas71 diapositivas
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
40 vistas69 diapositivas
Future of AR - Facebook Presentation por
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentationssuserb54b561
15 vistas27 diapositivas

Último(20)

Special_edition_innovator_2023.pdf por WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 vistas
Voice Logger - Telephony Integration Solution at Aegis por Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 vistas
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 vistas
Future of AR - Facebook Presentation por ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 vistas
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 por IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
SAP Automation Using Bar Code and FIORI.pdf por Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Virendra Rai, PMP23 vistas
STPI OctaNE CoE Brochure.pdf por madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 vistas
6g - REPORT.pdf por Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 vistas
Serverless computing with Google Cloud (2023-24) por wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 vistas
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... por Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
Jasper Oosterveld19 vistas
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 vistas
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive por Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive

Erlang factory 2011 london

  • 1. Designing for Scale Knut Nesheim @knutin Paolo Negri @hungryblank
  • 2. About this talk 2 developers and erlang vs. 1 million daily users
  • 3. Social Games Flash client (game) HTTP API
  • 4. Social Games Flash client • Game actions need to be persisted and validated • 1 API call every 2 secs
  • 5. Social Games HTTP API • @ 1 000 000 daily users • 5000 HTTP reqs/sec • more than 90% writes
  • 7. Users we expect DAU 1000000 750000 “Monster World” daily users july - december 2010 500000 250000 0 July December
  • 8. Users we have DAU New game daily users march - june 2011 50 0 march april may june
  • 9. What to do? 1 Simulate users
  • 10. Simulating users • Must not be too synthetic (like apachebench) • Must look like a meaningful game session • Users must come online at a given rate and play
  • 11. Tsung • Multi protocol (HTTP, XMPP) benchmarking tool • Able to test non trivial call sequences • Can actually simulate a scripted gaming session http://tsung.erlang-projects.org/
  • 12. Tsung - configuration Fixed content Dynamic parameter <request subst="true"> <http url="http://server.wooga.com/users/% %ts_user_server:get_unique_id%%/resources/column/5/ row/14?%%_routing_key%%" method="POST" contents='{"parameter1":"value1"}'> </http> </request> http://tsung.erlang-projects.org/
  • 13. Tsung - configuration • Not something you fancy writing • We’re in development, calls change and we constantly add new calls • A session might contain hundreds of requests • All the calls must refer to a consistent game state http://tsung.erlang-projects.org/
  • 14. Tsung - configuration • From our ruby test code user.resources(:column => 5, :row => 14) • Same as <request subst="true"> <http url="http://server.wooga.com/users/% %ts_user_server:get_unique_id%%/resources/column/5/ row/14?%%_routing_key%%" method="POST" contents='{"parameter1":"value1"}'> </http> </request> http://tsung.erlang-projects.org/
  • 15. Tsung - configuration • Session A session is a • requests group of requests • Arrival phase Sessions arrive in • duration phases with a specific arrival • arrival rate rate http://tsung.erlang-projects.org/
  • 16. Tsung - setup Application Benchmarking cluster app server tsung HTTP reqs worker ssh app server tsung master app server http://tsung.erlang-projects.org/
  • 17. Tsung • Generates ~ 2500 reqs/sec on AWS m1.large • Flexible but hard to extend • Code base rather obscure http://tsung.erlang-projects.org/
  • 18. What to do? 2 Collect metrics
  • 19. Tsung-metrics • Tsung collects measures and provides reports • But these measure include tsung network/ cpu congestion itself • Tsung machines aren’t a good point of view http://tsung.erlang-projects.org/
  • 20. HAproxy Application Benchmarking cluster app server tsung HTTP reqs worker haproxy ssh app server tsung master app server
  • 21. HAproxy “The Reliable, High Performance TCP/ HTTP Load Balancer” • Placed in front of http servers • Load balancing • Fail over
  • 22. HAproxy - syslog • Easy to setup • Efficient (UDP) • Provides 5 timings per each request
  • 23. HAproxy • Time to receive request from client Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 24. HAproxy • Time spent in HAproxy queue Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 25. HAproxy • Time to connect to the server Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 26. HAproxy • Time to receive response headers from server Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 27. HAproxy • Total session duration time Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 28. HAproxy - syslog • Application urls identify directly server call • Application urls are easy to parse • Processing haproxy syslog gives per call metric
  • 29. What to do? 3 Understand metrics
  • 30. Reading/aggregating metrics • Python to parse/normalize syslog • R language to analyze/visualize data • R language console to interactively explore benchmarking results
  • 31. R is a free software environment for statistical computing and graphics.
  • 32. What you get • Aggregate performance levels (throughput, latency) • Detailed performance per call type • Statistical analysis (outliers, trends, regression, correlation, frequency, standard deviation)
  • 34. What to do? 4 go deeper
  • 35. Digging into the data • From HAproxy log analisys one call emerged as exceptionally slow • Using eprof we were able to determine that most of the time was spent in a redis query fetching many keys (MGET)
  • 36. Tracing erldis query • More than 60% of runtime is spent manipulating the socket • gen_tcp:recv/2 is the culprit • But why is it called so many times?
  • 37. Understanding the redis protocol C: LRANGE mylist 0 2 <<"*2rn s: *2 $5rn s: $5 Hellorn $5rn s: Hello Worldrn">> s: $5 s: World
  • 38. Understanding erldis • recv_value/2 is used in the protocol parser to get the next data to parse
  • 39. A different approach • Two ways to use gen_tcp: active or passive • In passive, use gen_tcp:recv to explicitly ask for data, blocking • In active, gen_tcp will send the controlling process a message when there is data • Hybrid: active once
  • 40. A different approach • Is active sockets faster? • Proof-of-concept proved active socket faster • Change erldis or write a new driver?
  • 41. A different approach • Radical change => new driver • Keep Erldis queuing approach • Think about error handling from the start • Use active sockets
  • 42. A different approach • Active socket, parse partial replies
  • 43. Circuit breaker • eredis has a simple circuit breaker for when Redis is down/unreachable • eredis returns immediately to clients if connection is down • Reconnecting is done outside request/ response handling • Robust handling of errors
  • 44. Benchmarking eredis • Redis driver critical for our application • Must perform well • Must be stable • How do we test this?
  • 45. Basho bench • Basho produces the Riak KV store • Basho build a tool to test KV servers • Basho bench • We used Basho bench to test eredis
  • 46. Basho bench • Create callback module
  • 49. eredis is open source https://github.com/wooga/eredis
  • 50. What to do? 5 measure internals
  • 51. Measure internals HAproxy point of view is valid but how to measure internals of our application, while we are live, without the overhead of tracing?
  • 52. Think Basho bench • Basho bench can benchmark a redis driver • Redis is very fast, 100K ops/sec • Basho bench overhead is acceptable • The code is very simple
  • 53. Cherry pick ideas from Basho Bench • Creates a histogram of timings on the fly, reducing the number of data points • Dumps to disk every N seconds • Allows statistical tools to work on already aggregated data • Near real-time, from event to stats in N+5 seconds
  • 54. Homegrown stats • Measures latency from the edges of our system (excludes HTTP handling) • And at interesting points inside the system • Statistical analysis using R • Correlate with HAproxy data • Produces graphs and data specific to our application
  • 56. Recap Measure: • From an external point of view (HAproxy) • At the edge of the system (excluding HTTP handling) • Internals in the single process (eprof)
  • 57. Recap Analyze: • Aggregated measures • Statistical properties of measures • standard deviation • distribution • trends
  • 58. Thanks! http://www.wooga.com/jobs knut.nesheim@wooga.com @knutin paolo.negri@wooga.com @hungryblank