2013 - Dustin whittle - Escalando PHP en la vida real

SCALING PHP IN THE REAL
WORLD!

PHP is used by the likes of Facebook, Yahoo!, Zynga, Tumblr, Etsy, and Wikipedia. How do the
largest internet companies scale PHP to meet their demand? Join this session and ﬁnd out how
to use the latest tools in PHP for developing high performance applications. We’ll take a look
at common techniques for scaling PHP applications and best practices for proﬁling and
optimizing performance. After this session, you’ll leave prepared to tackle your next
enterprise PHP project.
Why performance matters?
The problems with PHP
Best practice designs
Doing work in the background with queues
Fronting with http caching and a reverse proxy cache
Distributed data caches with redis and memcached
Using the right tool for the job
Tools of the trade
Opcode Caches
Varnish/Squid and reverse proxy caches
Xdebug + Valgrind + WebGrind
AppDynamics
Architecture not applications

Dustin Whittle
• dustinwhittle.com
• @dustinwhittle
• Technologist, Pilot, Skier, Diver,
Sailor, Golfer

What I have worked on
• Developer Evangelist @
• Consultant & Trainer @
• Developer Evangelist @

Did you know
Facebook, Yahoo,
Zynga, Tumblr, Etsy,
and Wikipedia were
all built on PHP?

Why does
performance
matter?

http://www.stevesouders.com/blog/2013/05/09/how-fast-are-we-going-now/
http://www.appdynamics.com/blog/2013/07/04/tools-of-the-trade-for-performance-andload-testing/
The majority of Americans are said to wait in line (in a real shop) for no longer than 15
minutes.
1 out of 4 customers will abandon a webpage that takes more than 4 seconds to load.
http://www.scribd.com/doc/16877317/Shopzillas-Site-Redo-You-Get-What-You-Measure
http://velocityconf.com/velocity2010/public/schedule/detail/13019
http://www.aberdeen.com/Aberdeen-Library/5136/RA-performance-web-application.aspx
http://blog.mozilla.org/metrics/2010/04/05/ﬁrefox-page-load-speed-%E2%80%93-part-ii/

Microsoft found that Bing
searches that were 2 seconds
slower resulted in a 4.3%
drop in revenue per user

http://velocityconf.com/velocity2009/public/schedule/detail/8523

When Mozilla shaved 2.2
seconds off their landing page,
Firefox downloads increased
15.4%

https://blog.mozilla.org/metrics/2010/04/05/ﬁrefox-page-load-speed-%E2%80%93-part-ii/
(60 million more downloads)

Making Barack
Obama’s website 60%
faster increased donation
conversions by 14%

http://kylerush.net/blog/meet-the-obama-campaigns-250-million-fundraising-platform/

Amazon and Walmart
increased revenue 1% for
every 100ms of improvement

http://www.globaldots.com/how-website-speed-affects-conversion-rates/

Amazon and Walmart increased revenue 1% for every 100ms of improvement
http://www.strangeloopnetworks.com/web-performance-infographics/

Performance
directly impacts
the bottom line

http://www.strangeloopnetworks.com/web-performance-infographics/

PHP is slower than
Java, C++, Erlang,
Scala, and Go!

...but there are ways
to scale to handle high
trafﬁc applications

http://phpsadness.com/

Notice how many issues are getting resolved as the PHP team iterates and releases.

What version of
PHP do you run?

Upgrade your PHP
environment to
2013!

One of the easiest improvements you can make to improve performance and stability is to
upgrade your version of PHP. PHP 5.3.x was released in 2009. If you haven’t migrated to PHP
5.4, now is the time! Not only do you benefit from bug fixes and new features, but you will
also see faster response times immediately. See PHP.net to get started.
Installing the latest PHP on Linux - http://php.net/manual/en/install.unix.debian.php
Installing the latest PHP on OSX - http://php.net/manual/en/install.macosx.php
Installing the latest PHP on Windows - http://php.net/manual/en/install.windows.php
Once you’ve finished upgrading PHP, be sure to disable any unused extensions in production
such as xdebug or xhprof.

Nginx + PHP-FPM

http://nginx.org/
http://php-fpm.org/

Use an opcode
cache!

PHP is an interpreted language, which means that every time a PHP page is requested, the
server will interpet the PHP file and compile it into something the machine can understand
(opcode). Opcode caches preserve this generated code in a cache so that it will only need to
be interpreted on the first request. If you aren’t using an opcode cache you’re missing out on
a very easy performance gain. Pick your flavor: APC, Zend Optimizer, XCache, or
Eaccellerator. I highly recommend APC, written by the creator of PHP, Rasmus Lerdorf.

APC

http://php.net/manual/en/book.apc.php

Use autoloading
and PSR-0

Many developers writing object-oriented applications create one PHP source file per class
definition. One of the biggest annoyances in writing PHP is having to write a long list of
needed includes at the beginning of each script (one for each class). PHP re-evaluates these
require/include expressions over and over during the evaluation period each time a file
containing one or more of these expressions is loaded into the runtime. Using an autoloader
will enable you to remove all of your require/include statements and benefit from a
performance improvement. You can even cache the class map of your autoloader in APC for a
small performance improvement.

Symfony2 ClassLoader
Component with APC
Caching

http://symfony.com/doc/master/components/class_loader.html

Scaling beyond a
single server in
PHP

Optimize your
sessions!

While HTTP is stateless, most real life web applications require a way to manage user data. In
PHP, application state is managed via sessions. The default conﬁguration for PHP is to persist
session data to disk. This is extremely slow and not scalable beyond a single server. A better
solution is to store your session data in a database and front with an LRU (Least Recently
Used) cache with Memcached or Redis. If you are super smart you will realize you should limit
your session data size (4096 bytes) and store all session data in a signed or encrypted
cookie.

The default in PHP
is to persist
sessions to disk

http://www.php.net/manual/en/book.session.php

It is better to
store sessions in a
database

http://php.net/manual/en/function.session-set-save-handler.php

Even better is to store
in a database with a
shared cache in front

http://php.net/manual/en/memcached.sessions.php

The best solution is to
limit session size and
store all data in a signed
or encrypted cookie

http://www.hardened-php.net/suhosin/conﬁguration.html#suhosin.session.encrypt

Leverage an inmemory data
cache

Applications usually require data. Data is usually structured and organized in a database.
Depending on the data set and how it is accessed it can be expensive to query. An easy
solution is to cache the result of the ﬁrst query in a data cache like Memcached or Redis. If
the data changes, you invalidate the cache and make another SQL query to get the updated
result set from the database.
I highly recommend the Doctrine ORM for PHP which has built-in caching support for
Memcached or Redis.
There are many use cases for a distributed data cache from caching web service responses
and app conﬁgurations to entire rendered pages.

Memcached.org

http://memcached.org/

• Any data that is expensive to

generate/query and long lived
should be cached

• Web Service Responses
• HTTP Responses
• Database Result Sets
• Conﬁguration Data

Guzzle HTTP client has
built-in support for
caching web service
requests

Doctrine ORM for PHP
has built-in caching
support for Memcached
and Redis

http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/caching.html

Do blocking work
in background
tasks via queues

Often times web applications have to run tasks that can take a while to complete. In most
cases there is no good reason to force the end-user to have to wait for the job to finish. The
solution is to queue blocking work to run in background jobs. Background jobs are jobs that
are executed outside the main flow of your program, and usually handled by a queue or
message system. There are a lot of great solutions that can help solve running backgrounds
jobs. The benefits come in terms of both end-user experience and scaling by writing and
processing long running jobs from a queue. I am a big fan of Resque for PHP that is a simple
toolkit for running tasks from queues. There are a variety of tools that provide queuing or
messaging systems that work well with PHP:

• Resque
• Gearman
• RabbitMQ
• Kafka
• Beanstalkd
• ZeroMQ
• ActiveMQ

Resque

https://github.com/chrisboulton/php-resque
https://github.com/kamisama/php-resque-ex/
https://github.com/kamisama/ResqueBoard/

• Any process that is slow and not
important for the http response
should be queued

• Sending notiﬁcations + posting to
social accounts

• Analytics + Instrumentation
• Updating proﬁles and discovering
friends from social accounts

• Consuming web services like
Twitter Streaming API

http://kamisama.me/2012/10/09/background-jobs-with-php-and-resque-part-1introduction/

http://kamisama.me/2012/10/09/background-jobs-with-php-and-resque-part-2-queuesystem/
http://kamisama.me/2012/10/09/background-jobs-with-php-and-resque-part-3installation/

http://kamisama.me/2012/10/12/background-jobs-with-php-and-resque-part-4managing-worker/
http://kamisama.me/2012/10/13/background-jobs-with-php-and-resque-part-5-creatingjobs/

http://kamisama.me/2012/10/22/background-jobs-with-php-and-resque-part-7manage-workers-with-fresque/
http://kamisama.me/2012/10/22/background-jobs-with-php-and-resque-part-8-aglance-at-php-resque-ex/
http://resqueboard.kamisama.me/

Leverage HTTP
caching

HTTP caching is one of the most misunderstood technologies on the Internet. Go read the
HTTP caching speciﬁcation. Don’t worry, I’ll wait. Seriously, go do it! They solved all of these
caching design problems a few decades ago. It boils down to expiration or invalidation and
when used properly can save your app servers a lot of load. Please read the excellent HTTP
caching guide from Mark Nottingam. I highly recommend using Varnish as a reverse proxy
cache to alleviate load on your app servers.
http://tomayko.com/writings/things-caches-do

There are different kinds of HTTP caches that are useful for different kinds of things. I want
to talk about gateway caches -- or, "reverse proxy caches" -- and consider their effects on
modern, dynamic web application design.
Draw an imaginary vertical line, situated between Alice and Cache, from the very top of the
diagram to the very bottom. That line is your public, internet facing interface. In other words,
everything from Cache back is "your site" as far as Alice is concerned.
Alice is actually Alice's web browser, or perhaps some other kind of HTTP user-agent. There's
also Bob and Carol. Gateway caches are primarily interesting when you consider their effects
across multiple clients.
Cache is an HTTP gateway cache, like Varnish, Squid in reverse proxy mode,Django's cache
framework, or my personal favorite: rack-cache. In theory, this could also be a CDN, like
Akamai.
And that brings us to Backend, a dynamic web application built with only the most modern
and sophisticated web framework. Interpreted language, convenient routing, an ORM, slick
template language, and various other crap -- all adding up to amazing developer

Expires or
Invalidation


Expiration

Most people understand the expiration model well enough. You specify how long a response
should be considered "fresh" by including either or both of the Cache-Control: max-age=N or
Expires

headers. Caches that understand expiration will not make the same request until the

cached version reaches its expiration time and becomes "stale".
A gateway cache dramatically increases the beneﬁts of providing expiration information in
dynamically generated responses. To illustrate, let's suppose Alicerequests a welcome page:

Since the cache has no previous knowledge of the welcome page, it forwards the request to
the backend. The backend generates the response, including a Cache-Control header that
indicates the response should be considered fresh for ten minutes. The cache then shoots the
response back to Alice while storing a copy for itself.
Thirty seconds later, Bob comes along and requests the same welcome page:

The cache recognizes the request, pulls up the stored response, sees that it's still fresh, and
sends the cached response back to Bob, ignoring the backend entirely.
Note that we've experienced no signiﬁcant bandwidth savings here -- the entire response was
delivered to both Alice and Bob. We see savings in CPU usage, database round trips, and the
various other resources required to generate the response at the backend.

Validation

Expiration is ideal when you can get away with it. Unfortunately, there are many situations
where it doesn't make sense, and this is especially true for heavily dynamic web apps where
changes in resource state can occur frequently and unpredictably. The validation model is
designed to support these cases.
Again, we'll suppose Alice makes the initial request for the welcome page:

The Last-Modified and ETag header values are called "cache validators" because they can be
used by the cache on subsequent requests to validate the freshness of the stored response
without requiring the backend to generate or transmit the response body. You don't need
both validators -- either one will do, though both have pros and cons, the details of which are
outside the scope of this document.
So Bob comes along at some point after Alice and requests the welcome page:

The cache sees that it has a copy of the welcome page but can't be sure of its freshness so it
needs to pass the request to the backend. But, before doing so, the cache adds the IfModified-Since

and If-None-Match headers to the request, setting them to the original

response's Last-Modified and ETag values, respectively. These headers make the request
conditional. Once the backend receives the request, it generates the current cache validators,
checks them against the values provided in the request, and immediately shoots back a 304
Not Modified

response without generating the response body. The cache, having validated the

freshness of its copy, is now free to respond to Bob.
This requires a round-trip with the backend, but if the backend generates cache validators up
front and in an efficient manner, it can avoid generating the response body. This can be
extremely signiﬁcant. A backend that takes advantage of validation need not generate the
same response twice.

Expiration and
Invalidation

The expiration and validation models form the basic foundation of HTTP caching. A response
may include expiration information, validation information, both, or neither. So far we've seen
what each looks like independently. It's also worth looking at how things work when they're
combined.
Suppose, again, that Alice makes the initial request:

The backend speciﬁes that the response should be considered fresh for sixty seconds and
also includes the Last-Modified cache validator.
Bob comes along thirty seconds later. Since the response is still fresh, validation is not
required; he's served directly from cache:

But then Carol makes the same request, thirty seconds after Bob:

The cache relies on expiration if at all possible before falling back on validation. Note also
that the 304 Not Modified response includes updated expiration information, so the cache
knows that it has another sixty seconds before it needs to perform another validation request.

• Varnish
• Squid
• Nginx Proxy Cache
• Apache Proxy Cache

The basic mechanisms shown here form the conceptual foundation of caching in HTTP -- not
to mention the Cache architectural constraint as deﬁned by REST. There's more to it, of
course: a cache's behavior can be further constrained with additional Cache-Control directives,
and the Vary header narrows a response's cache suitability based on headers of subsequent
requests. For a more thorough look at HTTP caching, I suggest Mark Nottingham's excellent
Caching Tutorial for Web Authors and Webmasters. Paul James's HTTP Caching is also quite
good and bit shorter. And, of course, the relevant sections of RFC 2616 are highly
recommended.
http://www.mnot.net/cache_docs/

Use Varnish as a reverse
proxy cache to alleviate
load on your app servers.

Optimize your
framework!

Deep diving into the speciﬁcs of optimizing each framework is outside of the scope of this
post, but these principles apply to every framework:
Stay up-to-date with the latest stable version of your favorite framework
Disable features you are not using (I18N, Security, etc)
Enable caching features for view and result set caching

• Stay up-to-date with the latest stable
version of your favorite framework

• Disable features you are not using (I18N,
Security, etc)

• Always use a data cache like
Memcached/Redis

• Enable caching features for view and
database result set caching

• Always use a http cache like Varnish

Taking a large problem
and making it into
manageable smaller
problems

In short, sharding means splitting the dataset, but a good sharding function would make sure that related information is located on the same
server (strong locality), since that drastically simply things. A bad sharding function (splitting by type) would create very weak locality, which
will drastically impact the system performance down the road.

Service Oriented Architecture
Java/Scala/Erlang/Go/Node backend
PHP or pure JavaScript frontend

Companies of great scale
move away from PHP or
create their own variant

Facebook &
HipHop

HipHop 1.0 - HPHPc - Transformed subset of PHP code to C++ for performance
HipHop 2.0 - HHVM - Virtual Machine, Runtime, and JIT for PHP
https://github.com/facebook/hiphop-php
https://github.com/facebook/hiphop-php/wiki

Learn to how to
profile code for
PHP performance

Xdebug is a PHP extension for powerful debugging. It supports stack and function traces,
profiling information and memory allocation and script execution analysis. It allows
developers to easily profile PHP code.
WebGrind is an Xdebug profiling web frontend in PHP5. It implements a subset of the features
of kcachegrind and installs in seconds and works on all platforms. For quick-and-dirty
optimizations it does the job. Here’s a screenshot showing the output from profiling.
XHprof is a function-level hierarchical profiler for PHP with a reporting and UI layer. XHProf is
capable of reporting function-level inclusive and exclusive wall times, memory usage, CPU
times and number of calls for each function. Additionally, it supports the ability to compare
two runs (hierarchical DIFF reports) or aggregate results from multiple runs.
XHprof
XHGui
AppDynamics is application performance management software designed to help dev and ops
troubleshoot problems in complex production apps.

http://xdebug.org/
https://github.com/jokkedk/webgrind
http://valgrind.org/info/tools.html#callgrind

Don’t forget to
optimize the client
side

PHP application performance is only part of the battle
Now that you have optimized the server-side, you can spend time improving the client side!
In modern web applications most of the end-user experience time is spent waiting on the
client side to render. Google has dedicated many resources to helping developers improve
client side performance.

In modern web
applications most of the
latency comes from the
client side

In modern web applications most of the end-user experience time is spent waiting on the
client side to render.

Google PageSpeed

https://developers.google.com/speed/docs/best-practices/rules_intro
Google PageSpeed Insights – PageSpeed Insights analyzes the content of a web page, then
generates suggestions to make that page faster. Reducing page load times can reduce bounce
rates and increase conversion rates.
Google ngx_pagespeed – ngx_pagespeed speeds up your site and reduces page load time.
This open-source nginx server module automatically applies web performance best practices
to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify
your existing content or workﬂow.
Google mod_pagespeed – mod_pagespeed speeds up your site and reduces page load time.
This open-source Apache HTTP server module automatically applies web performance best
practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you
modify your existing content or workﬂow.

Google PageSpeed
Insights

https://developers.google.com/speed/pagespeed/

https://developers.google.com/speed/pagespeed/

Google PageSpeed
API

Available as a service
Extensions/Modules for Apache/Nginx to automatically improve end user experience times

curl "https://www.googleapis.com/
pagespeedonline/v1/runPagespeed?
url=http://dustinwhittle.com/&key=xxx"

https://developers.google.com/speed/docs/insights/v1/getting_started#familiarize

Scalability is about the entire
architecture, not some
minor code optimizations.

Use the right tool for the right job!

Find these slides on
SpeakerDeck
https://speakerdeck.com/
dustinwhittle

https://speakerdeck.com/dustinwhittle

2013 - Dustin whittle - Escalando PHP en la vida real

2013 - Dustin whittle - Escalando PHP en la vida real

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a 2013 - Dustin whittle - Escalando PHP en la vida real

Similar a 2013 - Dustin whittle - Escalando PHP en la vida real (20)

Más de PHP Conference Argentina

Más de PHP Conference Argentina (10)

Último

Último (20)

2013 - Dustin whittle - Escalando PHP en la vida real