Caching has been a 'hot' topic for a few years. But caching takes more than merely taking data and putting it in a cache : the right caching techniques can improve performance and reduce load significantly. But we'll also look at some major pitfalls, showing that caching the wrong way can bring down your site. If you're looking for a clear explanation about various caching techniques and tools like Memcached, Nginx and Varnish, as well as ways to deploy them in an efficient way, this talk is for you.
12. Who am I ?
Wim Godden (@wimgtr)
Founder of Cu.be Solutions (http://cu.be)
Open Source developer since 1997
Developer of OpenX, PHPCompatibility, Nginx SLIC, ...
Speaker at PHP and Open Source conferences
13. Who are you ?
Developers ?
System/network engineers ?
Managers ?
Caching experience ?
14. Goals of this talk
Everything about caching and tuning
A few techniques
How-to
How-NOT-to
Lot of ways, this is just one ;-)
→ Increase reliability, performance and scalability
5 visitors/day → 500.000 visitors/day
(Don't expect miracle cure !)
16. Test page
3 DB-queries
select firstname, lastname, email from user where user_id = 5;
select title, createddate, body from article order by createddate desc limit 5;
select title, createddate, body from article order by score desc limit 5;
Page just outputs result
17. Our base benchmark
Apachebench = useful enough
Result ?
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Limit :
CPU, network
or disk
Limit :
database
20. What is caching ?
x = 5, y = 2
n = 50 Same result
CCAACCHHEE
select
*
from
article
join user
on article.user_id = user.id
order by
created desc
limit
10
Doesn't change
all the time
21. Caching goals
Source of information :
Reduce # of request
Reduce the load
Latency :
Reduce for visitor
Reduce for webserver load
Network :
Send less data to visitor
Hey, that's frontend !
22. Theory of caching
DB
Cache
if ($data == false)
$data = false
get('key')
Page
GET /page
select data from table
$data = returned result
set('key', $data)
24. Caching techniques
#1 : Store entire pages
#2 : Store part of a page (block)
#3 : Store data retrieval (SQL ?)
#4 : Store complex processing result
#? : Your call !
When you have data, think :
Creating time ?
Modification frequency ?
Retrieval frequency ?
25. How to find cacheable data
New projects : start from 'cache everything'
Existing projects :
Check page loading times
Look at MySQL/PgSQL/Oracle/... slow query log
Make a complete query log (don't forget to turn it off !)
→ Use Percona Toolkit (pt-query-digest)
26. Caching storage - Disk
Data with few updates : good
Caching SQL queries : preferably not
DON'T use NFS
high latency
possible problem for sessions : locking issues !
27. Caching storage - Disk / ramdisk
Local
5 Webservers → 5 local caches
How will you keep them synchronized ?
→ Don't say NFS or rsync !
28. Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte
29. Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte
Extremely fast... non-blocking, UDP (!)
32. Memcache - installation & running it
Installation
Distribution package
PECL
Windows : binaries
Running
No config-files
memcached -d -m <mem> -l <ip> -p <port>
ex. : memcached -d -m 2048 -l 172.16.1.91 -p 11211
33. Caching storage - Memcache - some notes
Not fault-tolerant
It's a cache !
Lose session data
Lose shopping cart data
…
Firewall your Memcache port !
34. Memcache in code
<?php
$memcache = new Memcache();
$memcache->addServer('172.16.0.1', 11211);
$memcache->addServer('172.16.0.2', 11211);
$myData = $memcache->get('myKey');
if ($myData === false) {
$myData = GetMyDataFromDB();
// Put it in Memcache as 'myKey', without compression, with no expiration
$memcache->set('myKey', $myData, false, 0);
}
echo $myData;
35. Memcache in code
<?php
$memcache = new Memcache();
$memcache->addServer('172.16.0.1', 11211);
$memcache->addServer('172.16.0.2', 11211);
$myData = $memcache->get('myKey');
if ($memcache->getResultCode() == Memcached::RES_NOTSTORED) {
$myData = GetMyDataFromDB();
// Put it in Memcache as 'myKey', without compression, with no expiration
$memcache->set('myKey', $myData, false, 0);
}
echo $myData;
36. Where's the data ?
Memcache client decides (!)
2 hashing algorithms :
Traditional
Server failure → all data must be rehashed
Consistent
Server failure → 1/x of data must be rehashed (x = # of servers)
No replication !
37. Benchmark with Memcache
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
38. Memcache slabs
(or why Memcache says it's full when it's not)
Multiple slabs of different sizes :
Slab 1 : 40 bytes
Slab 2 : 50 bytes (40 * 1.25)
Slab 3 : 63 bytes (63 * 1.25) (and so on...)
Multiplier (1.25 by default) can be configured
Store a lot of objects of different sizes
→ Certain slabs : full
→ Other slabs : Mostly empty
→ Eviction of data !
39. Memcache - Is it working ?
Connect to it using telnet
"stats" command →
Use Cacti or other monitoring tools
STAT pid 2941
STAT uptime 10878
STAT time 1296074240
STAT version 1.4.5
STAT pointer_size 64
STAT rusage_user 20.089945
STAT rusage_system 58.499106
STAT curr_connections 16
STAT total_connections 276950
STAT connection_structures 96
STAT cmd_get 276931
STAT cmd_set 584148
STAT cmd_flush 0
STAT get_hits 211106
STAT get_misses 65825
STAT delete_misses 101
STAT delete_hits 276829
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 613193860
STAT bytes_written 553991373
STAT limit_maxbytes 268435456
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT bytes 20418140
STAT curr_items 65826
STAT total_items 553856
STAT evictions 0
STAT reclaimed 0
41. Memcache - tip
Page with multiple blocks ?
→ use Memcached::getMulti()
getMulti($array) Hashing
algorithm
But : what if you get some hits and some misses ?
42. Naming your keys
Key names must be unique
Prefix / namespace your keys !
Only letters, numbers and underscore
md5() is useful
→ BUT : harder to debug
Use clear names
Document your key names !
53. Cache warmup scripts
Used to fill your cache when it's empty
Run it before starting Webserver !
2 ways :
Visit all URLs
Error-prone
Hard to maintain
Call all cache-updating methods
Make sure you have a warmup script !
54. Cache stampeding - what about locking ?
Seems like a nice idea, but...
While lock in place
What if the process that created the lock fails ?
55. So...
DON'T DELETE FROM CACHE
&
DON'T EXPIRE FROM CACHE
(unless you know you'll never store it again)
56. Quick-tip
Start small → disk or APC
Move to Memcached/Redis/... later
But : is your code ready ?
→ Use a component like Zend_Cache to switch easily !
72. Varnish - VCL
Varnish Configuration Language
DSL (Domain Specific Language)
→ compiled to C
Hooks into each request
Defines :
Backends (web servers)
ACLs
Load balancing strategy
Can be reloaded while running
73. Varnish - whatever you want
Real-time statistics (varnishtop, varnishhist, ...)
ESI
74. Website X with ESI
Header
Latest news
Article content page
Page content
Navigation
75. Website X with ESI
Top header
(TTL = 2h)
Latest news
Article content page
Page content
Navigation
(TTL = 1h)
76. Website X with ESI
Top header
(TTL = 2h)
Latest news (TTL = 2m)
Article content page
Page content (TTL = 30m)
Navigation
(TTL = 1h)
79. Varnish - what can/can't be cached ?
Can :
Static pages
Images, js, css
Pages or parts of pages that don't change often (ESI)
Can't :
POST requests
Very large files (it's not a file server !)
Requests with Set-Cookie
User-specific content
80. ESI → no caching on user-specific content ?
Logged in as : Wim Godden
5 messages
TTL = 0s ?
TTL=1h TTL = 5min
81. Coming soon...
Based on Nginx
Links Nginx directly with Memcached, Redis, …
Supports sessions !
Reduces number of GET requests (up to 100%)
Requires code changes !
Well-built project → few changes
Effect on webservers and database servers
83. Figures
Second customer (already using Nginx + Memcache) :
No. of web servers : 72 → 8
No. of db servers : 15 → 4
Total : 87 → 12 (86% reduction !)
Latest customer :
Total no. of servers : 1350 → 380
72% reduction → €1.5 million / year
vBulletin test project :
Load dropped by 98% on webservers and db-servers !
84. Availability
Old system :
Stable at 4 customers
Unavailable (copyright issue)
Total rebuild :
Under heavy development
Will become open source
Spare time project
Anyone feel like sponsoring ?
Beta : Oct 14
Final : Jan 15 (?) - on Github
86. Apache - tuning tips
Disable unused modules → fixes 10% of performance issues
Set AllowOverride to None
Disable SymLinksIfOwnerMatch
Site in /var/www/domain.com/subdomain/html
Check on /var, /var/www, /var/www/domain.com, etc.
MinSpareServers, MaxSpareServers, StartServers, MaxClients,
MPM selection → a whole session of its own ;-)
Don't mod_proxy → use Nginx or Varnish !
High load on an SSL-site ? → put SSL on a reverse proxy
87. PHP speed - some tips
Upgrade PHP - every minor release has 5-15% speed gain !
Use an opcode cache
Opcache (5.5 and above)
APC (5.4 and below)
Profile your code
XHProf
Xdebug
Zend Server Z-Ray
89. DB speed - some tips
Avoid dynamic functions
Example :
select col_x from table_y where date_column = CURDATE()
select col_x form table_y where date_column = "2014-10-03"
Use same types for joins
i.e. don't join decimal with int
Index, index, index !
→ But only on fields that are used in where, order by, group by !
RAND() is evil !
Select the right storage engine
Persistent connect is sort-of evil
91. Frontend tuning
1. You optimize backend
2. Frontend engineers messes up → havoc on backend
3. Don't forget : frontend sends requests to backend !
SO...
Care about frontend
Test frontend
Check what requests frontend sends to backend
97. Tuning frontend
Minimize requests
Combine CSS/JavaScript files
Use CSS Sprites (horizontally if possible)
Put CSS at top
Put JavaScript at bottom
Max. no connections
Especially if JavaScript does Ajax (advertising-scripts, …) !
Avoid iFrames
Again : max no. of connections
Don't scale images in HTML
Have a favicon.ico (don't 404 it !)
→ see my blog
98. What else can kill your site ?
Redirect loops
Multiple requests
More load on Webserver
More code to process
Additional latency for visitor
Try to avoid redirects anyway
Watch your logs, but equally important...
Watch the logging process →
Logging = disk I/O → can kill your server !
99. Above all else... be prepared !
Have a monitoring system
Use a cache abstraction layer (disk → Memcache)
Don't install for the worst → prepare for the worst
Have a test-setup
Have fallbacks
→ Turn off non-critical functionality
100. So...
Cache
But : never delete, always push !
Have a warmup script
Monitor your cache
Have an abstraction layer
Apache = fine, Nginx = better
Static pages ? Use Varnish
Tune your frontend → impact on backend !
103. Contact
Twitter @wimgtr
Web http://techblog.wimgodden.be
Slides http://www.slideshare.net/wimg
E-mail wim.godden@cu.be
Please provide feedback via :
http://joind.in/12120
Notas del editor
Caching serves 3 purposes :
- Firstly, to reduce the number of requests or the load at the source of information, which can be a database server, content repository, or anything else.
See slide
&gt;&gt; replication!&lt;&lt;
- Key names must be unique
- Prefix/namespace your keys !
→ might seem overkill at first, but it&apos;s usually necessary after a while, at least for large systems.
→ Oh, and don&apos;t share the same Memcache with multiple projects. Start separate instances for each !)
- Be careful with charachters. Use only letters, numbers and underscore !
- Sometimes MD5() is your friend
→ but : harder to debug
- Use clear names. Remember you can&apos;t make a list of data in the cache, so you&apos;ll need to document them. I know you don&apos;t like to write documentation, but you&apos;ll simply have to in this case.
OK, that sort of covers the basics of how we can use Memcache to cache data for your site. So purely in terms of caching in the code, we&apos;ve done a lot.
→ There&apos;s still things that you can always add. If you&apos;re using Zend Framework or any other major framework, you can cache things like the initialization of the configuration file, creation of the route object (which is a very heavy process if you have a lot of routes).
→ Things like translation and locale can be cached in Zend Framework using 1 command, so do that !
→ But as I said before, the only limit is your imagination...
→ and your common sense !
→ Don&apos;t overdo it... make sure that the cache has enough space left for the things you really need to cache.
If you&apos;re starting a project where the number of hits to the site will be limited at first, but you have no idea on how fast it will grow in the future :
- I would suggest to start by using disk-based caching or APC variable caching
- You can always move to Memcache later when you deploy a second webserver
Keep in mind that your code needs to be ready for this. So you need to use some kind of cache abstraction layer like Zend_Cache
So, we&apos;re serving all those extensions directly from disk and forwarding all the rest to the Apache running on port 8080. We&apos;re also forwarding the Set-Cookie headers and adding a few headers so Apache can log the original IP if it wants to.
→ Something to keep in mind here : you will have 2 logfiles now : 1 from Nginx and 1 from Apache.
→ What you should notice once you start using this type of setup is that your performance from an enduser perspective will remain somewhat the same if your server was not overloaded yet. If it was having issues because of memory problems or too many Apache workers, ...
→ However, you will suddenly need a lot less Apache workers, which will save you quite a lot of memory. That memory can be used for... Memcache maybe ?
So, we&apos;re serving all those extensions directly from disk and forwarding all the rest to the Apache running on port 8080. We&apos;re also forwarding the Set-Cookie headers and adding a few headers so Apache can log the original IP if it wants to.
→ Something to keep in mind here : you will have 2 logfiles now : 1 from Nginx and 1 from Apache.
→ What you should notice once you start using this type of setup is that your performance from an enduser perspective will remain somewhat the same if your server was not overloaded yet. If it was having issues because of memory problems or too many Apache workers, ...
→ However, you will suddenly need a lot less Apache workers, which will save you quite a lot of memory. That memory can be used for... Memcache maybe ?
If one of the backend webservers goes down, you want all traffic to go to the other one ofcourse.
That&apos;s where health checks come in