Caching has been a ‘hot’ topic for a few years. But caching takes more than merely taking data and putting it in a cache: the right caching techniques can improve performance and reduce load significantly. But we’ll also look at some major pitfalls, showing that caching the wrong way can bring down your site. If you’re looking for a clear explanation about various caching techniques and tools like Memcached, Nginx and Varnish, as well as ways to deploy them in an efficient way, this talk is for you.
2. Who am I ?
Wim Godden (@wimgtr)
Founder of Cu.be Solutions (http://cu.be)
Open Source developer since 1997
Developer of OpenX, PHPCompatibility, ...
Speaker at PHP and Open Source conferences
3. Who are you ?
Developers ?
System/network engineers ?
Managers ?
Caching experience ?
4. Goals of this tutorial
Everything about caching and tuning
A few techniques
How-to
How-NOT-to
→ Increase reliability, performance and scalability
5 visitors/day → 5 million visitors/day
(Don't expect miracle cure !)
7. Test page
3 DB-queries
select firstname, lastname, email from user where user_id = 5;
select title, createddate, body from article order by createddate desc limit 5;
select title, createddate, body from article order by score desc limit 5;
Page just outputs result
8. Our base benchmark
Apachebench = useful enough
Result ?
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Limit : Limit :
CPU, network database
or disk
11. What is caching ?
x = 5, y = 2
Same result
n = 50
CACHE
CACHE
select
*
from
article
join user Doesn't change
on article.user_id = user.id all the time
order by
created desc
limit
10
12. Theory of caching
DB
le
tab
fro
m ult
ta d re s
da turne
e ct re
sel ta =
$da
if ($data == false)
Page
GET /page set(
'key
',
$da fal $data)
ta = se
get(
'key
')
Cache
14. Caching techniques
#1 : Store entire pages
#2 : Store part of a page (block)
#3 : Store data retrieval (SQL ?)
#4 : Store complex processing result
#? : Your call !
When you have data, think :
Creating time ?
Modification frequency ?
Retrieval frequency ?
15. How to find cacheable data
New projects : start from 'cache everything'
Existing projects :
Look at MySQL slow query log
Make a complete query log (don't forget to turn it off !)
→ Use Percona Toolkit (pt-query-digest)
Check page loading times
16. Caching storage - Disk
Data with few updates : good
Caching SQL queries : preferably not
DON'T use NFS or other network file systems
high latency
possible problem for sessions : locking issues !
17. Caching storage - Disk / ramdisk
Local
5 Webservers → 5 local caches
How will you keep them synchronized ?
→ Don't say NFS or rsync !
18. Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte
19. Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte
Extremely fast... non-blocking, UDP (!)
22. Memcache - installation & running it
Installation
Distribution package
PECL
Windows : binaries
Running
No config-files
memcached -d -m <mem> -l <ip> -p <port>
ex. : memcached -d -m 2048 -l 172.16.1.91 -p 11211
23. Caching storage - Memcache - some notes
Not fault-tolerant
It's a cache !
Lose session data
Lose shopping cart data
...
24. Caching storage - Memcache - some notes
Not fault-tolerant
It's a cache !
Lose session data
Lose shopping cart data
…
Firewall your Memcache port !
25. Memcache in code
<?php
$memcache = new Memcache();
$memcache->addServer('172.16.0.1', 11211);
$memcache->addServer('172.16.0.2', 11211);
$myData = $memcache->get('myKey');
if ($myData === false) {
$myData = GetMyDataFromDB();
// Put it in Memcache as 'myKey', without compression, with no expiration
$memcache->set('myKey', $myData, false, 0);
}
echo $myData;
26. Benchmark with Memcache
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
27. Memcache slabs
(or why Memcache says it's full when it's not)
Multiple slabs of different sizes :
Slab 1 : 400 bytes
Slab 2 : 480 bytes (400 * 1.2)
Slab 3 : 576 bytes (480 * 1.2) (and so on...)
Multiplier (1.2 here) can be configured
Store a lot of very large objects
→ Large slabs : full
→ Rest : free
→ Eviction of data !
28. Memcache - Is it working ?
Connect to it using telnet STAT pid 2941
STAT uptime 10878
STAT time 1296074240
"stats" command → STAT version 1.4.5
STAT pointer_size 64
STAT rusage_user 20.089945
Use Cacti or other monitoring tools STAT rusage_system 58.499106
STAT curr_connections 16
STAT total_connections 276950
STAT connection_structures 96
STAT cmd_get 276931
STAT cmd_set 584148
STAT cmd_flush 0
STAT get_hits 211106
STAT get_misses 65825
STAT delete_misses 101
STAT delete_hits 276829
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 613193860
STAT bytes_written 553991373
STAT limit_maxbytes 268435456
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT bytes 20418140
STAT curr_items 65826
STAT total_items 553856
STAT evictions 0
STAT reclaimed 0
30. Memcache - tip
Page with multiple blocks ?
→ use Memcached::getMulti()
Hashing
getMulti($array)
algorithm
But : what if you get some hits and some misses ?
41. Cache warmup scripts
Used to fill your cache when it's empty
Run it before starting Webserver !
2 ways :
Visit all URLs
Error-prone
Hard to maintain
Call all cache-updating methods
Make sure you have a warmup script !
42. Cache stampeding - what about locking ?
Seems like a nice idea, but...
While lock in place
What if the process that created the lock fails ?
55. Varnish - VCL
Varnish Configuration Language
DSL (Domain Specific Language)
→ compiled to C
Hooks into each request
Defines :
Backends (web servers)
ACLs
Load balancing strategy
Can be reloaded while running
56. Varnish - whatever you want
Real-time statistics (varnishtop, varnishhist, ...)
ESI
57. Varnish - ESI
Perfect for caching pages
Header (TTL : 60 min) In your article page output :
/top <esi:include src="/top"/>
<esi:include src="/nav"/>
Latest news (TTL : 2 min) /news <esi:include src="/news"/>
<esi:include src="/article/732"/>
Navigation Article content page In your Varnish config :
(TTL : sub vcl_fetch {
60 min) Article content (TTL : 15 min) if (req.url == "/news") {
/nav /article/732 esi; /* Do ESI processing */
set obj.ttl = 2m;
} elseif (req.url == "/nav") {
esi;
set obj.ttl = 1m;
} elseif ….
….
}
58. Varnish with ESI - hold on tight !
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
Nginx + PHP-FPM + MC 11700 57 11200 112
Varnish - - 11200 4200
59. Varnish - what can/can't be cached ?
Can :
Static pages
Images, js, css
Pages or parts of pages that don't change often (ESI)
Can't :
POST requests
Very large files (it's not a file server !)
Requests with Set-Cookie
User-specific content
60. ESI → no caching on user-specific content ?
Logged in as : Wim Godden
TTL = 0s ?
5 messages
TTL=1h TTL = 5min
61. Coming soon...
Based on Nginx
Reduces load by 50 – 95%
Requires code changes !
Well-built project → few changes
Effect on webservers and database servers
64. Figures
Second customer (already using Nginx + Memcache) :
No. of web servers : 72 → 8
No. of db servers : 15 → 4
Total : 87 → 12 (86% reduction !)
Latest customer :
Total no. of servers : 1350 → 380
72% reduction → €1.5 million / year
vBulletin test project :
Load dropped by 98% on webservers and db-servers !
65. Availability
Old system : stable at 4 customers
Total rebuild : still under heavy development
Beta : Sep 2013
Final : End 2013
66. PHP speed - some tips
Upgrade PHP - every minor release has 5-15% speed gain !
Use an opcode cache (APC, eAccelerator, XCache)
67. DB speed - some tips
Use same types for joins
i.e. don't join decimal with int
RAND() is evil !
count(*) is evil in InnoDB without a where clause !
Persistent connect is sort-of evil
69. Frontend tuning
1. You optimize backend
2. Frontend engineers messes up → havoc on backend
3. Don't forget : frontend sends requests to backend !
SO...
Care about frontend
Test frontend
Check what requests frontend sends to backend
75. Tuning frontend
Minimize requests
Combine CSS/JavaScript files
Use CSS Sprites (horizontally if possible)
Put CSS at top
Put JavaScript at bottom
Max. no connections
Especially if JavaScript does Ajax (advertising-scripts, …) !
Avoid iFrames
Again : max no. of connections
Don't scale images in HTML
Have a favicon.ico (don't 404 it !)
→ see my blog
76. What else can kill your site ?
Redirect loops
Multiple requests
More load on Webserver
More PHP to process
Additional latency for visitor
Try to avoid redirects anyway
→ In ZF : use $this->_forward instead of $this->_redirect
Watch your logs, but equally important...
Watch the logging process →
Logging = disk I/O → can kill your server !
77. Above all else... be prepared !
Have a monitoring system
Use a cache abstraction layer (disk → Memcache)
Don't install for the worst → prepare for the worst
Have a test-setup
Have fallbacks
→ Turn off non-critical functionality
78. So...
Cache
But : never delete, always push !
Have a warmup script
Monitor your cache
Have an abstraction layer
Apache = fine, Nginx = better
Static pages ? Use Varnish
Tune your frontend → impact on backend !