Beyond php - it's not (just) about the code

Beyond PHP :
It's not (just) about the code

Wim Godden
Cu.be Solutions
@wimgtr

Who am I ?
Wim Godden (@wimgtr)
Founder of Cu.be Solutions (http://cu.be)
Open Source developer since 1997
Developer of OpenX, PHPCompatibility, Nginx SCL, ...
Speaker at PHP and Open Source conferences

Cu.be Solutions ?
Open source consultancy
PHP-centered
High-speed redundant network (BGP, OSPF, VRRP)
High scalability development
Nginx + extensions
MySQL Cluster

Projects :

mostly IT & Telecom companies
lots of public-facing apps/sites

Who are you ?
Developers ?
Anyone setup a MySQL master-slave ?
Anyone setup a site/app on separate web and database server ?
→ How much traffic between them ?

The topic
Things we take for granted
Famous last words : "It should work just fine"
Works fine today
→ might fail tomorrow
Most common mistakes
PHP code ↔ PHP ecosystem
How-to & How-NOT-to

It starts with...
… code !

First up : database

Database queries – complexity
SELECT DISTINCT n.nid, n.uid, n.title, n.type, e.event_start, e.event_start AS
event_start_orig, e.event_end, e.event_end AS event_end_orig, e.timezone,
e.has_time, e.has_end_date, tz.offset AS offset, tz.offset_dst AS offset_dst,
tz.dst_region, tz.is_dst, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst,
tz.offset) HOUR_SECOND AS event_start_utc, e.event_end - INTERVAL
IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_end_utc,
e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND +
INTERVAL 0 SECOND AS event_start_user, e.event_end - INTERVAL IF(tz.is_dst,
tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS
event_end_user, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset)
HOUR_SECOND + INTERVAL 0 SECOND AS event_start_site, e.event_end INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0
SECOND AS event_end_site, tz.name as timezone_name FROM node n INNER
JOIN event e ON n.nid = e.nid INNER JOIN event_timezones tz ON tz.timezone =
e.timezone INNER JOIN node_access na ON na.nid = n.nid LEFT JOIN
domain_access da ON n.nid = da.nid LEFT JOIN node i18n ON n.tnid > 0 AND
n.tnid = i18n.tnid AND i18n.language = 'en' WHERE (na.grant_view >= 1 AND
((na.gid = 0 AND na.realm = 'all'))) AND ((da.realm = "domain_id" AND da.gid = 4)
OR (da.realm = "domain_site" AND da.gid = 0)) AND (n.language ='en' OR
n.language ='' OR n.language IS NULL OR n.language = 'is' AND i18n.nid IS NULL)
AND ( n.status = 1 AND ((e.event_start >= '2010-01-31 00:00:00' AND
e.event_start <= '2010-03-01 23:59:59') OR (e.event_end >= '2010-01-31 00:00:00'
AND e.event_end <= '2010-03-01 23:59:59') OR (e.event_start <= '2010-01-31
00:00:00' AND e.event_end >= '2010-03-01 23:59:59')) ) GROUP BY n.nid HAVING
(event_start >= '2010-02-01 00:00:00' AND event_start <= '2010-02-28 23:59:59')
OR (event_end >= '2010-02-01 00:00:00' AND event_end <= '2010-02-28 23:59:59')
OR (event_start <= '2010-02-01 00:00:00' AND event_end >= '2010-02-28
23:59:59') ORDER BY event_start ASC;

Database - indexing
'select id from stock where status = 2 order by qty'
→ aggregate index on (status, qty)

But if we use memory table :
'select id from stock where status > 2 order by qty'
→ aggregate index on (status, qty) ?
→ No : range selection stops use of aggregate index
→ separate index on status and qty (since recent versions)

Database - indexing
Indexes make database faster
→ Let's index everything !
→ DON'T :
Insert/update/delete → Index modification
Each query → evaluation of all indexes

"Relational schema design is based on data
but index design is based on queries"
(Bill Karwin, Percona)

Databases – detecting problematic queries
Slow query log
→ SET GLOBAL slow_query_log = ON;

Queries not using indexes
→ In my.cnf/my.ini : 'log_queries_not_using_indexes'

General query log

→ SET GLOBAL general_log = ON;
→ Turn it off quickly !

Percona Toolkit (Maatkit)
pt-query-digest

Databases - pt-query-digest

#
#
#
#
#
#
#
#
#
#

Profile
Rank Query ID
Response time
Calls R/Call Apdx V/M
Item
==== ================== ================ ===== ======= ==== ===== ==========
1 0x543FB322AE4330FF 16526.2542 62.0% 1208 13.6806 1.00 0.00 SELECT output_option
2 0xE78FEA32E3AA3221
0.8312 10.3% 6412 0.0001 1.00 0.00 SELECT poller_output poller_item
3 0x211901BF2E1C351E
0.6811 8.4% 6416 0.0001 1.00 0.00 SELECT poller_time
4 0xA766EE8F7AB39063
0.2805 3.5%
149 0.0019 1.00 0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
5 0xA3EEB63EFBA42E9B
0.1999 2.5%
51 0.0039 1.00 0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary
6 0x94350EA2AB8AAC34
0.1956 2.4%
89 0.0022 1.00 0.01 UPDATE wp_options
MISC 0xMISC
0.8137 10.0% 3853 0.0002
NS
0.0 <147 ITEMS>

Databases - pt-query-digest
# Query 2: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |
_^
|
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute
pct
total
min
max
avg
95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count
1
312
# Exec time
50
4s
5ms
25ms
13ms
20ms
4ms
12ms
# Lock time
3
32ms
43us
163us
103us
131us
19us
98us
# Rows sent
59 62.41k
203
231 204.82 202.40
3.99 202.40
# Rows examine 13 73.63k
238
296 241.67 246.02
10.15 234.30
# Rows affecte
0
0
0
0
0
0
0
0
# Rows read
59 62.41k
203
231 204.82 202.40
3.99 202.40
# Bytes sent
53 24.85M 46.52k 84.36k 81.56k 83.83k
7.31k 79.83k
# Merge passes
0
0
0
0
0
0
0
0
# Tmp tables
0
0
0
0
0
0
0
0
# Tmp disk tbl
0
0
0
0
0
0
0
0
# Tmp tbl size
0
0
0
0
0
0
0
0
# Query size
0 21.63k
71
71
71
71
0
71
# InnoDB:
# IO r bytes
0
0
0
0
0
0
0
0
# IO r ops
0
0
0
0
0
0
0
0
# IO r wait
0
0
0
0
0
0
0
0
# pages distin 40 11.77k
34
44
38.62
38.53
1.87
38.53
# queue wait
0
0
0
0
0
0
0
0
# rec lock wai
0
0
0
0
0
0
0
0
# Boolean:
# Full scan
100% yes,
0% no
# String:
# Databases
wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
# Last errno
0
# Users
wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
# Query_time distribution
#
1us
# 10us
# 100us
#
1ms
# 10ms ################################################################
# 100ms
#
1s
# 10s+
# Tables
#
SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'G
#
SHOW CREATE TABLE `wp_blog_one `.`wp_options`G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'G

Databases – next step : explain
explain <query>
"How will MySQL execute the query"

Type of lookup
'system', 'const' and 'ref' = good
'ALL' = bad

Extra info
Using index = good
Using filesort = usually bad

For / foreach

$customers = CustomerQuery::create()
->filterByState('MN')
->find();
foreach ($customers as $customer) {
$contacts = ContactsQuery::create()
->filterByCustomerid($customer->getId())
->find();
foreach ($contacts as $contact) {
doSomestuffWith($contact);
}
}

Joins

$contacts = mysql_query("
select
contacts.*
from
customer
join contact
on contact.customerid = customer.id
where
state = 'MN'
");
while ($contact = mysql_fetch_array($contacts)) {
doSomeStuffWith($contact);
}

or the ORM equivalent

Better...
10001 → 1 query
Sadly : people still produce code with query loops
Usually :

Growth not anticipated
Internal app → Public app

The origins of this talk
Customers :
Projects we built
Projects we didn't build, but got pulled into
Fixes
Changes
Infrastructure migration

15 years of 'how to cause mayhem with a few lines of code'

Client X
Jobs search site
Monitor job views :
Daily hits

Weekly hits
Monthly hits
Which user saw which job

Client X
Originally : when user viewed job details
Now : when job is in search result
Search for 'php' → 50 jobs = 50 jobs to be updated
→ 50 updates for shown_today
→ 50 updates for shown_week
→ 50 updates for shown_month
→ 50 inserts for shown_user
= 200 queries for 1 search !

Client X : the code
foreach ($jobs as $job) {
$db->query("
insert into shown_today(
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
insert into shown_week(
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");

$db->query("
insert into shown_month(
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
insert into shown_user(
jobId,
userId,
when
) values (
" . $job['id'] . ",
" . $user['id'] . ",
now()
)
");
}

Client X : the numbers
600-1000 updates/sec (peaks up to 1600)
400-1000 updates/sec (peaks up to 2600)
16 core machine

Client X : panic !
Mail : "MySQL slave is more than 5 minutes behind master"
We set it up → who did they blame ?
Wait a second !

Client X : what's causing those peaks ?

Client X : possible cause ?
Code changes ?
→ According to developers : none

Action : turn on general log, analyze with pt-query-digest
→ 50+-fold increase in queries
→ Developers : 'Oops we did make a change'

After 3 days : 2,5 days behind
Every hour : 50 min extra lag

Client X : But why is the slave lagging ?

File :
master-bin-xxxx.log
um
g d ad
n lo e
Bi thr

Master

p

Slave I/O thread

File :
master-bin-xxxx.log
Sl
av
th e S
re Q
ad L

Slave

Client X : fix ?
$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");

$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
jobId,
userId,
when
) values (
" . $job['id'] . ",
" . $user['id'] . ",
now()
)
");
}

Client X : the code change

insert into shown_today values (5, 1), (8, 1), (12, 1), (18, 1), … on duplicate key … ;
insert into shown_week values (5, 1), (8, 1), (12, 1), (18, 1), … on duplicate key … ;
insert into shown_month values (5, 1), (8, 1), (12, 1), (18, 1), … on duplicate key … ;
insert into shown_user values (5, 23, "2013-11-12 12:01:00"), (8, 23, "2013-11-12
12:01:00"), … ;

Client X : the code change
$todayQuery = "
jobId,
number
) values ";
$todayQuery .= "(" . $job['id'] . ", 1),";
}
$todayQuery = substr($todayQuery, 0, strlen($todayQuery) - 1);
$todayQuery .= "
)
on duplicate key
update
number = number + 1
";
$db->query($todayQuery);

Careful : max_allowed_packet !

Client X : the chosen solution
$db->autocommit(false);
$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");

$db->query("
jobId,
number
) values(
" . $job['id'] . ",
1
)
on duplicate key
update
number = number + 1
");
$db->query("
jobId,
userId,
when
) values (
" . $job['id'] . ",
" . $user['id'] . ",
now()
)
");
}
$db->commit();

Client X : conclusion
For loops are bad (we already knew that)
Add master/slave and it gets much worse
Use transactions : it will provide huge performance increase
Result : slave caught up 5 days later

Database → Network
Customer Y
Top 10 site in Belgium
Growing rapidly
At peak traffic :

Unexplicable latency on database
Load on webservers : minimal
Load on database servers : acceptable

Client Y : the network

60GB

700GB

700GB

Client Y : network overload
Cause : Drupal hooks → retrieving data that was not needed
Only load data you actually need
Don't know at the start ? → Use lazy loading
Caching :
Same story
Memcached/Redis are fast
But : data still needs to cross the network

Network trouble : more than just traffic
Customer Z
150.000 visits/day

News ticker :
XML feed from other site (owned by same customer)
Cached for 15 min

Customer Z – fetching the feed

if (filectime(APP_DIR . '/tmp/cacheFile.xml') < time() - 900) {
unlink(APP_DIR . '/tmp/cacheFile.xml');
file_put_contents(
APP_DIR . '/tmp/cacheFile.xml',
file_get_contents('http://www.scrambledsitename.be/xml/feed.xml')
);
}
$xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/cacheFile.xml');

What's wrong with this code ?

Customer Z – no feed without the source

Feed source

Customer Z : timeout
default_socket_timeout : 60 sec by default
Each visitor : 60 sec wait time
People keep hitting refresh → more load
More active connections → more load
Apache hits maximum connections → entire site down

Customer Z : timeout fix

$context = stream_context_create(
array(
'http' => array(
'timeout' => 5
)
)
);
file_put_contents(
file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context)
);
}

Customer Z : don't delete from cache

array(
'http' => array(
'timeout' => 5
)
)
);
file_put_contents(
);
}


array(
'http' => array(
'timeout' => 5
)
)
);
file_put_contents(
);
}


array(
'http' => array(
'timeout' => 5
)
)
);
$feed = file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context);
if ($feed !== false) {
file_put_contents(
$feed
);
}
}


array(
'http' => array(
'timeout' => 5
)
)
);
$feed = file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context);
if ($feed !== false) {
file_put_contents(
ParseXmlFeed($feed)
);
}
}

Network resources
Use timeouts for all :
fopen
curl
SOAP
…

Data source trusted ?

→ setup a webservice
→ let them push updates when their feed changes
→ less load on data source
→ no timeout issues

Add logging → early detection

Logging
Logging = good
Logging in PHP using fopen
→ bad idea : locking issues
→ Use file_put_contents($filename, $data, FILE_APPEND)
For Firefox : FirePHP (add-on for Firebug)
Debug logging = bad on production
Watch your logs !
Don't log on slow disks → I/O bottlenecks

File system : I/O bottlenecks
Causes :
Excessive writes (database updates, logfiles, swapping, …)
Excessive reads (non-indexed database queries, swapping, small file
system cache, …)

How to detect ?
top

Cpu(s):

0.2%us,

iostat

avg-cpu:

%user
0.10

Device:
sda
sdb
dm-0
dm-1

3.0%sy,

0.0%ni, 61.4%id, 35.5%wa,

%nice %system %iowait
0.00
0.96
53.70
tps
120.40
2.10
4.20
0.00

Blk_read/s
0.00
0.00
0.00
0.00

%steal
0.00

Blk_wrtn/s
123289.60
4378.10
36.80
0.00

0.0%hi,

0.0%si,

0.0%st

%idle
45.24
Blk_read
0
0
0
0

Blk_wrtn
616448
18215
184
0

See iowait ? Stop worrying about php, fix the I/O problem !

File system
Worst of all : NFS
PHP files → lstat calls
Templates → same
Sessions
→ locking issues
→ corrupt data
→ store sessions in database, Memcached, Redis, ...

Much more than code

XML feed

User

Network
Webserver

DB
server

Look beyond PHP (or Perl, Ruby, Python, ...) !

Contact
Twitter
Web
Slides
E-mail

@wimgtr
http://techblog.wimgodden.be
http://www.slideshare.net/wimg
wim.godden@cu.be

Please...
Rate my talk : http://joind.in/9278

Step-by-step : most common issues
iowait on NFS server (lstat calls)
iowait on database server

I/O reads (use iostat) ? → missing/wrong indexes
I/O writes ?
→ no transactions ?
→ too many queries ?
→ too many indexes ?
→ bad DB engine settings

iowait on webserver (logs ? static files ?)
CPU on database server (missing/wrong indexes)
CPU on webserver (PHP)

Beyond php - it's not (just) about the code

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Beyond php - it's not (just) about the code

Similar a Beyond php - it's not (just) about the code (20)

Más de Wim Godden

Más de Wim Godden (20)

Último

Último (20)

Beyond php - it's not (just) about the code

Notas del editor