2. Hello everybody
Julien PAULI
SensioLabs Blackfire team
Programming with PHP since early 2000s
Now : Unix system programmer (C)
PHP Internals programmer/reviewer
PHP 5.5 & 5.6 Release Manager
@julienpauli
Tech blog at jpauli.github.io
jpauli@php.net
3. What we'll cover together
Profiling a simple SF2 app
Under PHP 5
Under PHP 7
Compare profiles using Blackfire graph comparison
Analyze numbers
Dive into PHP 7 performances
Structures optimizations
New variable model (zval)
New HashTable model
String management with zend_string
Other ideas...
4. Profiles
Done on my laptop (not on prod env)
LP64
Done on app_dev.php (debug mode)
Do not take numbers for real
But relative measures
Performed with Blackfire
On PHP-5.6 latest
On PHP-7.0.0RC8
5. Blackfire
General profiler
Not only PHP, but works best for PHP
Free version exists
Collects many metrics
memory, CPU, IO, Network trafic, SQL ...
Graphs useful info, trashes useless info
Immediately spot your perf problems
Nice graph comparison view
6. Blackfire collector
Collector is a PHP extension
~ 5000 C lines
Available for 5.3, 5.4, 5.5 and 5.6
In beta for PHP 7, but soon to be released
In beta for Windows platforms, but soon to be
released
Collector impact is NULL if not triggered
Collector works in prod environment
It is highly optimized for performances
It is finely optimized for each PHP version
11. Which PHP ?
PHP 7 is slower than PHP 5 ...
When no OPCode cache is used !
This is a 15% perf difference
(Remember numbers target this SF2-based small
app)
PHP 5
PHP 7
12. PHP 7 changes
PHP 7 now uses an AST based compiler
The PHP 7 compiler is SLOWER than PHP 5's
But much more well designed
Creating and compiling an AST is slow
The AST is hookable with PHP extensions
The AST is hookable in userland using nikic/php-ast
The compiler is more complex, it tries to optimize runtime
... But as you use an OPCode cache
This is not a problem to you
PHP 7 compiler generates better runtime OPCodes
Your runtime will be better compared to PHP 5
14. Comparing view, with OPCache
PHP 7 runs faster on this app by a factor of 23%
PHP 7 CPU usage is 22% less than PHP 5
PHP 7 memory footprint is 38% less than PHP 5
~ 3.85Mb less in our case
15. Comparing view, with OPCache
Some components benefit more than others of PHP
7 performance optimizations
17. Optimizing CPU time
Latency Numbers Every Programmer Should Know
http://lwn.net/Articles/250967/
http://www.eecs.berkeley.edu/~rcs/research/interactive_l
atency.html
2016 numbers (may vary with chip)
---------------------------------------------------
L1 cache reference 1 ns
Branch mispredict 3 ns
L2 cache reference 4 ns 4x L1 cache
L3 cache reference 12 ns 3X L2 cache, 12x L1 cache
Main memory reference 100 ns 25x L2 cache, 100x L1 cache
SSD random read 16,000 ns
HDD random read(seek) 200,000,000 ns
18. Optimizing CPU cache efficiency
If we can reduce payload size, the CPU will use its
caches more often
CPU caches prefetch data on a "line" basis
Improve data locality to improve cache efficiency
https://software.intel.com/en-us/articles/optimize-data-
structures-and-memory-access-patterns-to-improve-
data-locality
That means in C
Reduce number of pointer indirections
Stick data together (struct hacks, struct merges)
Use smaller data sizes
19. Optimizing CPU cache efficiency
If we can reduce payload size, the CPU will use its
caches more often
PHP 5.6 (debug)
456,483483 task-clock # 0,974 CPUs utilized
1 405 context-switches # 0,003 M/sec
7 CPU-migrations # 0,000 M/sec
8 633 page-faults # 0,019 M/sec
1 163 771 607 cycles # 2,549 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1 247 617 395 instructions # 1,07 insns per cycle
181 700 375 branches # 398,044 M/sec
5 257 940 branch-misses # 2,89% of all branches
9 085 235 cache-references # 20,787 M/sec
1 108 044 cache-misses # 12,196 % of all cache refs
0,468451813 seconds time elapsed
20. Optimizing CPU cache efficiency
If we can reduce payload size, the CPU will use its
caches more often
PHP 7.0.0RC8 (debug)
306,006739 task-clock # 0,916 CPUs utilized
1 446 context-switches # 0,005 M/sec
2 CPU-migrations # 0,000 M/sec
4 330 page-faults # 0,014 M/sec
787 684 146 cycles # 2,574 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
817 673 456 instructions # 1,04 insns per cycle
121 452 445 branches # 396,895 M/sec
3 356 650 branch-misses # 2,76% of all branches
5 741 559 cache-references # 18,464 M/sec
873 581 cache-misses # 15,215 % of all cache refs
0,334226815 seconds time elapsed
22. PHP 7 optimizations
Every variable in PHP is coded on a zval struct
This struct has been reorganized in PHP 7
Narrowed / shrinked
separated
23. PHP 5 variables
value
refcount is_ref
type
gc_info
dval
str_val* str_len
hashtable*
object*
lval
ast*
zval
zval_value
...
...
HashTable
32 bytes
$a
8 bytes
zval *
XX bytes
40 bytes + complex value size
2 indirections
25. PHP 5 vs PHP 7 variable design
zval container no longer stores GC infos
No more need to heap allocate a zval *
GC infos stored into each complex types
each complex type may now be shared
In PHP 5, we had to share the zval containing them
PHP 7 variables are more CPU cache efficient
26. Hashtables (arrays)
In PHP, HashTables are used to represent the PHP
array type
But HashTables are also used internally
Everywhere
HashTables optimization in PHP 7 are well felt as
they are heavilly used internally
27. HashTables in PHP 5
Each element needs
4 pointer indirections
72 bytes for a bucket + 32 bytes for a zval
zval
zval *
HashTable
$a
zval *
HashTable*
bucket *
zval
64 bytes
72 bytesbucket
28. HashTables in PHP 7
Each element needs
2 pointer indirections
32 bytes for a bucket
zval
bucket
HashTable
$a
zval
HashTable*
zval
56 bytes
32 bytes
bucket*
29. PHP 7 Hash
Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
nIndex
30. PHP 7 Hash
Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
buckets*
arData
-1-2-3
2 X XX
-4 1 2
nIndex
nIndex idx
idx
hash
key
zval
bucket
32. String management
In PHP 5, strings don't have their own structure
String management is hard
Leads to many strings duplication
Many more memory access
In PHP 7, strings share the zend_string structure
They are refcounted
hashes are precomputed, often at compile time
struct hack is used to compact memory
33. Strings in PHP
char * str
...
zval
gc_infos
int len
refcount is_ref zend_string *
...
zval
...
hash
gc_infos
char str[1]size_t len
...
zend_string
PHP 5 PHP 7