1. (Do not be afraid of)
PHP Compiler Internals
Sebastian Bergmann
August 23rd 2009
2. Sebastian Bergmann
Co-Founder and
Principal Consultant
with thePHP.cc
Creator of PHPUnit
Involved in the PHP
project since 2000
3. Under PHP's Hood
Extensions
(date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)
PHP Core Zend Engine
Request Management Compilation and Execution
File and Network Operations Memory and Resource Allocation
Server API (SAPI)
(mod_php, FastCGI, CLI, ...)
This slide contains material by Sara Golemon
4. How PHP executes code
Lexical Analysis
Scan the source for sequences of characters
and convert them to a sequence of tokens
5. How PHP executes code
Lexical Analysis
Syntax Analysis
Parse a sequence of tokens to determine
their grammatical structure
6. How PHP executes code
Lexical Analysis
Syntax Analysis
Bytecode Generation
Generate bytecode based on the information
gathered by analyzing the source
16. Lexical Analysis
Scanner Generators
You do not want to write a scanner by
hand
At least when the code for the scanner should
be efficient and maintainable
Tools such as flex or re2c generate the
code for a scanner from a set of rules
<ST_IN_SCRIPTING>"if" {
"if" {
return T_IF;
}
20. Syntax Analysis
Parse a sequence of tokens
You do not want to write a parser by hand
At least when the code for the scanner should
be efficient and maintainable
Tools such as bison or lemon generate
the code for a parser from a set of rules
T_IF '(' expr ')' { ... }
statement { ... }
elseif_list else_single { ... }
21. PHP Bytecode
Using bytekit-cli to disassemble bytecode
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit if.php
bytekit-cli 1.0.0 by Sebastian Bergmann.
Filename: /home/sb/if.php
Function: main
Number of oplines: 8
line # opcode result operands
-----------------------------------------------------------------------------
2 0 EXT_STMT
1 JMPZ true, ->6
3 2 EXT_STMT
3 PRINT ~0 '*'
4 FREE ~0
4 5 JMP ->6
6 6 EXT_STMT
7 RETURN 1
22. PHP Bytecode
Using bytekit-cli to visualize bytecode
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php
23. How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
typedef struct _znode {
int op_type;
union {
zval constant;
zend_uint var;
zend_uint opline_num;
zend_op_array *op_array;
zend_op *jmp_addr;
struct {
zend_uint var;
zend_uint type;
} EA;
} u;
} } znode;
zend_do_if_cond() is called when an if statement is compiled
24. How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);
struct _zend_op {
opcode_handler_t handler;
znode result;
znode op1;
znode op2;
ulong extended_value;
uint lineno;
zend_uchar opcode;
} };
Allocate a new opline in the current oparray
25. How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);
opline->opcode = ZEND_JMPZ;
}
Set the opcode of the new opline to JMPZ (jump if zero)
26. How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);
opline->opcode = ZEND_JMPZ;
opline->op1 = *cond;
}
Set the first operand of the new opline to the if condition
27. How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);
opline->opcode = ZEND_JMPZ;
opline->op1 = *cond;
closing_bracket_token->u.opline_num =
if_cond_op_number;
SET_UNUSED(opline->op2);
INC_BPC(CG(active_op_array));
}
Perform book keeping tasks such as marking the second operand of the
new opline as unused or incrementing the backpatching counter for the
current oparray
31. Extending the PHP Compiler
Add token for unless to the scanner
Add rule for unless to the parser
Implement bytecode generation for
unless in the compiler
Add token for unless to ext/tokenizer
35. Add unless to the compiler
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int unless_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);
opline->opcode = ZEND_JMPNZ;
opline->op1 = *cond;
closing_bracket_token->u.opline_num =
unless_cond_op_number;
SET_UNUSED(opline->op2);
INC_BPC(CG(active_op_array));
}
All we have to do to generate code for the unless statement,
as compared to generate code for the if statement, is to emit
JMPNZ (jump if not zero) instead of JMPZ (jump if zero)
36. Add unless to the compiler
The generated bytecode
1 <?php
2 unless (FALSE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit unless.php
bytekit-cli 1.0.0 by Sebastian Bergmann.
Filename: /home/sb/unless.php
Function: main
Number of oplines: 8
line # opcode result operands
-----------------------------------------------------------------------------
2 0 EXT_STMT
1 JMPNZ true, ->6
3 2 EXT_STMT
3 PRINT ~0 '*'
4 FREE ~0
4 5 JMP ->6
6 6 EXT_STMT
7 RETURN 1
37. Running the test
sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt
Build complete.
Don't forget to run 'make test'.
=====================================================================
PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php
PHP_SAPI : cli
PHP_VERSION : 5.3.1-dev
ZEND_VERSION: 2.3.0
PHP_OS : Linux 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 i686 GNU/Linux
INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini
More .INIs :
CWD : /usr/local/src/php/php-5.3-unless
Extra dirs :
VALGRIND : Not used
=====================================================================
Running selected tests.
PASS unless statement [Zend/tests/unless.phpt]
=====================================================================
Number of tests : 1 1
Tests skipped : 0 ( 0.0%) --------
Tests warned : 0 ( 0.0%) ( 0.0%)
Tests failed : 0 ( 0.0%) ( 0.0%)
Expected fail : 0 ( 0.0%) ( 0.0%)
Tests passed : 1 (100.0%) (100.0%)
---------------------------------------------------------------------
Time taken : 0 seconds
=====================================================================
39. The End
Thank you for your interest!
These slides will be posted on
http://slideshare.net/sebastian_bergmann
40. Acknowledgements
Thomas Lee, whose Python Language
Internals presentation at OSDC 2008
inspired this presentation
Stefan Esser for creating the Bytekit
extension that provides PHP bytecode
access and analysis features
Derick Rethans, David Soria Parra, and
Scott MacVicar for reviewing these slides
41. References
http://www.php.net/manual/en/tokens.php
http://www.zapt.info/opcodes.html
”Extending and Embedding PHP”,
Sara Golemon
http://bytekit.org/
http://github.com/sebastianbergmann/bytekit-cli/
42. License
This presentation material is published under the Attribution-Share Alike 3.0 Unported
license.
You are free:
✔ to Share – to copy, distribute and transmit the work.
✔ to Remix – to adapt the work.
Under the following conditions:
● Attribution. You must attribute the work in the manner specified by the author or
licensor (but not in any way that suggests that they endorse you or your use of the
work).
● Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under the same, similar or a compatible license.
For any reuse or distribution, you must make clear to others the license terms of this
work.
Any of the above conditions can be waived if you get permission from the copyright
holder.
Nothing in this license impairs or restricts the author's moral rights.