Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
HipHop Virtual Machine
1.
2. Agenda
Introduction
What is HipHop VM ?
History and why it exists
Architecture and Features
General Architecture
Code cache
JIT
Garbage Collector
AdminServer
FastCGI
Extensions
HHVM-friendly PHP code
Parity
3. What is HipHop VM ?
High-Level Stack-Based virtual machine that executes
PHP code
Created by Facebook in a (successful) attempt to reduce
load on their servers
New versions are released every 8 weeks on Thursday. 10
days before a release, the branch is cut and heavily tested.
4. History of HHVM (I)
Summer 2007: Facebook started developing
HPHPc, an PHP to C++ translator.
It worked by:
Building an AST based on the PHP code
Based on that AST, equivalent C++ code was generated
The C++ code was compiled to binary using g++
The binary was uploaded to the webservers where it
was executed
This resulted in significant performance
improvements, up to 500% in some cases compared
to PHP 5.2
5. History of HHVM (II)
The succes of HPHPc was so great, that the engineers decided
to give it a developer-friendly brother: HPHPi
HPHPi was just like HPHPc but it ran in interpreted mode only
(a.k.a. much slower)
However, it provided a lot of utilities for developers:
Debugger (known as HPHPd)
Setting watches, breakpoints
Static code analysis
Performance profiling
It also didn’t require the compilation step to run the code
HPHPc ran over 90 % of FB production code by the end of 2009
HPHPc was open-sourced on February 2010
6. History of HHVM (III)
But good performance came at a cost:
Static compilation was very cumbersome
The binary had 1 GB which was a problem since production code had
to be pushed to the servers DAILY
Maintaining compatibility between HPHPc and HPHPi was getting
more and more difficult (they used different formats for their ASTs)
So, at the beginning of 2010, FB started developing HHVM, which
was a better, longer-term solution
At first, HHVM replaced only HPHPi, while HPHPc remained in
production
But now, all FBs production servers are run by HHVM
FB claims a 3x to 10x speed boost and 0.5x – 5x memory reduction
compared to PHP + APC. This, of course, is on their own
code, most applications will have a more modest improvement
7. General Architecture (I)
General architecture is made up of:
2 webservers
A translator
A JIT compiler
A Garbage Collector
HHVM doesn’t support any OS:
It supports most flavours of Linux
It has some support for Mac OS X (only runs with JIT turned off )
There is no Windows support
The OS must have a 64-bit architecture in order for HHVM to
work
8. General Architecture (II)
The HHVM will follow the following steps to execute a PHP
script:
Based on PHP code, build an AST (implementation for this was
reused from HPHPc)
Based on the AST, build Hip Hop Bytecode (HHBC), similar to
Java’s or CLR’s bytecode
Cache the HHBC
At runtime, pass the HHBC through the JIT compliler (if
enabled) which will transform it to machine code
Execute the machine code or, if JIT is disabled, execute the
HHBC in interpreted mode (not as fast, but still faster than Zend
PHP)
9. Code Cache (I)
When request comes in, HHVM determines which file to
serve up, then checks if the file’s HHBC is in SQLite-based
cache
If yes, it’s executed
If no, HHVM compiles it, optimizes it and stores it in cache
This is very similar to APC
There’s a warm-up period when new server is
created, because cache is empty
However, HHVM’s cache lives on disk, so it survives server
restarts and there will be no more warm-up periods for that
file
10. Code Cache (II)
But warm-up period can be bypassed by doing pre-analysis
Pre-analysis means the cache can be generated before
HHVM starts-up
Pre-analyser will actually work a little harder and will do a
better job at optimizing code
11. Code Cache (III)
There is a mode called RepoAuthoritative mode
HHVM will check at each request if the PHP file changed in
order to know if cache must be updated
RepoAuthoritative mode means this check is not
performed anymore.
But be careful because, if the file is not in cache, you’ll get a
HTTP 404 error, even though the PHP file is right there
RepoAuthoritative is recommended for production because
it avoides a lot of disk IO and files change rarely anyway
12. JIT Compiler
Just-in-Time compilation is done during execution, not
before
It translates an intermediate form of code (in this case
HHBC) to machine code
A JIT compiler will constantly check to see which paths of
code are executed more frequently and try to optimize
those as best as possible
Since a JIT compiler will compile to machine code at
runtime, the resulting machine code will be optimized for
that platform or CPU, which will sometimes make it faster
than even static compilation
13. JIT Compiler (II)
HHVM uses so called tracelets as basic unit block of JIT
A tracelet is usually a loop because most programs spend
most of their time in some “hot loops” and subsequent
iterations of those loops take similar paths
A tracelet has 3 parts:
Type guard(s): prevents execution for incompatible types
Body
Link to subsequent tracelet(s)
Each tracelet has great freedom, but it is required to restore
the VM to a consistent state any time execution escapes
Tracelets have only ONE execution path, which means no
control flow, which they’re easy to optimize
14. Garbage Collector
Most modern languages have automatic memory
management
In the case of VMs, this is called Garbage Collector
There are 2 major types of GCs:
Refcounting: for each object, there is a count that constantly
keeps track of how many references point to it
Tracing: periodically, during execution, the GC scans each
object and determines if it’s reachable. If not, it deletes it
Tracing is easier to implement and more efficient, but PHP
requires refcounting, so HHVM uses refcounting
FB engineers want to move to a tracing approach and they
might get it done someday
15. AdminServer
HHVM will actually start 2 webservers:
Regular one on port 80
AdminServer on the port you specify
It can be accessed at an URI like
http://localhost:9191/check-health?auth=mypasshaha
The AdminServer can turn JIT on/off, show statistics about
traffic, queries, memcache, CPU load, number of active
threads and many more
16. FastCGI
HHVM supports FastCGI starting with version 2.3.0
(released in December 2013)
FastCGI is a communication protocol used by webservers to
communicate with other applications
The support for FastCGI means we don’t have to use
HHVM’s poor webserver, but instead use something like
Apache or nginx and let HHVM do what it does best:
execute PHP code at lightning speed
Supporting FastCGI will make HHVM enter even more
production systems and increase its popularity
17. Extensions
HHVM supports extensions just like PHP does
They can be written in PHP, C++ or a combination of the 2
Extensions will be loaded at each request, you don’t have to
keep loading an extension all over your applications
To use custom extensions, you add it to the extensions and
then recompile HHVM. The resulting binary will contain
your extension and you can then use it
By default, HHVM already contains the most popular
extensions, like
MySQL, PDO, DOM, cURL, PHAR, SimpleXML, JSON, me
mcache and many others
Though, it doesn’t include MySQLi at this time
18. HHVM-friendly Code (I)
Write code that HHVM can understand without
running, code that contains as much static detail as
possible
Avoid things like:
Dynamic function call: $function_name()
Dynamic variable name: $a = $$x + 1;
Functions like compact(), get_defined_vars(), extract() etc
Don't access dynamic properties of an object. If you want to
access it, declare it. Accessing dynamic properties must use
hashtable lookups, which are much slower.
Where possible, provide:
Type hinting in function parameters
Return type of functions should be as obvious as possible:
19. HHVM-friendly Code (II)
Code that runs in global scope is never JIT-ed.
Any code anywhere can mutate the variables in the global scope.
So, since PHP is weak-typed, it makes it impossible for the JIT
compiler to predict a variable’s type
Example:
class B {
public function __toString() {
$GLOBALS['a'] = 'Hello, world !';
}
}
$a = 5;
$b = new B;
echo $b;
20. Parity (I)
All this is great, but can HHVM actually run real-world code ? Well, in
December 2013, it looked like this (taken from HHVM blog):
21. Parity (II)
HHVM’s engineers main goal is to be able to run all PHP
frameworks by Q4 2014 or Q1 2015.