This document discusses techniques for optimizing JavaScript execution through just-in-time compilation (JIT). It describes how a basic JIT works by generating optimized machine code for common cases like integer math. It also covers challenges like handling different object shapes through inline caching of property accesses and generating polymorphic code stubs. The goal of these techniques is to make JavaScript execution faster while avoiding noticeable compilation pauses.
17. Displaying a Webpage Layout engine applies CSS to DOM Computes geometry of elements JavaScript engine executes code This can change DOM, triggers “reflow” Finished layout is sent to graphics layer
22. Think LISP or Self, not Java Untyped - no type declarations Multi-Paradigm – objects, closures, first-class functions Highly dynamic - objects are dictionaries
23.
24.
25.
26.
27. Properties may be deleted or added at any time!var point = { x : 5, y : 10 };deletepoint.x;point.z=12;
28. Prototypes Every object can have a prototype object If a property is not found on an object, its prototype is searched instead … And the prototype’s prototype, etc..
56. Garbage Collection Need to reclaim memory without pausing user workflow, animations, etc Need very fast object allocation Consider lots of small objects like points, vectors Reading and writing to the heap must be fast
57. Mark and Sweep All live objects are found via a recursive traversal of the root value set All dead objects are added to a free list Very slow to traverse entire heap Building free lists can bring “dead” memory into cache
61. After Minor GC Active Inactive Newborn Space 8MB 8MB Tenured Space
62. Barriers Incremental marking, generational GC need read or write barriers Read barriers are much slower Azul Systems has hardware read barrier Write barrier seems to be well-predicted?
63. Unknowns How much do cache effects matter? Memory GC has to touch Locality of objects What do we need to consider to make GC fast?
69. Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;
70. Interpreter Very slow! Lots of opcode and type dispatch Lots of interpreter stack traffic Just-in-time compilation solves both
71. Just In Time Compilation must be very fast Can’t introduce noticeable pauses People care about memory use Can’t JIT everything Can’t create bloated code May have to discard code at any time
72. Basic JIT JägerMonkey in Firefox 4 Every opcode has a hand-coded template of assembly (registers left blank) Method at a time: Single pass through bytecode stream! Compiler uses assembly templates corresponding to each opcode
73. Bytecode GETARG 0 ; fetch x GETARG 1 ; fetch y ADD RETURN functionAdd(x, y) {return x + y;}
74. Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;
86. Inline Out-of-line if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;rejoin: slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; goto rejoin;
87. Rejoining Inline Out-of-line if (arg0.type != INT32) gotoslow_add; if (arg1.type != INT32)gotoslow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED)gotoslow_add; R3 = TYPE_INT32;rejoin: slow_add: Value result = runtime::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
88. Final Code Inline Out-of-line if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32)goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)goto slow_add; R3 = TYPE_INT32;rejoin: return Value(R3, R2); slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
89. Observations Even though we have fast paths, no type information can flow in between opcodes Cannot assume result of ADD is integer Means more branches Types increase register pressure
93. We want it to work for >1 objectsfunctionf(x) {returnx.y;}
94. Object Access Observation Usually, objects flowing through an access site look the same If we knew an object’s layout, we could bypass the dictionary lookup
99. Familiar Problems We don’t know an object’s shape during JIT compilation ... Or even that a property access receives an object! Solution: leave code “blank,” lazily generate it later
138. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1
139. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1
140. Type Oracle Need a mechanism to inform compiler about likely types Type Oracle: given program counter, returns types for inputs and outputs May use any data source – runtime profiling, type inference, etc
141. Type Oracle For this example, assume the type oracle returns “integer” for the output and both inputs to all ADD operations
153. Register Allocation Linear Scan Register Allocation Based on Christian Wimmer’s work Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. Master's thesis, Institute for System Software, Johannes Kepler University Linz, 2004 Linear Scan Register Allocation on SSA Form. In Proceedings of the International Symposium on Code Generation and Optimization, pages 170–179. ACM Press, 2010. Same algorithm used in Hotspot JVM
I’m going to start off by showing my favorite demo, that I think really captures what we’re enabling by making the browser as fast possible.
So this is Doom, translated - actually compiled - from C++ to JavaScript, being played in Firefox. In addition to JavaScript it’s using the new 2D drawing element called Canvas. This demo is completely native to the browser. There’s no Flash, it’s all open web technologies. And it’s managing to get a pretty good framerate, 30fps, despite being mostly unoptimized. This is a really cool demo for me because, three years ago, this kind of demo on the web was just impossible without proprietary extensions. Some of the technology just wasn’t there, and the rest of it was anywhere from ten to a hundred times slower. So we’ve come a long way.… Unfortunately all I have of the demo is a recording, it was playable online for about a day but iD Software requested that we take it down.
To step back for a minute, I’m going to briefly overview how the browser and the major web technologies interact, and then jump into JavaScript which is the heart of this talk.
This is a much less gory example, a major company’s website in a popular web browser. The process of getting from the intel.com URL to this display is actually pretty complicated.
The browser then sends intel.com this short plaintext message which says, “give me the webpage corresponding to the top of your website’s hierarchy.”
How does the browser get from all of this to the a fully rendered website?
Now that we’re familiar with the language’s main features, let’s talk about implementing a JavaScript engine. A JavaScript implementation has four main pieces
One of the challenges of a JavaScript engine is dealing with the lack of type declarations. In order to dispatch the correct computation, the runtime must be able to query a variable or field’s current type.
We’re done with data representation and garbage collection, so it’s time for the really fun stuff: running JavaScript code, and making it run fast. One way to run code, but not make it run fast, is to use an interpreter.