JIT compilation in modern platforms – challenges and solutions

JIT in modern runtimes
Challenges and solutions
Alexey Ragozin
Deutsche Bank

Presentation outline
Why dynamic languages are slow
 Virtual calls
 Untyped / Weak typed data
Two approaches to JIT
 Method based JIT
 Tracing JIT
JIT in HotSpot JVM
 Interpreter overview
 JIT dirty tricks

Good old C++
010110010010
101010100110
101010100101
010101001010
101010101010
101010101010
101010101010
101010100010
00: methodA
02: methodC
03: methodD
CODEOBJECT
VTABLE
01: methodB
Plain inheritance

Good old C++
Multiple inheritance
010110010010
101010100110
101010100101
010101001010
101010101010
101010101010
101010101010
101010100010
111010100100
011110000010
101001010100
00: methodA
02: methodC
03: methodD
CODEOBJECT
VTABLE
01: methodB
00: methodX
02: methodZ
01: methodY
VTABLE

Old good C++
More fun with multiple inheritance
A
B C
D D
A
B C
D

Branch misprediction penalty
• Intel Nehalem – 17 cycles
• Intel Sandy/Ivy bridge – 15 cycles
• Intel Haskwell – 15 - 20 cycles
• AMD K8 / K10 – 13 cycles
• AMD Buldozer – 19 - 22 cycles
http://www.agner.org/optimize/microarchitecture.pdf
Cost of virtual call

Two memory access before actual jump
• Memory access is serialized
• CPU pipeline is blocked
Memory access timings
• L1 cache ～0.5 ns
• L2 cache ～7 ns
• RAM ～100 ns
Cost of virtual call

Fields are stored in hash table
Access to field
• Arithmetic operation
• Memory read
• Condition check
• Memory read
Cost of dynamic class metadata

Is interpreters that slow?
switch(byteCode) {
case STORE: ...
case LOAD: ...
case ASTORE: ...
case ALOADE: ...
...
}
?

Fast interrupter in HotSpot JVM
Byte code interpreter in HotSpot JVM
• Each byte code instruction has routine written
in assembly language
• Dispatch – jump to corresponding routing
• Each routine ends with jump back to dispatch
 No stack frame is produced per instruction
 Dispatch table and code are well cached
 CPU pipeline is kept busy

JIT compilation approaches
Classic
Method based compilation
+ runtime profiling
+ profiling driven optimization
Tracing JIT
Recording whole execution paths (trace)
+ fallbacktointerpretedifexecutiondeviatesfrompath
+ maintain a tree of compiled traces

JIT compilation approaches
Classic
Method based compilation
– JVM, V8, Firefox Ion Monkey
Tracing JIT
Recording whole execution paths (trace)
– Flash, Trace Monkey, PyPy, LuaJIT

Tracing JIT
Interpretation mode
• Record actions and branch condition (recording a trace)
Profiling
• Detect “hot” traces
Trace compilation
• Non branching code is generated
• Guards instead of branching
• Whole trace optimization
• Guard violation – fallback to interpreted

Tracing JIT
Strong
• Devirtualization and inlining
• Hash lookups are also “deconditioned”
• Efficient “hot loops” optimization
Weak
• Tracing SLOWS down interpretation
• Long “warm up” time

Dynamic types problem
V8 – shadow classes
• Shadow classes are strongly typed
TraceMonkey – shape inference/property cache
• Inline caching in compiled code
LuaJIT – hash table access optimized trace
HREFK: if (hash[17].key != key) goto exit
HLOAD: x = hash[17].value
-or-
HSTORE: hash[17].value = x

References
1. LuaJIT
http://article.gmane.org/gmane.comp.lang.lua.general/58908
2. IncrementalDynamicCodeGenerationwithTraceTrees
http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-06-16.pdf
3. V8 Design aspects
https://developers.google.com/v8/design
4. RPython
http://tratt.net/laurie/research/pubs/papers/bolz_tratt__the_impact_of_me
tatracing_on_vm_design_and_implementation.pdf

HotSpot JVM JIT
• Fast interpreter
• Two JIT compilers (C1 / C2)
• Runtime profiling
• “Deoptimizing” of code on flight
• On Stack Replacement (OSR)

Devirtualization
Call site profiling
• Monomorphic
– single destination majority of calls
• Bimorphic
– there are two most frequent destinations
• Polymorphic

Devirtualization
“Inline” method caching
if (list.getClass == ArrayList.class) {
/* NON VIRTUAL */ list.ArrayList#size()
}
else {
/* VIRTUAL */ list.size();
}

Incremental compilation
Collections.indexedBinarySearch()
MyPojo
…
int mid = (low + high) >>> 1;
Comparable<? super T> midVal = list.get(mid);
int cmp = midVal.compareTo(key);
…
Polymorphic
Polymorphic
List<String> keys = new ArrayList<String>();
List<String> vals = new ArrayList<String>();
public String get(String key) {
int n = Collections.binarySearch(keys, key);
return n < 0 ? null ? vals.get(n);
}

Increamental compilation
 MyPojo.get() is compiled by JIT
– Collections.binarySort() – got inlined
 Calls in Collections.binarySort() become
monomorphic
 JIT continue to profiling in runtime
 Calls get() and compareTo() will be
inlined once MyPojo.get() is recompiled

On Stack Replacement
JIT can recompile main and replace return
address in stack while execution in some
method inside of loop
public static void main() {
long s = System.nanotime();
for(int i = 0; i != N; ++i) {
/* a lot of code */
...
}
long avg = (System.nanotime() - s) / N;
}

Escape analysis
Heritage of old days – dreaded synchronize
 buf is not used outside of method
 all methods of buf are inlined
 synchronization code could be removed
public String toString() {
StringBuffer buf = new StringBuffer();
buf.append("X=").append(x);
buf.append(",Y=").append(y);
return buf.toString();
}

Scalar replacement
After inlining of distance() in length()
 JITwillreplacePointobjectsbyfewscalarvariables
public double length() {
return distance(
new Point(ax, ay),
new Point(bx, by));
}
public double distance(Point a, Point b) {
double w = a.x - b.x;
double h = a.y - b.y;
return Math.sqrt(w*w + h*h);
}

Garbage collection and JIT
JIT can inline final static fields
• Memory address is placed in compiled code
• GC threats compiled code much like data structure
 Compiled methods act as GC roots
 GCwillfixaddressinsideofcompiledcodeifobjectisrelocated
public class Singleton {
public static final
Singleton INSTANCE = new Singleton()
}

About code optimization
“Beautiful planesareflyingbetter”
– presumably a saying of aircraft engineers

THANK YOU
Alexey Ragizun (alexey.ragozin@gmail.com)
http://blog.ragozin.info
http://aragozin.timepad.ru

JIT compilation in modern platforms – challenges and solutions

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (15)

Similar a JIT compilation in modern platforms – challenges and solutions

Similar a JIT compilation in modern platforms – challenges and solutions (20)

Más de aragozin

Más de aragozin (20)

Último

Último (20)

JIT compilation in modern platforms – challenges and solutions