7. Two memory access before actual jump
• Memory access is serialized
• CPU pipeline is blocked
Memory access timings
• L1 cache ~0.5 ns
• L2 cache ~7 ns
• RAM ~100 ns
Cost of virtual call
8. Fields are stored in hash table
Access to field
• Arithmetic operation
• Memory read
• Condition check
• Memory read
Cost of dynamic class metadata
9. Is interpreters that slow?
switch(byteCode) {
case STORE: ...
case LOAD: ...
case ASTORE: ...
case ALOADE: ...
...
}
?
10. Fast interrupter in HotSpot JVM
Byte code interpreter in HotSpot JVM
• Each byte code instruction has routine written
in assembly language
• Dispatch – jump to corresponding routing
• Each routine ends with jump back to dispatch
No stack frame is produced per instruction
Dispatch table and code are well cached
CPU pipeline is kept busy
11. JIT compilation approaches
Classic
Method based compilation
+ runtime profiling
+ profiling driven optimization
Tracing JIT
Recording whole execution paths (trace)
+ fallbacktointerpretedifexecutiondeviatesfrompath
+ maintain a tree of compiled traces
13. Tracing JIT
Interpretation mode
• Record actions and branch condition (recording a trace)
Profiling
• Detect “hot” traces
Trace compilation
• Non branching code is generated
• Guards instead of branching
• Whole trace optimization
• Guard violation – fallback to interpreted
14. Tracing JIT
Strong
• Devirtualization and inlining
• Hash lookups are also “deconditioned”
• Efficient “hot loops” optimization
Weak
• Tracing SLOWS down interpretation
• Long “warm up” time
15. Dynamic types problem
V8 – shadow classes
• Shadow classes are strongly typed
TraceMonkey – shape inference/property cache
• Inline caching in compiled code
LuaJIT – hash table access optimized trace
HREFK: if (hash[17].key != key) goto exit
HLOAD: x = hash[17].value
-or-
HSTORE: hash[17].value = x
18. HotSpot JVM JIT
• Fast interpreter
• Two JIT compilers (C1 / C2)
• Runtime profiling
• “Deoptimizing” of code on flight
• On Stack Replacement (OSR)
19. Devirtualization
Call site profiling
• Monomorphic
– single destination majority of calls
• Bimorphic
– there are two most frequent destinations
• Polymorphic
21. Incremental compilation
Collections.indexedBinarySearch()
MyPojo
…
int mid = (low + high) >>> 1;
Comparable<? super T> midVal = list.get(mid);
int cmp = midVal.compareTo(key);
…
Polymorphic
Polymorphic
List<String> keys = new ArrayList<String>();
List<String> vals = new ArrayList<String>();
public String get(String key) {
int n = Collections.binarySearch(keys, key);
return n < 0 ? null ? vals.get(n);
}
22. Increamental compilation
MyPojo.get() is compiled by JIT
– Collections.binarySort() – got inlined
Calls in Collections.binarySort() become
monomorphic
JIT continue to profiling in runtime
Calls get() and compareTo() will be
inlined once MyPojo.get() is recompiled
23. On Stack Replacement
JIT can recompile main and replace return
address in stack while execution in some
method inside of loop
public static void main() {
long s = System.nanotime();
for(int i = 0; i != N; ++i) {
/* a lot of code */
...
}
long avg = (System.nanotime() - s) / N;
}
24. Escape analysis
Heritage of old days – dreaded synchronize
buf is not used outside of method
all methods of buf are inlined
synchronization code could be removed
public String toString() {
StringBuffer buf = new StringBuffer();
buf.append("X=").append(x);
buf.append(",Y=").append(y);
return buf.toString();
}
25. Scalar replacement
After inlining of distance() in length()
JITwillreplacePointobjectsbyfewscalarvariables
public double length() {
return distance(
new Point(ax, ay),
new Point(bx, by));
}
public double distance(Point a, Point b) {
double w = a.x - b.x;
double h = a.y - b.y;
return Math.sqrt(w*w + h*h);
}
26. Garbage collection and JIT
JIT can inline final static fields
• Memory address is placed in compiled code
• GC threats compiled code much like data structure
Compiled methods act as GC roots
GCwillfixaddressinsideofcompiledcodeifobjectisrelocated
public class Singleton {
public static final
Singleton INSTANCE = new Singleton()
}