4. How to manage memory?
Garbage – data structure (object) in memory
unreachable for the program.
How to find garbage?
Reference counting
Object graph traversal
Do not collect garbage at all
5. Reference counting
+ Simple
+ No Stop-the-World pauses required
– Cannot collect cyclic graphs
– 15-30% CPU overhead
– Pretty bad for multi core systems
6. Object graph traversal
• Roots
Static fields
Local variables (stack frames)
• Reachable objects - alive
• Unreachable objects - garbage
In general, graph should not be mutated during graph
traversal. As a consequence, application should be
frozen for period of while runtime is collecting
garbage.
7. Garbage collection
Algorithms
Copy collection
Traverse object graph and copy reachable object to other
space
Mark old space as free
Mark / Sweep
Traverse object graph and mark reachable objects
Scan (sweep) whole memory and “free” unmarked objects
Mark / Sweep / Compact
… mark … sweep ….
Relocate live objects to defragment free space
8. Garbage collection
Economics
S – whole heap size Total amount
L – size of live objects of garbage
Copy collection
S−L
Throughput ≈ c ⋅
L
Mark / Sweep
S−L S−L
Throughput ≈ c1 ⋅ + c2 ⋅
L S
For all algorithms based on reference reachability.
GC efficiency is in reverse proportion to amount of
live objects.
11. Garbage collection
Terms dictionary
Stop-the-world (STW) pause
– pause of all application threads require
Compacting algorithms
– can move objects in memory to defragment free space
Parallel collection
– using multiple cores to reduce STW time
Concurrent collection
– collection algorithms working in parallel with application
threads
12. Garbage collection
Throughput vs low latency
Throughput algorithms
– minimize total time of program execution
– economically efficient CPU utilization
Low pause algorithms
– minimize time of individual STW pause
– may use background (concurrent) collection
– may incremental collection
13. Oracle HotSpot JVM
Throughput algorithms
Parallel GC (-XX:+UseParallelOldGC)
Young: Copy collector Old: Parallel Mark Sweep
Compact
Low pause algorithms
Concurrent Mark Sweet (-XX:
+UseConcMarkSweepGC)
Young: Copy collector Old: Mark Sweep
– not compacting (prone for fragmentation)
– most work is in background
– young collections are STW
14. Oracle HotSpot JVM
Low pause algorithms
Garbage First – G1 (-XX:+UseG1GC)
Young: Copy collector Old: Incremental copy collector
– incremental – more STW but shorter
– collect regions with more garbage first
– compacting, but had problems with large objects
G1 – algorithm of future, hopefully not forever
– bad throughput
– pauses normally are twice longer than CMS
15. Garbage collection
Generational approach
Young space collection
High throughput
Low memory utilization
Promotion
Eden (nursery) -> Survivor (keep) space -> Old space
Old space collection
Better memory utilization
Orders of magnitude lower throughput
Memory barrier
JVM “tracks” references from old to young space
16. Oracle’s HotSpot JVM
Default (serial) collector
Young: Serial copy collector, Old: serial MSC
Parallel scavenge / Parallel old GC
Young: Parallel copy collector, Old: serial MSC or parallel
MSC
Concurrent mark sweep (CMS)
Young: Serial or parallel copy collector, Old: concurrent mark
sweep
G1 (garbage first)
http://blog.ragozin.info/2011/07/hotspot-jvm-garbage-collection-options.html
Young: Copy collector (region based) Old: Incremental MSC
17. Oracle’s HotSpot JVM
Young collector Old collector JVM option
Serial (DefNew) Serial Mark-Sweep-Compact -XX:+UseSerialGC
Parallel scavenge (PSYoungGen) Serial Mark-Sweep-Compact (PSOldGen) -XX:+UseParallelGC
Parallel scavenge (PSYoungGen) Parallel Mark-Sweep-Compact (ParOldGen) -XX:+UseParallelOldGC
Serial (DefNew) Concurrent Mark Sweep -XX:+UseConcMarkSweepGC
-XX:-UseParNewGC
Parallel (ParNew) Concurrent Mark Sweep -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
G1 -XX:+UseG1GC
http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html
18. Oracle’s Jrockit JVM
-Xgc: option Generational Mark Sweep/Compact
genconcon or gencon Yes concurrent incremental
singleconcon or singlecon No concurrent incremental
genconpar Yes concurrent parallel
singleconpar No concurrent parallel
genparpar or genpar Yes parallel parallel
singleparpar or singlepar No parallel parallel
genparcon Yes parallel incremental
singleparcon No parallel incremental
http://blog.ragozin.info/2011/07/jrockit-gc-in-action.html
19. Azul Zing JVM
• Generational GC
• Young – Concurrent mark sweep compact
MSC)
• Old – Concurrent mark sweep compact MSC)
Azul Zing can relocate objects in memory
without STW pause.
Secret – read barrier (барьер чтения).
Requires special linux kernel modules to run
21. Concurrent Mark Sweep
Initial mark - Stop-The-World
Collect root references (thread stacks) – mark them gray
Mark them as gray
Concurrent mark - concurrent
Do three color marking until grays exhaust
Mark all black objects on dirty regions as gray (by card table)
Repeat
Remark - Stop-The-World
Final remark
Sweep - concurrent
Scan heap and reclaim white objects
24. Direct memory buffers
java.nio.ByteBuffer.allocateDirect()
Pro
• Memory is allocated out of heap
• Memory is deallocated when ByteBuffer is collected
• Cross platform, native java
Con
• Fragmentation of non-heap memory
• Memory is deallocated when ByteBuffer is collected
• Complicated multi thread programming
• -XX:MaxDirectMemorySize=<value>
25. RTSJ
Scoped memory
• Objects can be allocated in chosen memory
areas
• Scoped and immortal areas are not garbage
collected
• Scoped areas can be release by whole area
• Cross references between areas are limited
and this limitation is enforced in run time
29. Memory spaces in HotSpot JVM
Memory geometry
• Young space: -XX:NewSize=<n> -XX:MaxNewSize=<n>
• Survival space: Young space / -XX:SurvivorRatio=<n>
• Young + old space: -Xms<n> -Xmx<n>
• Permanent space: -XX:PermSize=<n> -XX:MaxPerSize=<n>
* G1 has same set of spaces but they are not continuous address ranges but dynamic sets of regions
30. How young collection works?
Collect root references
Stack frame references
References from other spaces (tenured + permanent)
does it mean scanning old space?
Travers object graph
Visit only live object
Copy live object to other region of young space or old space
Consider whole Eden and old survivor space to be free memory
Write barrier is required to effectively collect
references from old to young space.
31. How young collector works
Collecting root references
Card marking barrier
Each 512 bytes of heap is associated
with flag (card).
Once reference is written in memory,
associated card is marked dirty.
32. How young collection works?
Coping live objects
Card table is reset just before copy
collector starts to move objects.
33. How young collection works?
Collection finished
Since every object in young space has
been relocated, clean card means that
there is no references to young space in
particular 512 bytes of heap.
34. Thread local allocation blocks
TLA in HotSpot JVM
• Each thread preallocates block in Eden
• Thread is allocating new objects in its TLAB
• Then TLAB is full, new TLAB allocated
• If object does not fit TLAB
• Allocate in Eden space
• If does not fit Eden (or ‑XX:PretenureSizeThreshold)
• Allocate in old space
35. Young collection stop-the-world
Total STW time
Collect roots
Scan thread stacks
Scan dirty cards
Read card table ~ Sheap
1
Scan pages marked as dirty ~ C−
S heap
Copy live objects
Process special references
* You can use -XX:+PrintGCTaskTimeStamps to analyze time of individual phases
* You can use -XX:+PrintReferenceGC to analyze reference processing times
37. HotSpot: Old space collection
Stop-the-World Mark-Sweep-Compact
Single threaded
Multithreaded
Concurrent Mark Sweep (CMS)
Background collection of old space
G1 (Garbage Fisrt)
Incremental Stop-the-Wolrd collection
38. HotSpot: Old space collection
Concurrent Mark Sweep
HotSpot’s CMS (Concurrent Mark Sweep)
• Does not compact
• Prone to fragmentation
• Use separate free lists for each object size
• Use statistic to manage fragmentation
• Introduces 2 short STW phases
39. HotSpot: Old space collection
Incremental collection
HotSpot’s G1
• Space is divided into regions
• Regions can be collected individually
• Write barrier tracks references between regions
• Subset of regions collected during STW pause
Live object are “evacuated” to other regions
• Young collections – all Eden regions collected
• Partial collection – few old regions collected
• Global marking is used to estimated live population
51. Concurrent Mark Sweep
Initial mark - Stop-The-World
Collect root references (thread stacks) – mark them gray
Mark them as gray
Concurrent mark - concurrent
Do three color marking until grays exhaust
Mark all black objects on dirty regions as gray (by card table)
Repeat
Remark - Stop-The-World
Final remark
Sweep - concurrent
Scan heap and reclaim white objects
53. Patching OpenJDK
Serial collector gain
http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html
54. Patching OpenJDK
CMS collector gain
http://aragozin.blogspot.com/2011/07/openjdk-patch-cutting-down-gc-pause.html
55. Concurrent Mark Sweep
Full GC
Concurrent mode failure
If background collection cannot free memory fast enough. CMS
will perform Stop-The-World single thread Mark-Sweep-
Compact.
Promotion failure
Due to fragmentation. Old space may not have continuous block
of memory to accommodate promoted object even if free
space is available.
CMS will perform Stop-The-World single thread Mark-Sweep-
Compact to defragment memory.
57. Common reasons for long STW
[Times: user=0.53 sys=0.06, real=0.15 secs]
• Full GC
• OS Swapping
• Too many survivors in young space
• Long reference processing
• JNI delays
• Long CMS initial mark / remark
58. CMS Check list
• jdk6u22 - jdk6u26 – broken free lists logic
• -XX:CMSWaitDuration=…
• -XX:+CMSScavengeBeforeRemark=…
• -XX:-CMSConcurrentMTEnabled
• Consider CMS for permanent space
• Size your heap -Xmn / -Xms / -Xmx
Expected data + young space + CMS overhead
CMS overhead ~30% of expected data
59. Tuning young collection
Eden size
too small – frequent YGC, objects promoted to old space early
too large – more long lived objects need to be copied
Survivor space size
too small – overflow, objects prematurely promoted
too large – memory wasted
Tenuring threshold
higher – objects are kept in young space for longer
higher – more objects in young space, more copy time
60. Tuning young collection
Eden size
-XX:MaxNewSize=<n> -XX:NewSize=<n>
Eden size = new size – 2 * survivor space size
Survivor space size
-XX:SurvivorRatio=<n>
Survivor space size = new size / survivor ratio
Tenuring threshold
-XX:MaxTenuringThreshold=<n>
61. Tuning young collection
Small heap sizes
Balance tenuring threshold / survivor space to keep objects in
limited young space for longer
Large heap sizes (4Gb and greater)
Limit tenuring threshold to avoid increase in copy time
Limit survivor space to avoid accidental long young collections
Increase Eden size instead of increasing tenuring threshold
62. Tuning young collection
GC tuning is based on application allocation
pattern
If application allocation patterns is changed –
you are in trouble
In practice application always have different
“modes of operation”
GC tuning – choosing better evil
64. Surviving with huge heap
• CMS is very good in terms of pauses
You can reliably keep pauses under 150ms – 50ms
on 30GiB – 50 GiB
• Fragmentation treat
Not big deal for server type of applications
XML processing is GC disaster
• Very narrow GC comfort zone
If you tune for “long run” you are likely to have
pauses during initial loads / bulk refreshes