7. Twitter on Scala
“Because Ruby’s garbage collector is not quite as good
as Java’s,each process uses up a lot of memory.We can’t
really run very many Ruby daemon processes on a single
machine without consuming large amounts of memory.
Whereas with running things on the JVM we can run
many threads in the same heap,and let that one process
take all the machine’s memory for its playground.”
—Robey Pointer
8. IRON.IO
“After we rolled out our Go version,we reduced our
server count [from 30] to two and we really only had
two for redundancy.They were barely utilized,it was
almost as if nothing was running on them.Our CPU
utilization was less than 5% and the entire process
started up with only a few hundred KB's of memory (on
startup) vs our Rails apps which were ~50MB (on
startup).”
http://blog.iron.io/2013/03/how-we-went-from-30-servers-to-2-go.html
11. RVALUE
841 #define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1)) # 23 bytes!
842 struct RString {!
843 struct RBasic basic;!
844 union {!
845 struct {!
846 long len;!
847 char *ptr;!
848 union {!
849 long capa;!
850 VALUE shared;!
851 } aux;!
852 } heap;!
853 char ary[RSTRING_EMBED_LEN_MAX + 1];!
854 } as;!
855 };
RBasic,RObject,RClass,RFloat,RArray,RRegexp,RHash,RData,RtypeData,
RStruct, RBignum,RFile,RNode,RMatch,RRational,RComplex.
Since 1.9 small strings are embedded into RString and not allocated externally
http://rxr.whitequark.org/mri/source/include/ruby/ruby.h?v=2.0.0-p353#841
12. Mark & Sweep
Mark & Sweep as a first GC was developed for
the original version of Lisp in 1960.
John McCarthy
September 4,1927 –October 24,2011
13. Pro et contra
(+) is able to reclaim cyclic data structures: Mark & Sweep traces
out the set of objects accessible from the roots.
(-) stop the world: When Mark & Sweep is called the execution of
the program is suspended temporary.
22. Why do we need bitmap marking
fork() uses Copy On Write (COW) optimization
marking all objects (including AST nodes of your program) breaks
COW
pre-/forking model is widely used in RoR: Passenger,Unicorn,
resque.
23. Bitmap requirements
We need to locate a flag in the bitmap for an object on the heap
(and vice versa) in constant time.
This can be accomplished by converting one address to another
with just bit operations if we will be able to allocate aligned
memory.
Ruby doesn’t have its own memory management and relies on OS
malloc,ruby runs on multiple different platforms.
26. Tuning variables
• RUBY_GC_HEAP_INIT_SLOTS (10000) initial number of slots on the heap.
If your app boots with 500k long-lived objects then increase it to 500k,there is no
reason to run gc at boot.
• RUBY_GC_HEAP_FREE_SLOTS (4096) minimum free slots reserved for sweep re-use.
Let’s assume that every request allocates N objects, then setting it to N*8 will give you
~8 requests in between each mark phase.
• RUBY_GC_HEAP_GROWTH_FACTOR (1.8) factor to growth the heap.
If you increased RUBY_GC_HEAP_INIT_SLOTS and RUBY_GC_HEAP_FREE_SLOTS then
your heap is already big,so you may decrease this one.
• RUBY_GC_HEAP_GROWTH_MAX_SLOTS (no limit) maximum new slots to add.
27. New heap layout
• Heap consists of pages (instead of slots),and page consists of
slots (RVALUEs)
• Page size is HEAP_OBJ_LIMIT=16KB,so one page can hold
about ~408 object slots.
• Heap pages divided into two sub heaps (lists of pages):
eden_heap,tomb_heap.
28. Eden & Tomb
eden_heap contains pages with one or more live objects.
tomb_heap contains pages with no objects or with zombies or ghosts.
(zombies are deferred T_DATA,T_FILE,objects with finalizers,…)
When creating a new object (newobj_of) and there is no free pages in
eden_heap resurrecting a page from tomb.
Filling all slots in eden pages first reduces memory fragmentation.
29. Lazy memory management
• When need to resurrect or allocate new pages to the eden
pages allocated/resurrected only one at once.
• Laze sweeping: sweep pages until first freed page.
30. Malloc limits
• GC runs when exceeded threshold of memory allocation limit
• @ko1 claims that RUBY_GC_MALLOC_LIMIT was 8MB because “Matz used
10MB machine at 20 years old”.
• Now default malloc limit is 16 MB
• Adaptive malloc limits with RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR
(1.4) and RUBY_GC_MALLOC_LIMIT_MAX (32MB)
• Similarly,the memory growth associated with oldgen is tracked
separately.
31. Adaptive malloc limits
If malloc increase exceeds malloc_limit then increase malloc_limit by a growth
factor:
2880 if (inc > malloc_limit) {!
2881 malloc_limit = (size_t)(inc * gc_params.malloc_limit_growth_factor);!
2882 if (gc_params.malloc_limit_max > 0 && /* ignore max-check if 0 */!
2883 malloc_limit > gc_params.malloc_limit_max) {!
2884 malloc_limit = inc;!
2885 }!
2886 }
http://rxr.whitequark.org/mri/source/gc.c?v=2.1.0-p0#2880
If malloc increase doesn’t exceed malloc_limit,then decrease malloc_limit:
2887 else {!
2888 malloc_limit = (size_t)(malloc_limit * 0.98); /* magic number */!
2889 if (malloc_limit < gc_params.malloc_limit_min) {!
2890 malloc_limit = gc_params.malloc_limit_min;!
2891 }!
2892 }
32. Python GC
• Reference counter GC
• Ref-counter can’t solve cyclic data structures
• Generations can solve cyclic data structures
• Python implements Generation-Zero,Generation-One,
Generation-Two
33. Weak generational hypothesis
1. The most young objects die young.
2. Older objects are likely to remain alive (active) for a long time
(e.g.in ruby T_CLASS,T_MODULE objects).
34. RGenGC
• Two generations: young and old objects.
• Two GCs:
• Minor: GC on young space; Mark & Sweep
• Major: GC on all (both young and old) spaces; Mark & Sweep
35. Minor GC
Mark phase:
• Mark and promote to old generation.
• Stop traversing after old object.
!
!
Sweep phase:
• Sweep not marked and not old
objects.
• Some unreachable objects will not
be collected.
36. Major GC
Mark phase:
• Mark reachable objects from roots.
• Promote new marked to old-gen.
Sweep phase:
• Sweep all unmarked objects.
37. Marking leak
New object attached to an old object is
unreachable for Minor GC,thus it can’t be
marked and promoted.
Not marked and not old objects are
sweeping,thus new object attached to an
old object will be swept by Minor GC.
38. Marking leak fix
1. Add Write Barrier (WB) to detect creation of a “old object”to
“new object”reference.
2. Add old objects with references to new objects to Remember
set (Rset).
3. At mark phase treat objects from Remember set as root
objects.
39. Shady objects
• Write barrier in ruby is complicated.
• Write barrier must work with 3rd-party C-extension.
• Create objects of two types: Normal objects and Shady objects.
• Never promote shady objects to old-gen and mark shady
objects every Minor GC.
40. Minor GC triggers…
• Malloc limit exceeded.
• Adding new obj,lazy sweep is completed,no free page in eden
and we reached limit for attaching/allocating new pages.
41. Major GC triggers…
• Malloc limit exceeded.
• [old] + [remembered] > [all objects count] / 2
• [old] > [old count at last major gc] * 2
• [shady] > [shady count at last major gc] * 2
43. Symbol GC
Collects dynamic symbols such as #to_sym,#intern,…
obj = Object.new!
100_000.times do |i|!
obj.respond_to?("sym#{i}".to_sym)!
end!
GC.start!
puts"symbol : #{Symbol.all_symbols.size}"
% time ruby -v symbol_gc.rb
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
symbol : 102399
ruby -v symbol_gc.rb 0,29s user 0,05s system 89% cpu 0,386 total
% time ruby -v symbol_gc.rb
ruby 2.2.0dev (2014-05-15 trunk 45954) [x86_64-darwin13]
symbol : 2406
ruby -v symbol_gc.rb 0,34s user 0,05s system 90% cpu 0,426 total
44. 3-gen RGenGC
Promote infant object to old after it survives two GC cycles:
infant → young → old
!
3-gen RGenGC might be more suitable for web-apps because if 2-
gen GC runs during a request then most of objects created in the
request will be marked as old,but they will be ‘unreachable’
immediately after request finishes.