Living with garbage

Senior Software Engineer,
LIVING WITH GARBAGE
Gregg Donovan

3.5Years Solr & Lucene at Etsy.com
3 years Solr & Lucene atTheLadders.com

8+ billion pageviews per month

Understanding GC
Monitoring GC
Debugging Memory Leaks
Design for Partial Availability

public class BuzzwordDetector {
static String[] prefixes = { "synergy", "win-win" };
static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };
public static void main(String[] args) {
args = myArgs;
int buzzwords = 0;
for (int i = 0; i < args.length; i++) {
String lc = args[i].toLowerCase();
for (int j = 0; j < prefixes.length; j++) {
if (lc.contains(prefixes[j])) {
buzzwords++;
}
}
}
System.out.println("Found " + buzzwords + " buzzwords");
}
}

New():
ref <- allocate()
if ref = null /* Heap is full */
collect()
ref <- allocate()
if ref = null /* Heap is still full */
error "Out of memory"
return ref
atomic collect():
markFromRoots()
sweep(HeapStart, HeapEnd)
From Garbage Collection Handbook

markFromRoots():
initialise(worklist)
for each fld in Roots
ref <- *fld
if ref != null && not isMarked(ref)
setMarked(ref)
add(worklist, ref)
mark()
initialise(worklist):
worklist <- empty
mark():
while not isEmpty(worklist)
ref <- remove(worklist) /* ref is marked */
for each fld in Pointers(ref)
child <- *fld
if (child != null && not isMarked(child)
setMarked(child)
add(worklist, child)
From Garbage Collection Handbook

Trivia:Who invented the first
GC and Mark-and-Sweep?

Where do objects in common Solr
application live?
AtomicReaderContext?
SolrIndexSearcher?
SolrRequest?

GC Terminology:
Concurrent vs Parallel

Trivia: How does System.identityHashCode
() work?

Continuously Concurrent Compacting Collector (C4)

...
import java.lang.management.*;
...
public static long getCollectionTime() {
long collectionTime = 0;
for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) {
collectionTime += mbean.getCollectionTime();
}
return collectionTime;
}
Available via JMX

export GC_DEBUG="-verbose:gc
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintAdaptiveSizePolicy
-XX:AdaptiveSizePolicyOutputInterval=1
-XX:+PrintTenuringDistribution
-XX:+PrintGCDetails
-XX:+PrintCommandLineFlags
-XX:+PrintSafepointStatistics
-Xloggc:/var/log/search/gc.log"

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213
PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154
promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live:
22190682112
AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790
old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672
avg_young_live: 7340911616
AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98
PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost:
0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space:
16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672
desired_eden_size: 8321564672
AdaptiveSizeStop: collection: 213
[PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K-
>29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36
sys=0.03, real=8.77 secs]
Heap after GC invocations=213 (full 210):
PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000,
0x00007ff0dd000000)
eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000)
from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000)
to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000)
ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000,
0x00007fee47ab0000)
object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000)
PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000,
0x00007fe91d000000)
object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)
}

GC Log Analyzers?
GCHisto
GCViewer
garbagecat

Graphing with Logster
github.com/etsy/logster

GC Dashboard
github.com/etsy/dashboard

Designing for Partial Availability

How can a client ignore GC-ing hosts?

Server lies to clients about availability
TCP socket receive buffer
TCP write buffer

“Banner” protocol
1. Connect via TCP
2.Wait ~1-10ms
3. Either receive magic four byte header or try another host
4. Only send query after receiving header from server

What if GC happens
mid-request?

Jeff Dean: Achieving Rapid
Response Time in Large
Online Services

Solr sharding?
Right now, only as fast as the slowest shard.

“Make a reliable whole
out of unreliable parts.”

Solr API hooks for
custom code
QParserPlugin SearchComponent
SolrRequestHandler SolrEventListener
SolrCache ValueSourceParser
etc.FieldType

PSA: Are you sure you
need custom code?

CoreContainer#getCore()
RefCounted<SolrIndexSearcher>

SolrIndexSearcher generation marking with
YourKit triggers

#!/usr/bin/env bash
# This script is designed to be run every minute by cron.
host=$(hostname -s)
psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)
min_flt=$(echo $psout | awk '{print $1}') # minor page faults
maj_flt=$(echo $psout | awk '{print $2}') # major page faults
epoch_s=$(date +%s)
echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc
graphite.etsycorp.com 2003
echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc
graphite.etsycorp.com 2003

Solution 1: Buy more RAM
Ideally enough RAM to:
Keep index in OS ﬁle buffers
AND ensure no paging ofVM memory
AND whatever else happens on the box
~$5-10/GB

echo “0” > /proc/sys/vm/swappiness

echo “-17” > /proc/$PID/oom_adj
Mercy from the OOM Killer

Many small VMs instead of one large VM
microsharding

In-memory Lucene codecs
I.e. custom DirectPostingsFormat

Off-heap memory with sun.misc.Unsafe?

bit.ly/mmgcb
Mark Miller’s GC Bootcamp

bit.ly/giltene
GilTene: Understanding Java
Garbage Collection

bit.ly/cpumemory
Ulrich Drepper: What Every Programmer Should
Know About Memory

github.com/pingtimeout/jvm-options

Read the JVM Source
(Not as scary as it sounds.)
hg.openjdk.java.net/jdk7/jdk7

Mechanical Sympathy Google Group
bit.ly/mechsym

CONTACT
Gregg Donovan
gregg@etsy.com

Living with garbage

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Living with garbage

Similar a Living with garbage (20)

Más de lucenerevolution

Más de lucenerevolution (20)

Último

Último (20)

Living with garbage