Presented by Gregg Donovan, Senior Software Engineer, Etsy.com, Inc.
Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM.
At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success.
At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.
18. public class BuzzwordDetector {
static String[] prefixes = { "synergy", "win-win" };
static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };
public static void main(String[] args) {
args = myArgs;
int buzzwords = 0;
for (int i = 0; i < args.length; i++) {
String lc = args[i].toLowerCase();
for (int j = 0; j < prefixes.length; j++) {
if (lc.contains(prefixes[j])) {
buzzwords++;
}
}
}
System.out.println("Found " + buzzwords + " buzzwords");
}
}
19. New():
ref <- allocate()
if ref = null /* Heap is full */
collect()
ref <- allocate()
if ref = null /* Heap is still full */
error "Out of memory"
return ref
atomic collect():
markFromRoots()
sweep(HeapStart, HeapEnd)
From Garbage Collection Handbook
20. markFromRoots():
initialise(worklist)
for each fld in Roots
ref <- *fld
if ref != null && not isMarked(ref)
setMarked(ref)
add(worklist, ref)
mark()
initialise(worklist):
worklist <- empty
mark():
while not isEmpty(worklist)
ref <- remove(worklist) /* ref is marked */
for each fld in Pointers(ref)
child <- *fld
if (child != null && not isMarked(child)
setMarked(child)
add(worklist, child)
From Garbage Collection Handbook
37. ...
import java.lang.management.*;
...
public static long getCollectionTime() {
long collectionTime = 0;
for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) {
collectionTime += mbean.getCollectionTime();
}
return collectionTime;
}
Available via JMX
51. Server lies to clients about availability
TCP socket receive buffer
TCP write buffer
52. “Banner” protocol
1. Connect via TCP
2.Wait ~1-10ms
3. Either receive magic four byte header or try another host
4. Only send query after receiving header from server
71. #!/usr/bin/env bash
# This script is designed to be run every minute by cron.
host=$(hostname -s)
psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)
min_flt=$(echo $psout | awk '{print $1}') # minor page faults
maj_flt=$(echo $psout | awk '{print $2}') # major page faults
epoch_s=$(date +%s)
echo -e "search_memstats.$host.etsy-search.min_fltt${min_flt:-0}t$epoch_s" | nc
graphite.etsycorp.com 2003
echo -e "search_memstats.$host.etsy-search.maj_fltt${maj_flt:-0}t$epoch_s" | nc
graphite.etsycorp.com 2003
72. Solution 1: Buy more RAM
Ideally enough RAM to:
Keep index in OS file buffers
AND ensure no paging ofVM memory
AND whatever else happens on the box
~$5-10/GB