SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Why GC is eating all my CPU? 
Aprof - Java Memory Allocation Profiler 
Roman Elizarov, Devexperts 
Joker Conference, St. Petersburg, 2014
Java Memory Allocation Profiler 
Why it is needed? 
When to use it? 
How it works? 
How to use it? 
DISCLAIMER Herein lots of: 
omissions, 
simplifications, 
half-truths, 
intentionally misleading information. [Almost] all the truth is at the end of presentation!
Java 
•Does not have stack-allocation, does not have structs, does not have tuples… 
•Promotes the “Object Oriented” style of programming with lots of object allocations 
–Most of which are only temporary → garbage
Garbage collection
Enjoy Java and GC 
Until it becomes performance bottleneck
http://odetocode.com/Blogs/scott/archive/2008/07/15/optimizing-linq-queries.aspx
So, measure GC! 
http://clearlypresentable.files.wordpress.com/2010/10/manage-measure.jpg
Use simple tools…
… or use advanced tools …
…or use the logs and APIs 
•Always use the following settings in prod: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
– So, you can always figure out % time spent in GC by looking at your stdout as last resort. 
•To figure it out programmatically, see java.lang.management.GarbageCollectorMXBean 
Shameless commercial plug: That is how our MARS monitoring product does it
How much is much? 
•10%+ time in GC – you start worrying 
•30%+ time in GC – you do something about it 
http://upload.wikimedia.org/wikipedia/commons/4/4f/Scheibenbremse-magura.jpg
C/C++ 
Java 
Use CPU Profiler 
Find hot spots 
Fix 
Use CPU Profiler 
Find hot spots 
Wait! Allocation is FAST! (does not consume CPU) 
Troubleshooting
Object Allocation 
Garbage Collection 
Causes 
C/C++ 
Java 
Object Allocation 
CPU consumption 
Causes 
malloc/free cost
Memory allocation profiling 
•“-Xaprof” option in old JVM (<= 1.6) 
–Prints something like this on process termination: 
Allocation profile (sizes in bytes, cutoff = 0 bytes): 
___________Size__Instances__Average__Class________________ 
555807584 34737974 16 java.lang.Integer 
321112 5844 55 [I 
106104 644 165 [C 
37144 63 590 [B 
13744 325 42 [Ljava.lang.Object; 
… <the rest> 
Top alloc’d 
That is where aprof project got its name from
Where memory is allocated?
Where memory is allocated? (1) 
•Java 
–new CName(…) 
–new <prim>[…] 
–new CName[...] 
–new CName[…][…]…[…] 
•Bytecode 
–new 
–newarray 
–anewarray 
–multianewarray 
Fundamental ways to allocate memory 
Quiz for audience: What else? 
Allocate memory 
Since Java 1.0
Where memory is allocated? (2) 
–Integer i = 1234; 
–Integer i = Integer.valueOf(1234); 
Boxing – syntactic sugar to allocate memory 
Quiz for audience: What else? 
Since Java 5 
“Enhanced for loop” also desugars to method calls
Where memory is allocated? (3) 
•java.lang.Object.clone 
–Yet another true way of object allocation 
•Reflection & deserialization 
–Gets compiled into bytecode after a few invokes 
but Array.newInstance needs separate care 
•sun.misc.Unsafe.allocateInstance 
–Could be tracked, but is not in current aprof 
•JNI 
–Can be tracked via native JVMTI 
aprof is a pure Java tool now, does not track it 
Quiz for audience: What else?
Where memory is allocated? (4) 
Captured 
In new lambda 
Since Java 8 
Capturing Java 8 Lambda Closure
Let’s instrument bytecodes 
java –javaagent:aprof.jar … 
JVM API to write bytecode instrumenting agents in Java
Define Premain-Class 
Premain-Class: com.devexperts.aprof.AProfAgent Boot-Class-Path: aprof.jar Can-Redefine-Classes: true 
aprof.jar!META-INF/MANIFEST.MF
Install transformer 
That is what we need! 
Ah… The secret sauce!
Manipulate bytecode with ASM 
•ObjectWeb ASM is an open source lib to help 
–Easy to use for bytecode manipulation 
–Extremely fast (suited to on-the-fly manipulation) 
ClassReader 
ClassVisitor 
ClassWriter 
MethodVisitor
Instrument around bytecodes
Generate calls to profiling methods 
New array bytecode 
Needs length to compute object size 
Index of location 
invoke static profiling method
Generate calls to profiling methods 
Regular new object bytecode 
Needs class to compute object size
Example transformation 
public int[] newArray(int); 
Code: 
0: iload_1 
1: newarray int 
3: areturn 
public int[] newArray(int); Code: 0: iload_1 1: dup 2: sipush 4137 5: invokestatic #31 8: newarray int 10: areturn 
java –javaagent:aprof.jar=dump.classes.dir=dump … 
Method com/devexperts/aprof/AProfOps. intAllocateArraySize 
Assigned index 
return new int[n];
We cannot measure the system without affecting it 
Need to minimize measurement effect 
Measure garbage without producing garbage (do not allocate memory)
Garbage-free code? 
[ ] Class-transformation 
Only during load (don’t care) 
[ ] Object-class size computation 
Only once per class (don’t care) 
[x] Count individual allocations 
It has to be garbage-free 
and it is (once all allocation locations were visited)
Count all allocations 
Uses array access, because index enumerates (location, type) pairs consecutively
Compute object size 
Array size 
… very fast computation
Compute object size 
Regular object size 
… and cache the size for class 
Precise and actual size (!)
Let’s try it 
http://3.bp.blogspot.com/-vUgjzK5P-kQ/ToYOTLjqBTI/AAAAAAAAFu0/2ZGrGV9y0c4/s640/tumblr_komtamvGna1qzvjtno1_500.jpg
Big business app 
Top allocated data types with locations 
------------------------------------------------------------- 
char[]: 330,657,880 bytes in 5,072,613 objects 
java.util.Arrays.copyOfRange: 131,299,568 bytes in … 
aprof.txt 
Oops! That is not informative 
We knew that it will produce a lot of char[] garbage in strings! 
aprof agent
Need allocation context 
Call stack: 
•MyDataClass.getData(int) 
•java.lang.StringBuilder.toString() 
•Java.lang.String.String(char[], int, int) 
•Java.util.Arrays.copyOfRange(char[], int, int[]) 
•new char[] 
But it is allocated here 
We want this location
Aprof Tracked Methods 
java –jar aprof.jar export details.config 
… 
java.lang.StringBuilder 
<init> 
append 
appendCodePoint 
ensureCapacity 
insert 
setLength 
subSequence 
substring 
trimToSize 
toString 
… 
details.config 
Java RT methods that might allocate memory
How it is done? 
•Tracked via thread-local instance of class com.devexperts.aprof.LocationStack 
•Local variable is injected into 
–methods that allocate memory 
–methods that invoke tracked methods 
–tracked methods themselves 
•Local variable is initialized on first use
public int[] newArray(int); 
Code: 
0: aconst_null 
1: astore_2 
2: iload_1 
3: dup 
4: aload_2 
5: dup 
6: ifnonnull 15 
9: pop 
10: invokestatic #25 
13: dup 
14: astore_2 
15: sipush 4137 
18: invokestatic #31 
21: newarray int 
23: areturn 
Example transformation (actual) 
LocationStack stack = null; // local var 
If (stack == null) 
stack = LocationStack.get(); 
return new int[n];
Looks scary 
… But fast in practice 
–Long-running methods retrieve LocationStack from ThreadLocal only once 
–HotSpot optimizes this ugly code quite well 
–It only affects code that does memory allocations or uses tracked (memory allocating!) methods 
–Does not allocate memory during stable operation 
It has no performance impact on dense garbage-free computation code, various getters, etc
Big business app 
Top allocated data types with reverse location traces 
----------------------------------------------------- 
char[]: 330,657,880 bytes in 5,072,613 objects 
java.util.Arrays.copyOfRange: 131,299,568 bytes in … 
java.lang.StringBuilder.toString: … 
MyBusinessMethod: … 
… (more!) 
aprof.txt 
aprof agent 
Got you!
Actual call stack: … 
•MyBusinessMethod 
•java.lang.StringBuilder.toString() 
•Java.lang.String.String(char[], int, int) 
•Java.util.Arrays.copyOfRange(char[], int, int[]) 
•new char[] 
Aprof dump vs actual call stack 
Top allocated data types with reverse location traces ----------------------------------------------------- char[]: 330,657,880 bytes in 5,072,613 objects java.util.Arrays.copyOfRange: 131,299,568 bytes in … java.lang.StringBuilder.toString: … MyBusinessMethod: … … (more!) 
At most 4 items are displayed in aprof dump 
(1) Type that was allocated 
(2) Where it was allocated 
(1) 
(2) 
(3) Outermost tracked method 
(4) Caller of the outermost tracked method 
(3) 
(4)
But what if 
… caught MyFrameworkMethod allocating memory instead? 
Add it to tracked methods list 
java –javaagent:aprof.jar=track=MyFrameworkMethod … 
java –javaagent:aprof.jar=track.file=details.config … 
a) 
b) 
Repeat
Poor man’s catch-all 
java –javaagent:aprof.jar=+unknown … 
Injects the profiling code right into java.lang.Object constructor 
•Does some mental accounting to exclude allocation that were already counted, reports the difference 
•The difference can appear from JNI object allocations. But location is unknown anyway. 
•Does not help with tracking JNI array allocations at all (no constructor invocation)
Top allocated data types with reverse location traces 
----------------------------------------------------- 
java.util.ArrayList$Itr: 32,000,006,272 bytes … 
java.util.ArrayList.iterator: 32,000,006,272 bytes … IterateALot.sumList: 32,000,000,000 bytes … 
Iterator is allocated here
java -XX:+PrintGCDetails -XX:+PrintGCTimeStamps … 
Shouldn’t GC work like hell? 
[PSYoungGen: 512K->400K(1024K)] 512K->408K(126464K) 
[PSYoungGen: 912K->288K(1024K)] 920K->296K(126464K) 
[PSYoungGen: 800K->352K(1024K)] 808K->360K(126464K) 
[PSYoungGen: 864K->336K(1536K)] 872K->344K(126976K) 
[PSYoungGen: 1360K->352K(1536K)] 1368K->360K(126976K) 
[PSYoungGen: 1376K->352K(2560K)] 1384K->360K(128000K) 
[PSYoungGen: 2400K->0K(2560K)] 2408K->284K(128000K) 
[PSYoungGen: 2048K->0K(4608K)] 2332K->284K(130048K) 
And that is it! No more GC! 
What is going on here?
HotSpot allocation elimination 
Deep inlining 
Escape analysis 
Allocation elimination
Aprof allocation elimination checking 
java –javaagent:aprof.jar:+check.eliminate.allocation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation … 
Top allocated data types with reverse location traces 
----------------------------------------------------- 
java.util.ArrayList$Itr: 32,000,006,272 bytes … 
java.util.ArrayList.iterator: 32,000,006,272 bytes … IterateALot.sumList: … ; possibly eliminated 
1.HotSpot writes hs_xxx.log files 
2.Aprof parses them to learn eliminated allocation locations
Some additional options/features 
•histogram – track array allocation separately for different size brackets 
•file – configure dump file, use #### in file name to auto-number files 
•file.append – append dump to file every, instead of overwriting 
•time – time period to write dumps, 
defaults to a minute 
java –jar aprof.jar 
To get help
Advanced topics (1) 
•Transforming classes that were already loaded 
–… before aprof Pre-Main had even got control 
•Profiling the profiler 
–aprof records and reports its own allocations 
•Caching class meta-info for each class-loader 
–for fast class transformation 
•Aggregating classes with dynamic names like “sun.reflect.GeneratedConstructorAccessorX” 
–to avoid memory leaks (out of memory)
Advanced topics (2) 
•Handling HotSpot intrinsics 
–Arrays.copyOf and Arrays.copyOfRange do not actually run their bytecode in HotSpot (intrinsic) 
aprof has to treat them as first-class allocations 
•Avoiding class-name clashes with target app 
–asm and aprof transformer are actually loaded from a separate inner class loader 
•Accessing index of counters via fast hash 
–http://elizarov.livejournal.com/tag/hash 
•Writing big reports without tons of garbage
The source 
https://github.com/devexperts/aprof 
GPL 3.0 
Your contributions are welcome
Known issues 
•Lambda capture memory allocations are not tracked nor reported in any way 
–Need to track metafactory calls and unnamed classes it generates 
•Java 8 library methods (collections, streams, etc) are not included into default list of tracked methods 
–Need to work through them and include them 
•JNI allocations are not properly tracked 
–Need to write native JVMTI agent to track them 
Where you can help 
Mostly done in aprof v30 
To get actual code location
Questions? 
Feedback? 
aprof@devexperts.com 
elizarov@devexperts.com

Más contenido relacionado

La actualidad más candente

Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 

La actualidad más candente (20)

Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
객체지향 개념 (쫌 아는체 하기)
객체지향 개념 (쫌 아는체 하기)객체지향 개념 (쫌 아는체 하기)
객체지향 개념 (쫌 아는체 하기)
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
svn 능력자를 위한 git 개념 가이드
svn 능력자를 위한 git 개념 가이드svn 능력자를 위한 git 개념 가이드
svn 능력자를 위한 git 개념 가이드
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
Windows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCPWindows Registered I/O (RIO) vs IOCP
Windows Registered I/O (RIO) vs IOCP
 
ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 

Similar a Why GC is eating all my CPU?

Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 

Similar a Why GC is eating all my CPU? (20)

DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Jvm internals
Jvm internalsJvm internals
Jvm internals
 
java training faridabad
java training faridabadjava training faridabad
java training faridabad
 
Inside the JVM - Follow the white rabbit!
Inside the JVM - Follow the white rabbit!Inside the JVM - Follow the white rabbit!
Inside the JVM - Follow the white rabbit!
 
Java Tutorials
Java Tutorials Java Tutorials
Java Tutorials
 
Inside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGInside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUG
 
Compromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging MechanismsCompromising Linux Virtual Machines with Debugging Mechanisms
Compromising Linux Virtual Machines with Debugging Mechanisms
 
2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...
2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...
2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Core java msc
Core java mscCore java msc
Core java msc
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Understanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual MachineUnderstanding the Dalvik Virtual Machine
Understanding the Dalvik Virtual Machine
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Programming in java basics
Programming in java  basicsProgramming in java  basics
Programming in java basics
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 

Más de Roman Elizarov

Многопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и ПрактикаМногопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и Практика
Roman Elizarov
 
Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)
Roman Elizarov
 
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Roman Elizarov
 
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
Roman Elizarov
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and Fallacies
Roman Elizarov
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
Roman Elizarov
 
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
Roman Elizarov
 

Más de Roman Elizarov (20)

Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018Kotlin Coroutines in Practice @ KotlinConf 2018
Kotlin Coroutines in Practice @ KotlinConf 2018
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017
 
Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017Introduction to Coroutines @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017
 
Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017Fresh Async with Kotlin @ QConSF 2017
Fresh Async with Kotlin @ QConSF 2017
 
Scale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOneScale Up with Lock-Free Algorithms @ JavaOne
Scale Up with Lock-Free Algorithms @ JavaOne
 
Kotlin Coroutines Reloaded
Kotlin Coroutines ReloadedKotlin Coroutines Reloaded
Kotlin Coroutines Reloaded
 
Lock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin CoroutinesLock-free algorithms for Kotlin Coroutines
Lock-free algorithms for Kotlin Coroutines
 
Introduction to Kotlin coroutines
Introduction to Kotlin coroutinesIntroduction to Kotlin coroutines
Introduction to Kotlin coroutines
 
Non blocking programming and waiting
Non blocking programming and waitingNon blocking programming and waiting
Non blocking programming and waiting
 
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
 
Многопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и ПрактикаМногопоточное Программирование - Теория и Практика
Многопоточное Программирование - Теория и Практика
 
Wait for your fortune without Blocking!
Wait for your fortune without Blocking!Wait for your fortune without Blocking!
Wait for your fortune without Blocking!
 
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
 
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
 
Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)Многопоточные Алгоритмы (для BitByte 2014)
Многопоточные Алгоритмы (для BitByte 2014)
 
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
 
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and Fallacies
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems ReviewACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Último (20)

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 

Why GC is eating all my CPU?

  • 1. Why GC is eating all my CPU? Aprof - Java Memory Allocation Profiler Roman Elizarov, Devexperts Joker Conference, St. Petersburg, 2014
  • 2. Java Memory Allocation Profiler Why it is needed? When to use it? How it works? How to use it? DISCLAIMER Herein lots of: omissions, simplifications, half-truths, intentionally misleading information. [Almost] all the truth is at the end of presentation!
  • 3. Java •Does not have stack-allocation, does not have structs, does not have tuples… •Promotes the “Object Oriented” style of programming with lots of object allocations –Most of which are only temporary → garbage
  • 5. Enjoy Java and GC Until it becomes performance bottleneck
  • 7. So, measure GC! http://clearlypresentable.files.wordpress.com/2010/10/manage-measure.jpg
  • 9. … or use advanced tools …
  • 10. …or use the logs and APIs •Always use the following settings in prod: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps – So, you can always figure out % time spent in GC by looking at your stdout as last resort. •To figure it out programmatically, see java.lang.management.GarbageCollectorMXBean Shameless commercial plug: That is how our MARS monitoring product does it
  • 11. How much is much? •10%+ time in GC – you start worrying •30%+ time in GC – you do something about it http://upload.wikimedia.org/wikipedia/commons/4/4f/Scheibenbremse-magura.jpg
  • 12.
  • 13. C/C++ Java Use CPU Profiler Find hot spots Fix Use CPU Profiler Find hot spots Wait! Allocation is FAST! (does not consume CPU) Troubleshooting
  • 14. Object Allocation Garbage Collection Causes C/C++ Java Object Allocation CPU consumption Causes malloc/free cost
  • 15. Memory allocation profiling •“-Xaprof” option in old JVM (<= 1.6) –Prints something like this on process termination: Allocation profile (sizes in bytes, cutoff = 0 bytes): ___________Size__Instances__Average__Class________________ 555807584 34737974 16 java.lang.Integer 321112 5844 55 [I 106104 644 165 [C 37144 63 590 [B 13744 325 42 [Ljava.lang.Object; … <the rest> Top alloc’d That is where aprof project got its name from
  • 16. Where memory is allocated?
  • 17. Where memory is allocated? (1) •Java –new CName(…) –new <prim>[…] –new CName[...] –new CName[…][…]…[…] •Bytecode –new –newarray –anewarray –multianewarray Fundamental ways to allocate memory Quiz for audience: What else? Allocate memory Since Java 1.0
  • 18. Where memory is allocated? (2) –Integer i = 1234; –Integer i = Integer.valueOf(1234); Boxing – syntactic sugar to allocate memory Quiz for audience: What else? Since Java 5 “Enhanced for loop” also desugars to method calls
  • 19. Where memory is allocated? (3) •java.lang.Object.clone –Yet another true way of object allocation •Reflection & deserialization –Gets compiled into bytecode after a few invokes but Array.newInstance needs separate care •sun.misc.Unsafe.allocateInstance –Could be tracked, but is not in current aprof •JNI –Can be tracked via native JVMTI aprof is a pure Java tool now, does not track it Quiz for audience: What else?
  • 20. Where memory is allocated? (4) Captured In new lambda Since Java 8 Capturing Java 8 Lambda Closure
  • 21. Let’s instrument bytecodes java –javaagent:aprof.jar … JVM API to write bytecode instrumenting agents in Java
  • 22. Define Premain-Class Premain-Class: com.devexperts.aprof.AProfAgent Boot-Class-Path: aprof.jar Can-Redefine-Classes: true aprof.jar!META-INF/MANIFEST.MF
  • 23. Install transformer That is what we need! Ah… The secret sauce!
  • 24. Manipulate bytecode with ASM •ObjectWeb ASM is an open source lib to help –Easy to use for bytecode manipulation –Extremely fast (suited to on-the-fly manipulation) ClassReader ClassVisitor ClassWriter MethodVisitor
  • 26. Generate calls to profiling methods New array bytecode Needs length to compute object size Index of location invoke static profiling method
  • 27. Generate calls to profiling methods Regular new object bytecode Needs class to compute object size
  • 28. Example transformation public int[] newArray(int); Code: 0: iload_1 1: newarray int 3: areturn public int[] newArray(int); Code: 0: iload_1 1: dup 2: sipush 4137 5: invokestatic #31 8: newarray int 10: areturn java –javaagent:aprof.jar=dump.classes.dir=dump … Method com/devexperts/aprof/AProfOps. intAllocateArraySize Assigned index return new int[n];
  • 29. We cannot measure the system without affecting it Need to minimize measurement effect Measure garbage without producing garbage (do not allocate memory)
  • 30. Garbage-free code? [ ] Class-transformation Only during load (don’t care) [ ] Object-class size computation Only once per class (don’t care) [x] Count individual allocations It has to be garbage-free and it is (once all allocation locations were visited)
  • 31. Count all allocations Uses array access, because index enumerates (location, type) pairs consecutively
  • 32. Compute object size Array size … very fast computation
  • 33. Compute object size Regular object size … and cache the size for class Precise and actual size (!)
  • 34. Let’s try it http://3.bp.blogspot.com/-vUgjzK5P-kQ/ToYOTLjqBTI/AAAAAAAAFu0/2ZGrGV9y0c4/s640/tumblr_komtamvGna1qzvjtno1_500.jpg
  • 35. Big business app Top allocated data types with locations ------------------------------------------------------------- char[]: 330,657,880 bytes in 5,072,613 objects java.util.Arrays.copyOfRange: 131,299,568 bytes in … aprof.txt Oops! That is not informative We knew that it will produce a lot of char[] garbage in strings! aprof agent
  • 36. Need allocation context Call stack: •MyDataClass.getData(int) •java.lang.StringBuilder.toString() •Java.lang.String.String(char[], int, int) •Java.util.Arrays.copyOfRange(char[], int, int[]) •new char[] But it is allocated here We want this location
  • 37. Aprof Tracked Methods java –jar aprof.jar export details.config … java.lang.StringBuilder <init> append appendCodePoint ensureCapacity insert setLength subSequence substring trimToSize toString … details.config Java RT methods that might allocate memory
  • 38. How it is done? •Tracked via thread-local instance of class com.devexperts.aprof.LocationStack •Local variable is injected into –methods that allocate memory –methods that invoke tracked methods –tracked methods themselves •Local variable is initialized on first use
  • 39. public int[] newArray(int); Code: 0: aconst_null 1: astore_2 2: iload_1 3: dup 4: aload_2 5: dup 6: ifnonnull 15 9: pop 10: invokestatic #25 13: dup 14: astore_2 15: sipush 4137 18: invokestatic #31 21: newarray int 23: areturn Example transformation (actual) LocationStack stack = null; // local var If (stack == null) stack = LocationStack.get(); return new int[n];
  • 40. Looks scary … But fast in practice –Long-running methods retrieve LocationStack from ThreadLocal only once –HotSpot optimizes this ugly code quite well –It only affects code that does memory allocations or uses tracked (memory allocating!) methods –Does not allocate memory during stable operation It has no performance impact on dense garbage-free computation code, various getters, etc
  • 41. Big business app Top allocated data types with reverse location traces ----------------------------------------------------- char[]: 330,657,880 bytes in 5,072,613 objects java.util.Arrays.copyOfRange: 131,299,568 bytes in … java.lang.StringBuilder.toString: … MyBusinessMethod: … … (more!) aprof.txt aprof agent Got you!
  • 42. Actual call stack: … •MyBusinessMethod •java.lang.StringBuilder.toString() •Java.lang.String.String(char[], int, int) •Java.util.Arrays.copyOfRange(char[], int, int[]) •new char[] Aprof dump vs actual call stack Top allocated data types with reverse location traces ----------------------------------------------------- char[]: 330,657,880 bytes in 5,072,613 objects java.util.Arrays.copyOfRange: 131,299,568 bytes in … java.lang.StringBuilder.toString: … MyBusinessMethod: … … (more!) At most 4 items are displayed in aprof dump (1) Type that was allocated (2) Where it was allocated (1) (2) (3) Outermost tracked method (4) Caller of the outermost tracked method (3) (4)
  • 43. But what if … caught MyFrameworkMethod allocating memory instead? Add it to tracked methods list java –javaagent:aprof.jar=track=MyFrameworkMethod … java –javaagent:aprof.jar=track.file=details.config … a) b) Repeat
  • 44. Poor man’s catch-all java –javaagent:aprof.jar=+unknown … Injects the profiling code right into java.lang.Object constructor •Does some mental accounting to exclude allocation that were already counted, reports the difference •The difference can appear from JNI object allocations. But location is unknown anyway. •Does not help with tracking JNI array allocations at all (no constructor invocation)
  • 45.
  • 46. Top allocated data types with reverse location traces ----------------------------------------------------- java.util.ArrayList$Itr: 32,000,006,272 bytes … java.util.ArrayList.iterator: 32,000,006,272 bytes … IterateALot.sumList: 32,000,000,000 bytes … Iterator is allocated here
  • 47. java -XX:+PrintGCDetails -XX:+PrintGCTimeStamps … Shouldn’t GC work like hell? [PSYoungGen: 512K->400K(1024K)] 512K->408K(126464K) [PSYoungGen: 912K->288K(1024K)] 920K->296K(126464K) [PSYoungGen: 800K->352K(1024K)] 808K->360K(126464K) [PSYoungGen: 864K->336K(1536K)] 872K->344K(126976K) [PSYoungGen: 1360K->352K(1536K)] 1368K->360K(126976K) [PSYoungGen: 1376K->352K(2560K)] 1384K->360K(128000K) [PSYoungGen: 2400K->0K(2560K)] 2408K->284K(128000K) [PSYoungGen: 2048K->0K(4608K)] 2332K->284K(130048K) And that is it! No more GC! What is going on here?
  • 48. HotSpot allocation elimination Deep inlining Escape analysis Allocation elimination
  • 49. Aprof allocation elimination checking java –javaagent:aprof.jar:+check.eliminate.allocation -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation … Top allocated data types with reverse location traces ----------------------------------------------------- java.util.ArrayList$Itr: 32,000,006,272 bytes … java.util.ArrayList.iterator: 32,000,006,272 bytes … IterateALot.sumList: … ; possibly eliminated 1.HotSpot writes hs_xxx.log files 2.Aprof parses them to learn eliminated allocation locations
  • 50. Some additional options/features •histogram – track array allocation separately for different size brackets •file – configure dump file, use #### in file name to auto-number files •file.append – append dump to file every, instead of overwriting •time – time period to write dumps, defaults to a minute java –jar aprof.jar To get help
  • 51. Advanced topics (1) •Transforming classes that were already loaded –… before aprof Pre-Main had even got control •Profiling the profiler –aprof records and reports its own allocations •Caching class meta-info for each class-loader –for fast class transformation •Aggregating classes with dynamic names like “sun.reflect.GeneratedConstructorAccessorX” –to avoid memory leaks (out of memory)
  • 52. Advanced topics (2) •Handling HotSpot intrinsics –Arrays.copyOf and Arrays.copyOfRange do not actually run their bytecode in HotSpot (intrinsic) aprof has to treat them as first-class allocations •Avoiding class-name clashes with target app –asm and aprof transformer are actually loaded from a separate inner class loader •Accessing index of counters via fast hash –http://elizarov.livejournal.com/tag/hash •Writing big reports without tons of garbage
  • 53. The source https://github.com/devexperts/aprof GPL 3.0 Your contributions are welcome
  • 54. Known issues •Lambda capture memory allocations are not tracked nor reported in any way –Need to track metafactory calls and unnamed classes it generates •Java 8 library methods (collections, streams, etc) are not included into default list of tracked methods –Need to work through them and include them •JNI allocations are not properly tracked –Need to write native JVMTI agent to track them Where you can help Mostly done in aprof v30 To get actual code location
  • 55. Questions? Feedback? aprof@devexperts.com elizarov@devexperts.com