English:
Sometimes java applications are not as fast as we expect. What if our system contains of hundreds JVMs, databases and other components? We try using profiler, however can't find the bottleneck.
In this talk we discuss:
1. How to profile single JVM
- what is profiler and how does it work
- write simple profiler using java agent and byte-code modification
2. How to profile distributed system
- how engineers in Google doing this
- look at commercial and open source solutions: Dynatrace and Zipkin
- connect to demo-system and see live demos
Russian:
Порой наши java-программы работают медленней, чем хотелось бы. Иногда мы даже подключаемся профайлером, чтобы посмотреть, где тормозит.
А что если наша система состоит из десятков/сотен JVM, баз данных и других компонентов?
На этом техтолке мы обсудим:
1. Как профилировать одну JVM
- что такое профайлер и как он работает под капотом
- напишем простой профайлер с помощью java-агента и байт-код модификаций
2. Как профилировать сложную распределённую систему
- разберёмся как это делают инженеры в Google
- посмотрим готовые коммерческие и open-source решения: Dynatrace и Zipkin
- подключимся к демо-системе и увидим всё своими глазами
https://github.com/kslisenko/java-performance
11. 11CONFIDENTIAL
public void main() {
a(); // 100 ms
Thread.sleep(200);
b(); // 100 ms
// GC is running – 50ms
c(); // 100 ms
}
CPU VS WALL-CLOCK TIME
12. 12CONFIDENTIAL
Wall-clock time
As much as it takes to execute
100 + 200 + 100 + 50 + 100 = 550 ms
public void main() {
a(); // 100 ms
Thread.sleep(200);
b(); // 100 ms
// GC is running – 50ms
c(); // 100 ms
}
CPU VS WALL-CLOCK TIME
13. 13CONFIDENTIAL
Wall-clock time
As much as it takes to execute
100 + 200 + 100 + 50 + 100 = 550 ms
CPU time
Time CPU was busy
100 + 100 + 100 = 300 ms
public void main() {
a(); // 100 ms
Thread.sleep(200);
b(); // 100 ms
// GC is running – 50ms
c(); // 100 ms
}
CPU VS WALL-CLOCK TIME
21. 21CONFIDENTIAL
Thread dumps in regular intervals
Overhead depends on sampling interval
Injection of measurement code
Overhead depends on speed of measurement code
INSTRUMENTATION
c()
b()
a()
main()
SAMPLING
c()
b()
a()
main()
22. 22CONFIDENTIAL
Thread dumps in regular intervals
Overhead depends on sampling interval
relatively small overhead
can be used for unknown code
Injection of measurement code
Overhead depends on speed of measurement code
accuracy (we measure each execution)
we can modify the code also
INSTRUMENTATION
c()
b()
a()
main()
SAMPLING
c()
b()
a()
main()
23. 23CONFIDENTIAL
Thread dumps in regular intervals
Overhead depends on sampling interval
relatively small overhead
can be used for unknown code
accuracy (probability-based approach)
triggers JVM safe-points
Injection of measurement code
Overhead depends on speed of measurement code
accuracy (we measure each execution)
we can modify the code also
relatively big overhead
we must know the code we are instrumenting
INSTRUMENTATION
c()
b()
a()
main()
SAMPLING
c()
b()
a()
main()
24. 24CONFIDENTIAL
How to capture thread dump
1. jstack -l JAVA_PID
2. ManagementFactory.getThreadMXBean()
.dumpAllThreads(true, true);
3. JVMTI AsyncGetCallTrace
SAMPLING
25. 25CONFIDENTIAL
How to capture thread dump
1. jstack -l JAVA_PID
2. ManagementFactory.getThreadMXBean()
.dumpAllThreads(true, true);
3. JVMTI AsyncGetCallTrace
SAMPLING
JVM goes to safe-point
• Application threads are paused
• We never see the code where safe-point never happens
Does not trigger safe-points
26. 26CONFIDENTIAL
How to capture thread dump
1. jstack -l JAVA_PID
2. ManagementFactory.getThreadMXBean()
.dumpAllThreads(true, true);
3. JVMTI AsyncGetCallTrace
SAMPLING
Doesn’t trigger safe-points
github.com/jvm-profiling-tools/honest-profiler
JVM goes to safe-point
• Application threads are paused
• We never see the code where safe-point never happens
27. 27CONFIDENTIAL
Safe-points
> jstack –l JAVA_PID
Total time for which application threads were
stopped: 0.0132329 seconds, Stopping threads took:
0.0007617 seconds
Total time for which application threads were
stopped: 0.0002887 seconds, Stopping threads took:
0.0000385 seconds
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
36. 36CONFIDENTIAL
> java –jar -agentlib:agent.dll app.jar> java –jar -agentlib:agent.jar app.jar
JAVA AGENTS
Use for deep dive into JVM
• Has access to the JVM state, can receive JVMTI events
• Independent from JVM (not interrupted by GC, can collect
debug information between safe-points, etc.)
API
• JVMTI (C++ native interface of the JVM)
Use for byte-code modification
• Allows to transform byte-code before it is loaded by
ClassLoader
• Follows JVM lifecycle (suspended by GC, etc.)
API
• java.lang.instrument, java.lang.management
Java C++
48. 48CONFIDENTIAL
JVM
ClassLoader
Class A
Class B
Class C
Agent
ClassFile
Transformer
Byte code
manipulation
library
1. premain
2. addTransformer
3. load class
5. modify byte code
6. redefine class
Class A*
4. transform
Class A
58. 58CONFIDENTIAL
IDENTIFYING PERFORMANCE PROBLEM
HTTP 500
timeout
Responses
HTTP 200
150ms
HTTP 200
270ms
HTTP 200
270ms
HTTP 200
150ms
Req-1 12:45:31.000 150 ms
Req-1 12:45:31.010 130 ms
Header
req-id: 1
Header
req-id: 1
Header
req-id: 1
Req-1 12:45:31.020 120 ms
Server 1
DBServer 2
DBServer 3
HTTP
HTTP
HTTP
Trace
propagation
Frustrated
user
59. 59CONFIDENTIAL
TRACE EXAMPLE 1
http://server1/service
http://server2/service
server2
to DB
business
logic
http://server3/service
server3
to DB
business
logic
150 ms
120 ms
80 ms 30 ms
130 ms
100 ms 20 ms
http://server1/service
http://server2/service
server2
to DB
business
logic
http://server3/service
server3
to DB
business
logic
120 ms
80 ms 30 ms
130 ms
100 ms 20 ms
270 ms
HTTP 200
150ms
HTTP 200
270ms
60. 60CONFIDENTIAL
TRACE EXAMPLE 2
http://server1/service
http://server2/service
server2
to DB
business
logic
http://server3/service
server3
to DB
business
logic
150 ms
120 ms
80 ms 30 ms
130 ms
100 ms 20 ms
http://server1/service
http://server2/service
server2
to DB
business
logic
http://server3/service
server3 to DB
120 ms
80 ms 30 ms
370 ms
350 ms
500 ms
timeout
HTTP 200
150ms
HTTP 500
timeout
61. 61CONFIDENTIAL
“When systems involve not just dozens of subsystems but
dozens of engineering teams, even our best and most
experienced engineers routinely guess wrong about the root
cause of poor end-to-end performance.”
Google Dapper
https://research.google.com/pubs/pub36356.html
62. 62CONFIDENTIAL
GOOGLE DAPPER
Use cases
1. Identify performance problems
across multiple teams and services
2. Build dynamic environment map
Requirements
1. Low overhead
– no impact on running services
2. Application-level transparency*
– programmers should not need to be aware of
the tracing system
3. Scalability
*They instrumented Google Search almost without modifications
65. 65CONFIDENTIAL
GOOGLE DAPPER: TECHNICAL DETAILS
Technical facts
1. Adaptive sampling
2. 1TB/day to BigTable
3. API + MapReduce
4. Instrumentation of common
Google libraries
Issues and limitations
1. Request buffering
2. Batch jobs
3. Queued requests
4. Relative latency
69. 69CONFIDENTIAL
ZIPKIN (SPRING CLOUD SLEUTH)
Server 1 Server 2
HTTPHTTP
transport
storage User interface
API
http://zipkin.io/pages/architecture.html
Instrumented libraries
Send traces and spans
Trace id Trace id
70. 70CONFIDENTIAL
ZIPKIN (SPRING CLOUD SLEUTH)
HTTP
http://zipkin.io/pages/architecture.html
Server 1 Server 2
HTTPHTTP
transport
storage User interface
API
Instrumented libraries
Send traces and spans
Trace id Trace id
71. 71CONFIDENTIAL
ZIPKIN (SPRING CLOUD SLEUTH)
HTTP
http://zipkin.io/pages/architecture.html
Instrumented libraries
Server 1 Server 2
HTTPHTTP
transport
storage User interface
API
Send traces and spans
Trace id Trace id