SlideShare a Scribd company logo
1 of 31
Download to read offline
OS-caused Long JVM Pauses
- Deep Dive and Solutions
Zhenyun Zhuang
LinkedIn Corp., Mountain View, California, USA
https://www.linkedin.com/in/zhenyun
Zhenyun@gmail.com
Outline
 Introduction
 Background
 Scenario 1: startup state
 Scenario 2: steady state with memory pressure
 Scenario 3: steady state with heavy IO
 Lessons learned
2
Introduction
 Java + Linux
 Java is popular in production deployments
 Linux features interact with JVM operations
 Unique challenges caused by concurrent applications
 Long JVM pauses caused by Linux OS
 Production issues, in three scenarios
 Root causes
 Solutions
 References
 Ensuring High-performance of Mission-critical Java Applications in Multi-
tenant Cloud Platforms, IEEE Cloud 2014
 Eliminating Large JVM GC Pauses Caused by Background IO Traffic,
LinkedIn Engineering Blog, 2016 (Too many tweets bringing down a twitter server! :)
3
Background
 JVM and Heap
 Oracle HotSpot JVM
 Garbage collection
 Generations
 Garbage collectors
 Linux OS
 Paging (Regular page, Huge page)
 Swapping (Anonymous memory)
 Page cache writeback (Batched, Periodic)
4
Scenarios
 Three scenarios
 Startup state
 Steady state with memory pressure
 Steady state with heavy IO
 Workload
 Java application keeps allocating/de-allocating objects
 Background applications taking memories or issuing disk IO
 Performance metrics
 Application throughput (K allocations/sec)
 Java GC pauses
5
Scenario 1: Startup State (App. Symptoms)
 When Java applications start
 Life is good in the beginning
 Then Java throughput drops sharply
 Java GC pauses spike during the same period
6
Scenario 1: Startup State (Investigations)
 Java heap is gradually allocated
 Without enough memory, direct
page scanning can happen
 Heap is swapped out and in
 It causes large GC
7
Solutions
 Pre-allocating JVM heap spaces
 JVM “-XX:AlwaysPreTouch”
 Protecting JVM heap spaces from being
swapped out
 Swappoff command
 Swappiness
• =0 for kernel version before 2.6.32-303
• =1 for kernel version from 2.6.32-303
 Cgroup
8
Evaluations (Pre-allocating Heap)
9
Evaluations (Protecting Heap)
18 24
10
Scenario 2: Steady State (App. Symptoms)
 During steady state of a Java application, system
memory stresses due to other applications
 Java throughput drops sharply and performs badly
 Java GC pauses spike
11
Scenario 2: Steady State (Level-1 Investigations)
 During GC pauses, swapping activities persist
 Swapping in JVM pages causes GC pauses
 However, swapping is not enough
 Excessive GC pauses (i.e., 55 seconds)
 High sys-cpu usage (swapping is not sys-cpu intensive)
12
[Times: user=0.12 sys=54.67,
real=54.83 secs]
Scenario 2: Steady State (Level-2 Investigations)
 THP (Transparent Huge Pages)
 Improved TLB cache-hits
 Bi-directional operations
 THPs are allocated first, but split during memory pressure
 Regular pages are collapsed to make THPs
 CPU heavy, and thrashing!
4KB
Regular Pages
4KB
4KB
4KB
4KB
4KB
……
……
2MB
Transparent Huge
Pages (THP)
Splitting
Collapsing
13
Solutions
 Dynamically adjusting THP
 Enable THP when no memory pressure
 Disable THP during memory pressure period
 Fine tuning of THP parameters
14
Evaluations (Dynamic THP)
 Without memory pressure
 Dynamic THP delivers similar performance as THP is on
Mechanism THP Off THP On Dynamic THP
Throughput
(K allocations/sec)
12 15 15
Mechanism THP Off THP On Dynamic THP
Throughput
(K allocations/sec)
13 11 12
 With memory pressure
 Dynamic THP has some performance overhead
 Performance is less than THP-off
 But better than THP-on
15
Scenario 3: Steady State (Heavy IO)
 Production issue
 Online products
 Applications have light workload
 Both CMS and G1 garbage collectors
 Preliminary investigations
 Examined many layers/metrics
 The only suspect: disk IO occasionally is heavy
 But all application IO are asynchronous
16
Reproducing the problem
 Workload
 Simplified to avoid complex business logic
 https://github.com/zhenyun/JavaGCworkload
 Background IO
 Saturating HDD
17
Case I: Without background IO
18
No single longer-than-200ms pause
Case II: With background IO
Huge pause!
19
Investigations
20
Time lines
 At time 35.04 (line 2), a young GC starts and takes 0.12 seconds to complete.
 The young GC finishes at time 35.16 and JVM tries to output the young GC statistics
to gc log file by issuing a write() system call (line 4).
 The write() call finishes at time 36.64 after being blocked for 1.47 seconds (line 5)
 When write() call returns to JVM, JVM records at time 36.64 this STW pause of 1.59
seconds (i.e., 0.12 + 1.47) (line 3).
21
Interaction between JVM and OS
22
Non-blocking IO can be blocked
 Stable page write
 For file-backed writing, OS writes to page cache first
 OS has write-back mechanism to persist dirty pages
 If a page is under write-back, the page is locked
 Journal committing
 Journals are generated for journaling file system
 When appending GC log files needs new blocks,
journals need to be committed
 Commitment might need to wait
23
Background IO activities
 OS activity such as swapping
 Data writing to underlying disks
 Administration and housekeeping software
 System-level software such as CFEngine also perform
disk IO
 Other co-located applications
 Co-located applications that share the disk drives,
then other applications contend on IO
 IO of the same JVM instance
 The particular JVM instance may use disk IO in ways
other than GC logging
24
Solutions
 Enhancing JVM
 Another thread
 Exposing JVM flags
 Reducing IO activities
 OS, other apps, same app
 Latency sensitive applications
 Separate disk
 High performing disks such as SSD
 Tmpfs
25
Evaluation
 SSD as the disk
26
The good, the bad, and the ugly
 The good: low real time
 Low user time and low sys time
 [user=0.18 sys=0.01, real=0.04 secs]
 The bad: non-low (but not high) real time
 High user time and low sys time
 [user=8.00 sys=0.02, real=0.50 secs]
 The ugly: high real time
 High sys time [user=0.02 sys=1.20, real=1.20 secs]
 Low sys time, low user time [Example? ]
27
Lessons Learned (I)
 Be cautious about Linux’s (and other OS) new
features
 Constantly incorporating new features to optimize
performance
 Some features incur performance tradeoff
 They may backfire in certain scenarios
28
Lessons Learned (II)
29
 Root causes can come from seemingly
insignificant information
 Linux emits significant amount of performance
information
 Most of us most of the time mostly only examine a
small subset of them
 Don’t ignore others – understand the interactions
of sub-components
Lessons Learned (III)
30
 Pay attention to multi-layer interaction
 Application protocol, JVM, OS, storage/networking
 Most people are familiar with a few layers
 Optimizations done at one layer may adversely
affect other layers
 Many performance problems are caused by the
cross-layer interactions
Zhenyun@gmail.com

More Related Content

What's hot

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeDataWorks Summit
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluent
 
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)Hyunmin Lee
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OGeorge Cao
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Henning Jacobs
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Jelastic Multi-Cloud PaaS
 
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScyllaDB
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaSpark Summit
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Tom Diederich
 
Optimizing {Java} Application Performance on Kubernetes
Optimizing {Java} Application Performance on KubernetesOptimizing {Java} Application Performance on Kubernetes
Optimizing {Java} Application Performance on KubernetesDinakar Guniguntala
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 

What's hot (20)

Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
 
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
카프카(kafka) 성능 테스트 환경 구축 (JMeter, ELK)
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™
 
Optimizing {Java} Application Performance on Kubernetes
Optimizing {Java} Application Performance on KubernetesOptimizing {Java} Application Performance on Kubernetes
Optimizing {Java} Application Performance on Kubernetes
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 

Similar to OS caused Large JVM pauses: Deep dive and solutions

Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...Zhenyun Zhuang
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time JavaDeniz Oguz
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfRichHagarty
 
GlassFish Update and Directions - Karim Mazouni - November 2007
GlassFish Update and Directions - Karim Mazouni - November 2007GlassFish Update and Directions - Karim Mazouni - November 2007
GlassFish Update and Directions - Karim Mazouni - November 2007JUG Lausanne
 
JITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfJITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfRichHagarty
 
Java-light-speed NebraskaCode.pdf
Java-light-speed NebraskaCode.pdfJava-light-speed NebraskaCode.pdf
Java-light-speed NebraskaCode.pdfRichHagarty
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxGrace Jansen
 
It's always sunny with OpenJ9
It's always sunny with OpenJ9It's always sunny with OpenJ9
It's always sunny with OpenJ9DanHeidinga
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoringSimon Ritter
 
Inside The Java Virtual Machine
Inside The Java Virtual MachineInside The Java Virtual Machine
Inside The Java Virtual Machineelliando dias
 
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...RichHagarty
 
USAA Mono-to-Serverless.pdf
USAA Mono-to-Serverless.pdfUSAA Mono-to-Serverless.pdf
USAA Mono-to-Serverless.pdfRichHagarty
 
JPrime_JITServer.pptx
JPrime_JITServer.pptxJPrime_JITServer.pptx
JPrime_JITServer.pptxGrace Jansen
 
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Zhenyun Zhuang
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Software Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaSoftware Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaIsuru Perera
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfRichHagarty
 

Similar to OS caused Large JVM pauses: Deep dive and solutions (20)

Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
JITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdfJITServerTalk Nebraska 2023.pdf
JITServerTalk Nebraska 2023.pdf
 
GlassFish Update and Directions - Karim Mazouni - November 2007
GlassFish Update and Directions - Karim Mazouni - November 2007GlassFish Update and Directions - Karim Mazouni - November 2007
GlassFish Update and Directions - Karim Mazouni - November 2007
 
JITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdfJITServerTalk JCON World 2023.pdf
JITServerTalk JCON World 2023.pdf
 
Java-light-speed NebraskaCode.pdf
Java-light-speed NebraskaCode.pdfJava-light-speed NebraskaCode.pdf
Java-light-speed NebraskaCode.pdf
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptx
 
It's always sunny with OpenJ9
It's always sunny with OpenJ9It's always sunny with OpenJ9
It's always sunny with OpenJ9
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
 
Inside The Java Virtual Machine
Inside The Java Virtual MachineInside The Java Virtual Machine
Inside The Java Virtual Machine
 
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
DevNexus 2024: Just-In-Time Compilation as a Service for cloud-native Java mi...
 
USAA Mono-to-Serverless.pdf
USAA Mono-to-Serverless.pdfUSAA Mono-to-Serverless.pdf
USAA Mono-to-Serverless.pdf
 
JPrime_JITServer.pptx
JPrime_JITServer.pptxJPrime_JITServer.pptx
JPrime_JITServer.pptx
 
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Java one 2015 - v1
Java one   2015 - v1Java one   2015 - v1
Java one 2015 - v1
 
Software Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in JavaSoftware Profiling: Understanding Java Performance and how to profile in Java
Software Profiling: Understanding Java Performance and how to profile in Java
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
 
Inside the JVM
Inside the JVMInside the JVM
Inside the JVM
 

More from Zhenyun Zhuang

Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...Zhenyun Zhuang
 
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...Zhenyun Zhuang
 
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...Zhenyun Zhuang
 
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...Zhenyun Zhuang
 
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...Zhenyun Zhuang
 
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified WormsPAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified WormsZhenyun Zhuang
 
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data NetworksOn the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data NetworksZhenyun Zhuang
 
WebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hostsWebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hostsZhenyun Zhuang
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsZhenyun Zhuang
 
A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksZhenyun Zhuang
 
Mutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor NetworksMutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor NetworksZhenyun Zhuang
 
Hazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networksHazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networksZhenyun Zhuang
 
Dynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesDynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesZhenyun Zhuang
 
A Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemA Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemZhenyun Zhuang
 
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksHybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksZhenyun Zhuang
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsZhenyun Zhuang
 
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Zhenyun Zhuang
 
Enhancing Intrusion Detection System with Proximity Information
Enhancing Intrusion Detection System with Proximity InformationEnhancing Intrusion Detection System with Proximity Information
Enhancing Intrusion Detection System with Proximity InformationZhenyun Zhuang
 
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...Zhenyun Zhuang
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsZhenyun Zhuang
 

More from Zhenyun Zhuang (20)

Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...
 
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
 
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
 
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
 
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
 
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified WormsPAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
 
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data NetworksOn the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
 
WebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hostsWebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hosts
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
 
A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networks
 
Mutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor NetworksMutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor Networks
 
Hazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networksHazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networks
 
Dynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesDynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer Architectures
 
A Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemA Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching Problem
 
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksHybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
 
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
 
Enhancing Intrusion Detection System with Proximity Information
Enhancing Intrusion Detection System with Proximity InformationEnhancing Intrusion Detection System with Proximity Information
Enhancing Intrusion Detection System with Proximity Information
 
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
 

Recently uploaded

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 

Recently uploaded (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 

OS caused Large JVM pauses: Deep dive and solutions

  • 1. OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com
  • 2. Outline  Introduction  Background  Scenario 1: startup state  Scenario 2: steady state with memory pressure  Scenario 3: steady state with heavy IO  Lessons learned 2
  • 3. Introduction  Java + Linux  Java is popular in production deployments  Linux features interact with JVM operations  Unique challenges caused by concurrent applications  Long JVM pauses caused by Linux OS  Production issues, in three scenarios  Root causes  Solutions  References  Ensuring High-performance of Mission-critical Java Applications in Multi- tenant Cloud Platforms, IEEE Cloud 2014  Eliminating Large JVM GC Pauses Caused by Background IO Traffic, LinkedIn Engineering Blog, 2016 (Too many tweets bringing down a twitter server! :) 3
  • 4. Background  JVM and Heap  Oracle HotSpot JVM  Garbage collection  Generations  Garbage collectors  Linux OS  Paging (Regular page, Huge page)  Swapping (Anonymous memory)  Page cache writeback (Batched, Periodic) 4
  • 5. Scenarios  Three scenarios  Startup state  Steady state with memory pressure  Steady state with heavy IO  Workload  Java application keeps allocating/de-allocating objects  Background applications taking memories or issuing disk IO  Performance metrics  Application throughput (K allocations/sec)  Java GC pauses 5
  • 6. Scenario 1: Startup State (App. Symptoms)  When Java applications start  Life is good in the beginning  Then Java throughput drops sharply  Java GC pauses spike during the same period 6
  • 7. Scenario 1: Startup State (Investigations)  Java heap is gradually allocated  Without enough memory, direct page scanning can happen  Heap is swapped out and in  It causes large GC 7
  • 8. Solutions  Pre-allocating JVM heap spaces  JVM “-XX:AlwaysPreTouch”  Protecting JVM heap spaces from being swapped out  Swappoff command  Swappiness • =0 for kernel version before 2.6.32-303 • =1 for kernel version from 2.6.32-303  Cgroup 8
  • 11. Scenario 2: Steady State (App. Symptoms)  During steady state of a Java application, system memory stresses due to other applications  Java throughput drops sharply and performs badly  Java GC pauses spike 11
  • 12. Scenario 2: Steady State (Level-1 Investigations)  During GC pauses, swapping activities persist  Swapping in JVM pages causes GC pauses  However, swapping is not enough  Excessive GC pauses (i.e., 55 seconds)  High sys-cpu usage (swapping is not sys-cpu intensive) 12 [Times: user=0.12 sys=54.67, real=54.83 secs]
  • 13. Scenario 2: Steady State (Level-2 Investigations)  THP (Transparent Huge Pages)  Improved TLB cache-hits  Bi-directional operations  THPs are allocated first, but split during memory pressure  Regular pages are collapsed to make THPs  CPU heavy, and thrashing! 4KB Regular Pages 4KB 4KB 4KB 4KB 4KB …… …… 2MB Transparent Huge Pages (THP) Splitting Collapsing 13
  • 14. Solutions  Dynamically adjusting THP  Enable THP when no memory pressure  Disable THP during memory pressure period  Fine tuning of THP parameters 14
  • 15. Evaluations (Dynamic THP)  Without memory pressure  Dynamic THP delivers similar performance as THP is on Mechanism THP Off THP On Dynamic THP Throughput (K allocations/sec) 12 15 15 Mechanism THP Off THP On Dynamic THP Throughput (K allocations/sec) 13 11 12  With memory pressure  Dynamic THP has some performance overhead  Performance is less than THP-off  But better than THP-on 15
  • 16. Scenario 3: Steady State (Heavy IO)  Production issue  Online products  Applications have light workload  Both CMS and G1 garbage collectors  Preliminary investigations  Examined many layers/metrics  The only suspect: disk IO occasionally is heavy  But all application IO are asynchronous 16
  • 17. Reproducing the problem  Workload  Simplified to avoid complex business logic  https://github.com/zhenyun/JavaGCworkload  Background IO  Saturating HDD 17
  • 18. Case I: Without background IO 18 No single longer-than-200ms pause
  • 19. Case II: With background IO Huge pause! 19
  • 21. Time lines  At time 35.04 (line 2), a young GC starts and takes 0.12 seconds to complete.  The young GC finishes at time 35.16 and JVM tries to output the young GC statistics to gc log file by issuing a write() system call (line 4).  The write() call finishes at time 36.64 after being blocked for 1.47 seconds (line 5)  When write() call returns to JVM, JVM records at time 36.64 this STW pause of 1.59 seconds (i.e., 0.12 + 1.47) (line 3). 21
  • 23. Non-blocking IO can be blocked  Stable page write  For file-backed writing, OS writes to page cache first  OS has write-back mechanism to persist dirty pages  If a page is under write-back, the page is locked  Journal committing  Journals are generated for journaling file system  When appending GC log files needs new blocks, journals need to be committed  Commitment might need to wait 23
  • 24. Background IO activities  OS activity such as swapping  Data writing to underlying disks  Administration and housekeeping software  System-level software such as CFEngine also perform disk IO  Other co-located applications  Co-located applications that share the disk drives, then other applications contend on IO  IO of the same JVM instance  The particular JVM instance may use disk IO in ways other than GC logging 24
  • 25. Solutions  Enhancing JVM  Another thread  Exposing JVM flags  Reducing IO activities  OS, other apps, same app  Latency sensitive applications  Separate disk  High performing disks such as SSD  Tmpfs 25
  • 26. Evaluation  SSD as the disk 26
  • 27. The good, the bad, and the ugly  The good: low real time  Low user time and low sys time  [user=0.18 sys=0.01, real=0.04 secs]  The bad: non-low (but not high) real time  High user time and low sys time  [user=8.00 sys=0.02, real=0.50 secs]  The ugly: high real time  High sys time [user=0.02 sys=1.20, real=1.20 secs]  Low sys time, low user time [Example? ] 27
  • 28. Lessons Learned (I)  Be cautious about Linux’s (and other OS) new features  Constantly incorporating new features to optimize performance  Some features incur performance tradeoff  They may backfire in certain scenarios 28
  • 29. Lessons Learned (II) 29  Root causes can come from seemingly insignificant information  Linux emits significant amount of performance information  Most of us most of the time mostly only examine a small subset of them  Don’t ignore others – understand the interactions of sub-components
  • 30. Lessons Learned (III) 30  Pay attention to multi-layer interaction  Application protocol, JVM, OS, storage/networking  Most people are familiar with a few layers  Optimizations done at one layer may adversely affect other layers  Many performance problems are caused by the cross-layer interactions