SlideShare una empresa de Scribd logo
1 de 69
Kris Mok, Software Engineer, Taobao
@rednaxelafx
莫枢 /“撒迦”
JVM @ Taobao
Agenda

Customization

  Tuning
JVM @ Taobao
Open Source

  Training
INTRODUCTION
Java Strengths
•   Good abstraction
•   Good performance
•   Good tooling (IDE, profiler, etc.)
•   Easy to recruit good programmers
Java Weaknesses
• Tension between “abstraction leak” and
  performance
  – Abstraction and performance don’t always
    come together
• More control/info over GC and object
  overhead wanted sometimes
Our Team
• Domain-Specific Computing Team
  – performance- and efficency-oriented
  – specific solutions to specific problems
  – do the low-level plumbing for specific
    applications targeting specific hardware
  – we’re hiring!
    • software and hardware hackers
Our Team (cont.)
• Current Focus
  – JVM-level customization/tuning
    • long term project
    • based on HotSpot Express 20 from OpenJDK
    • serving:
       – 10,000+ JVM instances serving online
       – 1,000+ Java developers
  – Hadoop tuning
  – Dedicated accelerator card adoption
JVM CUSTOMIZATION
@ TAOBAO
Themes
• Performance
• Monitoring/Diagnostics
• Stability
Tradeoffs
• Would like to make as little impact on
  existing Java application code as possible
• But if the performance/efficiency gains are
  significant enough, we’re willing to make
  extensions to the VM/core libs
JVM Customizations
•   GC Invisible Heap (GCIH)
•   JNI Wrapper improvement
•   New instructions
•   PrintGCReason / CMS bug fix
•   ArrayAllocationWarningSize
•   Change VM argument defaults
•   etc.
Case 1: in-memory cache
• Certain data is computed offline and then
  fed to online systems in a read-only,
  “cache” fashion
in-memory cache
• Fastest way to access them is to
  – put them in-process, in-memory,
  – access as normal Java objects,
  – no serialization/JNI involved per access
in-memory cache
• Large, static, long-live data in the GC heap
  – may lead to long GC pauses at full GC,
  – or long overall concurrent GC cycle
• What if we take them out of the GC heap?
  – but without having to serialize them?
GC Inivisible Heap
• “GC Invisible Heap” (GCIH)
  – an extension to HotSpot VM
  – an in-process, in-memory heap space
  – not managed by the GC
  – stores normal Java objects
• Currently works with ParNew+CMS
GCIH interface
• “moveIn(Object root)”
  – given the root of an object graph, move the
    whole graph out of GC heap and into GCIH
• “moveOut()”
  – GCIH space reset to a clean state
  – abandon all data in current GCIH space
  – (earlier version) move the object graph back
    into GC heap
GCIH interface (cont.)
• Current restrictions
  – data in GCIH should be read-only
  – objects in GCIH may not be used as monitors
  – no outgoing references allowed
• Restrictions may be relaxed in the future
GCIH interface (cont.)
• To update data
  – moveOut – (update) - moveIn
-XX:PermSize
-XX:MaxPermSize                                Original
                               -Xms/-Xmx
              -Xmn

 Perm             Young                      Old Cache Data

                          GC Managed Heap




-XX:PermSize
-XX:MaxPermSize                                Using GCIH
                      -Xms/-Xmx
                                                       -XX:GCIHSize
              -Xmn

 Perm             Young                Old       Cache Data

                  GC Managed Heap                         GCIH
Actual performance
• Reduces stop-the-world full GC pause
  time
• Reduces concurrent-mark and concurrent-
  sweep time
  – but the two stop-the-world phases of CMS
    aren’t necessarily significantly faster
Total time of CMS GC phases
             2.0000
                                   concurrent-mark
             1.8000                                                       concurrent-
                                                                            sweep
             1.6000
             1.4000
             1.2000
time (sec)




             1.0000
             0.8000
             0.6000
             0.4000
             0.2000
                          initial-mark               preclean   remark                  reset
             0.0000
                               1             2          3          4            5         6
               Original     0.0072         1.7943     0.0373     0.0118       1.5717    0.0263
               w/GCIH       0.0043         0.5400     0.0159     0.0035       0.6266    0.0240
Alternatives
GCIH                            BigMemory
• × extension to the JVM        • √ runs on standard JVM
• √ in-process, in-memory       • √ in-process, in-memory
• √ not under GC control        • √ not under GC control
• √ direct access of Java       • × serialize/deserialize
  objects                         Java objects
• √ no JNI overhead on          • × JNI overhead on
  access                          access
• √ object graph is in better   • × N/A
  locality
GCIH future
• still in early stage of development now
• may try to make the API surface more like
  RTSJ
Experimental: object data sharing
• Sharing of GCIH between JVMs on the
  same box
• Real-world application:
  – A kind special Map/Reduce jobs uses a big
    piece of precomputed cache data
  – Multiple homogenous jobs run on the same
    machine, using the same cache data
  – can save memory to run more jobs on a
    machine, when CPU isn’t the bottleneck
Before sharing

JVM1 JVM2 JVM3                   …   JVMn

Sharable   Sharable   Sharable       Sharable
  Objs       Objs       Objs           Objs




 Other      Other      Other          Other
 Objs       Objs       Objs           Objs
After sharing

           JVM1 JVM2 JVM3                   …   JVMn

Sharable   Sharable   Sharable   Sharable       Sharable
  Objs       Objs       Objs       Objs           Objs




            Other      Other      Other          Other
            Objs       Objs       Objs           Objs
Case 2: JNI overhead
• JNI carries a lot overhead at invocation
  boundaries
• JNI invocations involves calling JNI native
  wrappers in the VM
JNI wrapper
• Wrappers are in hand-written assembler
• But not necessarily always well-tuned
• Look for opportunities to optimize for
  common cases
Wrapper example
...
0x00002aaaab19be92:   cmpl     $0x0,0x30(%r15) // check the suspend flag
0x00002aaaab19be9a:   je       0x2aaaab19bec6
0x00002aaaab19bea0:   mov      %rax,-0x8(%rbp)
0x00002aaaab19bea4:   mov      %r15,%rdi
0x00002aaaab19bea7:   mov      %rsp,%r12
0x00002aaaab19beaa:   sub      $0x0,%rsp
0x00002aaaab19beae:   and      $0xfffffffffffffff0,%rsp
0x00002aaaab19beb2:   mov      $0x2b7d73bcbda0,%r10
0x00002aaaab19bebc:   rex.WB   callq *%r10
0x00002aaaab19bebf:   mov      %r12,%rsp
0x00002aaaab19bec2:   mov      -0x8(%rbp),%rax
0x00002aaaab19bec6:   movl     $0x8,0x238(%r15) //change thread state to
thread in java
... //continue
Wrapper example (cont.)
• The common case
  – Threads are more unlikely to be suspended
    when running through this wrapper
• Optimize for the common case
  – move the logic that handles suspended state
    out-of-line
Modified wrapper example
...
0x00002aaaab19be3a:   cmpl     $0x0,0x30(%r15) // check the suspend flag
0x00002aaaab19be42:   jne      0x2aaaab19bf52
0x00002aaaab19be48:   movl     $0x8,0x238(%r15) //change thread state to
thread in java

... //continue

0x00002aaaab19bf52:   mov      %rax,-0x8(%rbp)
0x00002aaaab19bf56:   mov      %r15,%rdi
0x00002aaaab19bf59:   mov      %rsp,%r12
0x00002aaaab19bf5c:   sub      $0x0,%rsp
0x00002aaaab19bf60:   and      $0xfffffffffffffff0,%rsp
0x00002aaaab19bf64:   mov      $0x2ae3772aae70,%r10
0x00002aaaab19bf6e:   rex.WB   callq *%r10
0x00002aaaab19bf71:   mov      %r12,%rsp
0x00002aaaab19bf74:   mov      -0x8(%rbp),%rax
0x00002aaaab19bf78:   jmpq     0x2aaaab19be48
...
Performance
• 5%-10% improvement of raw JNI
  invocation performance on various
  microarchitectures
Case 3: new instructions
• SSE 4.2 brings new instructions
  – e.g. CRC32c
• We’re using Westmere now
• Should take advantage of SSE 4.2
CRC32 / CRC32C
• CRC32
 – well known, commonly used checksum
 – used in HDFS
 – JDK’s impl uses zlib, through JNI
• CRC32c
 – an variant of CRC32
 – hardware support by SSE 4.2
Intrinsify CRC32c
• Add new intrinsic methods to directly
  support CRC32c instruction in HotSpot VM
• Hardware accelerated
• To be used in modified HDFS
• Completely avoids JNI overhead
  – HADOOP-7446 still carries JNI overhead




                                             blog post
Other intrinsics
• May intrinsify other operation in the future
  – AES-NI
  – others on applications’ demand
Case 4: frequent CMS GC
• An app experienced back-to-back CMS
  GC cycles after running for a few days
• The Java heaps were far from full
• What’s going on?
The GC Log
2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew:
1747712K->40832K(1922432K), 0.0887510 secs] 1747712K-
>40832K(4019584K), 0.0888740 secs] [Times: user=0.19
sys=0.00, real=0.09 secs]
2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew:
1788544K->109881K(1922432K), 0.0910540 secs] 1788544K-
>109881K(4019584K), 0.0911960 secs] [Times: user=0.24
sys=0.07, real=0.09 secs]
2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial-
mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs]
[Times: user=0.47 sys=0.00, real=0.46 secs]
2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark-
start]
GC log visualized




     The tool used here is GCHisto from Tony Printezis
Need more info
• -XX:+PrintGCReason to the rescue
  – added this new flag to the VM
  – print the direct cause of a GC cycle
The GC Log
2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew:
1747712K->40832K(1922432K), 0.0887510 secs] 1747712K-
>40832K(4019584K), 0.0888740 secs] [Times: user=0.19
sys=0.00, real=0.09 secs]
2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew:
1788544K->109881K(1922432K), 0.0910540 secs] 1788544K-
>109881K(4019584K), 0.0911960 secs] [Times: user=0.24
sys=0.07, real=0.09 secs]
 CMS Perm: collect because of occupancy 0.920845 / 0.920000
CMS perm gen initiated
2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial-
mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs]
[Times: user=0.47 sys=0.00, real=0.46 secs]
2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark-
start]
• Relevant VM arguments
  – -XX:PermSize=96m -XX:MaxPermSize=256m
• The problem was caused by bad
  interaction between CMS GC triggering
  and PermGen expansion
  – Thanks, Ramki!
• The (partial) fix
// Support for concurrent collection policy decisions.
bool CompactibleFreeListSpace::should_concurrent_collect() const {
  // In the future we might want to add in frgamentation stats --
  // including erosion of the "mountain" into this decision as well.
  return !adaptive_freelists() && linearAllocationWouldFail();
  return false;
}
After the change
Case 5: huge objects
• An app bug allocated a huge
  object, causing unexpected OOM
• Where did it come from?
huge objects and arrays
• Most Java objects are small
• Huge objects usually happen to be arrays
• A lot of collection objects use arrays as
  backing storage
  – ArrayLists, HashMaps, etc.
• Tracking huge array allocation can help
  locate huge allocation problems
product(intx, ArrayAllocationWarningSize, 512*M,   
        "array allocation with size larger than"   
        "this (bytes) will be given a warning"     
        "into the GC log")
Demo
import java.util.ArrayList;

public class Demo {
  private static void foo() {
    new ArrayList<Object>(128 * 1024 * 1024);
  }

    public static void main(String[] args) {
      foo();
    }
}
Demo

$ java Demo
==WARNNING== allocating large array:
thread_id[0x0000000059374800], thread_name[main], array_size[
536870928 bytes], array_length[134217728 elememts]
        at java.util.ArrayList.<init>(ArrayList.java:112)
        at Demo.foo(Demo.java:5)
        at Demo.main(Demo.java:9)
Case 6: bad optimizations?
• Some loop optimization bugs were found
  before launch of Oracle JDK 7
• Actually, they exist in recent JDK 6, too
  – some of the fixes weren’t in until JDK6u29
  – can’t wait until an official update with the fixes
  – roll our own workaround
Workarounds
• Explicitly set -XX:-UseLoopPredicate
  when using recent JDK 6
• Or …
Workarounds (cont.)
• Change the defaults of the opt flags to turn
  them off

product(bool, UseLoopPredicate, true false,                 
  "Generate a predicate to select fast/slow loop versions")
A Case Study

JVM TUNING
@ TAOBAO
JVM Tuning
• Most JVM tuning efforts are spent on
  memory related issues
  – we do too
  – lots of reading material available
• Let’s look at something else
  – use JVM internal knowledge to guide tuning
Case: Velocity template
             compilation
• An internal project seeks to compile
  Velocity templates into Java bytecodes
Compilation process
• Parse *.vm source into AST
  – reuse original parser and AST from Velocity
• Traverse the AST and generate Java
  source code as target
  – works like macro expansion
• Use Java Compiler API to generate
  bytecodes
Example
Velocity template source

 Check $dev.Name out!



                       generated Java source

                  _writer.write("Check ");
                  _writer.write(
                    _context.get(_context.get("dev"),
                    "Name", Integer.valueOf(26795951)));
                  _writer.write(" out!");
Performance: interpreted vs. compiled
                                4500
execution time (ms/10K times)




                                4000


                                3500


                                3000


                                2500


                                2000                                                                          compiled
                                                                                                              interpreted
                                1500


                                1000


                                500


                                   0
                                       1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16

                                                           template complexity
Problem
• In the compiled version
  – 1 “complexity” ≈ 800 bytes of bytecode
  – So 11 “complexities” > 8000 bytes of bytecode
Compiled templates larger
than “11” are not JIT’d!
 develop(intx, HugeMethodLimit,  8000,              
    "don't compile methods larger than"             
    "this if +DontCompileHugeMethods")              
  product(bool, DontCompileHugeMethods, true,       
    "don't compile methods > HugeMethodLimit")      


                                   Case Study Summary
-XX:-DontCompileHugeMethods
                                4500
execution time (ms/10K times)




                                4000


                                3500


                                3000


                                2500


                                2000                                                                          compiled
                                                                                                              interpreted
                                1500


                                1000


                                500


                                   0
                                       1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16

                                                           template complexity
JVM OPEN SOURCE
@ TAOBAO
Open Source
• Participate in OpenJDK
  – Already submitted 4 patches into the HotSpot
    VM and its Serviceability Agent
  – Active on OpenJDK mailing-lists
• Sign the OCA
  – Work in progress, almost there
  – Submit more patches after OCA is accepted
• Future open sourcing of custom
  modifications
Open Source (cont.)
• The submitted patches
  – 7050685: jsdbproc64.sh has a typo in the
    package name
  – 7058036: FieldsAllocationStyle=2 does not work
    in 32-bit VM
  – 7060619: C1 should respect inline and dontinline
    directives from CompilerOracle
  – 7072527: CMS: JMM GC counters overcount in
    some cases
• Due to restrictions in contribution
  process, more significant patches cannot be
  submitted until our OCA is accepted
JVM TRAINING
@ TAOBAO
JVM Training
• Regular internal courses on
  – JVM internals
  – JVM tuning
  – JVM troubleshooting
• Discussion group for people interested in
  JVM internals
QUESTIONS?
Kris Mok, Software Engineer, Taobao
@rednaxelafx
莫枢 /“撒迦”

Más contenido relacionado

La actualidad más candente

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tipsViller Hsiao
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking MechanismsKernel TLV
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardHeechul Yun
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engineDuoyi Wu
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPUTomer Gabel
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
An other world awaits you
An other world awaits youAn other world awaits you
An other world awaits you信之 岩永
 
系統程式 -- 第 3 章 組合語言
系統程式 -- 第 3 章 組合語言系統程式 -- 第 3 章 組合語言
系統程式 -- 第 3 章 組合語言鍾誠 陳鍾誠
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 

La actualidad más candente (20)

Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
GDB Rocks!
GDB Rocks!GDB Rocks!
GDB Rocks!
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
How A Compiler Works: GNU Toolchain
How A Compiler Works: GNU ToolchainHow A Compiler Works: GNU Toolchain
How A Compiler Works: GNU Toolchain
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Cours JavaScript
Cours JavaScriptCours JavaScript
Cours JavaScript
 
Virtual machine and javascript engine
Virtual machine and javascript engineVirtual machine and javascript engine
Virtual machine and javascript engine
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
An other world awaits you
An other world awaits youAn other world awaits you
An other world awaits you
 
系統程式 -- 第 3 章 組合語言
系統程式 -- 第 3 章 組合語言系統程式 -- 第 3 章 組合語言
系統程式 -- 第 3 章 組合語言
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 

Similar a Kris Mok's Presentation on JVM Customization at Taobao

Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelleIvelin Ivanov
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020Jelastic Multi-Cloud PaaS
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesBruno Borges
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - DeploymentFabio Akita
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展Leon Chen
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftVMware Tanzu
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Anna Shymchenko
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxGrace Jansen
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvmPrem Kuppumani
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Dinakar Guniguntala
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015Charles Nutter
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Community
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...ScyllaDB
 

Similar a Kris Mok's Presentation on JVM Customization at Taobao (20)

Basics of JVM Tuning
Basics of JVM TuningBasics of JVM Tuning
Basics of JVM Tuning
 
Tunning mobicent-jean deruelle
Tunning mobicent-jean deruelleTunning mobicent-jean deruelle
Tunning mobicent-jean deruelle
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
State of Java Elasticity. Tuning Java Efficiency - GIDS.JAVA LIVE 2020
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
Fisl - Deployment
Fisl - DeploymentFisl - Deployment
Fisl - Deployment
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展
 
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with MinecraftReplatforming Legacy Packaged Applications: Block-by-Block with Minecraft
Replatforming Legacy Packaged Applications: Block-by-Block with Minecraft
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Javaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptxJavaland_JITServerTalk.pptx
Javaland_JITServerTalk.pptx
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
Ceph Day Beijing - Our Journey to High Performance Large Scale Ceph Cluster a...
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Kris Mok's Presentation on JVM Customization at Taobao

  • 1. Kris Mok, Software Engineer, Taobao @rednaxelafx 莫枢 /“撒迦”
  • 3. Agenda Customization Tuning JVM @ Taobao Open Source Training
  • 5. Java Strengths • Good abstraction • Good performance • Good tooling (IDE, profiler, etc.) • Easy to recruit good programmers
  • 6. Java Weaknesses • Tension between “abstraction leak” and performance – Abstraction and performance don’t always come together • More control/info over GC and object overhead wanted sometimes
  • 7. Our Team • Domain-Specific Computing Team – performance- and efficency-oriented – specific solutions to specific problems – do the low-level plumbing for specific applications targeting specific hardware – we’re hiring! • software and hardware hackers
  • 8. Our Team (cont.) • Current Focus – JVM-level customization/tuning • long term project • based on HotSpot Express 20 from OpenJDK • serving: – 10,000+ JVM instances serving online – 1,000+ Java developers – Hadoop tuning – Dedicated accelerator card adoption
  • 11. Tradeoffs • Would like to make as little impact on existing Java application code as possible • But if the performance/efficiency gains are significant enough, we’re willing to make extensions to the VM/core libs
  • 12. JVM Customizations • GC Invisible Heap (GCIH) • JNI Wrapper improvement • New instructions • PrintGCReason / CMS bug fix • ArrayAllocationWarningSize • Change VM argument defaults • etc.
  • 13. Case 1: in-memory cache • Certain data is computed offline and then fed to online systems in a read-only, “cache” fashion
  • 14. in-memory cache • Fastest way to access them is to – put them in-process, in-memory, – access as normal Java objects, – no serialization/JNI involved per access
  • 15. in-memory cache • Large, static, long-live data in the GC heap – may lead to long GC pauses at full GC, – or long overall concurrent GC cycle • What if we take them out of the GC heap? – but without having to serialize them?
  • 16. GC Inivisible Heap • “GC Invisible Heap” (GCIH) – an extension to HotSpot VM – an in-process, in-memory heap space – not managed by the GC – stores normal Java objects • Currently works with ParNew+CMS
  • 17. GCIH interface • “moveIn(Object root)” – given the root of an object graph, move the whole graph out of GC heap and into GCIH • “moveOut()” – GCIH space reset to a clean state – abandon all data in current GCIH space – (earlier version) move the object graph back into GC heap
  • 18. GCIH interface (cont.) • Current restrictions – data in GCIH should be read-only – objects in GCIH may not be used as monitors – no outgoing references allowed • Restrictions may be relaxed in the future
  • 19. GCIH interface (cont.) • To update data – moveOut – (update) - moveIn
  • 20. -XX:PermSize -XX:MaxPermSize Original -Xms/-Xmx -Xmn Perm Young Old Cache Data GC Managed Heap -XX:PermSize -XX:MaxPermSize Using GCIH -Xms/-Xmx -XX:GCIHSize -Xmn Perm Young Old Cache Data GC Managed Heap GCIH
  • 21. Actual performance • Reduces stop-the-world full GC pause time • Reduces concurrent-mark and concurrent- sweep time – but the two stop-the-world phases of CMS aren’t necessarily significantly faster
  • 22. Total time of CMS GC phases 2.0000 concurrent-mark 1.8000 concurrent- sweep 1.6000 1.4000 1.2000 time (sec) 1.0000 0.8000 0.6000 0.4000 0.2000 initial-mark preclean remark reset 0.0000 1 2 3 4 5 6 Original 0.0072 1.7943 0.0373 0.0118 1.5717 0.0263 w/GCIH 0.0043 0.5400 0.0159 0.0035 0.6266 0.0240
  • 23. Alternatives GCIH BigMemory • × extension to the JVM • √ runs on standard JVM • √ in-process, in-memory • √ in-process, in-memory • √ not under GC control • √ not under GC control • √ direct access of Java • × serialize/deserialize objects Java objects • √ no JNI overhead on • × JNI overhead on access access • √ object graph is in better • × N/A locality
  • 24. GCIH future • still in early stage of development now • may try to make the API surface more like RTSJ
  • 25. Experimental: object data sharing • Sharing of GCIH between JVMs on the same box • Real-world application: – A kind special Map/Reduce jobs uses a big piece of precomputed cache data – Multiple homogenous jobs run on the same machine, using the same cache data – can save memory to run more jobs on a machine, when CPU isn’t the bottleneck
  • 26. Before sharing JVM1 JVM2 JVM3 … JVMn Sharable Sharable Sharable Sharable Objs Objs Objs Objs Other Other Other Other Objs Objs Objs Objs
  • 27. After sharing JVM1 JVM2 JVM3 … JVMn Sharable Sharable Sharable Sharable Sharable Objs Objs Objs Objs Objs Other Other Other Other Objs Objs Objs Objs
  • 28. Case 2: JNI overhead • JNI carries a lot overhead at invocation boundaries • JNI invocations involves calling JNI native wrappers in the VM
  • 29. JNI wrapper • Wrappers are in hand-written assembler • But not necessarily always well-tuned • Look for opportunities to optimize for common cases
  • 30. Wrapper example ... 0x00002aaaab19be92: cmpl $0x0,0x30(%r15) // check the suspend flag 0x00002aaaab19be9a: je 0x2aaaab19bec6 0x00002aaaab19bea0: mov %rax,-0x8(%rbp) 0x00002aaaab19bea4: mov %r15,%rdi 0x00002aaaab19bea7: mov %rsp,%r12 0x00002aaaab19beaa: sub $0x0,%rsp 0x00002aaaab19beae: and $0xfffffffffffffff0,%rsp 0x00002aaaab19beb2: mov $0x2b7d73bcbda0,%r10 0x00002aaaab19bebc: rex.WB callq *%r10 0x00002aaaab19bebf: mov %r12,%rsp 0x00002aaaab19bec2: mov -0x8(%rbp),%rax 0x00002aaaab19bec6: movl $0x8,0x238(%r15) //change thread state to thread in java ... //continue
  • 31. Wrapper example (cont.) • The common case – Threads are more unlikely to be suspended when running through this wrapper • Optimize for the common case – move the logic that handles suspended state out-of-line
  • 32. Modified wrapper example ... 0x00002aaaab19be3a: cmpl $0x0,0x30(%r15) // check the suspend flag 0x00002aaaab19be42: jne 0x2aaaab19bf52 0x00002aaaab19be48: movl $0x8,0x238(%r15) //change thread state to thread in java ... //continue 0x00002aaaab19bf52: mov %rax,-0x8(%rbp) 0x00002aaaab19bf56: mov %r15,%rdi 0x00002aaaab19bf59: mov %rsp,%r12 0x00002aaaab19bf5c: sub $0x0,%rsp 0x00002aaaab19bf60: and $0xfffffffffffffff0,%rsp 0x00002aaaab19bf64: mov $0x2ae3772aae70,%r10 0x00002aaaab19bf6e: rex.WB callq *%r10 0x00002aaaab19bf71: mov %r12,%rsp 0x00002aaaab19bf74: mov -0x8(%rbp),%rax 0x00002aaaab19bf78: jmpq 0x2aaaab19be48 ...
  • 33. Performance • 5%-10% improvement of raw JNI invocation performance on various microarchitectures
  • 34. Case 3: new instructions • SSE 4.2 brings new instructions – e.g. CRC32c • We’re using Westmere now • Should take advantage of SSE 4.2
  • 35. CRC32 / CRC32C • CRC32 – well known, commonly used checksum – used in HDFS – JDK’s impl uses zlib, through JNI • CRC32c – an variant of CRC32 – hardware support by SSE 4.2
  • 36. Intrinsify CRC32c • Add new intrinsic methods to directly support CRC32c instruction in HotSpot VM • Hardware accelerated • To be used in modified HDFS • Completely avoids JNI overhead – HADOOP-7446 still carries JNI overhead blog post
  • 37. Other intrinsics • May intrinsify other operation in the future – AES-NI – others on applications’ demand
  • 38. Case 4: frequent CMS GC • An app experienced back-to-back CMS GC cycles after running for a few days • The Java heaps were far from full • What’s going on?
  • 39. The GC Log 2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew: 1747712K->40832K(1922432K), 0.0887510 secs] 1747712K- >40832K(4019584K), 0.0888740 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew: 1788544K->109881K(1922432K), 0.0910540 secs] 1788544K- >109881K(4019584K), 0.0911960 secs] [Times: user=0.24 sys=0.07, real=0.09 secs] 2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial- mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs] [Times: user=0.47 sys=0.00, real=0.46 secs] 2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark- start]
  • 40. GC log visualized The tool used here is GCHisto from Tony Printezis
  • 41. Need more info • -XX:+PrintGCReason to the rescue – added this new flag to the VM – print the direct cause of a GC cycle
  • 42. The GC Log 2011-06-30T19:40:03.487+0800: 26.958: [GC 26.958: [ParNew: 1747712K->40832K(1922432K), 0.0887510 secs] 1747712K- >40832K(4019584K), 0.0888740 secs] [Times: user=0.19 sys=0.00, real=0.09 secs] 2011-06-30T19:41:20.301+0800: 103.771: [GC 103.771: [ParNew: 1788544K->109881K(1922432K), 0.0910540 secs] 1788544K- >109881K(4019584K), 0.0911960 secs] [Times: user=0.24 sys=0.07, real=0.09 secs] CMS Perm: collect because of occupancy 0.920845 / 0.920000 CMS perm gen initiated 2011-06-30T19:42:04.940+0800: 148.410: [GC [1 CMS-initial- mark: 0K(2097152K)] 998393K(4019584K), 0.4745760 secs] [Times: user=0.47 sys=0.00, real=0.46 secs] 2011-06-30T19:42:05.416+0800: 148.886: [CMS-concurrent-mark- start]
  • 43. • Relevant VM arguments – -XX:PermSize=96m -XX:MaxPermSize=256m
  • 44. • The problem was caused by bad interaction between CMS GC triggering and PermGen expansion – Thanks, Ramki!
  • 45. • The (partial) fix // Support for concurrent collection policy decisions. bool CompactibleFreeListSpace::should_concurrent_collect() const { // In the future we might want to add in frgamentation stats -- // including erosion of the "mountain" into this decision as well. return !adaptive_freelists() && linearAllocationWouldFail(); return false; }
  • 47. Case 5: huge objects • An app bug allocated a huge object, causing unexpected OOM • Where did it come from?
  • 48. huge objects and arrays • Most Java objects are small • Huge objects usually happen to be arrays • A lot of collection objects use arrays as backing storage – ArrayLists, HashMaps, etc. • Tracking huge array allocation can help locate huge allocation problems
  • 49. product(intx, ArrayAllocationWarningSize, 512*M, "array allocation with size larger than" "this (bytes) will be given a warning" "into the GC log")
  • 50. Demo import java.util.ArrayList; public class Demo { private static void foo() { new ArrayList<Object>(128 * 1024 * 1024); } public static void main(String[] args) { foo(); } }
  • 51. Demo $ java Demo ==WARNNING== allocating large array: thread_id[0x0000000059374800], thread_name[main], array_size[ 536870928 bytes], array_length[134217728 elememts] at java.util.ArrayList.<init>(ArrayList.java:112) at Demo.foo(Demo.java:5) at Demo.main(Demo.java:9)
  • 52. Case 6: bad optimizations? • Some loop optimization bugs were found before launch of Oracle JDK 7 • Actually, they exist in recent JDK 6, too – some of the fixes weren’t in until JDK6u29 – can’t wait until an official update with the fixes – roll our own workaround
  • 53. Workarounds • Explicitly set -XX:-UseLoopPredicate when using recent JDK 6 • Or …
  • 54. Workarounds (cont.) • Change the defaults of the opt flags to turn them off product(bool, UseLoopPredicate, true false, "Generate a predicate to select fast/slow loop versions")
  • 55. A Case Study JVM TUNING @ TAOBAO
  • 56. JVM Tuning • Most JVM tuning efforts are spent on memory related issues – we do too – lots of reading material available • Let’s look at something else – use JVM internal knowledge to guide tuning
  • 57. Case: Velocity template compilation • An internal project seeks to compile Velocity templates into Java bytecodes
  • 58. Compilation process • Parse *.vm source into AST – reuse original parser and AST from Velocity • Traverse the AST and generate Java source code as target – works like macro expansion • Use Java Compiler API to generate bytecodes
  • 59. Example Velocity template source Check $dev.Name out! generated Java source _writer.write("Check "); _writer.write( _context.get(_context.get("dev"), "Name", Integer.valueOf(26795951))); _writer.write(" out!");
  • 60. Performance: interpreted vs. compiled 4500 execution time (ms/10K times) 4000 3500 3000 2500 2000 compiled interpreted 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template complexity
  • 61. Problem • In the compiled version – 1 “complexity” ≈ 800 bytes of bytecode – So 11 “complexities” > 8000 bytes of bytecode Compiled templates larger than “11” are not JIT’d! develop(intx, HugeMethodLimit, 8000, "don't compile methods larger than" "this if +DontCompileHugeMethods") product(bool, DontCompileHugeMethods, true, "don't compile methods > HugeMethodLimit") Case Study Summary
  • 62. -XX:-DontCompileHugeMethods 4500 execution time (ms/10K times) 4000 3500 3000 2500 2000 compiled interpreted 1500 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template complexity
  • 64. Open Source • Participate in OpenJDK – Already submitted 4 patches into the HotSpot VM and its Serviceability Agent – Active on OpenJDK mailing-lists • Sign the OCA – Work in progress, almost there – Submit more patches after OCA is accepted • Future open sourcing of custom modifications
  • 65. Open Source (cont.) • The submitted patches – 7050685: jsdbproc64.sh has a typo in the package name – 7058036: FieldsAllocationStyle=2 does not work in 32-bit VM – 7060619: C1 should respect inline and dontinline directives from CompilerOracle – 7072527: CMS: JMM GC counters overcount in some cases • Due to restrictions in contribution process, more significant patches cannot be submitted until our OCA is accepted
  • 67. JVM Training • Regular internal courses on – JVM internals – JVM tuning – JVM troubleshooting • Discussion group for people interested in JVM internals
  • 69. Kris Mok, Software Engineer, Taobao @rednaxelafx 莫枢 /“撒迦”