SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Nikita Salnikov-Tarnovski
            TECHNICAL OBSTACLES
            WHEN BUILDING PLUMBR




Monday, April 1, 13
AGENDA


              Who we were and who we are

              Object lifecycle with little overhead

              Graph analysis in low memory

              The problem of quitting




Monday, April 1, 13
OUR BACKGROUND

              2 developers

                      Nikita Salnikov-Tarnovski, @iNikem

                      Vladimir Šor, @vovencij

              10+ years in custom software house Nortal

              Mostly Java EE development

              Web sites, backend systems, batch processes



Monday, April 1, 13
NEW PROBLEM

              Memory leaks

              130,000 monthly searches for OutOfMemoryError in
              Google

              20,000 monthly unique visitors on our site

              http://plumbr.eu

              400 monthly downloads

              1700+ leaks discovered



Monday, April 1, 13
PLUMBR

              Automated performance consultant

              Giving you the exact location of the leak with enough information
              to fix it

              The foundation is based on machine learning

              trained on 500,000 memory snapshots

              From 3,000 different applications

              Finding 88% of the existing leaks.

              Quality only going up with the additional data gathered each day.



Monday, April 1, 13
PLUMBR AGENT...


              JVM TI agents

              both java and native, OS specific

              welcome malloc and free!

              JNI code for communication between them




Monday, April 1, 13
... WATCHES YOU



              We monitor object creation and disposal

              On-the-fly bytecode instrumentation

              Hooks into GC events




Monday, April 1, 13
OBJECT MONITORING I



              Java agent registers
              java.lang.instrument.ClassFileTransformer

              Modifies bytecode as classes are loaded

              Using ASM library

              To capture all newly created objects




Monday, April 1, 13
PROBLEMS


              Different compilers produce slightly different bytecode

              Some classes are too fragile or broken already

              new and chain of <init>

              Clone, deserialization, reflection




Monday, April 1, 13
OBJECT MONITORING II


              We keep some data about each live object

              That data creation and association takes time

              On every object creation!




Monday, April 1, 13
OBJECT MONITORING II


              If you cannot do in-process, do it off-process




Monday, April 1, 13
PROBLEMS

              BlockingQueue are slow

              Locks are slow

              Atomic* are slow!

              No existing library

              Even Disruptor doesn’t suite

              We’ve written no-guarantee-lock-free-many-producers-one-
              consumer buffer

              Concurrent programming IS hard



Monday, April 1, 13
MORE PROBLEMS

              Have to store all that objects related data somewhere

              Java Collections are too fat

              No lock-free thread-safe reading

              We use Trove to save memory

              Hand-written clone with dirty check

              Testing persistent immutable data structures



Monday, April 1, 13
LEAK HUNTING



              When leaks are detected we need to find out, who is
              holding them

              Paths to GC roots

              While application is still running




Monday, April 1, 13
PROBLEMS

              Java objects have no incoming refs

              You can walk the heap in C code

              But that stops the world

              Standard heap dump loses information

              So we make custom heap dump

              And traverse reference graph on it



Monday, April 1, 13
STILL PROBLEMS

              We’ve tried many graph traversal libraries

              And NoSQL solutions

              All somewhat works

              If you give them gigs of memory

              But we have to do this on-site, while application is still
              running

              We needed memory sensitive solution



Monday, April 1, 13
ONE MORE BICYCLE



              We’ve written our own specialized version of Dijkstra
              path searching

              Again had to replace many Java Collections with more
              memory efficient implementations




Monday, April 1, 13
TIME TO DIE



              Plumbr runs inside JVM alongside with an application

              It isn’t the main actor, just a supporter

              So Plumbr must be ready to quit whenever main
              application wishes




Monday, April 1, 13
WHEN JVM QUITS


              It turns out JVM is quite survivable

              No shutdown notification or smth

              It just quits when there are no more non-daemon threads

              And some threads live for far too long




Monday, April 1, 13
PROBLEMS


              Plumbr’s own threads

              Threads from libraries that Plumbr uses



              ExecutorService with daemon thread factory




Monday, April 1, 13
PROBLEMS

              RMI Reaper Thread

              Keeps JVM alive as long as some JMX resources are in use

              We must clean behind ourselves, MBeans, JMX
              connections, JMX servers

              But when???

              Implemented our own monitor thread with some
              heuristics



Monday, April 1, 13
PROBLEMS


              Earlier versions used some Swing components, e.g.
              Systray icon

              And JVM will not quit while there is some displayable
              Swing components

              Should kill it when before quitting

              Again, when???




Monday, April 1, 13
CONCLUSION



                      Don’t spend all your time writing web components or
                      web-services or Swing

                      There is more to Java than that

                      There are many Java libraries but not enough




Monday, April 1, 13

Más contenido relacionado

Similar a Plumbr case study

Introduction to node.js by Ran Mizrahi @ Reversim Summit
Introduction to node.js by Ran Mizrahi @ Reversim SummitIntroduction to node.js by Ran Mizrahi @ Reversim Summit
Introduction to node.js by Ran Mizrahi @ Reversim SummitRan Mizrahi
 
Introducing Cloud9 at DynCon 2011
Introducing Cloud9 at DynCon 2011Introducing Cloud9 at DynCon 2011
Introducing Cloud9 at DynCon 2011Sergi Mansilla
 
Using Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBUsing Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBHiro Asari
 
The State of Puppet
The State of PuppetThe State of Puppet
The State of PuppetPuppet
 
Testing Adhearsion Applications
Testing Adhearsion ApplicationsTesting Adhearsion Applications
Testing Adhearsion ApplicationsMojo Lingo
 
1 Introduction to JAVA.pptx
1 Introduction to JAVA.pptx1 Introduction to JAVA.pptx
1 Introduction to JAVA.pptxKabiles07
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesCharles Nutter
 
Basics of JAVA programming
Basics of JAVA programmingBasics of JAVA programming
Basics of JAVA programmingElizabeth Thomas
 
What is Node and Why does it Matter?
What is Node and Why does it Matter?What is Node and Why does it Matter?
What is Node and Why does it Matter?Dominiek ter Heide
 
Lecture - 1 introduction to java
Lecture - 1 introduction to javaLecture - 1 introduction to java
Lecture - 1 introduction to javamanish kumar
 
From a monolithic Ruby on Rails app to the JVM
From a monolithic  Ruby on Rails app  to the JVMFrom a monolithic  Ruby on Rails app  to the JVM
From a monolithic Ruby on Rails app to the JVMPhil Calçado
 
Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Michael Neale
 
Error Handling Done Differently
Error Handling Done DifferentlyError Handling Done Differently
Error Handling Done DifferentlyCloudBees
 
OSGi Community Event 2010 - OSGi and Android
OSGi Community Event 2010 - OSGi and AndroidOSGi Community Event 2010 - OSGi and Android
OSGi Community Event 2010 - OSGi and Androidmfrancis
 
Apache TomEE, Java EE 6 Web Profile on Tomcat - David Blevins
Apache TomEE, Java EE 6 Web Profile on Tomcat - David BlevinsApache TomEE, Java EE 6 Web Profile on Tomcat - David Blevins
Apache TomEE, Java EE 6 Web Profile on Tomcat - David Blevinsjaxconf
 
Java notes | All Basics |
Java notes | All Basics |Java notes | All Basics |
Java notes | All Basics |ShubhamAthawane
 

Similar a Plumbr case study (20)

Introduction to node.js by Ran Mizrahi @ Reversim Summit
Introduction to node.js by Ran Mizrahi @ Reversim SummitIntroduction to node.js by Ran Mizrahi @ Reversim Summit
Introduction to node.js by Ran Mizrahi @ Reversim Summit
 
Introducing Cloud9 at DynCon 2011
Introducing Cloud9 at DynCon 2011Introducing Cloud9 at DynCon 2011
Introducing Cloud9 at DynCon 2011
 
Using Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBUsing Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRB
 
The State of Puppet
The State of PuppetThe State of Puppet
The State of Puppet
 
Chapter 1 (1).pptx
Chapter 1 (1).pptxChapter 1 (1).pptx
Chapter 1 (1).pptx
 
Testing Adhearsion Applications
Testing Adhearsion ApplicationsTesting Adhearsion Applications
Testing Adhearsion Applications
 
1 Introduction to JAVA.pptx
1 Introduction to JAVA.pptx1 Introduction to JAVA.pptx
1 Introduction to JAVA.pptx
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the Trenches
 
Java
JavaJava
Java
 
Basics of JAVA programming
Basics of JAVA programmingBasics of JAVA programming
Basics of JAVA programming
 
What is Node and Why does it Matter?
What is Node and Why does it Matter?What is Node and Why does it Matter?
What is Node and Why does it Matter?
 
Lecture - 1 introduction to java
Lecture - 1 introduction to javaLecture - 1 introduction to java
Lecture - 1 introduction to java
 
From a monolithic Ruby on Rails app to the JVM
From a monolithic  Ruby on Rails app  to the JVMFrom a monolithic  Ruby on Rails app  to the JVM
From a monolithic Ruby on Rails app to the JVM
 
Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011
 
Error Handling Done Differently
Error Handling Done DifferentlyError Handling Done Differently
Error Handling Done Differently
 
OSGi Community Event 2010 - OSGi and Android
OSGi Community Event 2010 - OSGi and AndroidOSGi Community Event 2010 - OSGi and Android
OSGi Community Event 2010 - OSGi and Android
 
1 .java basic
1 .java basic1 .java basic
1 .java basic
 
Apache TomEE, Java EE 6 Web Profile on Tomcat - David Blevins
Apache TomEE, Java EE 6 Web Profile on Tomcat - David BlevinsApache TomEE, Java EE 6 Web Profile on Tomcat - David Blevins
Apache TomEE, Java EE 6 Web Profile on Tomcat - David Blevins
 
Cracking OCPJP 7 exam
Cracking OCPJP 7 examCracking OCPJP 7 exam
Cracking OCPJP 7 exam
 
Java notes | All Basics |
Java notes | All Basics |Java notes | All Basics |
Java notes | All Basics |
 

Plumbr case study

  • 1. Nikita Salnikov-Tarnovski TECHNICAL OBSTACLES WHEN BUILDING PLUMBR Monday, April 1, 13
  • 2. AGENDA Who we were and who we are Object lifecycle with little overhead Graph analysis in low memory The problem of quitting Monday, April 1, 13
  • 3. OUR BACKGROUND 2 developers Nikita Salnikov-Tarnovski, @iNikem Vladimir Šor, @vovencij 10+ years in custom software house Nortal Mostly Java EE development Web sites, backend systems, batch processes Monday, April 1, 13
  • 4. NEW PROBLEM Memory leaks 130,000 monthly searches for OutOfMemoryError in Google 20,000 monthly unique visitors on our site http://plumbr.eu 400 monthly downloads 1700+ leaks discovered Monday, April 1, 13
  • 5. PLUMBR Automated performance consultant Giving you the exact location of the leak with enough information to fix it The foundation is based on machine learning trained on 500,000 memory snapshots From 3,000 different applications Finding 88% of the existing leaks. Quality only going up with the additional data gathered each day. Monday, April 1, 13
  • 6. PLUMBR AGENT... JVM TI agents both java and native, OS specific welcome malloc and free! JNI code for communication between them Monday, April 1, 13
  • 7. ... WATCHES YOU We monitor object creation and disposal On-the-fly bytecode instrumentation Hooks into GC events Monday, April 1, 13
  • 8. OBJECT MONITORING I Java agent registers java.lang.instrument.ClassFileTransformer Modifies bytecode as classes are loaded Using ASM library To capture all newly created objects Monday, April 1, 13
  • 9. PROBLEMS Different compilers produce slightly different bytecode Some classes are too fragile or broken already new and chain of <init> Clone, deserialization, reflection Monday, April 1, 13
  • 10. OBJECT MONITORING II We keep some data about each live object That data creation and association takes time On every object creation! Monday, April 1, 13
  • 11. OBJECT MONITORING II If you cannot do in-process, do it off-process Monday, April 1, 13
  • 12. PROBLEMS BlockingQueue are slow Locks are slow Atomic* are slow! No existing library Even Disruptor doesn’t suite We’ve written no-guarantee-lock-free-many-producers-one- consumer buffer Concurrent programming IS hard Monday, April 1, 13
  • 13. MORE PROBLEMS Have to store all that objects related data somewhere Java Collections are too fat No lock-free thread-safe reading We use Trove to save memory Hand-written clone with dirty check Testing persistent immutable data structures Monday, April 1, 13
  • 14. LEAK HUNTING When leaks are detected we need to find out, who is holding them Paths to GC roots While application is still running Monday, April 1, 13
  • 15. PROBLEMS Java objects have no incoming refs You can walk the heap in C code But that stops the world Standard heap dump loses information So we make custom heap dump And traverse reference graph on it Monday, April 1, 13
  • 16. STILL PROBLEMS We’ve tried many graph traversal libraries And NoSQL solutions All somewhat works If you give them gigs of memory But we have to do this on-site, while application is still running We needed memory sensitive solution Monday, April 1, 13
  • 17. ONE MORE BICYCLE We’ve written our own specialized version of Dijkstra path searching Again had to replace many Java Collections with more memory efficient implementations Monday, April 1, 13
  • 18. TIME TO DIE Plumbr runs inside JVM alongside with an application It isn’t the main actor, just a supporter So Plumbr must be ready to quit whenever main application wishes Monday, April 1, 13
  • 19. WHEN JVM QUITS It turns out JVM is quite survivable No shutdown notification or smth It just quits when there are no more non-daemon threads And some threads live for far too long Monday, April 1, 13
  • 20. PROBLEMS Plumbr’s own threads Threads from libraries that Plumbr uses ExecutorService with daemon thread factory Monday, April 1, 13
  • 21. PROBLEMS RMI Reaper Thread Keeps JVM alive as long as some JMX resources are in use We must clean behind ourselves, MBeans, JMX connections, JMX servers But when??? Implemented our own monitor thread with some heuristics Monday, April 1, 13
  • 22. PROBLEMS Earlier versions used some Swing components, e.g. Systray icon And JVM will not quit while there is some displayable Swing components Should kill it when before quitting Again, when??? Monday, April 1, 13
  • 23. CONCLUSION Don’t spend all your time writing web components or web-services or Swing There is more to Java than that There are many Java libraries but not enough Monday, April 1, 13