A Taste of Monitoring and Post Mortem Debugging with Node
1. § Michael Dawson/Michael Tunnicliffe
§ September 2015
A Taste of Monitoring and Post Mortem
Debugging with Node
A Workshop
2. 2
To talk about: Application reliability and performance
●
Common problem scenarios
●
Monitoring
●
Problem diagnosis
●
Tailoring tooling to your application
Try it for yourself with prepared hands-on activities
To share and learn from each others experiences
Why are we here ?
3. 3
• Meet and Greet
• Runtime Monitoring
• Concepts and Tools
• Hands on - Monitor an App
• GC Visualisation
• Concepts and Tools
• Hands on – visualize gc activity
• Post Mortem Debugging
• Concepts and Tools
• Hands on with a heap dump
• Hands on with a core dump
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour (if appropriate)
Execution State/Native Memory
Retrospective
Improve your tools
Modelled on typical workflow
Agenda
4. 4
About Michael Dawson
Loves the web and building software
Senior Software Developer @ IBM
22 years development experience
10 years in runtime development
Contact me:
Michael_dawson@ca.ibm.com
https://www.linkedin.com/pub/michael-dawson/2/128/605
Twitter: @mhdawson1
5. 5
About Mike Tunnicliffe
Bug detective and low level systems programmer
Software Developer @ IBM
15 years development experience
12 years in runtime development
Contact me:
mike.tunnicliffe@uk.ibm.com
@fierydrake
6. 6
• Interest in Monitoring/Post mortem debugging ?
• What you hope to get out of the workshop today ?
About You (workshop participants)
Used Node Before ?
Role (Dev/QA/Ops) ?
Apps in Production ?
Production Monitoring ?
Background
Memory Usage ?
GC activity
Throughput
Reponse Time
Something else ?
Metrics – what do you track
High Memory
Memory Exhaustion
Memory Leaks
Footprint
Performance
Crash
Unhandled Exceptions
Problems – what have you seen
7. 7
• USB drive provided for easy installation
• IBM is not a distributor
• While freely available, licences apply to some components
●
Licences are included on USB drive and in install packages
●
By copying from USB drive you are accepting licence terms in USB drive
as licensee
• Printed copy also available
• Alternatively follow full download/install instructions.
• Install Instructions
●
Pre-Install/Instructions.txt
●
Self-Install/Instructions.txt
Workshop materials
8. 8
• Runtime Monitoring
• GC Visualization
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Modelled on typical workflow
Runtime Monitoring
9. 9
To understand the “real” characteristics of the application
• In production – health check under real load
• During development – test under artificial load
Why do runtime monitoring?
10. 10
Data about the application and the environment on which it
runs, typically:
• Machine and operating system
• Node.js and dependencies (eg V8)
• Application and dependencies (eg http, mongodb)
What will it give you?
11. 11
Typically production monitoring consists of:
• Data sources
• Data store aggregator
• Web-based visualizer
How it works
Browser
Monitoring
Data
HTT
P LoadBalancer
13. 13
Setup
• Run startelk
• Run startapps
• Open Kibana in a browser
http://localhost:5601/
• Open the dashboard
Try to identify any potential problems with the monitored
applications
Monitoring – Hands on – 20 minutes
15. 15
Node Application Metrics ('appmetrics')
• https://www.npmjs.com/package/appmetrics
• https://github.com/RuntimeTools/appmetrics
• Open Source under Apache 2.0 License*
• Provides:
●
producer and consumer APIs
●
monkey-patching framework and “dropins”
●
collection of connectors to existing monitoring stacks
*has a (currently) proprietary dependency
16. 16
appmetrics Data Collection
Data Type Event Name Data Provided
Application Data
HTTP request/response 'http' Time, Method, URL, Duration
MySQL database queries 'mysql' Time, SQL, Duration
MongoDB database queries 'mongo' Time, Query, Duration
Request tracking 'request' Time, Event Tree, Duration
JavaScript Runtime
V8 Garbage Collection 'gc' Time, Type, Used, Size, Duration
V8 Profiler 'profiling' Time, Function Array, Samples
System Resources
CPU Usage 'cpu' Time, Process, System
Memory Usage 'memory' Time, Process (Physical, Reserved, Virtual),
System (Physical, Total)
19. 19
Appmetrics and ElasticSearch/Kibana (ELK)
var appmetrics = require('appmetrics-elk').monitor();
var http = require('http');
var server = http.createServer(function handler(req, res) {
…
}).listen(process.env.PORT || 3000);
console.log('App listening on port 3000');
});
ES ES ES ES
client.bulk()
visualize
20. 20
'appmetrics' and Health Center for Eclipse
• Free from Eclipse Marketplace
• http://marketplace.eclipse.org/content/ibm-monitoring-and-diagnostic-tools-health-
• Supports Node and Java
• Uses MQTT for the connection
Pub/Sub Broker
Topic namespace:
/ibm/healthcenter
/<namespace>/<agent_id>/<datatype>
Node.js
Application
Liberty
Java
Application
MQTT
MQTT
MQTT
MQTT
MQTT Broker, eg. mosca
(npm install -g mosca)
21. 21
Health Center
Environment
Hardware and Operating System Configuration
Process environment and configuration
Highlights incorrect or non-standard configurations
Memory Utilization
Detect native memory leaks in application
Determine if external forces are using more memory
View components using the most native memory
CPU Utilization
Visualizes process CPU usage over time
Visualizes system CPU usage over time
22. 22
Health Center
Garbage Collection
Visualizes heap usage and GC pause times
Identifies memory leaks
Suggests command-line and tuning parameters
Function Profiling
Function profiling shows application activity
Identifies the hottest methods in an application
Full call stacks to identify where functions are being
called from and what functions they call
23. 23
Monitoring – Hands on – 20 minutes
• Run sample app (gcload.bat/gcload.sh) and connect with Health Center
• Look at environment information
• Look through other available panels
• Look at gc activity so see periodic allocation load
• Run sample app with heavy cpu usage (cpuload.bat/cpuload.sh)
• Enable profiling
• Look at profile, map back to application source
• Run custom program that consumes data.
• Edit gcload.js to un-comment line that create monitor instance
• Re-run and see output on console.
24. 24
• Runtime Monitoring
• GC Visualisation
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Modelled on typical workflow
Runtime Monitoring
25. 25
GC Visualization
• Visualisation of gc data long known to be important
• There are many existing gc visualizers (mostly Java)
•
•
•
•
•
• Javascript
●
https://github.com/adobe-research/GCview
●
http://marketplace.eclipse.org/content/ibm-monitoring-and-diagnostic-tools-garbage
We’ll use GCMV since we are most familiar with it
26. 26
GCMV
• Free from Eclipse Marketplace
• http://marketplace.eclipse.org/content/ibm-monitoring-and-diagnostic-tools-
garbage-collection-and-memory-visualizer-gcmv
• Supports both Node and Java
27. 27
GC Visualization – Hands on (15mins)
• Run sample app with memory leak with gc options enabled (gcloadWithVerbose)
• Observe there is a lot of information
• Run again sending to file
• Open up resulting file in GCMV
28. 28
• Runtime Monitoring
• GC Visualisation
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Post Mortem Debugging
29. 29
• 2 Common artifacts
• Heap Dumps
• Core Dumps
• No standards yet but workgroup:
• https://github.com/nodejs/post-mortem
• Heap dumps provide insight into Heap memory usage
• Core Dumps can provide insight into:
• Heap memory usage
• Runtime State
• What was running at the time of a crash – native and JS stacks
• Values in native memory
• Native memory usage
Post Mortem Debugging
30. 30
• Run Sample OOM program
• Generate Heapdump on OOM
• Load OOM into Chrome developer tools
• Show how to find what is keeps memory alive
• Participants explore
Chrome Developer Tools – Hands on – 10 minutes
31. 31
• Free from Eclipse Marketplace
• http://marketplace.eclipse.org/content/ibm-monitoring-and-diagnostic-tools-interac
• Multi-runtime → Supports Node and Java
• Currently requires dumps from IBM SDK for Node.js
• http://www.ibm.com/developerworks/web/nodesdk/
Interactive Diagnostic Data Explorer (IDDE)
32. 32
• Open pre-provided core file
• Use IDDE commands to show where crash occurred
• Help to list all commands
• Explore
IDDE – Hands On – 10 minutes
33. 33
• Runtime Monitoring
• GC Visualisation
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Retrospective
34. 34
• Experience from Participants (Open discussion)
– Other tools
– Key issues
– Workflow
• Gaps Brainstorm
– Key gaps
– Post mortem debugging – useful commands ?
Other Experiences/Gaps Brainstorm (10-15 mins)
35. 35
• Runtime Monitoring
• GC Visualisation
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Improve
36. 36
You may collect timing and event data from your application,
or modules that you use.
Wouldn't it be great if you could capture and view that in the
same way as the other monitoring data?
Custom monitoring
37. 37
The appmetrics module provides:
• A producer API for emitting custom data
• A consumer API for receiving all data
Using appmetrics APIs
38. 38
Using the emit() and Event APIs
var appmetrics = require('appmetrics');
var data = {
time: Date.now(),
mydata: 7
};
appmetrics.emit('myevent', data);
var appmetrics = require('appmetrics');
var monitor = appmetrics.monitor();
monitor.on('myevent', function(data) {
console.log('[myevent] ' + data.mydata);
});
• emit() API is on the appmetrics object
• Convention to have a “time” field
– Allows it to be used in ElasticSearch
• Events are on the monitored instance object
– Allows for future remote monitoring
39. 39
Instrument a simple piece of test code:
• Emit a custom 'iteration' event with the following data:
count
time taken to generate the array
time taken to serialize
• Add a handler that logs the data
Improve – Instrument an application (10 mins)
// Crunch some numbers
setInterval(function() {
var object = [];
var count = Math.random()*10000000;
for (var i=0; i<count; i++) {
object.push(i)
}
JSON.stringify(object);
}, 1000);
40. 40
You may not be able to modify the code you want to
instrument
• For example: a third-party or core module
• You could monkey patch it
Instrumenting modules
41. 41
A simple monkey patching example
var appmetrics = require('appmetrics');
var fs = require('fs');
var originalReadFileSync = fs.readFileSync;
// Instrument readFileSync
fs.readFileSync = function() {
var startTime = Date.now();
originalReadFileSync.apply(fs, arguments);
var endTime = Date.now();
appmetrics.emit('loadfile', { 'duration': endTime - startTime });
};
// Set up monitoring listeners
appmetrics.monitor().on('loadfile', function(e) {
console.log('File load duration=' + e['duration'] + 'ms');
});
// Application
for (var i=0; i<100; i++) {
fs.readFileSync('bigfile.dat');
}
Store function to
patch
Patch function
by replacing
Custom
“visualizer”
42. 42
Asynchronous monkey patching example
var appmetrics = require('appmetrics');
var fs = require('fs');
var originalReadFile = fs.readFile;
// Instrument readFileSync
fs.readFile = function() {
var startTime = Date.now();
var callbackIndex = (typeof(arguments[1]) === 'function') ? 1 :
(typeof(arguments[2]) === 'function') ? 2 : -1;
if (callbackIndex == -1) return;
var originalCallback = arguments[callbackIndex];
var wrap = function() {
appmetrics.emit('loadfile', { 'duration': Date.now() - startTime });
originalCallback.call(arguments);
};
arguments[callbackIndex] = wrap;
originalReadFile.apply(fs, arguments);
};
// Set up monitoring listeners
appmetrics.monitor().on('loadfile', console.log);
// Application
for (var i=0; i<100; i++) {
fs.readFile('bigfile.dat', function(err, data) {
console.log('readFile completed');
});
}
Store function to
patch
Patch function
by replacing
Custom
“visualizer”
43. 43
Deferred monkey patching example
var module = require('module');
var moduleLoad = module._load;
module._load = function(mod, parent, isMain) {
var loadedModule = moduleLoad.apply(this, arguments);
if (mod === 'http') {
var addListener = loadedModule.Server.prototype.addListener;
loadedModule.Server.prototype.addListener = function(type, listener) {
if (type === 'request' && typeof listener === 'function'){
var oldListener = listener;
listener = function(req, res){
if (typeof res.startTime === 'undefined') {
res.startTime = new Date().getTime();
res.once('finish', function() {
res.endTime = new Date().getTime();
var dur = res.endTime - res.startTime;
console.log('['+ res.startTime + '] ' + req.method + ' : ' + req.url + ' ('+dur+')');
});
};
oldListener.call(this, req, res);
};
return addListener.call(this, type, listener);
} else {
return addListener.apply(this, arguments);
}
};
};
return loadedModule;
};
44. 44
The appmetrics module provides:
• Drop-in “probes”
Self-contained facility to deploy patches in appmetrics
• An aspect framework for use by probes
Simplify complex method replacement scenarios
Making patching easier
45. 45
Using the Probes framework
Probe scripts are automatically loaded from:
node_modules/appmetrics/probes
Template probe:
var Probe = require('../lib/probe.js');
var aspect = require('../lib/aspect.js');
var util = require('util');
function MyExampleProbe() {
Probe.call(this, 'name_of_module_to_patch');
}
util.inherits(MyExampleProbe, Probe);
MyExampleProbe.prototype.attach = function(name, target, am) {
if (target.__probeAttached__) return target;
target.__probeAttached__ = true;
// Code to patch and emit event (can use aspect)
return target;
};
module.exports = MyExampleProbe;
46. 46
Aspect API
aspect.before(obj, methods, func);
aspect.after(obj, methods, func);
aspect.around(obj, methods, func1, func2);
aspect.aroundCallback(obj, methods, func1, func2);
• Provides APIs to locate and patch functions
– before, after and around supported
HTTPProbe.prototype.attach = function(name, target, am) {
if (target.__probeAttached__) return target;
target.__probeAttached__ = true;
aspect.before(target.Server.prototype, ['on', 'addListener'], function(obj, args) {
if (args[0] !== 'request') return;
aspect.aroundCallback(args, function(obj, args) {
var request = args[0], response = args[1]; // args for callback are [request, response]
response.startTime = Date.now();
aspect.after(response, 'finish', function(obj, args, ret) {
var dur = Date.now() - response.startTime;
console.log('['+ response.startTime + '] ' + request.method + ' : ' + request.url + ' ('+dur+')');
});
});
});
return target;
};
• Reduces the complex monkey patch example to...
47. 47
Using the Aspects framework
Aspects framework makes it easy to patch specific functions in the module
Simple use:
The “around” function is basically “before” and “after” combined
MyExampleProbe.prototype.attach = function(name, target, am) {
if (target.__probeAttached__) return target;
target.__probeAttached__ = true;
var start;
aspect.around(target, 'readFileSync',
function(obj, args) {
start = Date.now();
},
function(obj, args, returnObject) {
am.emit('file', {time: start, method: 'readFileSync', file: args[0], duration: Date.now() - start});
return returnObject;
}
);
return target;
};
48. 48
Using the Aspects framework
This, more complex, scenario can happen when you want to patch async functions
that accept callback arguments
The “aroundCallback” function will search a given list of objects and patch the
first function it finds (good for patching async handlers)
MyExampleProbe.prototype.attach = function(name, target, am) {
if (target.__probeAttached__) return target;
target.__probeAttached__ = true;
// 1. Run some code after moduleToPatch.functionToPatch() to...
// 2. - run some code before a function on its return value (returnObject.functionOnReturnObjectToPatch()) to...
// 3. - search that function's arguments for a callback and run some code before that callback
aspect.after(target, 'functionToPatch', function(target, args, returnObject) {
aspect.before(returnObject, 'functionOnReturnObjectToPatch',
function(target, methodArgs) {
aspect.aroundCallback(methodArgs, function(target, args){
// code
});
}
);
return returnObject;
});
return target;
};
1
2
3
49. 49
Instrument fs.readFile() by creating a probe
Start with the template code →
●
Set the name to 'fs'
●
Replace the comment with
a call into the aspect framework
to patch readFile() and associated
callback
●
Drop into the probes directory
inside appmetrics
Use this simple script to test →
Improve – Instrument a module (10 mins)
var fs = require('fs');
setInterval(function() {
fs.readFile('bigfile.dat', function(err, data) {
console.log(data.length);
});
}, 1000);
var Probe = require('../lib/probe.js');
var aspect = require('../lib/aspect.js');
var util = require('util');
function MyExampleProbe() {
Probe.call(this, 'name_of_module_to_patch');
}
util.inherits(MyExampleProbe, Probe);
MyExampleProbe.prototype.attach = function(name, target, am) {
if (target.__probeAttached__) return target;
target.__probeAttached__ = true;
// Code to patch and emit event (can use aspect)
return target;
};
module.exports = MyExampleProbe;
50. 50
• Runtime Monitoring
• GC Visualisation
• Post Mortem Debugging
• Experience/Gaps Brainstorm
• Monitoring – Instrument a Module
Heap usage/Object layout
Initial Problem Determination
Dig into GC behaviour
Execution State/Native Memory
Retrospective
Improve your tools
Summary and Wrap up