SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
System	
  Methodology	
  
Holis0c	
  Performance	
  Analysis	
  on	
  
Modern	
  Systems	
  
Brendan Gregg
Senior Performance Architect
Jun,	
  2016	
  ACM Applicative 2016
Apollo LMGC
performance analysis
ERASABLE	
  
MEMORY	
  
CORE	
  SET	
  
AREA	
  
VAC	
  SETS	
  
FIXED	
  
MEMORY	
  
Background	
  
History	
  
•  System Performance Analysis up to the '90s:
–  Closed source UNIXes and applications
–  Vendor-created metrics and performance tools
–  Users interpret given metrics
•  Problems
–  Vendors may not provide the best metrics
–  Often had to infer, rather than measure
–  Given metrics, what do we do with them?
# ps alx
F S UID PID PPID CPU PRI NICE ADDR SZ WCHAN TTY TIME CMD
3 S 0 0 0 0 0 20 2253 2 4412 ? 186:14 swapper
1 S 0 1 0 0 30 20 2423 8 46520 ? 0:00 /etc/init
1 S 0 16 1 0 30 20 2273 11 46554 co 0:00 –sh
[…]
Today	
  
1.  Open source
–  Operating systems: Linux, BSDs, illumos, etc.
–  Applications: source online (Github)
2.  Custom metrics
–  Can patch the open source, or,
–  Use dynamic tracing (open source helps)
3.  Methodologies
–  Start with the questions, then make metrics to answer them
–  Methodologies can pose the questions
Biggest problem with dynamic tracing has been what to do with it.
Methodologies guide your usage.
Crystal	
  Ball	
  Thinking	
  
An#-­‐Methodologies	
  
Street	
  Light	
  An#-­‐Method	
  
1.  Pick observability tools that are
–  Familiar
–  Found on the Internet
–  Found at random
2.  Run tools
3.  Look for obvious issues
Drunk	
  Man	
  An#-­‐Method	
  
•  Drink Tune things at random until the problem goes away
Blame	
  Someone	
  Else	
  An#-­‐Method	
  
1.  Find a system or environment component you are not
responsible for
2.  Hypothesize that the issue is with that component
3.  Redirect the issue to the responsible team
4.  When proven wrong, go to 1
Traffic	
  Light	
  An#-­‐Method	
  
1.  Turn all metrics into traffic lights
2.  Open dashboard
3.  Everything green? No worries, mate.
•  Type I errors: red instead of green
–  team wastes time
•  Type II errors: green instead of red
–  performance issues undiagnosed
–  team wastes more time looking elsewhere
Traffic lights are suitable for objective metrics (eg, errors),
not subjective metrics (eg, IOPS, latency).
Methodologies	
  
Performance	
  Methodologies	
  
System Methodologies:
–  Problem statement method
–  Functional diagram method
–  Workload analysis
–  Workload characterization
–  Resource analysis
–  USE method
–  Thread State Analysis
–  On-CPU analysis
–  CPU flame graph analysis
–  Off-CPU analysis
–  Latency correlations
–  Checklists
–  Static performance tuning
–  Tools-based methods
…
•  For system engineers:
–  ways to analyze unfamiliar
systems and applications
•  For app developers:
–  guidance for metric and
dashboard design
Collect your
own toolbox of
methodologies
Problem	
  Statement	
  Method	
  
1.  What makes you think there is a performance problem?
2.  Has this system ever performed well?
3.  What has changed recently?
–  software? hardware? load?
4.  Can the problem be described in terms of latency?
–  or run time. not IOPS or throughput.
5.  Does the problem affect other people or applications?
6.  What is the environment?
–  software, hardware, instance types?
versions? config?
Func0onal	
  Diagram	
  Method	
  
1.  Draw the functional diagram
2.  Trace all components in the data path
3.  For each component, check performance
Breaks up a bigger problem into
smaller, relevant parts
Eg, imagine throughput
between the UCSB 360 and
the UTAH PDP10 was slow…
ARPA	
  Network	
  1969	
  
Workload	
  Analysis	
  
•  Begin with application metrics & context
•  A drill-down methodology
•  Pros:
–  Proportional,
accurate metrics
–  App context
•  Cons:
–  App specific
–  Difficult to dig from
app to resource
Applica0on	
  
	
  
	
  System	
  Libraries	
  
System	
  Calls	
  
Kernel	
  
Hardware	
  
Workload	
  
Analysis	
  
Workload	
  Characteriza0on	
  
•  Check the workload: who, why, what, how
–  not resulting performance
•  Eg, for CPUs:
1.  Who: which PIDs, programs, users
2.  Why: code paths, context
3.  What: CPU instructions, cycles
4.  How: changing over time
Target	
  Workload	
  
Workload	
  Characteriza0on:	
  CPUs	
  
Who
How What
Why
top CPU	
  sample	
  
flame	
  graphs	
  
monitoring	
   PMCs	
  
Resource	
  Analysis	
  
•  Typical approach for system performance analysis:
begin with system tools & metrics
•  Pros:
–  Generic
–  Aids resource
perf tuning
•  Cons:
–  Uneven coverage
–  False positives
Applica0on	
  
	
  
	
  System	
  Libraries	
  
System	
  Calls	
  
Kernel	
  
Hardware	
  
Workload	
  
Analysis	
  
The	
  USE	
  Method	
  
•  For every resource, check:
1.  Utilization: busy time
2.  Saturation: queue length or time
3.  Errors: easy to interpret (objective)
Starts with the questions, then finds the tools
Eg, for hardware, check every resource incl. busses:
http://www.brendangregg.com/USEmethod/use-rosetta.html
ERASABLE	
  
MEMORY	
  
CORE	
  SET	
  
AREA	
  
VAC	
  SETS	
  
FIXED	
  
MEMORY	
  
Apollo Guidance
Computer
USE	
  Method:	
  SoZware	
  
•  USE method can also work for software resources
–  kernel or app internals, cloud environments
–  small scale (eg, locks) to large scale (apps). Eg:
•  Mutex locks:
–  utilization à lock hold time
–  saturation à lock contention
–  errors à any errors
•  Entire application:
–  utilization à percentage of worker threads busy
–  saturation à length of queued work
–  errors à request errors
Resource	
  
U0liza0on	
  
(%)	
  X	
  
RED	
  Method	
  
•  For every service, check that:
1.  Request rate
2.  Error rate
3.  Duration (distribution)
are within SLO/A
Another exercise in posing questions
from functional diagrams
By Tom Wilkie: http://www.slideshare.net/weaveworks/monitoring-microservices
Load	
  
Balancer	
  
Web	
  
Proxy	
  
Web	
  Server	
  
User	
  
Database	
  
Payments	
  
Server	
  
Asset	
  
Server	
  
Metrics	
  
Database	
  
Thread	
  State	
  Analysis	
  
Identify & quantify
time in states
Narrows further
analysis to state
Thread states are
applicable to all apps
State transition diagram
TSA:	
  eg,	
  Solaris	
  
TSA:	
  eg,	
  RSTS/E	
  
RSTS: DEC OS
from the 1970's
TENEX (1969-72)
also had Control-T
for job states
TSA:	
  eg,	
  OS	
  X	
  
Instruments:	
  Thread	
  States	
  
On-­‐CPU	
  Analysis	
  
1.  Split into user/kernel states
–  /proc, vmstat(1)
2.  Check CPU balance
–  mpstat(1), CPU utilization heat map
3.  Profile software
–  User & kernel stack sampling (as a CPU flame graph)
4.  Profile cycles, caches, busses
–  PMCs, CPI flame graph
CPU	
  U0liza0on	
  
Heat	
  Map	
  
CPU	
  Flame	
  Graph	
  Analysis	
  
1.  Take a CPU profile
2.  Render it as a flame graph
3.  Understand all software that is in >1% of samples
Discovers issues by their CPU usage
-  Directly: CPU consumers
-  Indirectly: initialization
of I/O, locks, times, ...
Narrows target of study
to only running code
-  See: "The Flame Graph",
CACM, June 2016
Flame	
  Graph	
  
Java	
  Mixed-­‐Mode	
  CPU	
  Flame	
  Graph	
  
Java	
  
JVM	
  
Kernel	
  
GC	
  
•  eg, Linux perf_events, with:
•  Java –XX:+PreserveFramePointer
•  Java perf-map-agent
CPI	
  Flame	
  Graph	
  
•  Profile cycle stack traces and instructions or stalls separately
•  Generate CPU flame graph (cycles) and color using other profile
•  eg, FreeBSD: pmcstat
red	
  ==	
  instruc0ons	
  
blue	
  ==	
  stalls	
  
Off-­‐CPU	
  Analysis	
  
Analyze off-CPU time
via blocking code path:
Off-CPU flame graph
Often need wakeup
code paths as well…
Off-­‐CPU	
  Time	
  Flame	
  Graph	
  
file	
  read	
  
from	
  disk	
  
directory	
  read	
  
from	
  disk	
  
pipe	
  write	
  
path	
  read	
  from	
  disk	
  
fstat	
  from	
  disk	
  
Stack	
  depth	
  Off-­‐CPU	
  0me	
  Trace blocking events with
kernel stacks & time blocked
(eg, using Linux BPF)
Wakeup	
  Time	
  Flame	
  Graph	
  
… can also associate wake-up stacks with off-CPU stacks
(eg, Linux 4.6: samples/bpf/offwaketime*)
Who did the wakeup:
Associate more than
one waker: the full
chain of wakeups
With enough stacks,
all paths lead to metal
An approach for
analyzing all off-CPU
issues
Chain	
  Graphs	
  
Latency	
  Correla0ons	
  
1.  Measure latency
histograms at different
stack layers
2.  Compare histograms
to find latency origin
Even better, use latency
heat maps
•  Match outliers based on
both latency and time
Checklists:	
  eg,	
  Linux	
  Perf	
  Analysis	
  in	
  60s	
  
1.  uptime
2.  dmesg | tail
3.  vmstat 1
4.  mpstat -P ALL 1
5.  pidstat 1
6.  iostat -xz 1
7.  free -m
8.  sar -n DEV 1
9.  sar -n TCP,ETCP 1
10.  top
load	
  averages	
  
kernel	
  errors	
  
overall	
  stats	
  by	
  0me	
  
CPU	
  balance	
  
process	
  usage	
  
disk	
  I/O	
  
memory	
  usage	
  
network	
  I/O	
  
TCP	
  stats	
  
check	
  overview	
  
http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
1.	
  RPS,	
  CPU	
   2.	
  Volume	
  
6.	
  Load	
  Avg	
  
3.	
  Instances	
   4.	
  Scaling	
  
5.	
  CPU/RPS	
  
7.	
  Java	
  Heap	
   8.	
  ParNew	
  
9.	
  Latency	
   10.	
  99th	
  0le	
  
Checklists:	
  eg,	
  Neklix	
  perfvitals	
  Dashboard	
  
Sta0c	
  Performance	
  Tuning:	
  eg,	
  Linux	
  
Tools-­‐Based	
  Method	
  
1.  Try all the tools! May be an anti-pattern. Eg, OS X:
Other	
  Methodologies	
  
•  Scientific method
•  5 Why's
•  Process of elimination
•  Intel's Top-Down Methodology
•  Method R
What	
  You	
  Can	
  Do	
  
What	
  you	
  can	
  do	
  
1.  Know what's now possible on modern systems
–  Dynamic tracing: efficiently instrument any software
–  CPU facilities: PMCs, MSRs (model specific registers)
–  Visualizations: flame graphs, latency heat maps, …
2.  Ask questions first: use methodologies to ask them
3.  Then find/build the metrics
4.  Build or buy dashboards to support methodologies
Dynamic	
  Tracing:	
  Efficient	
  Metrics	
  
send	
  
receive	
  
tcpdump	
  
Kernel	
  
1.	
  read	
  
2.	
  dump	
  
Analyzer	
   1.	
  read	
  
2.	
  process	
  
3.	
  print	
  
disks	
  
Old way: packet capture
New way: dynamic tracing
Tracer	
   1.	
  configure	
  
2.	
  read	
  
tcp_retransmit_skb()	
  
Eg, tracing TCP retransmits
file	
  system	
  
buffer	
  
Dynamic	
  Tracing:	
  Measure	
  Anything	
  
Those are Solaris/DTrace tools. Now becoming possible on all OSes:
FreeBSD & OS X DTrace, Linux BPF, Windows ETW
Performance	
  Monitoring	
  Counters	
  
Eg, FreeBSD PMC groups for Intel Sandy Bridge:
Visualiza0ons	
  
Eg, Disk I/O latency as a heat map, quantized in kernel:
USE	
  Method:	
  eg,	
  Neklix	
  Vector	
  
u0liza0on	
   satura0on	
  CPU:	
  
u0liza0on	
   satura0on	
  Network:	
   load	
  
u0liza0on	
   satura0on	
  Memory:	
  
load	
   satura0on	
  Disk:	
   u0liza0on	
  
USE	
  Method:	
  To	
  Do	
  
Showing what is and is not commonly measured
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
  
U	
   S	
   E	
   U	
   S	
   E	
   U	
   S	
   E	
   U	
   S	
   E	
  
CPU	
  Workload	
  Characteriza0on:	
  To	
  Do	
  
Who
How What
Why
top,	
  htop perf record -g
flame	
  Graphs	
  
monitoring	
   perf stat -a -d
Showing what is and is not commonly measured
Summary	
  
•  It is the crystal ball age of performance observability
•  What matters is the questions you want answered
•  Methodologies are a great way to pose questions
References	
  &	
  Resources	
  
•  USE Method
–  http://queue.acm.org/detail.cfm?id=2413037
–  http://www.brendangregg.com/usemethod.html
•  TSA Method
–  http://www.brendangregg.com/tsamethod.html
•  Off-CPU Analysis
–  http://www.brendangregg.com/offcpuanalysis.html
–  http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
–  http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html
•  Static Performance Tuning, Richard Elling, Sun blueprint, May 2000
•  RED Method: http://www.slideshare.net/weaveworks/monitoring-microservices
•  Other system methodologies
–  Systems Performance: Enterprise and the Cloud, Prentice Hall 2013
–  http://www.brendangregg.com/methodology.html
–  The Art of Computer Systems Performance Analysis, Jain, R., 1991
•  Flame Graphs
–  http://queue.acm.org/detail.cfm?id=2927301
–  http://www.brendangregg.com/flamegraphs.html
–  http://techblog.netflix.com/2015/07/java-in-flames.html
•  Latency Heat Maps
–  http://queue.acm.org/detail.cfm?id=1809426
–  http://www.brendangregg.com/HeatMaps/latency.html
•  ARPA Network: http://www.computerhistory.org/internethistory/1960s
•  RSTS/E System User's Guide, 1985, page 4-5
•  DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD, Prentice Hall 2011
•  Apollo: http://www.hq.nasa.gov/office/pao/History/alsj/a11 http://www.hq.nasa.gov/alsj/alsj-LMdocs.html
Feb	
  
2016	
  
•  Questions?
•  http://slideshare.net/brendangregg
•  http://www.brendangregg.com
•  bgregg@netflix.com
•  @brendangregg
Jun,	
  2016	
  ACM Applicative 2016

Más contenido relacionado

La actualidad más candente

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisBrendan Gregg
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsBrendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 

La actualidad más candente (20)

YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
MeetBSD2014 Performance Analysis
MeetBSD2014 Performance AnalysisMeetBSD2014 Performance Analysis
MeetBSD2014 Performance Analysis
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 

Destacado

Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 

Destacado (6)

Stop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production SystemsStop the Guessing: Performance Methodologies for Production Systems
Stop the Guessing: Performance Methodologies for Production Systems
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 

Similar a ACM Applicative System Methodology 2016

Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE MethodBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & AnalysisRishi Pathak
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Daniel dauwe ece 561 Trial 3
Daniel dauwe   ece 561 Trial 3Daniel dauwe   ece 561 Trial 3
Daniel dauwe ece 561 Trial 3cinedan
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Resultscinedan
 
Daniel dauwe ece 561 Benchmarking Results Trial 2
Daniel dauwe   ece 561 Benchmarking Results Trial 2Daniel dauwe   ece 561 Benchmarking Results Trial 2
Daniel dauwe ece 561 Benchmarking Results Trial 2cinedan
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and AnalysisRishi Pathak
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Performance engineering methodologies
Performance engineering  methodologiesPerformance engineering  methodologies
Performance engineering methodologiesManeesh Chaturvedi
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Operating System / System Operasi
Operating System / System Operasi                   Operating System / System Operasi
Operating System / System Operasi seolangit4
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Alexander Shalimov
 
Scalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetScalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetIntel IT Center
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-conceptsMuhammad Ahad
 

Similar a ACM Applicative System Methodology 2016 (20)

Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
HPC Application Profiling & Analysis
HPC Application Profiling & AnalysisHPC Application Profiling & Analysis
HPC Application Profiling & Analysis
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
techniques.ppt
techniques.ppttechniques.ppt
techniques.ppt
 
Daniel dauwe ece 561 Trial 3
Daniel dauwe   ece 561 Trial 3Daniel dauwe   ece 561 Trial 3
Daniel dauwe ece 561 Trial 3
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Results
 
Daniel dauwe ece 561 Benchmarking Results Trial 2
Daniel dauwe   ece 561 Benchmarking Results Trial 2Daniel dauwe   ece 561 Benchmarking Results Trial 2
Daniel dauwe ece 561 Benchmarking Results Trial 2
 
HPC Application Profiling and Analysis
HPC Application Profiling and AnalysisHPC Application Profiling and Analysis
HPC Application Profiling and Analysis
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Unit 4
Unit  4Unit  4
Unit 4
 
Performance engineering methodologies
Performance engineering  methodologiesPerformance engineering  methodologies
Performance engineering methodologies
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Operating System / System Operasi
Operating System / System Operasi                   Operating System / System Operasi
Operating System / System Operasi
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)Runos OpenFlow Controller (eng)
Runos OpenFlow Controller (eng)
 
Scalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetScalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca Toolset
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 

Más de Brendan Gregg

IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedBrendan Gregg
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at NetflixBrendan Gregg
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareBrendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsBrendan Gregg
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityBrendan Gregg
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixBrendan Gregg
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityBrendan Gregg
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Brendan Gregg
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance AnalysisBrendan Gregg
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 

Más de Brendan Gregg (20)

IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
re:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflixre:Invent 2019 BPF Performance Analysis at Netflix
re:Invent 2019 BPF Performance Analysis at Netflix
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
YOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflixYOW2018 CTO Summit: Working at netflix
YOW2018 CTO Summit: Working at netflix
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
FlameScope 2018
FlameScope 2018FlameScope 2018
FlameScope 2018
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)Linux Performance 2018 (PerconaLive keynote)
Linux Performance 2018 (PerconaLive keynote)
 
LISA17 Container Performance Analysis
LISA17 Container Performance AnalysisLISA17 Container Performance Analysis
LISA17 Container Performance Analysis
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 

Último

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 

Último (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 

ACM Applicative System Methodology 2016

  • 1. System  Methodology   Holis0c  Performance  Analysis  on   Modern  Systems   Brendan Gregg Senior Performance Architect Jun,  2016  ACM Applicative 2016
  • 2.
  • 3. Apollo LMGC performance analysis ERASABLE   MEMORY   CORE  SET   AREA   VAC  SETS   FIXED   MEMORY  
  • 4.
  • 6. History   •  System Performance Analysis up to the '90s: –  Closed source UNIXes and applications –  Vendor-created metrics and performance tools –  Users interpret given metrics •  Problems –  Vendors may not provide the best metrics –  Often had to infer, rather than measure –  Given metrics, what do we do with them? # ps alx F S UID PID PPID CPU PRI NICE ADDR SZ WCHAN TTY TIME CMD 3 S 0 0 0 0 0 20 2253 2 4412 ? 186:14 swapper 1 S 0 1 0 0 30 20 2423 8 46520 ? 0:00 /etc/init 1 S 0 16 1 0 30 20 2273 11 46554 co 0:00 –sh […]
  • 7. Today   1.  Open source –  Operating systems: Linux, BSDs, illumos, etc. –  Applications: source online (Github) 2.  Custom metrics –  Can patch the open source, or, –  Use dynamic tracing (open source helps) 3.  Methodologies –  Start with the questions, then make metrics to answer them –  Methodologies can pose the questions Biggest problem with dynamic tracing has been what to do with it. Methodologies guide your usage.
  • 10. Street  Light  An#-­‐Method   1.  Pick observability tools that are –  Familiar –  Found on the Internet –  Found at random 2.  Run tools 3.  Look for obvious issues
  • 11. Drunk  Man  An#-­‐Method   •  Drink Tune things at random until the problem goes away
  • 12. Blame  Someone  Else  An#-­‐Method   1.  Find a system or environment component you are not responsible for 2.  Hypothesize that the issue is with that component 3.  Redirect the issue to the responsible team 4.  When proven wrong, go to 1
  • 13. Traffic  Light  An#-­‐Method   1.  Turn all metrics into traffic lights 2.  Open dashboard 3.  Everything green? No worries, mate. •  Type I errors: red instead of green –  team wastes time •  Type II errors: green instead of red –  performance issues undiagnosed –  team wastes more time looking elsewhere Traffic lights are suitable for objective metrics (eg, errors), not subjective metrics (eg, IOPS, latency).
  • 15. Performance  Methodologies   System Methodologies: –  Problem statement method –  Functional diagram method –  Workload analysis –  Workload characterization –  Resource analysis –  USE method –  Thread State Analysis –  On-CPU analysis –  CPU flame graph analysis –  Off-CPU analysis –  Latency correlations –  Checklists –  Static performance tuning –  Tools-based methods … •  For system engineers: –  ways to analyze unfamiliar systems and applications •  For app developers: –  guidance for metric and dashboard design Collect your own toolbox of methodologies
  • 16. Problem  Statement  Method   1.  What makes you think there is a performance problem? 2.  Has this system ever performed well? 3.  What has changed recently? –  software? hardware? load? 4.  Can the problem be described in terms of latency? –  or run time. not IOPS or throughput. 5.  Does the problem affect other people or applications? 6.  What is the environment? –  software, hardware, instance types? versions? config?
  • 17. Func0onal  Diagram  Method   1.  Draw the functional diagram 2.  Trace all components in the data path 3.  For each component, check performance Breaks up a bigger problem into smaller, relevant parts Eg, imagine throughput between the UCSB 360 and the UTAH PDP10 was slow… ARPA  Network  1969  
  • 18. Workload  Analysis   •  Begin with application metrics & context •  A drill-down methodology •  Pros: –  Proportional, accurate metrics –  App context •  Cons: –  App specific –  Difficult to dig from app to resource Applica0on      System  Libraries   System  Calls   Kernel   Hardware   Workload   Analysis  
  • 19. Workload  Characteriza0on   •  Check the workload: who, why, what, how –  not resulting performance •  Eg, for CPUs: 1.  Who: which PIDs, programs, users 2.  Why: code paths, context 3.  What: CPU instructions, cycles 4.  How: changing over time Target  Workload  
  • 20. Workload  Characteriza0on:  CPUs   Who How What Why top CPU  sample   flame  graphs   monitoring   PMCs  
  • 21. Resource  Analysis   •  Typical approach for system performance analysis: begin with system tools & metrics •  Pros: –  Generic –  Aids resource perf tuning •  Cons: –  Uneven coverage –  False positives Applica0on      System  Libraries   System  Calls   Kernel   Hardware   Workload   Analysis  
  • 22. The  USE  Method   •  For every resource, check: 1.  Utilization: busy time 2.  Saturation: queue length or time 3.  Errors: easy to interpret (objective) Starts with the questions, then finds the tools Eg, for hardware, check every resource incl. busses:
  • 24.
  • 25. ERASABLE   MEMORY   CORE  SET   AREA   VAC  SETS   FIXED   MEMORY   Apollo Guidance Computer
  • 26. USE  Method:  SoZware   •  USE method can also work for software resources –  kernel or app internals, cloud environments –  small scale (eg, locks) to large scale (apps). Eg: •  Mutex locks: –  utilization à lock hold time –  saturation à lock contention –  errors à any errors •  Entire application: –  utilization à percentage of worker threads busy –  saturation à length of queued work –  errors à request errors Resource   U0liza0on   (%)  X  
  • 27. RED  Method   •  For every service, check that: 1.  Request rate 2.  Error rate 3.  Duration (distribution) are within SLO/A Another exercise in posing questions from functional diagrams By Tom Wilkie: http://www.slideshare.net/weaveworks/monitoring-microservices Load   Balancer   Web   Proxy   Web  Server   User   Database   Payments   Server   Asset   Server   Metrics   Database  
  • 28. Thread  State  Analysis   Identify & quantify time in states Narrows further analysis to state Thread states are applicable to all apps State transition diagram
  • 30. TSA:  eg,  RSTS/E   RSTS: DEC OS from the 1970's TENEX (1969-72) also had Control-T for job states
  • 31. TSA:  eg,  OS  X   Instruments:  Thread  States  
  • 32. On-­‐CPU  Analysis   1.  Split into user/kernel states –  /proc, vmstat(1) 2.  Check CPU balance –  mpstat(1), CPU utilization heat map 3.  Profile software –  User & kernel stack sampling (as a CPU flame graph) 4.  Profile cycles, caches, busses –  PMCs, CPI flame graph CPU  U0liza0on   Heat  Map  
  • 33. CPU  Flame  Graph  Analysis   1.  Take a CPU profile 2.  Render it as a flame graph 3.  Understand all software that is in >1% of samples Discovers issues by their CPU usage -  Directly: CPU consumers -  Indirectly: initialization of I/O, locks, times, ... Narrows target of study to only running code -  See: "The Flame Graph", CACM, June 2016 Flame  Graph  
  • 34. Java  Mixed-­‐Mode  CPU  Flame  Graph   Java   JVM   Kernel   GC   •  eg, Linux perf_events, with: •  Java –XX:+PreserveFramePointer •  Java perf-map-agent
  • 35. CPI  Flame  Graph   •  Profile cycle stack traces and instructions or stalls separately •  Generate CPU flame graph (cycles) and color using other profile •  eg, FreeBSD: pmcstat red  ==  instruc0ons   blue  ==  stalls  
  • 36. Off-­‐CPU  Analysis   Analyze off-CPU time via blocking code path: Off-CPU flame graph Often need wakeup code paths as well…
  • 37. Off-­‐CPU  Time  Flame  Graph   file  read   from  disk   directory  read   from  disk   pipe  write   path  read  from  disk   fstat  from  disk   Stack  depth  Off-­‐CPU  0me  Trace blocking events with kernel stacks & time blocked (eg, using Linux BPF)
  • 38. Wakeup  Time  Flame  Graph   … can also associate wake-up stacks with off-CPU stacks (eg, Linux 4.6: samples/bpf/offwaketime*) Who did the wakeup:
  • 39. Associate more than one waker: the full chain of wakeups With enough stacks, all paths lead to metal An approach for analyzing all off-CPU issues Chain  Graphs  
  • 40. Latency  Correla0ons   1.  Measure latency histograms at different stack layers 2.  Compare histograms to find latency origin Even better, use latency heat maps •  Match outliers based on both latency and time
  • 41. Checklists:  eg,  Linux  Perf  Analysis  in  60s   1.  uptime 2.  dmesg | tail 3.  vmstat 1 4.  mpstat -P ALL 1 5.  pidstat 1 6.  iostat -xz 1 7.  free -m 8.  sar -n DEV 1 9.  sar -n TCP,ETCP 1 10.  top load  averages   kernel  errors   overall  stats  by  0me   CPU  balance   process  usage   disk  I/O   memory  usage   network  I/O   TCP  stats   check  overview   http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
  • 42. 1.  RPS,  CPU   2.  Volume   6.  Load  Avg   3.  Instances   4.  Scaling   5.  CPU/RPS   7.  Java  Heap   8.  ParNew   9.  Latency   10.  99th  0le   Checklists:  eg,  Neklix  perfvitals  Dashboard  
  • 43. Sta0c  Performance  Tuning:  eg,  Linux  
  • 44. Tools-­‐Based  Method   1.  Try all the tools! May be an anti-pattern. Eg, OS X:
  • 45. Other  Methodologies   •  Scientific method •  5 Why's •  Process of elimination •  Intel's Top-Down Methodology •  Method R
  • 46. What  You  Can  Do  
  • 47. What  you  can  do   1.  Know what's now possible on modern systems –  Dynamic tracing: efficiently instrument any software –  CPU facilities: PMCs, MSRs (model specific registers) –  Visualizations: flame graphs, latency heat maps, … 2.  Ask questions first: use methodologies to ask them 3.  Then find/build the metrics 4.  Build or buy dashboards to support methodologies
  • 48. Dynamic  Tracing:  Efficient  Metrics   send   receive   tcpdump   Kernel   1.  read   2.  dump   Analyzer   1.  read   2.  process   3.  print   disks   Old way: packet capture New way: dynamic tracing Tracer   1.  configure   2.  read   tcp_retransmit_skb()   Eg, tracing TCP retransmits file  system   buffer  
  • 49. Dynamic  Tracing:  Measure  Anything   Those are Solaris/DTrace tools. Now becoming possible on all OSes: FreeBSD & OS X DTrace, Linux BPF, Windows ETW
  • 50. Performance  Monitoring  Counters   Eg, FreeBSD PMC groups for Intel Sandy Bridge:
  • 51. Visualiza0ons   Eg, Disk I/O latency as a heat map, quantized in kernel:
  • 52. USE  Method:  eg,  Neklix  Vector   u0liza0on   satura0on  CPU:   u0liza0on   satura0on  Network:   load   u0liza0on   satura0on  Memory:   load   satura0on  Disk:   u0liza0on  
  • 53. USE  Method:  To  Do   Showing what is and is not commonly measured U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E   U   S   E  
  • 54. CPU  Workload  Characteriza0on:  To  Do   Who How What Why top,  htop perf record -g flame  Graphs   monitoring   perf stat -a -d Showing what is and is not commonly measured
  • 55. Summary   •  It is the crystal ball age of performance observability •  What matters is the questions you want answered •  Methodologies are a great way to pose questions
  • 56. References  &  Resources   •  USE Method –  http://queue.acm.org/detail.cfm?id=2413037 –  http://www.brendangregg.com/usemethod.html •  TSA Method –  http://www.brendangregg.com/tsamethod.html •  Off-CPU Analysis –  http://www.brendangregg.com/offcpuanalysis.html –  http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html –  http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html •  Static Performance Tuning, Richard Elling, Sun blueprint, May 2000 •  RED Method: http://www.slideshare.net/weaveworks/monitoring-microservices •  Other system methodologies –  Systems Performance: Enterprise and the Cloud, Prentice Hall 2013 –  http://www.brendangregg.com/methodology.html –  The Art of Computer Systems Performance Analysis, Jain, R., 1991 •  Flame Graphs –  http://queue.acm.org/detail.cfm?id=2927301 –  http://www.brendangregg.com/flamegraphs.html –  http://techblog.netflix.com/2015/07/java-in-flames.html •  Latency Heat Maps –  http://queue.acm.org/detail.cfm?id=1809426 –  http://www.brendangregg.com/HeatMaps/latency.html •  ARPA Network: http://www.computerhistory.org/internethistory/1960s •  RSTS/E System User's Guide, 1985, page 4-5 •  DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD, Prentice Hall 2011 •  Apollo: http://www.hq.nasa.gov/office/pao/History/alsj/a11 http://www.hq.nasa.gov/alsj/alsj-LMdocs.html
  • 57. Feb   2016   •  Questions? •  http://slideshare.net/brendangregg •  http://www.brendangregg.com •  bgregg@netflix.com •  @brendangregg Jun,  2016  ACM Applicative 2016