SlideShare una empresa de Scribd logo
1 de 67
Devel::NYTProf
Perl Source Code Profiler


   Tim Bunce - October 2009
Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
 s$_({});n" for 1..1000' > x.pl
Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
 s$_({});n" for 1..1000' > x.pl

$ perl -d:DProf x.pl
Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
 s$_({});n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -r
   Total Elapsed Time =    0.108 Seconds
            Real Time =    0.108 Seconds
   Exclusive Times
   %Time ExclSec CumulS #Calls sec/call Csec/c   Name
    9.26   0.010 0.010       1   0.0100 0.0100   main::s76
    9.26   0.010 0.010       1   0.0100 0.0100   main::s323
    9.26   0.010 0.010       1   0.0100 0.0100   main::s626
    9.26   0.010 0.010       1   0.0100 0.0100   main::s936
    0.00       - -0.000      1        -      -   main::s77
    0.00       - -0.000      1        -      -   main::s82
What To Measure?

              CPU Time   Real Time


Subroutines
                 ?          ?
Statements
                 ?          ?
CPU Time vs Real Time
• CPU time
 - Not (much) affected by other load on system
 - Doesn’t include time spent waiting for i/o etc.


• Real time
 - Is affected by other load on system
 - Includes time spent waiting
Sub vs Line
• Subroutine Profiling
 - Measures time between subroutine entry and exit
 - That’s the Inclusive time. Exclusive by subtraction.
 - Reasonably fast, reasonably small data files


• Problems
 - Can be confused by funky control flow
 - No insight into where time spent within large subs
 - Doesn’t measure code outside of a sub
Sub vs Line
• Line/Statement profiling
 - Measure time from start of one statement to next
 - Exclusive time (except includes built-ins & xsubs)
 - Fine grained detail
• Problems
 - Very expensive in CPU & I/O
 - Assigns too much time to some statements
 - Too much detail for large subs (want time per sub)
 - Hard to get overall subroutine times
Devel::NYTProf
v1 Innovations

• Fork by Adam Kaplan of Devel::FastProf
  - working at the New York Times
• HTML report borrowed from Devel::Cover
• More accurate: Discounts profiler overhead
  including cost of writing to the file
• Test suite!
v2 Innovations


• Profiles time per block!
  - Statement times can be aggregated
    to enclosing block
    and enclosing sub
v2 Innovations

• Dual Profilers!
 - Is a statement profiler
 - and a subroutine profiler
 - At the same time
v2 Innovations
• Subroutine profiler
 -   tracks time per calling location
 -   even for xsubs
 -   calculates exclusive time on-the-fly
 -   discounts overhead of statement profiler
 -   immune from funky control flow
 -   in memory, writes to file at end
 -   extremely fast
v2 Other Features
•   Profiles compile-time activity
•   Profiling can be enabled & disabled on the fly
•   Handles forks with no overhead
•   Correct timing for mod_perl
•   Sub-microsecond resolution
•   Multiple clocks, including high-res CPU time
•   Can snapshot source code & evals into profile
•   Built-in zip compression
Profiling Performance
              Time     Size
Perl           x1       -
DProf         x 4.9    60,736KB
SmallProf     x 22.0    -
FastProf      x 6.3    42,927KB
NYTProf       x 3.9    11,174KB
 + blocks=0   x 3.5     9,628KB
 + stmts=0    x 2.5*        205KB
v3 Features

•   Profiles slow opcodes: system calls, regexps, ...
•   Subroutine caller name noted, for call-graph
•   Handles goto ⊂ e.g. AUTOLOAD
•   HTML report includes interactive TreeMaps
•   Outputs call-graph in Graphviz dot format
Reporting: KCachegrind
• KCachegrind call graph - new and cool
   - contributed by C. L. Kao.
   - requires KCachegrind

  $ nytprofcg   # generates nytprof.callgraph
  $ kcachegrind # load the file via the gui
KCachegrind
Reporting: HTML
• HTML report
   - page per source file, annotated with times and links
   - subroutine index table with sortable columns
   - interactive Treemaps of subroutine times
   - generates Graphviz dot file of call graph

$ nytprofhtml # writes HTML report in ./nytprof/...
$ nytprofhtml --file=/tmp/nytprof.out.793 --open
Summary




                             Links to annotated
                                source code




Link to sortable table
      of all subs
                          Timings for perl builtins
Overall time spent in and below this sub

                                            (in + below)




       Color coding based on
     Median Average Deviation
     relative to rest of this file          Timings for each location calling into,
                                                 or out of, the subroutine
Treemap showing relative
                 proportions of exclusive time




                                  Boxes represent subroutines
                                   Colors only used to show
                                packages (and aren’t pretty yet)




Hover over box to see details
                                          Click to drill-down one level
                                              in package hierarchy
Calls between packages
Calls within a package
Let’s take a look...
Optimizing
  Hints & Tips
Phase 0
Before you start
DONʼT
DO IT!
“The First Rule of Program Optimization:
Don't do it.

The Second Rule of Program Optimization
(for experts only!): Don't do it yet.”

- Michael A. Jackson
Why not?
“More computing sins are committed in the
name of efficiency (without necessarily
achieving it) than for any other single
reason - including blind stupidity.”

- W.A. Wulf
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”

- Donald Knuth
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”
- Donald Knuth
How?
“Bottlenecks occur in surprising places, so
don't try to second guess and put in a speed
hack until you have proven that's where the
bottleneck is.”

- Rob Pike
“Measure twice, cut once.”

- Old Proverb
Phase 1
Low Hanging Fruit
Low Hanging Fruit
1.   Profile code running representative workload.
2.   Look at Exclusive Time of subroutines.
3.   Do they look reasonable?
4.   Examine worst offenders.
5.   Fix only simple local problems.
6.   Profile again.
7.   Fast enough? Then STOP!
8.   Rinse and repeat once or twice, then move on.
“Simple Local Fixes”


 Changes unlikely to introduce bugs
Move invariant
 expressions
 out of loops
Avoid->repeated->chains
  ->of->accessors(...);
Avoid->repeated->chains
  ->of->accessors(...);

    Use a temporary variable
Use faster accessors


   Class::Accessor
   -> Class::Accessor::Fast
   --> Class::Accessor::Faster
   ---> Class::XSAccessor
Avoid calling subs that
 don’t do anything!
  my $unsed_variable = $self->get_foo;


  my $is_logging = $log->info(...);
  while (...) {
      $log->info(...) if $is_logging;
      ...
  }
Exit subs and loops early
  Delay initializations
  return if not ...a cheap test...;
  return if not ...a more expensive test...;
  my $foo = ...initializations...;
  ...body of subroutine...
Fix silly code

-   return exists $nav_type{$country}{$key}
-               ? $nav_type{$country}{$key}
-               : undef;
+   return $nav_type{$country}{$key};
Beware pathological
regular expressions

      NYTPROF=slowops=2
Avoid unpacking args
  in very hot subs
   sub foo { shift->delegate(@_) }

   sub bar {
       return shift->{bar} unless @_;
       return $_[0]->{bar} = $_[1];
   }
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 2
Deeper Changes
Profile with a
    known workload


E.g., 1000 identical requests
Check Inclusive Times
 (especially top-level subs)


Reasonable percentage
  for the workload?
Check subroutine
   call counts

    Reasonable
for the workload?
Add caching
    if appropriate
   to reduce calls


Remember invalidation
Walk up call chain
 to find good spots
     for caching


Remember invalidation
Creating many objects
 that don’t get used?


 Lightweight proxies
 e.g. DateTimeX::Lite
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 3
Structural Changes
Push loops down


-   $object->walk($_) for @dogs;

+   $object->walk_these(@dogs);
Change the data
   structure

hashes <–> arrays
Change the algorithm

What’s the “Big O”?
O(n 2) or O(logn) or ...
Rewrite hot-spots in C

     Inline::C
It all adds up!

“I achieved my fast times by
multitudes of 1% reductions”

       - Bill Raymond
Questions?

Tim.Bunce@pobox.com
 @timbunce on twitter
http://blog.timbunce.org

Más contenido relacionado

La actualidad más candente

Lab3 advanced port scanning 30 oct 21
Lab3 advanced port scanning 30 oct 21Lab3 advanced port scanning 30 oct 21
Lab3 advanced port scanning 30 oct 21Hussain111321
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Matthew Ahrens
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseOaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseKyle Hailey
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
What to do when detect deadlock
What to do when detect deadlockWhat to do when detect deadlock
What to do when detect deadlockSyed Zaid Irshad
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015lpgauth
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoChris Stivers
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 

La actualidad más candente (20)

Lab3 advanced port scanning 30 oct 21
Lab3 advanced port scanning 30 oct 21Lab3 advanced port scanning 30 oct 21
Lab3 advanced port scanning 30 oct 21
 
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
Improving the ZFS Userland-Kernel API with Channel Programs - BSDCAN 2017 - M...
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseOaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
What to do when detect deadlock
What to do when detect deadlockWhat to do when detect deadlock
What to do when detect deadlock
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Storm
StormStorm
Storm
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 

Destacado

Dash Profiler 200910
Dash Profiler 200910Dash Profiler 200910
Dash Profiler 200910HighLoad2009
 
м.токовинин компромиссная производительность
м.токовинин   компромиссная производительностьм.токовинин   компромиссная производительность
м.токовинин компромиссная производительностьHighLoad2009
 
Smirnov Twisted Python
Smirnov Twisted PythonSmirnov Twisted Python
Smirnov Twisted PythonHighLoad2009
 
Dz Java Hi Load 0.4
Dz Java Hi Load 0.4Dz Java Hi Load 0.4
Dz Java Hi Load 0.4HighLoad2009
 
Eremkin Cboss Smsc Hl2009
Eremkin Cboss Smsc Hl2009Eremkin Cboss Smsc Hl2009
Eremkin Cboss Smsc Hl2009HighLoad2009
 

Destacado (9)

Dash Profiler 200910
Dash Profiler 200910Dash Profiler 200910
Dash Profiler 200910
 
м.токовинин компромиссная производительность
м.токовинин   компромиссная производительностьм.токовинин   компромиссная производительность
м.токовинин компромиссная производительность
 
Highload 2009
Highload 2009Highload 2009
Highload 2009
 
бегун
бегунбегун
бегун
 
Smirnov Twisted Python
Smirnov Twisted PythonSmirnov Twisted Python
Smirnov Twisted Python
 
Dz Java Hi Load 0.4
Dz Java Hi Load 0.4Dz Java Hi Load 0.4
Dz Java Hi Load 0.4
 
Php Daemon
Php DaemonPhp Daemon
Php Daemon
 
Silverspoon2
Silverspoon2Silverspoon2
Silverspoon2
 
Eremkin Cboss Smsc Hl2009
Eremkin Cboss Smsc Hl2009Eremkin Cboss Smsc Hl2009
Eremkin Cboss Smsc Hl2009
 

Similar a Nyt Prof 200910

Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Tim Bunce
 
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12Tim Bunce
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
Testing Wi-Fi with OSS Tools
Testing Wi-Fi with OSS ToolsTesting Wi-Fi with OSS Tools
Testing Wi-Fi with OSS ToolsAll Things Open
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrencyAlex Miller
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdfssuser28de9e
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 

Similar a Nyt Prof 200910 (20)

Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406
 
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
Testing Wi-Fi with OSS Tools
Testing Wi-Fi with OSS ToolsTesting Wi-Fi with OSS Tools
Testing Wi-Fi with OSS Tools
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
 
04 performance
04 performance04 performance
04 performance
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Ansible - A 'crowd' introduction
Ansible - A 'crowd' introductionAnsible - A 'crowd' introduction
Ansible - A 'crowd' introduction
 

Más de HighLoad2009

Hl++2009 Ayakovlev Pochta
Hl++2009 Ayakovlev PochtaHl++2009 Ayakovlev Pochta
Hl++2009 Ayakovlev PochtaHighLoad2009
 
архитектура новой почты рамблера
архитектура новой почты рамблераархитектура новой почты рамблера
архитектура новой почты рамблераHighLoad2009
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
особенности использования Times Ten In Memory Database в высоконагруженной среде
особенности использования Times Ten In Memory Database в высоконагруженной средеособенности использования Times Ten In Memory Database в высоконагруженной среде
особенности использования Times Ten In Memory Database в высоконагруженной средеHighLoad2009
 
High Load 2009 Dimaa Rus Ready
High Load 2009 Dimaa Rus ReadyHigh Load 2009 Dimaa Rus Ready
High Load 2009 Dimaa Rus ReadyHighLoad2009
 
High Load 2009 Dimaa Rus Ready 16 9
High Load 2009 Dimaa Rus Ready 16 9High Load 2009 Dimaa Rus Ready 16 9
High Load 2009 Dimaa Rus Ready 16 9HighLoad2009
 
температура мира
температура миратемпература мира
температура мираHighLoad2009
 

Más de HighLoad2009 (20)

Krizhanovsky Vm
Krizhanovsky VmKrizhanovsky Vm
Krizhanovsky Vm
 
Ddos
DdosDdos
Ddos
 
Kosmodemiansky
KosmodemianskyKosmodemiansky
Kosmodemiansky
 
Scalaxy
ScalaxyScalaxy
Scalaxy
 
Hl++2009 Ayakovlev Pochta
Hl++2009 Ayakovlev PochtaHl++2009 Ayakovlev Pochta
Hl++2009 Ayakovlev Pochta
 
Why02
Why02Why02
Why02
 
архитектура новой почты рамблера
архитектура новой почты рамблераархитектура новой почты рамблера
архитектура новой почты рамблера
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Take2
Take2Take2
Take2
 
Hl2009 1c Bitrix
Hl2009 1c BitrixHl2009 1c Bitrix
Hl2009 1c Bitrix
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
Hl2009 Pr V2
Hl2009 Pr V2Hl2009 Pr V2
Hl2009 Pr V2
 
Highload2009
Highload2009Highload2009
Highload2009
 
особенности использования Times Ten In Memory Database в высоконагруженной среде
особенности использования Times Ten In Memory Database в высоконагруженной средеособенности использования Times Ten In Memory Database в высоконагруженной среде
особенности использования Times Ten In Memory Database в высоконагруженной среде
 
Hl Nekoval
Hl NekovalHl Nekoval
Hl Nekoval
 
High Load 2009 Dimaa Rus Ready
High Load 2009 Dimaa Rus ReadyHigh Load 2009 Dimaa Rus Ready
High Load 2009 Dimaa Rus Ready
 
High Load 2009 Dimaa Rus Ready 16 9
High Load 2009 Dimaa Rus Ready 16 9High Load 2009 Dimaa Rus Ready 16 9
High Load 2009 Dimaa Rus Ready 16 9
 
Pl High Load V1.1
Pl High Load V1.1Pl High Load V1.1
Pl High Load V1.1
 
температура мира
температура миратемпература мира
температура мира
 
бегун
бегунбегун
бегун
 

Nyt Prof 200910

  • 1. Devel::NYTProf Perl Source Code Profiler Tim Bunce - October 2009
  • 2. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl
  • 3. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl $ perl -d:DProf x.pl
  • 4. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl $ perl -d:DProf x.pl $ dprofpp -r Total Elapsed Time = 0.108 Seconds Real Time = 0.108 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82
  • 5. What To Measure? CPU Time Real Time Subroutines ? ? Statements ? ?
  • 6. CPU Time vs Real Time • CPU time - Not (much) affected by other load on system - Doesn’t include time spent waiting for i/o etc. • Real time - Is affected by other load on system - Includes time spent waiting
  • 7. Sub vs Line • Subroutine Profiling - Measures time between subroutine entry and exit - That’s the Inclusive time. Exclusive by subtraction. - Reasonably fast, reasonably small data files • Problems - Can be confused by funky control flow - No insight into where time spent within large subs - Doesn’t measure code outside of a sub
  • 8. Sub vs Line • Line/Statement profiling - Measure time from start of one statement to next - Exclusive time (except includes built-ins & xsubs) - Fine grained detail • Problems - Very expensive in CPU & I/O - Assigns too much time to some statements - Too much detail for large subs (want time per sub) - Hard to get overall subroutine times
  • 10. v1 Innovations • Fork by Adam Kaplan of Devel::FastProf - working at the New York Times • HTML report borrowed from Devel::Cover • More accurate: Discounts profiler overhead including cost of writing to the file • Test suite!
  • 11. v2 Innovations • Profiles time per block! - Statement times can be aggregated to enclosing block and enclosing sub
  • 12. v2 Innovations • Dual Profilers! - Is a statement profiler - and a subroutine profiler - At the same time
  • 13. v2 Innovations • Subroutine profiler - tracks time per calling location - even for xsubs - calculates exclusive time on-the-fly - discounts overhead of statement profiler - immune from funky control flow - in memory, writes to file at end - extremely fast
  • 14. v2 Other Features • Profiles compile-time activity • Profiling can be enabled & disabled on the fly • Handles forks with no overhead • Correct timing for mod_perl • Sub-microsecond resolution • Multiple clocks, including high-res CPU time • Can snapshot source code & evals into profile • Built-in zip compression
  • 15. Profiling Performance Time Size Perl x1 - DProf x 4.9 60,736KB SmallProf x 22.0 - FastProf x 6.3 42,927KB NYTProf x 3.9 11,174KB + blocks=0 x 3.5 9,628KB + stmts=0 x 2.5* 205KB
  • 16. v3 Features • Profiles slow opcodes: system calls, regexps, ... • Subroutine caller name noted, for call-graph • Handles goto &sub; e.g. AUTOLOAD • HTML report includes interactive TreeMaps • Outputs call-graph in Graphviz dot format
  • 17. Reporting: KCachegrind • KCachegrind call graph - new and cool - contributed by C. L. Kao. - requires KCachegrind $ nytprofcg # generates nytprof.callgraph $ kcachegrind # load the file via the gui
  • 19. Reporting: HTML • HTML report - page per source file, annotated with times and links - subroutine index table with sortable columns - interactive Treemaps of subroutine times - generates Graphviz dot file of call graph $ nytprofhtml # writes HTML report in ./nytprof/... $ nytprofhtml --file=/tmp/nytprof.out.793 --open
  • 20.
  • 21. Summary Links to annotated source code Link to sortable table of all subs Timings for perl builtins
  • 22.
  • 23. Overall time spent in and below this sub (in + below) Color coding based on Median Average Deviation relative to rest of this file Timings for each location calling into, or out of, the subroutine
  • 24.
  • 25. Treemap showing relative proportions of exclusive time Boxes represent subroutines Colors only used to show packages (and aren’t pretty yet) Hover over box to see details Click to drill-down one level in package hierarchy
  • 27. Calls within a package
  • 28. Let’s take a look...
  • 31.
  • 33. “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
  • 35. “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.” - W.A. Wulf
  • 36. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 37. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 38. How?
  • 39. “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.” - Rob Pike
  • 40. “Measure twice, cut once.” - Old Proverb
  • 42. Low Hanging Fruit 1. Profile code running representative workload. 2. Look at Exclusive Time of subroutines. 3. Do they look reasonable? 4. Examine worst offenders. 5. Fix only simple local problems. 6. Profile again. 7. Fast enough? Then STOP! 8. Rinse and repeat once or twice, then move on.
  • 43. “Simple Local Fixes” Changes unlikely to introduce bugs
  • 45. Avoid->repeated->chains ->of->accessors(...); Avoid->repeated->chains ->of->accessors(...); Use a temporary variable
  • 46. Use faster accessors Class::Accessor -> Class::Accessor::Fast --> Class::Accessor::Faster ---> Class::XSAccessor
  • 47. Avoid calling subs that don’t do anything! my $unsed_variable = $self->get_foo; my $is_logging = $log->info(...); while (...) { $log->info(...) if $is_logging; ... }
  • 48. Exit subs and loops early Delay initializations return if not ...a cheap test...; return if not ...a more expensive test...; my $foo = ...initializations...; ...body of subroutine...
  • 49. Fix silly code - return exists $nav_type{$country}{$key} - ? $nav_type{$country}{$key} - : undef; + return $nav_type{$country}{$key};
  • 51. Avoid unpacking args in very hot subs sub foo { shift->delegate(@_) } sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }
  • 52. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 54. Profile with a known workload E.g., 1000 identical requests
  • 55. Check Inclusive Times (especially top-level subs) Reasonable percentage for the workload?
  • 56. Check subroutine call counts Reasonable for the workload?
  • 57. Add caching if appropriate to reduce calls Remember invalidation
  • 58. Walk up call chain to find good spots for caching Remember invalidation
  • 59. Creating many objects that don’t get used? Lightweight proxies e.g. DateTimeX::Lite
  • 60. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 62. Push loops down - $object->walk($_) for @dogs; + $object->walk_these(@dogs);
  • 63. Change the data structure hashes <–> arrays
  • 64. Change the algorithm What’s the “Big O”? O(n 2) or O(logn) or ...
  • 65. Rewrite hot-spots in C Inline::C
  • 66. It all adds up! “I achieved my fast times by multitudes of 1% reductions” - Bill Raymond
  • 67. Questions? Tim.Bunce@pobox.com @timbunce on twitter http://blog.timbunce.org

Notas del editor

  1. CPU: only useful if high resolution Real: Most useful most of the time. Real users wait in real time! You want to know about slow I/O, slow db queries, delays due to thread contention, locks etc. etc
  2. Funky: goto &amp;sub, next/redo/last out of a sub, even exceptions
  3. Slightly dependent on perl version.
  4. As of 2008 NYTProf 2.0x perlcritic lib/Perl/Critic/Policy perl 5.6.8 perlcritic 1.088
  5. Generates GraphViz files that can be used to produce these diagrams
  6. Don&amp;#x2019;t optimize until you have a real need to Concentrate on good design &amp; implementation
  7. Don&amp;#x2019;t optimize until you have a real need to Concentrate on good design &amp; implementation
  8. Don&amp;#x2019;t optimize until you have a real need to Concentrate on good design &amp; implementation
  9. Don&amp;#x2019;t optimize until you have a real need to Concentrate on good design &amp; implementation
  10. Don&amp;#x2019;t optimize until you have a real need to Concentrate on good design &amp; implementation
  11. programmer time $ &gt; cpu time $. likely to introduce bugs
  12. Measure Twice, cut once
  13. Bottlenecks move when changes are made.
  14. use a temporary variable
  15. NYTProf can now time regular expressions as if they&amp;#x2019;re subroutines
  16. So you can tell what&amp;#x2019;s &amp;#x2018;reasonable&amp;#x2019;.
  17. Proportion of phases: Read, modify, write. Extract, translate, load.
  18. Often very effective. Especially if walk() has to do any initialisation.