SlideShare una empresa de Scribd logo
1 de 72
Descargar para leer sin conexión
Devel::NYTProf
Perl Source Code Profiler


   Tim Bunce - October 2009
Devel::DProf
• Oldest Perl Profiler —1995

• Design flaws make it practically useless
  on modern systems

• Limited to 0.01 second resolution
  even for realtime measurements!
Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
 s$_({});n" for 1..1000' > x.pl

$ perl -d:DProf x.pl

$ dprofpp -r
   Total Elapsed Time =    0.108 Seconds
            Real Time =    0.108 Seconds
   Exclusive Times
   %Time ExclSec CumulS #Calls sec/call Csec/c   Name
    9.26   0.010 0.010       1   0.0100 0.0100   main::s76
    9.26   0.010 0.010       1   0.0100 0.0100   main::s323
    9.26   0.010 0.010       1   0.0100 0.0100   main::s626
    9.26   0.010 0.010       1   0.0100 0.0100   main::s936
    0.00       - -0.000      1        -      -   main::s77
    0.00       - -0.000      1        -      -   main::s82
Lots of Perl Profilers
• Take your pick...
   Devel::DProf          |   1995   |   Subroutine
   Devel::SmallProf      |   1997   |   Line
   Devel::AutoProfiler   |   2002   |   Subroutine
   Devel::Profiler       |   2002   |   Subroutine
   Devel::Profile        |   2003   |   Subroutine
   Devel::FastProf       |   2005   |   Line
   Devel::DProfLB        |   2006   |   Subroutine
   Devel::WxProf         |   2008   |   Subroutine
   Devel::Profit         |   2008   |   Line
   Devel::NYTProf        |   2008   |   Line & Subroutine
Evolution

Devel::DProf        | 1995 | Subroutine
Devel::SmallProf     | 1997 | Line
Devel::AutoProfiler | 2002 | Subroutine
Devel::Profiler     | 2002 | Subroutine
Devel::Profile      | 2003 | Subroutine
Devel::FastProf      | 2005 | Line
Devel::DProfLB      | 2006 | Subroutine
Devel::WxProf       | 2008 | Subroutine
Devel::Profit       | 2008 | Line
Devel::NYTProf v1    | 2008 | Line
Devel::NYTProf v2    | 2008 | Line & Subroutine
 ...plus lots of innovations!
What To Measure?

              CPU Time   Real Time


Subroutines
                 ?          ?
Statements
                 ?          ?
CPU Time vs Real Time
            • CPU time
               - Measures time CPU sent executing your code
               - Not (much) affected by other load on system
               - Doesn’t include time spent waiting for i/o etc.


            • Real time
               - Measures the elapsed time-of-day
               - Your time is affected by other load on system
               - Includes time spent waiting for i/o etc.


CPU: only useful if high resolution

Real: Most useful most of the time. Real users wait in real time!

You want to know about slow I/O, slow db queries, delays due to thread contention, locks etc. etc
Sub vs Line
           • Subroutine Profiling
              - Measures time between subroutine entry and exit
              - That’s the Inclusive time. Exclusive by subtraction.
              - Reasonably fast, reasonably small data files


           • Problems
              - Can be confused by funky control flow
              - No insight into where time spent within large subs
              - Doesn’t measure code outside of a sub


Funky: goto &sub, next/redo/last out of a sub, even exceptions
Sub vs Line
• Line/Statement profiling
 - Measure time from start of one statement to next
 - Exclusive time (except includes built-ins & xsubs)
 - Fine grained detail


• Problems
 -   Very expensive in CPU & I/O
 -   Assigns too much time to some statements
 -   Too much detail for large subs (want time per sub)
 -   Hard to get overall subroutine times
Devel::NYTProf
v1 Innovations

• Fork by Adam Kaplan of Devel::FastProf
  - working at the New York Times
• HTML report borrowed from Devel::Cover
• More accurate: Discounts profiler overhead
  including cost of writing to the file
• Test suite!
v2 Innovations


• Profiles time per block!
  - Statement times can be aggregated
    to enclosing block
    and enclosing sub
v2 Innovations

• Dual Profilers!
 - Is a statement profiler
 - and a subroutine profiler
 - At the same time
v2 Innovations
• Subroutine profiler
 -   tracks time per calling location
 -   even for xsubs
 -   calculates exclusive time on-the-fly
 -   discounts cost of statement profiler
 -   immune from funky control flow
 -   in memory, writes to file at end
 -   extremely fast
v2 Innovations

           • Statement profiler gives correct timing
                after leave ops
               - unlike previous statement profilers...
               - last statement in loops doesn’t accumulate
                 time spent evaluating the condition
               - last statement in subs doesn’t accumulate time
                 spent in remainder of calling statement



Slightly dependent on perl version.
v2 Other Features
•   Profiles compile-time activity
•   Profiling can be enabled & disabled on the fly
•   Handles forks with no overhead
•   Correct timing for mod_perl
•   Sub-microsecond resolution
•   Multiple clocks, including high-res CPU time
•   Can snapshot source code & evals into profile
•   Built-in zip compression
Profiling Performance
                                 Time                  Size
      Perl                          x1                   -
      DProf                       x 4.9                60,736KB
      SmallProf                 x 22.0                   -
      FastProf                    x 6.3                42,927KB
      NYTProf                   x 3.9                  11,174KB
       + blocks=0                 x 3.5                 9,628KB
       + stmts=0                  x 2.5                      205KB

NYTProf v2.0 running perl 5.6.8 perlcritic 1.088 on lib/Perl/Critic/Policy
v3 Features

•   Profiles slow opcodes: system calls, regexps, ...
•   Subroutine caller name noted, for call-graph
•   Handles goto ⊂ e.g. AUTOLOAD
•   HTML report includes interactive TreeMaps
•   Outputs call-graph in Graphviz dot format
Running NYTProf

perl -d:NYTProf ...


perl -MDevel::NYTProf ...


PERL5OPT=-d:NYTProf


NYTPROF=file=/tmp/nytprof.out:addpid=1:slowops=1
Reporting: CSV
• CSV - old, limited, dull
  $ nytprofcsv


  # Format: time,calls,time/call,code
  0,0,0,sub foo {
  0.000002,2,0.00001,print "in sub foon";
  0.000004,2,0.00002,bar();
  0,0,0,}
  0,0,0,
Reporting: KCachegrind
• KCachegrind call graph - new and cool
   - contributed by C. L. Kao.
   - requires KCachegrind

  $ nytprofcg   # generates nytprof.callgraph
  $ kcachegrind # load the file via the gui
KCachegrind
Reporting: HTML
• HTML report
   - page per source file, annotated with times and links
   - subroutine index table with sortable columns
   - interactive Treemaps of subroutine times
   - generates Graphviz dot file of call graph

$ nytprofhtml # writes HTML report in ./nytprof/...
$ nytprofhtml --file=/tmp/nytprof.out.793 --open
Summary




                             Links to annotated
                                source code




Link to sortable table
      of all subs
                          Timings for perl builtins
Exclusive vs. Inclusive
• Exclusive Time = Bottom up
 - Detail of time spent “just here”
 - Where the time actually gets spent
 - Useful for localized (peephole) optimisation


• Inclusive Time = Top down
 - Overview of time spent “in and below”
 - Useful to prioritize structural optimizations
Overall time spent in and below this sub

                                            (in + below)




       Color coding based on
     Median Average Deviation
     relative to rest of this file          Timings for each location calling into,
                                                 or out of, the subroutine
Treemap showing relative
                 proportions of exclusive time




                                  Boxes represent subroutines
                                   Colors only used to show
                                packages (and aren’t pretty yet)




Hover over box to see details
                                          Click to drill-down one level
                                              in package hierarchy
Calls between packages




Generates GraphViz files that can be used to produce these diagrams
Calls to/from/within package
Let’s take a look...
Optimizing
  Hints & Tips
Phase 0
Before you start
DONʼT
            DO IT!
Don’t optimize until you have a real need to
Concentrate on good design & implementation
“The First Rule of Program Optimization:
Don't do it.

The Second Rule of Program Optimization
(for experts only!): Don't do it yet.”

- Michael A. Jackson
Why not?
“More computing sins are committed in the
   name of efficiency (without necessarily
   achieving it) than for any other single
   reason - including blind stupidity.”

   - W.A. Wulf




• programmer time $ > cpu time $.
• likely to introduce bugs
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”

- Donald Knuth
“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”
- Donald Knuth
How?
“Bottlenecks occur in surprising places, so
don't try to second guess and put in a
speed hack until you have proven that's
where the bottleneck is.”

- Rob Pike
“Measure twice, cut once.”

        - Old Proverb




Measure Twice, cut once
Phase 1
Low Hanging Fruit
Low Hanging Fruit
          1.   Profile code running representative workload.
          2.   Look at Exclusive Time of subroutines.
          3.   Do they look reasonable?
          4.   Examine worst offenders.
          5.   Fix only simple local problems.
          6.   Profile again.
          7.   Fast enough? Then STOP!
          8.   Rinse and repeat once or twice, then move on.


Bottlenecks move when changes are made.
“Simple Local Fixes”


 Changes unlikely to introduce bugs
Move invariant
 expressions
 out of loops
Avoid->repeated->chains
                 ->of->accessors(...);
               Avoid->repeated->chains
                 ->of->accessors(...);

                           Use a temporary variable

use a temporary variable
Use faster accessors


   Class::Accessor
   -> Class::Accessor::Fast
   --> Class::Accessor::Faster
   ---> Class::Accessor::Fast::XS
Avoid calling subs that
 don’t do anything!
  my $unused_variable = $self->get_foo;


  my $is_logging = $log->info(...);
  while (...) {
      $log->info(...) if $is_logging;
      ...
  }
Exit subs and loops early
  Delay initializations
  return if not ...a cheap test...;
  return if not ...a more expensive test...;
  my $foo = ...initializations...;
  ...body of subroutine...
Fix silly code

-   return exists $nav_type{$country}{$key}
-               ? $nav_type{$country}{$key}
-               : undef;
+   return $nav_type{$country}{$key};
Beware pathological
               regular expressions

                                      NYTPROF=slowops=2




NYTProf can now time regular expressions as if they’re subroutines
Avoid unpacking args
  in very hot subs
   sub foo { shift->delegate(@_) }

   sub bar {
       return shift->{bar} unless @_;
       return $_[0]->{bar} = $_[1];
   }
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 2
Deeper Changes
Profile with a
                     known workload


         E.g., 1000 identical requests

So you can tell what’s ‘reasonable’.
Check Inclusive Times
                   (especially top-level subs)


            Reasonable percentage
              for the workload?

Proportion of phases: Read, modify, write. Extract, translate, load.
Check subroutine
   call counts

    Reasonable
for the workload?
Add caching
    if appropriate
   to reduce calls


Remember invalidation
Walk up call chain
 to find good spots
     for caching


Remember invalidation
Creating many objects
 that don’t get used?


 Lightweight proxies
 e.g. DateTimeX::Lite
Retest.

Fast enough?

        STOP!
Put the profiler down and walk away
Phase 3
Structural Changes
Push loops down


                          -       $object->walk($_) for @dogs;

                          +       $object->walk_these(@dogs);




Often very effective. Especially if walk() has to do any initialisation.
Change the data
   structure

hashes <–> arrays
Change the algorithm

What’s the “Big O”?
O(n 2) or O(logn) or ...
Rewrite hot-spots in C

     Inline::C
It all adds up!

“I achieved my fast times by
multitudes of 1% reductions”

       - Bill Raymond
Questions?

Tim.Bunce@pobox.com
 @timbunce on twitter
http://blog.timbunce.org

Más contenido relacionado

La actualidad más candente

Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Junichi Ishida
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 
Asynchronous I/O in Python 3
Asynchronous I/O in Python 3Asynchronous I/O in Python 3
Asynchronous I/O in Python 3Feihong Hsu
 
Packaging perl (LPW2010)
Packaging perl (LPW2010)Packaging perl (LPW2010)
Packaging perl (LPW2010)p3castro
 
The Integration of Laravel with Swoole
The Integration of Laravel with SwooleThe Integration of Laravel with Swoole
The Integration of Laravel with SwooleAlbert Chen
 
2021.laravelconf.tw.slides1
2021.laravelconf.tw.slides12021.laravelconf.tw.slides1
2021.laravelconf.tw.slides1LiviaLiaoFontech
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Clark Everetts
 
Vulnerability desing patterns
Vulnerability desing patternsVulnerability desing patterns
Vulnerability desing patternsPeter Hlavaty
 
Obfuscating The Empire
Obfuscating The EmpireObfuscating The Empire
Obfuscating The EmpireRyan Cobb
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpAll Things Open
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsDECK36
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architectureElizabeth Smith
 
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
[네이버오픈소스세미나] What’s new in Zipkin - Adrian ColeNAVER Engineering
 
Auto Deploy Deep Dive – vBrownBag Style
Auto Deploy Deep Dive – vBrownBag StyleAuto Deploy Deep Dive – vBrownBag Style
Auto Deploy Deep Dive – vBrownBag StyleRobert Nelson
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneSylvain Zimmer
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPANdaoswald
 

La actualidad más candente (20)

Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Asynchronous I/O in Python 3
Asynchronous I/O in Python 3Asynchronous I/O in Python 3
Asynchronous I/O in Python 3
 
Packaging perl (LPW2010)
Packaging perl (LPW2010)Packaging perl (LPW2010)
Packaging perl (LPW2010)
 
The Integration of Laravel with Swoole
The Integration of Laravel with SwooleThe Integration of Laravel with Swoole
The Integration of Laravel with Swoole
 
2021.laravelconf.tw.slides1
2021.laravelconf.tw.slides12021.laravelconf.tw.slides1
2021.laravelconf.tw.slides1
 
Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016Php Dependency Management with Composer ZendCon 2016
Php Dependency Management with Composer ZendCon 2016
 
Racing with Droids
Racing with DroidsRacing with Droids
Racing with Droids
 
Vulnerability desing patterns
Vulnerability desing patternsVulnerability desing patterns
Vulnerability desing patterns
 
Obfuscating The Empire
Obfuscating The EmpireObfuscating The Empire
Obfuscating The Empire
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
 
Tp install anything
Tp install anythingTp install anything
Tp install anything
 
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
[네이버오픈소스세미나] What’s new in Zipkin - Adrian Cole
 
Auto Deploy Deep Dive – vBrownBag Style
Auto Deploy Deep Dive – vBrownBag StyleAuto Deploy Deep Dive – vBrownBag Style
Auto Deploy Deep Dive – vBrownBag Style
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing oneDeveloper-friendly taskqueues: What you should ask yourself before choosing one
Developer-friendly taskqueues: What you should ask yourself before choosing one
 
The problem with Perl
The problem with PerlThe problem with Perl
The problem with Perl
 
30 Minutes To CPAN
30 Minutes To CPAN30 Minutes To CPAN
30 Minutes To CPAN
 
Php extensions
Php extensionsPhp extensions
Php extensions
 

Destacado

Graphic visualization
Graphic visualizationGraphic visualization
Graphic visualizationuwevoelker
 
Perl6 DBDI YAPC::EU 201008
Perl6 DBDI YAPC::EU 201008Perl6 DBDI YAPC::EU 201008
Perl6 DBDI YAPC::EU 201008Tim Bunce
 
PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012Tim Bunce
 
Perl Tidy Perl Critic
Perl Tidy Perl CriticPerl Tidy Perl Critic
Perl Tidy Perl Criticolegmmiller
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyTim Bunce
 

Destacado (6)

Graphic visualization
Graphic visualizationGraphic visualization
Graphic visualization
 
Perl6 DBDI YAPC::EU 201008
Perl6 DBDI YAPC::EU 201008Perl6 DBDI YAPC::EU 201008
Perl6 DBDI YAPC::EU 201008
 
PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012PL/Perl - New Features in PostgreSQL 9.0 201012
PL/Perl - New Features in PostgreSQL 9.0 201012
 
Perl Tidy Perl Critic
Perl Tidy Perl CriticPerl Tidy Perl Critic
Perl Tidy Perl Critic
 
Workflow Yapceu2010
Workflow Yapceu2010Workflow Yapceu2010
Workflow Yapceu2010
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 

Similar a Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)

Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Tim Bunce
 
Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Tim Bunce
 
Profiling with Devel::NYTProf
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProfbobcatfish
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Lucas Leong
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance DrupalJeff Geerling
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Issac Buenrostro
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the MetalC4Media
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Mike Willbanks
 

Similar a Devel::NYTProf v3 - 200908 (OUTDATED, see 201008) (20)

Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)Devel::NYTProf 2009-07 (OUTDATED, see 201008)
Devel::NYTProf 2009-07 (OUTDATED, see 201008)
 
Nyt Prof 200910
Nyt Prof 200910Nyt Prof 200910
Nyt Prof 200910
 
Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406Devel::NYTProf v5 at YAPC::NA 201406
Devel::NYTProf v5 at YAPC::NA 201406
 
Ansible - A 'crowd' introduction
Ansible - A 'crowd' introductionAnsible - A 'crowd' introduction
Ansible - A 'crowd' introduction
 
Profiling with Devel::NYTProf
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProf
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
High Performance Drupal
High Performance DrupalHigh Performance Drupal
High Performance Drupal
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
13 risc
13 risc13 risc
13 risc
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
 
Lecture1
Lecture1Lecture1
Lecture1
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Scaling tappsi
Scaling tappsiScaling tappsi
Scaling tappsi
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012Gearman - Northeast PHP 2012
Gearman - Northeast PHP 2012
 

Más de Tim Bunce

Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Tim Bunce
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12Tim Bunce
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209Tim Bunce
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Tim Bunce
 
Perl 6 DBDI 201007 (OUTDATED, see 201008)
Perl 6 DBDI 201007 (OUTDATED, see 201008)Perl 6 DBDI 201007 (OUTDATED, see 201008)
Perl 6 DBDI 201007 (OUTDATED, see 201008)Tim Bunce
 
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0Tim Bunce
 
DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007Tim Bunce
 
Perl Myths 200909
Perl Myths 200909Perl Myths 200909
Perl Myths 200909Tim Bunce
 
DashProfiler 200807
DashProfiler 200807DashProfiler 200807
DashProfiler 200807Tim Bunce
 
DBI for Parrot and Perl 6 Lightning Talk 2007
DBI for Parrot and Perl 6 Lightning Talk 2007DBI for Parrot and Perl 6 Lightning Talk 2007
DBI for Parrot and Perl 6 Lightning Talk 2007Tim Bunce
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809Tim Bunce
 
Perl Myths 200802 with notes (OUTDATED, see 200909)
Perl Myths 200802 with notes (OUTDATED, see 200909)Perl Myths 200802 with notes (OUTDATED, see 200909)
Perl Myths 200802 with notes (OUTDATED, see 200909)Tim Bunce
 

Más de Tim Bunce (12)

Perl Memory Use - LPW2013
Perl Memory Use - LPW2013Perl Memory Use - LPW2013
Perl Memory Use - LPW2013
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12
 
Perl Memory Use 201209
Perl Memory Use 201209Perl Memory Use 201209
Perl Memory Use 201209
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )
 
Perl 6 DBDI 201007 (OUTDATED, see 201008)
Perl 6 DBDI 201007 (OUTDATED, see 201008)Perl 6 DBDI 201007 (OUTDATED, see 201008)
Perl 6 DBDI 201007 (OUTDATED, see 201008)
 
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0
 
DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007DBI Advanced Tutorial 2007
DBI Advanced Tutorial 2007
 
Perl Myths 200909
Perl Myths 200909Perl Myths 200909
Perl Myths 200909
 
DashProfiler 200807
DashProfiler 200807DashProfiler 200807
DashProfiler 200807
 
DBI for Parrot and Perl 6 Lightning Talk 2007
DBI for Parrot and Perl 6 Lightning Talk 2007DBI for Parrot and Perl 6 Lightning Talk 2007
DBI for Parrot and Perl 6 Lightning Talk 2007
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809
 
Perl Myths 200802 with notes (OUTDATED, see 200909)
Perl Myths 200802 with notes (OUTDATED, see 200909)Perl Myths 200802 with notes (OUTDATED, see 200909)
Perl Myths 200802 with notes (OUTDATED, see 200909)
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)

  • 1. Devel::NYTProf Perl Source Code Profiler Tim Bunce - October 2009
  • 2. Devel::DProf • Oldest Perl Profiler —1995 • Design flaws make it practically useless on modern systems • Limited to 0.01 second resolution even for realtime measurements!
  • 3. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl $ perl -d:DProf x.pl $ dprofpp -r Total Elapsed Time = 0.108 Seconds Real Time = 0.108 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82
  • 4. Lots of Perl Profilers • Take your pick... Devel::DProf | 1995 | Subroutine Devel::SmallProf | 1997 | Line Devel::AutoProfiler | 2002 | Subroutine Devel::Profiler | 2002 | Subroutine Devel::Profile | 2003 | Subroutine Devel::FastProf | 2005 | Line Devel::DProfLB | 2006 | Subroutine Devel::WxProf | 2008 | Subroutine Devel::Profit | 2008 | Line Devel::NYTProf | 2008 | Line & Subroutine
  • 5. Evolution Devel::DProf | 1995 | Subroutine Devel::SmallProf | 1997 | Line Devel::AutoProfiler | 2002 | Subroutine Devel::Profiler | 2002 | Subroutine Devel::Profile | 2003 | Subroutine Devel::FastProf | 2005 | Line Devel::DProfLB | 2006 | Subroutine Devel::WxProf | 2008 | Subroutine Devel::Profit | 2008 | Line Devel::NYTProf v1 | 2008 | Line Devel::NYTProf v2 | 2008 | Line & Subroutine ...plus lots of innovations!
  • 6. What To Measure? CPU Time Real Time Subroutines ? ? Statements ? ?
  • 7. CPU Time vs Real Time • CPU time - Measures time CPU sent executing your code - Not (much) affected by other load on system - Doesn’t include time spent waiting for i/o etc. • Real time - Measures the elapsed time-of-day - Your time is affected by other load on system - Includes time spent waiting for i/o etc. CPU: only useful if high resolution Real: Most useful most of the time. Real users wait in real time! You want to know about slow I/O, slow db queries, delays due to thread contention, locks etc. etc
  • 8. Sub vs Line • Subroutine Profiling - Measures time between subroutine entry and exit - That’s the Inclusive time. Exclusive by subtraction. - Reasonably fast, reasonably small data files • Problems - Can be confused by funky control flow - No insight into where time spent within large subs - Doesn’t measure code outside of a sub Funky: goto &sub, next/redo/last out of a sub, even exceptions
  • 9. Sub vs Line • Line/Statement profiling - Measure time from start of one statement to next - Exclusive time (except includes built-ins & xsubs) - Fine grained detail • Problems - Very expensive in CPU & I/O - Assigns too much time to some statements - Too much detail for large subs (want time per sub) - Hard to get overall subroutine times
  • 11. v1 Innovations • Fork by Adam Kaplan of Devel::FastProf - working at the New York Times • HTML report borrowed from Devel::Cover • More accurate: Discounts profiler overhead including cost of writing to the file • Test suite!
  • 12. v2 Innovations • Profiles time per block! - Statement times can be aggregated to enclosing block and enclosing sub
  • 13. v2 Innovations • Dual Profilers! - Is a statement profiler - and a subroutine profiler - At the same time
  • 14. v2 Innovations • Subroutine profiler - tracks time per calling location - even for xsubs - calculates exclusive time on-the-fly - discounts cost of statement profiler - immune from funky control flow - in memory, writes to file at end - extremely fast
  • 15. v2 Innovations • Statement profiler gives correct timing after leave ops - unlike previous statement profilers... - last statement in loops doesn’t accumulate time spent evaluating the condition - last statement in subs doesn’t accumulate time spent in remainder of calling statement Slightly dependent on perl version.
  • 16. v2 Other Features • Profiles compile-time activity • Profiling can be enabled & disabled on the fly • Handles forks with no overhead • Correct timing for mod_perl • Sub-microsecond resolution • Multiple clocks, including high-res CPU time • Can snapshot source code & evals into profile • Built-in zip compression
  • 17.
  • 18. Profiling Performance Time Size Perl x1 - DProf x 4.9 60,736KB SmallProf x 22.0 - FastProf x 6.3 42,927KB NYTProf x 3.9 11,174KB + blocks=0 x 3.5 9,628KB + stmts=0 x 2.5 205KB NYTProf v2.0 running perl 5.6.8 perlcritic 1.088 on lib/Perl/Critic/Policy
  • 19. v3 Features • Profiles slow opcodes: system calls, regexps, ... • Subroutine caller name noted, for call-graph • Handles goto &sub; e.g. AUTOLOAD • HTML report includes interactive TreeMaps • Outputs call-graph in Graphviz dot format
  • 20. Running NYTProf perl -d:NYTProf ... perl -MDevel::NYTProf ... PERL5OPT=-d:NYTProf NYTPROF=file=/tmp/nytprof.out:addpid=1:slowops=1
  • 21. Reporting: CSV • CSV - old, limited, dull $ nytprofcsv # Format: time,calls,time/call,code 0,0,0,sub foo { 0.000002,2,0.00001,print "in sub foon"; 0.000004,2,0.00002,bar(); 0,0,0,} 0,0,0,
  • 22. Reporting: KCachegrind • KCachegrind call graph - new and cool - contributed by C. L. Kao. - requires KCachegrind $ nytprofcg # generates nytprof.callgraph $ kcachegrind # load the file via the gui
  • 24. Reporting: HTML • HTML report - page per source file, annotated with times and links - subroutine index table with sortable columns - interactive Treemaps of subroutine times - generates Graphviz dot file of call graph $ nytprofhtml # writes HTML report in ./nytprof/... $ nytprofhtml --file=/tmp/nytprof.out.793 --open
  • 25.
  • 26. Summary Links to annotated source code Link to sortable table of all subs Timings for perl builtins
  • 27. Exclusive vs. Inclusive • Exclusive Time = Bottom up - Detail of time spent “just here” - Where the time actually gets spent - Useful for localized (peephole) optimisation • Inclusive Time = Top down - Overview of time spent “in and below” - Useful to prioritize structural optimizations
  • 28.
  • 29. Overall time spent in and below this sub (in + below) Color coding based on Median Average Deviation relative to rest of this file Timings for each location calling into, or out of, the subroutine
  • 30.
  • 31. Treemap showing relative proportions of exclusive time Boxes represent subroutines Colors only used to show packages (and aren’t pretty yet) Hover over box to see details Click to drill-down one level in package hierarchy
  • 32. Calls between packages Generates GraphViz files that can be used to produce these diagrams
  • 34. Let’s take a look...
  • 37. DONʼT DO IT! Don’t optimize until you have a real need to Concentrate on good design & implementation
  • 38. “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
  • 40. “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.” - W.A. Wulf • programmer time $ > cpu time $. • likely to introduce bugs
  • 41. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 42. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  • 43. How?
  • 44. “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.” - Rob Pike
  • 45. “Measure twice, cut once.” - Old Proverb Measure Twice, cut once
  • 47. Low Hanging Fruit 1. Profile code running representative workload. 2. Look at Exclusive Time of subroutines. 3. Do they look reasonable? 4. Examine worst offenders. 5. Fix only simple local problems. 6. Profile again. 7. Fast enough? Then STOP! 8. Rinse and repeat once or twice, then move on. Bottlenecks move when changes are made.
  • 48. “Simple Local Fixes” Changes unlikely to introduce bugs
  • 50. Avoid->repeated->chains ->of->accessors(...); Avoid->repeated->chains ->of->accessors(...); Use a temporary variable use a temporary variable
  • 51. Use faster accessors Class::Accessor -> Class::Accessor::Fast --> Class::Accessor::Faster ---> Class::Accessor::Fast::XS
  • 52. Avoid calling subs that don’t do anything! my $unused_variable = $self->get_foo; my $is_logging = $log->info(...); while (...) { $log->info(...) if $is_logging; ... }
  • 53. Exit subs and loops early Delay initializations return if not ...a cheap test...; return if not ...a more expensive test...; my $foo = ...initializations...; ...body of subroutine...
  • 54. Fix silly code - return exists $nav_type{$country}{$key} - ? $nav_type{$country}{$key} - : undef; + return $nav_type{$country}{$key};
  • 55. Beware pathological regular expressions NYTPROF=slowops=2 NYTProf can now time regular expressions as if they’re subroutines
  • 56. Avoid unpacking args in very hot subs sub foo { shift->delegate(@_) } sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }
  • 57. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 59. Profile with a known workload E.g., 1000 identical requests So you can tell what’s ‘reasonable’.
  • 60. Check Inclusive Times (especially top-level subs) Reasonable percentage for the workload? Proportion of phases: Read, modify, write. Extract, translate, load.
  • 61. Check subroutine call counts Reasonable for the workload?
  • 62. Add caching if appropriate to reduce calls Remember invalidation
  • 63. Walk up call chain to find good spots for caching Remember invalidation
  • 64. Creating many objects that don’t get used? Lightweight proxies e.g. DateTimeX::Lite
  • 65. Retest. Fast enough? STOP! Put the profiler down and walk away
  • 67. Push loops down - $object->walk($_) for @dogs; + $object->walk_these(@dogs); Often very effective. Especially if walk() has to do any initialisation.
  • 68. Change the data structure hashes <–> arrays
  • 69. Change the algorithm What’s the “Big O”? O(n 2) or O(logn) or ...
  • 70. Rewrite hot-spots in C Inline::C
  • 71. It all adds up! “I achieved my fast times by multitudes of 1% reductions” - Bill Raymond
  • 72. Questions? Tim.Bunce@pobox.com @timbunce on twitter http://blog.timbunce.org