SlideShare a Scribd company logo
1 of 43
Download to read offline
Our Friends the Utils:
A highway traveled by wheels
we didn't re-invent.


          Steven Lembark
        Workhorse Computing
      lembark@wrkhors.com
Meet the Utils
●   Scalar::Util & List::Util were first written in by the
    ancient Prophet of Barr (c. 1997).
●   The modules provide often-requested features that
    were not worth modifying Perl itself to offer.
●   Later, List::MoreUtils added features that List::Util
    does not include.
●   If the Sound of Perl is an un-bloodied wall, the
    Utils are a superhighway traveled by truly lazy
    wheels.
Mixing old and new
●   Several features in v5.10+ overlap Util features.
    –   Smart matches are the most obvious, and are usually
        compared with List::Util::first.
    –   New features are not replacements, but work well with
        the modules.
    –   Examples here show how to use the modules with smart
        matching, switches.
●   What's important to notice is that these modules
    remain relevant.
Scalar::Util
    Provides introspection for scalars:
    –   Is a filehandle [still] open?
    –   The address, type, and class of a variable.
    –   Is a value “numeric” according to Perl?
    –   Does the variable contain readonly or tainted data?
    –   Tools for managing weak references or modifying
        prototypes.
●   Handling these in Pure Perl is messy, slow, or
    error-prone.
Dealing with ref's & objects
●   Collectively these replace “ref” or stringified
    references with a simpler, cleaner interface.
●   The problem with ref and stringified objects is that
    they return different data for objects or “plain” refs.
    –   Stringified refs are “Foobar=ARRAY(0x29eba90)”,
        unless overloading gets in the way.
    –   Ref returns the address and base type, unless the
        reference is blessed.
●
    blessed, refaddr, & reftype are consistent.
Blessed is the Object
●
    blessed returns a class or undef.
●   This simplifies sanity checks:
      blessed $_[0] or die 'Non-object...';
●   Construction with objects for types:
      bless $x, blessed $proto || $proto;
    avoids classes like “ARRAY(0xab1234)”.
●   Check for blessed before “can” to avoid errors:
    blessed $x && $x->can( $x ) or die ...
Blessed Structures
●   ref does not return the base type of a blessed ref.
●   reftype returns the data type, regardless of blessing.
●   Works nicely with switches:
     given( reftype $thing ) # blessed or not, same reftype
     {
        when( undef )        { die “Not a reference: '$thing'” }

         when( 'ARRAY' )    { ... }
         when( 'HASH' )     { ... }
         when( 'SCALAR' )   { ... }

         die "Un-usable data type: '$_'";
     }
Blessed Matches
●   Smart-matching an object requires an overloading.
●   Developers would like to QA their modules to
    validate the overload is available.
●   A generic test is simple: blessed scalars that
    can( '~~' ) are usable.
●   Writing this test with only ref is a pain.
●   With Scalar::Utils it is blessedly simple:
    blessed $var && $var->can( '~~' )
    or die ...
The guts of “inside out” classes
●   Virtual addresses are unique during execution.
●   Make useful keys for associating external data.
●   Problem is that stringified refs include too much data:
    –   Plain :    ARRAY(0XEAA750)
    –   Blessed:   Foo=ARRAY(0XEAA750)
    –   Re-blessed: Bletch=ARRAY(0XEAA750)
●   The extra data makes them unusable as keys.
●   Parsing the ref's to extract the address is too slow.
The key to your guts: refaddr
●   refaddr returns only the address portion of a ref:
    –   Previous values all look like: 0XEAA750
●   Note the lack of package or type.
●   This is not affected by [re]blessing the variable.
●   This leaves $data{ refaddr $ref } a stable over
    the life cycle of a ref or object.
use Scalar::Util qw( refaddr );

my %obj2data = (); # private cache for object data.

sub set
{
    my ( $obj, $data ) = @_;
    $obj2data{ refaddr $obj } = $data;
    return
}

sub get
{
    $obj2data{ refaddr $_[0] }
}

# have to manually clear out the cache.

DESTROY
{
    delete $obj2data{ refaddr $_[0] };
    $obj->NEXT::DESTROY;
}
Circular references are not
garbage
●   In fact, with Perl's reference counting they are
    normally memory leaks.
●   These are any case where a variable keeps alive
    some extra reference to itself:
    –   Self reference: $a = $a
    –   Linked list:   $a->[0] = [ [], $a, @data ]
●   The first is probably a mistake, the second is a
    properly formed doubly-linked list.
●   Both of them prevent $a from ever being released.
Fix: Weak References
●   Weak ref's do not increment the var's reference
    count.
●   In this case $backlink does not prevent cleaning $a:
      weaken ( my $backlink = $a );
      @$a = ( [], $backlink, @data );
●   $a->[1] will be undef if $a goes out of scope.
●   isweak returns true for weak ref's.
Aside: Accidentally getting
strong
●   Copies are strong references unless they are
    explicitly weakened.
●   This can leave you accidentally keeping items alive
    with things like:
      my @a = grep { defined } @a;
    this leaves @a with strong references that have to
    be explicitly weakened again.
●   See Scalar::Util's POD for dealing with this.
Knowing Your Numbers
●   We've all seen code that checks for numeric values
    with a regex like /^d+$/.
●   Aside from being slow, this simply does not work.
    Exercse: Come up with a working regex that
    gracefully handles all of Perl's numeric types
    including int, float, exponents, hex, and octal along
    with optional whitespace.
●   Better yet, let Perl figure it out for you:
    if( looks_like_number $x ) { … }
Switching on numerics
●   Switches with looks_like_number help parsing and
    make the logic more readable:
    if( looks_like_number $_ )
    {
      …
    }
    elsif( $regex )

        # deal with text
        ...
    }
Sorting and Sanity Checks
sub generic_minimum
{
  looks_like_number $_[0]
  $_[0] ? min @_ : minstr @_
}

sub numeric_input
{
    my $numstr = get_user_input;

    looks_like_number $numstr
    or die "Not a number: '$numstr'";

    $numstr
}
Anonymous Prototyping
●   set_prototype adjusts the prototype on a subref.
    –   Including anonymous subroutines.
    –   Allows installation of subs that handle block inputs or
        multiple arrays – think of import subs.
●   Another is removing or modifying mis-guided
    prototypes in wrappers that call them.
    –   Example is a prototype of “$$” that prevents calling a
        wrapped sub with “@_”.
Bi-polar Variables
●   dulvar is a fast handler for dealing with multimode
    string+numeric data.
●   Returns stringy or numeric portion depending on
    context:
    $a = dualvar ( 90, '/var/tmp' );
    print $a if $a > 80; # prints “/var/tmp”
    or
    sort { $a <=> $b or $a cmp $b } @list;
●   dulvar's are faster than blessed ref's with overloads
    and offer better encapsulation.
But wait, there's more!!!
●   Obvious sanity checks:
●   openhandle returns true for an open filehandle.
    –   validate stdin for interactive sessions.
    –   check for [still] live sockets.
●   isvstring returns true for a vstrings (e.g.,
    “v5.16.0”).
●   tainted returns true for tainted values.
●   isreadonly checks for readonly values or variables.
Managing lists
●   List::Util provides mostly-obvious functions: sum,
    max, min, maxstr, minstr, shuffle, first, and reduce.
●   max and min compare numbers, maxstr and minstr
    handle strings.
●   shuffle randomized the order of a list – useful for
    security or simulations.
●   first & reduce take a bit more explanation...
First Thing: Why Bother?
●   These can all be written in Pure Perl.
●   Why bother with Yet Another Module and XS?
    –   Most people think of speed, which is true.
    –   These all have simple, clean interfaces that Just Work.
    –   XS encapsulates the in-work data.
    –   Module provides them in one place, once, with POD.
●   So, speed is not the only issue –but it doesn't hurt
    that these are fast.
Second Thing's first()
●   first looks a lot like grep, with a block and list.
●   Unlike grep, first stops after finding the first match.
●   It returns the first scalar that leaves the block true – not
    the blocks output!
●   Lists don't have to be data: they can be anything.
    my $odd     = first { $_ % 2} @itemz;
    my $valid = first { /$rx/ } @regexen;
    my $found = first { foo $_} @inputz;
    my $obj     = first { $_->valid($data) } @objz
    or die “Invalid data...”;
first with ~~ for validation
●   Ever get sick of running through if-blocks for
    mutually exclusive switches?
●   first with smart matching offers is declarative:
    my @bogus = ( [ qw( fork debug ) ], … );
        ...
    if( my $botched = first { $_ ~~ %argz } @bogus )
    {
      local $” = ' ';
      die “Mutually exclusive: @$botched”;
    }


●   Hash-slicing the arguments array allows comparing
    invalid values with the same structure.
Working smarter
●   First saves overhead by stopping early.
●   Returning a scalar simplifies the syntax for
    assigning a result.
●   Depending on your data, first on an array may be
    faster than exists on a hash key.
●   Useful for more than iterating data:
    –   Use a list of regexes to determine what type of data is
        being processed.
    –   Lists of objects can be iterated to find the correct parser
        for general input.
Smart Match ~~ first
●   Unlike most Perly boolean operators, smart returns true
    or false, not the argument value that left it true.
●   first returns the value that matched:
    my $found = first { $record ~~ $_ } @filterz;
●   $found is the first entry from @filterz that matches the
    record.
●   Filters can be regexen, arrays, hashes, or objects with
    overloaded ~~ matching valid or unusable data.
    –   Use to check edge-cases in testing data handlers.
Inside-out data for a regex
●   Use an inside-out structure to associate arbitrary
    data or state with the regex.
●   Smart matching handles blessed regexen properly:
    works equally well with std regex or object.
    my $regex1 = qr{ ... };
    my $regex2 = qr{ ... };

    $inside{ refaddr $regex1 } = [];

    my @filtrz = ( $regex1, $regex2 );
    my $found  = first { $input ~~ $_ } @filtrz;

    push @{ $inside{ refaddr $found }, $input;
Use first to pick handlers
●   Say you have records with a variety of fields.
●   A set of arrays with the required fields for handlers
    makes it easy to pick the right one:
    my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] );

    my $found = first { $record ~~ $_ } @keyz
    or die 'Record fails minimum key test';

●   Add a bit of inside-out data and you can dispatch
    the record and its handler in a few lines of code.
Reducing your workload
●   All of the min, max, and sum functions are canned
    versions of reduce.
●   reduce looks like sort, with $a and $b.
●   Empty returns undef, singletons return themselves.
●   Otherwise:
    –   $a, $b are aliased to the first two list values.
    –   The block's result is assigned to $a.
    –   $b is cycled through the remaining list values.
Example: min, max, sum, prod
my @list = ( 1 .. 100 );

my $min = reduce { $a < $b ? $a : $b } @list;
my $max = reduce { $a > $b ? $a : $b } @list;

# sum, product roll the value forward:

my $sum = reduce { $a += $b } @list;
my $prd = reduce { $a *= $b } @list;

# sum of x-squared uses a placeholder:

my $sumx2
= reduce { $a += $b**2 } ( 0,@list );
But wait, there's more more!!!
●   List::Utils lacks a number of operations that are
    easy to implement in Pure Perl:
    –   unique
    –   interleave, every nth record, groups of N records.
●   Using XS does have advantages, not the least
    having none of use re-write the same Pure Perl.
●   So... we have List::MoreUtils, written by Adam
    Kennedy, maintained by Tassilo von Parseval.
Taking lazyness to XS
●   This module is a kitchen sink of things you've done
    at least once:

    any all none notall true false firstidx
    first_index lastidx last_index
    insert_after
    insert_after_string apply indexes after
    after_incl before before_incl firstval
    first_value lastval last_value each_array
    each_arrayref pairwise natatime mesh zip
    uniq
    distinct minmax part
Indexes and last items
●   first is nice, but to find the last item you need to
    reverse a list, which is expensive.
●   Looking up using indexes with first requires
    $ary[$_], which also gets expensive.
●
    last, last_index, first_index         do what you'd
    expect [novel idea, what?].
●
    before   and after are more compact versions of
    slices using the results of first_index.
If first is false, use any
●   first returns a list value, which might be false.
●   any() returns true the first time its block is true.
●   Solves tests using first failing on a false list value:
    # $x is 0, $y is 1
    @list = ( 0, 1, 2 );
    $x = first { defined          $_ } @list;
    $y = any       { defined      $_ } @list;
Unique lists
●   MoreUtil's unique returns a list in its original order
    (list) or the last value (scalar):
    # 1 2 3 5 4
    my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4;
    # 5
    my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;
●   Using hash keys gives a random order.
●   Any Pure Perl approach requires sort or lots of index
    operations.
Relative locations
●   insert_after places an item after the first item for
    which its block passes.
●   insert_after_string uses a string compare, avoiding
    the need for a block.
●   Example: post-insert sentinel values into processed
    lists.
apply: map Without Side-effects
●   One downside to map, sort, & grep is that they
    alias their block variables.
    –   Updating $_ or $a/$b will alter the inputs.
●
    apply works like map: extracting the result of a
    block applied to each element in a list.
    –   The difference is that $_ is copied, not aliased.
    –   The inputs are safe from modification.
Merging Lists
●   Pairwise processing of lists uses prototypes to keep
    the syntax saner:
    @sum_xy = pairwise { $a + $b } @x, @y;
    @x = pairwise { $a->($b) } @subz, @valz;
●   Nice for merging key/value pairs, which is what
    mesh does without a block:
    %y = pairwise{ ($a,$b) } @keyz, @valz;
    %y = mesh @keyz, @valz;
●   Prototypes require arrays; arrayrefs have to use
    “@$arrayref” sytax.
Iterating Separate Lists
●
    each_array  generates an iterator that cycles
    through successive values in multiple lists:
    my $each = each_array @a, @b, @c;
    while( my( $a, $b, $c ) = $each->() )
    { … }
●   This avoids having to destroy the lists with shift or
    the overhead of many index accesses.
●
    each_arrayref    takes arrayref (vs. array) args.
●   Limitation of prototypes: can't mix arrays & refs.
Breaking up is easy to do
●   Partitioning a list is quite doable in Pure Perl but
    gets messy when handling arbitrary lists.
●
    part  uses a block to select index entries, returning
    an array[ref] segregated by the block output:
    # [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ]
    my @partz = part { $i ++ % 2 } ( 1 .. 8 );
●   using %3 generates three lists.
●   Block can use regexen (including parsing results),
    looks_like_number, error levels, whatever.
POD is your friend
●   Actually, the module authors are: All of these
    modules are well documented, with good
    examples.
●   Especially for MoreUtils: Take the time to run the
    POD code in a debugger to see what it does.
CPAN & the Power of Perl
●   Code on CPAN isn't mouldy just because it's old.
    –   The modules are kept up to date.
    –   The guts of Perl have remained stable enough to keep
        the XS working.
●   This is due to a lot of effort from module owners
    and Perl hackers.
Summary
●   Smart matches did not obviate “first”, they work
    together.
●   Utils work with newer features like smart
    matching and switches.
●   Any time you find yourself hacking indexes, it's
    probably time to think about these modules.
●   POD is your friend – check the modules for
    examples (and good examples of writing XS).
●   Truly lazy wheels are not re-invented.

More Related Content

What's hot

Introducing Modern Perl
Introducing Modern PerlIntroducing Modern Perl
Introducing Modern Perl
Dave Cross
 

What's hot (17)

Drupal II: The SQL
Drupal II: The SQLDrupal II: The SQL
Drupal II: The SQL
 
Introduction to Perl and BioPerl
Introduction to Perl and BioPerlIntroduction to Perl and BioPerl
Introduction to Perl and BioPerl
 
JavaScript - Chapter 10 - Strings and Arrays
 JavaScript - Chapter 10 - Strings and Arrays JavaScript - Chapter 10 - Strings and Arrays
JavaScript - Chapter 10 - Strings and Arrays
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Edition
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
Drupal7 dbtng
Drupal7  dbtngDrupal7  dbtng
Drupal7 dbtng
 
PHP POSTGRESQL integration
PHP POSTGRESQL  integrationPHP POSTGRESQL  integration
PHP POSTGRESQL integration
 
Introducing Modern Perl
Introducing Modern PerlIntroducing Modern Perl
Introducing Modern Perl
 
UKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the DatabaseUKOUG Tech14 - Getting Started With JSON in the Database
UKOUG Tech14 - Getting Started With JSON in the Database
 
Prototype Utility Methods(1)
Prototype Utility Methods(1)Prototype Utility Methods(1)
Prototype Utility Methods(1)
 
Cassandra Day Chicago 2015: CQL: This is not he SQL you are looking for
Cassandra Day Chicago 2015: CQL: This is not he SQL you are looking forCassandra Day Chicago 2015: CQL: This is not he SQL you are looking for
Cassandra Day Chicago 2015: CQL: This is not he SQL you are looking for
 
Intro to The PHP SPL
Intro to The PHP SPLIntro to The PHP SPL
Intro to The PHP SPL
 
PHP Functions & Arrays
PHP Functions & ArraysPHP Functions & Arrays
PHP Functions & Arrays
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 
Erlang for data ops
Erlang for data opsErlang for data ops
Erlang for data ops
 
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructures
 

Viewers also liked

Viewers also liked (6)

Object Exercise
Object ExerciseObject Exercise
Object Exercise
 
Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
Clustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSPClustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSP
 
A-Walk-on-the-W-Side
A-Walk-on-the-W-SideA-Walk-on-the-W-Side
A-Walk-on-the-W-Side
 
Low and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid OklahomaLow and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid Oklahoma
 

Similar to Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

Scripting3
Scripting3Scripting3
Scripting3
Nao Dara
 
Lecture 22
Lecture 22Lecture 22
Lecture 22
rhshriva
 
Perl programming language
Perl programming languagePerl programming language
Perl programming language
Elie Obeid
 

Similar to Our Friends the Utils: A highway traveled by wheels we didn't re-invent. (20)

Scalar data types
Scalar data typesScalar data types
Scalar data types
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
Scripting3
Scripting3Scripting3
Scripting3
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Lecture 22
Lecture 22Lecture 22
Lecture 22
 
Memory unmanglement
Memory unmanglementMemory unmanglement
Memory unmanglement
 
Perl programming language
Perl programming languagePerl programming language
Perl programming language
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::Grammars
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
An introduction to javascript
An introduction to javascriptAn introduction to javascript
An introduction to javascript
 
Regular expressions, Session and Cookies by Dr.C.R.Dhivyaa Kongu Engineering ...
Regular expressions, Session and Cookies by Dr.C.R.Dhivyaa Kongu Engineering ...Regular expressions, Session and Cookies by Dr.C.R.Dhivyaa Kongu Engineering ...
Regular expressions, Session and Cookies by Dr.C.R.Dhivyaa Kongu Engineering ...
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
 
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, ItalyPHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Perl5 Memory Manglement
Perl5 Memory ManglementPerl5 Memory Manglement
Perl5 Memory Manglement
 
Getting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::CGetting started with Perl XS and Inline::C
Getting started with Perl XS and Inline::C
 
Perl_Part6
Perl_Part6Perl_Part6
Perl_Part6
 
Subroutines
SubroutinesSubroutines
Subroutines
 
Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)
 

More from Workhorse Computing

More from Workhorse Computing (20)

mro-every.pdf
mro-every.pdfmro-every.pdf
mro-every.pdf
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add Up
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.
 
Unit Testing Lots of Perl
Unit Testing Lots of PerlUnit Testing Lots of Perl
Unit Testing Lots of Perl
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in Posgresql
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command Interpolation
 
Findbin libs
Findbin libsFindbin libs
Findbin libs
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic Interpolation
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
 
Metadata-driven Testing
Metadata-driven TestingMetadata-driven Testing
Metadata-driven Testing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
 
Getting Testy With Perl6
Getting Testy With Perl6Getting Testy With Perl6
Getting Testy With Perl6
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
 
Light my-fuse
Light my-fuseLight my-fuse
Light my-fuse
 
Paranormal stats
Paranormal statsParanormal stats
Paranormal stats
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

  • 1. Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Steven Lembark Workhorse Computing lembark@wrkhors.com
  • 2. Meet the Utils ● Scalar::Util & List::Util were first written in by the ancient Prophet of Barr (c. 1997). ● The modules provide often-requested features that were not worth modifying Perl itself to offer. ● Later, List::MoreUtils added features that List::Util does not include. ● If the Sound of Perl is an un-bloodied wall, the Utils are a superhighway traveled by truly lazy wheels.
  • 3. Mixing old and new ● Several features in v5.10+ overlap Util features. – Smart matches are the most obvious, and are usually compared with List::Util::first. – New features are not replacements, but work well with the modules. – Examples here show how to use the modules with smart matching, switches. ● What's important to notice is that these modules remain relevant.
  • 4. Scalar::Util Provides introspection for scalars: – Is a filehandle [still] open? – The address, type, and class of a variable. – Is a value “numeric” according to Perl? – Does the variable contain readonly or tainted data? – Tools for managing weak references or modifying prototypes. ● Handling these in Pure Perl is messy, slow, or error-prone.
  • 5. Dealing with ref's & objects ● Collectively these replace “ref” or stringified references with a simpler, cleaner interface. ● The problem with ref and stringified objects is that they return different data for objects or “plain” refs. – Stringified refs are “Foobar=ARRAY(0x29eba90)”, unless overloading gets in the way. – Ref returns the address and base type, unless the reference is blessed. ● blessed, refaddr, & reftype are consistent.
  • 6. Blessed is the Object ● blessed returns a class or undef. ● This simplifies sanity checks: blessed $_[0] or die 'Non-object...'; ● Construction with objects for types: bless $x, blessed $proto || $proto; avoids classes like “ARRAY(0xab1234)”. ● Check for blessed before “can” to avoid errors: blessed $x && $x->can( $x ) or die ...
  • 7. Blessed Structures ● ref does not return the base type of a blessed ref. ● reftype returns the data type, regardless of blessing. ● Works nicely with switches: given( reftype $thing ) # blessed or not, same reftype { when( undef ) { die “Not a reference: '$thing'” } when( 'ARRAY' ) { ... } when( 'HASH' ) { ... } when( 'SCALAR' ) { ... } die "Un-usable data type: '$_'"; }
  • 8. Blessed Matches ● Smart-matching an object requires an overloading. ● Developers would like to QA their modules to validate the overload is available. ● A generic test is simple: blessed scalars that can( '~~' ) are usable. ● Writing this test with only ref is a pain. ● With Scalar::Utils it is blessedly simple: blessed $var && $var->can( '~~' ) or die ...
  • 9. The guts of “inside out” classes ● Virtual addresses are unique during execution. ● Make useful keys for associating external data. ● Problem is that stringified refs include too much data: – Plain : ARRAY(0XEAA750) – Blessed: Foo=ARRAY(0XEAA750) – Re-blessed: Bletch=ARRAY(0XEAA750) ● The extra data makes them unusable as keys. ● Parsing the ref's to extract the address is too slow.
  • 10. The key to your guts: refaddr ● refaddr returns only the address portion of a ref: – Previous values all look like: 0XEAA750 ● Note the lack of package or type. ● This is not affected by [re]blessing the variable. ● This leaves $data{ refaddr $ref } a stable over the life cycle of a ref or object.
  • 11. use Scalar::Util qw( refaddr ); my %obj2data = (); # private cache for object data. sub set { my ( $obj, $data ) = @_; $obj2data{ refaddr $obj } = $data; return } sub get { $obj2data{ refaddr $_[0] } } # have to manually clear out the cache. DESTROY { delete $obj2data{ refaddr $_[0] }; $obj->NEXT::DESTROY; }
  • 12. Circular references are not garbage ● In fact, with Perl's reference counting they are normally memory leaks. ● These are any case where a variable keeps alive some extra reference to itself: – Self reference: $a = $a – Linked list: $a->[0] = [ [], $a, @data ] ● The first is probably a mistake, the second is a properly formed doubly-linked list. ● Both of them prevent $a from ever being released.
  • 13. Fix: Weak References ● Weak ref's do not increment the var's reference count. ● In this case $backlink does not prevent cleaning $a: weaken ( my $backlink = $a ); @$a = ( [], $backlink, @data ); ● $a->[1] will be undef if $a goes out of scope. ● isweak returns true for weak ref's.
  • 14. Aside: Accidentally getting strong ● Copies are strong references unless they are explicitly weakened. ● This can leave you accidentally keeping items alive with things like: my @a = grep { defined } @a; this leaves @a with strong references that have to be explicitly weakened again. ● See Scalar::Util's POD for dealing with this.
  • 15. Knowing Your Numbers ● We've all seen code that checks for numeric values with a regex like /^d+$/. ● Aside from being slow, this simply does not work. Exercse: Come up with a working regex that gracefully handles all of Perl's numeric types including int, float, exponents, hex, and octal along with optional whitespace. ● Better yet, let Perl figure it out for you: if( looks_like_number $x ) { … }
  • 16. Switching on numerics ● Switches with looks_like_number help parsing and make the logic more readable: if( looks_like_number $_ ) { … } elsif( $regex ) # deal with text ... }
  • 17. Sorting and Sanity Checks sub generic_minimum { looks_like_number $_[0] $_[0] ? min @_ : minstr @_ } sub numeric_input { my $numstr = get_user_input; looks_like_number $numstr or die "Not a number: '$numstr'"; $numstr }
  • 18. Anonymous Prototyping ● set_prototype adjusts the prototype on a subref. – Including anonymous subroutines. – Allows installation of subs that handle block inputs or multiple arrays – think of import subs. ● Another is removing or modifying mis-guided prototypes in wrappers that call them. – Example is a prototype of “$$” that prevents calling a wrapped sub with “@_”.
  • 19. Bi-polar Variables ● dulvar is a fast handler for dealing with multimode string+numeric data. ● Returns stringy or numeric portion depending on context: $a = dualvar ( 90, '/var/tmp' ); print $a if $a > 80; # prints “/var/tmp” or sort { $a <=> $b or $a cmp $b } @list; ● dulvar's are faster than blessed ref's with overloads and offer better encapsulation.
  • 20. But wait, there's more!!! ● Obvious sanity checks: ● openhandle returns true for an open filehandle. – validate stdin for interactive sessions. – check for [still] live sockets. ● isvstring returns true for a vstrings (e.g., “v5.16.0”). ● tainted returns true for tainted values. ● isreadonly checks for readonly values or variables.
  • 21. Managing lists ● List::Util provides mostly-obvious functions: sum, max, min, maxstr, minstr, shuffle, first, and reduce. ● max and min compare numbers, maxstr and minstr handle strings. ● shuffle randomized the order of a list – useful for security or simulations. ● first & reduce take a bit more explanation...
  • 22. First Thing: Why Bother? ● These can all be written in Pure Perl. ● Why bother with Yet Another Module and XS? – Most people think of speed, which is true. – These all have simple, clean interfaces that Just Work. – XS encapsulates the in-work data. – Module provides them in one place, once, with POD. ● So, speed is not the only issue –but it doesn't hurt that these are fast.
  • 23. Second Thing's first() ● first looks a lot like grep, with a block and list. ● Unlike grep, first stops after finding the first match. ● It returns the first scalar that leaves the block true – not the blocks output! ● Lists don't have to be data: they can be anything. my $odd = first { $_ % 2} @itemz; my $valid = first { /$rx/ } @regexen; my $found = first { foo $_} @inputz; my $obj = first { $_->valid($data) } @objz or die “Invalid data...”;
  • 24. first with ~~ for validation ● Ever get sick of running through if-blocks for mutually exclusive switches? ● first with smart matching offers is declarative: my @bogus = ( [ qw( fork debug ) ], … ); ... if( my $botched = first { $_ ~~ %argz } @bogus ) { local $” = ' '; die “Mutually exclusive: @$botched”; } ● Hash-slicing the arguments array allows comparing invalid values with the same structure.
  • 25. Working smarter ● First saves overhead by stopping early. ● Returning a scalar simplifies the syntax for assigning a result. ● Depending on your data, first on an array may be faster than exists on a hash key. ● Useful for more than iterating data: – Use a list of regexes to determine what type of data is being processed. – Lists of objects can be iterated to find the correct parser for general input.
  • 26. Smart Match ~~ first ● Unlike most Perly boolean operators, smart returns true or false, not the argument value that left it true. ● first returns the value that matched: my $found = first { $record ~~ $_ } @filterz; ● $found is the first entry from @filterz that matches the record. ● Filters can be regexen, arrays, hashes, or objects with overloaded ~~ matching valid or unusable data. – Use to check edge-cases in testing data handlers.
  • 27. Inside-out data for a regex ● Use an inside-out structure to associate arbitrary data or state with the regex. ● Smart matching handles blessed regexen properly: works equally well with std regex or object. my $regex1 = qr{ ... }; my $regex2 = qr{ ... }; $inside{ refaddr $regex1 } = []; my @filtrz = ( $regex1, $regex2 ); my $found = first { $input ~~ $_ } @filtrz; push @{ $inside{ refaddr $found }, $input;
  • 28. Use first to pick handlers ● Say you have records with a variety of fields. ● A set of arrays with the required fields for handlers makes it easy to pick the right one: my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] ); my $found = first { $record ~~ $_ } @keyz or die 'Record fails minimum key test'; ● Add a bit of inside-out data and you can dispatch the record and its handler in a few lines of code.
  • 29. Reducing your workload ● All of the min, max, and sum functions are canned versions of reduce. ● reduce looks like sort, with $a and $b. ● Empty returns undef, singletons return themselves. ● Otherwise: – $a, $b are aliased to the first two list values. – The block's result is assigned to $a. – $b is cycled through the remaining list values.
  • 30. Example: min, max, sum, prod my @list = ( 1 .. 100 ); my $min = reduce { $a < $b ? $a : $b } @list; my $max = reduce { $a > $b ? $a : $b } @list; # sum, product roll the value forward: my $sum = reduce { $a += $b } @list; my $prd = reduce { $a *= $b } @list; # sum of x-squared uses a placeholder: my $sumx2 = reduce { $a += $b**2 } ( 0,@list );
  • 31. But wait, there's more more!!! ● List::Utils lacks a number of operations that are easy to implement in Pure Perl: – unique – interleave, every nth record, groups of N records. ● Using XS does have advantages, not the least having none of use re-write the same Pure Perl. ● So... we have List::MoreUtils, written by Adam Kennedy, maintained by Tassilo von Parseval.
  • 32. Taking lazyness to XS ● This module is a kitchen sink of things you've done at least once: any all none notall true false firstidx first_index lastidx last_index insert_after insert_after_string apply indexes after after_incl before before_incl firstval first_value lastval last_value each_array each_arrayref pairwise natatime mesh zip uniq distinct minmax part
  • 33. Indexes and last items ● first is nice, but to find the last item you need to reverse a list, which is expensive. ● Looking up using indexes with first requires $ary[$_], which also gets expensive. ● last, last_index, first_index do what you'd expect [novel idea, what?]. ● before and after are more compact versions of slices using the results of first_index.
  • 34. If first is false, use any ● first returns a list value, which might be false. ● any() returns true the first time its block is true. ● Solves tests using first failing on a false list value: # $x is 0, $y is 1 @list = ( 0, 1, 2 ); $x = first { defined $_ } @list; $y = any { defined $_ } @list;
  • 35. Unique lists ● MoreUtil's unique returns a list in its original order (list) or the last value (scalar): # 1 2 3 5 4 my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4; # 5 my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4; ● Using hash keys gives a random order. ● Any Pure Perl approach requires sort or lots of index operations.
  • 36. Relative locations ● insert_after places an item after the first item for which its block passes. ● insert_after_string uses a string compare, avoiding the need for a block. ● Example: post-insert sentinel values into processed lists.
  • 37. apply: map Without Side-effects ● One downside to map, sort, & grep is that they alias their block variables. – Updating $_ or $a/$b will alter the inputs. ● apply works like map: extracting the result of a block applied to each element in a list. – The difference is that $_ is copied, not aliased. – The inputs are safe from modification.
  • 38. Merging Lists ● Pairwise processing of lists uses prototypes to keep the syntax saner: @sum_xy = pairwise { $a + $b } @x, @y; @x = pairwise { $a->($b) } @subz, @valz; ● Nice for merging key/value pairs, which is what mesh does without a block: %y = pairwise{ ($a,$b) } @keyz, @valz; %y = mesh @keyz, @valz; ● Prototypes require arrays; arrayrefs have to use “@$arrayref” sytax.
  • 39. Iterating Separate Lists ● each_array generates an iterator that cycles through successive values in multiple lists: my $each = each_array @a, @b, @c; while( my( $a, $b, $c ) = $each->() ) { … } ● This avoids having to destroy the lists with shift or the overhead of many index accesses. ● each_arrayref takes arrayref (vs. array) args. ● Limitation of prototypes: can't mix arrays & refs.
  • 40. Breaking up is easy to do ● Partitioning a list is quite doable in Pure Perl but gets messy when handling arbitrary lists. ● part uses a block to select index entries, returning an array[ref] segregated by the block output: # [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ] my @partz = part { $i ++ % 2 } ( 1 .. 8 ); ● using %3 generates three lists. ● Block can use regexen (including parsing results), looks_like_number, error levels, whatever.
  • 41. POD is your friend ● Actually, the module authors are: All of these modules are well documented, with good examples. ● Especially for MoreUtils: Take the time to run the POD code in a debugger to see what it does.
  • 42. CPAN & the Power of Perl ● Code on CPAN isn't mouldy just because it's old. – The modules are kept up to date. – The guts of Perl have remained stable enough to keep the XS working. ● This is due to a lot of effort from module owners and Perl hackers.
  • 43. Summary ● Smart matches did not obviate “first”, they work together. ● Utils work with newer features like smart matching and switches. ● Any time you find yourself hacking indexes, it's probably time to think about these modules. ● POD is your friend – check the modules for examples (and good examples of writing XS). ● Truly lazy wheels are not re-invented.