Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

Our Friends the Utils:
A highway traveled by wheels
we didn't re-invent.

Steven Lembark
Workhorse Computing
lembark@wrkhors.com

Meet the Utils
● Scalar::Util & List::Util were first written in by the
ancient Prophet of Barr (c. 1997).
● The modules provide often-requested features that
were not worth modifying Perl itself to offer.
● Later, List::MoreUtils added features that List::Util
does not include.
● If the Sound of Perl is an un-bloodied wall, the
Utils are a superhighway traveled by truly lazy
wheels.

Mixing old and new
● Several features in v5.10+ overlap Util features.
– Smart matches are the most obvious, and are usually
compared with List::Util::first.
– New features are not replacements, but work well with
the modules.
– Examples here show how to use the modules with smart
matching, switches.
● What's important to notice is that these modules
remain relevant.

Scalar::Util
Provides introspection for scalars:
– Is a filehandle [still] open?
– The address, type, and class of a variable.
– Is a value “numeric” according to Perl?
– Does the variable contain readonly or tainted data?
– Tools for managing weak references or modifying
prototypes.
● Handling these in Pure Perl is messy, slow, or
error-prone.

Dealing with ref's & objects
● Collectively these replace “ref” or stringified
references with a simpler, cleaner interface.
● The problem with ref and stringified objects is that
they return different data for objects or “plain” refs.
– Stringified refs are “Foobar=ARRAY(0x29eba90)”,
unless overloading gets in the way.
– Ref returns the address and base type, unless the
reference is blessed.
●
blessed, refaddr, & reftype are consistent.

Blessed is the Object
●
blessed returns a class or undef.
● This simplifies sanity checks:
blessed $_[0] or die 'Non-object...';
● Construction with objects for types:
bless $x, blessed $proto || $proto;
avoids classes like “ARRAY(0xab1234)”.
● Check for blessed before “can” to avoid errors:
blessed $x && $x->can( $x ) or die ...

Blessed Structures
● ref does not return the base type of a blessed ref.
● reftype returns the data type, regardless of blessing.
● Works nicely with switches:
given( reftype $thing ) # blessed or not, same reftype
{
when( undef ) { die “Not a reference: '$thing'” }

when( 'ARRAY' ) { ... }
when( 'HASH' ) { ... }
when( 'SCALAR' ) { ... }

die "Un-usable data type: '$_'";
}

Blessed Matches
● Smart-matching an object requires an overloading.
● Developers would like to QA their modules to
validate the overload is available.
● A generic test is simple: blessed scalars that
can( '~~' ) are usable.
● Writing this test with only ref is a pain.
● With Scalar::Utils it is blessedly simple:
blessed $var && $var->can( '~~' )
or die ...

The guts of “inside out” classes
● Virtual addresses are unique during execution.
● Make useful keys for associating external data.
● Problem is that stringified refs include too much data:
– Plain : ARRAY(0XEAA750)
– Blessed: Foo=ARRAY(0XEAA750)
– Re-blessed: Bletch=ARRAY(0XEAA750)
● The extra data makes them unusable as keys.
● Parsing the ref's to extract the address is too slow.

The key to your guts: refaddr
● refaddr returns only the address portion of a ref:
– Previous values all look like: 0XEAA750
● Note the lack of package or type.
● This is not affected by [re]blessing the variable.
● This leaves $data{ refaddr $ref } a stable over
the life cycle of a ref or object.

use Scalar::Util qw( refaddr );

my %obj2data = (); # private cache for object data.

sub set
{
my ( $obj, $data ) = @_;
$obj2data{ refaddr $obj } = $data;
return
}

sub get
{
$obj2data{ refaddr $_[0] }
}

# have to manually clear out the cache.

DESTROY
{
delete $obj2data{ refaddr $_[0] };
$obj->NEXT::DESTROY;
}

Circular references are not
garbage
● In fact, with Perl's reference counting they are
normally memory leaks.
● These are any case where a variable keeps alive
some extra reference to itself:
– Self reference: $a = $a
– Linked list: $a->[0] = [ [], $a, @data ]
● The first is probably a mistake, the second is a
properly formed doubly-linked list.
● Both of them prevent $a from ever being released.

Fix: Weak References
● Weak ref's do not increment the var's reference
count.
● In this case $backlink does not prevent cleaning $a:
weaken ( my $backlink = $a );
@$a = ( [], $backlink, @data );
● $a->[1] will be undef if $a goes out of scope.
● isweak returns true for weak ref's.

Aside: Accidentally getting
strong
● Copies are strong references unless they are
explicitly weakened.
● This can leave you accidentally keeping items alive
with things like:
my @a = grep { defined } @a;
this leaves @a with strong references that have to
be explicitly weakened again.
● See Scalar::Util's POD for dealing with this.

Knowing Your Numbers
● We've all seen code that checks for numeric values
with a regex like /^d+$/.
● Aside from being slow, this simply does not work.
Exercse: Come up with a working regex that
gracefully handles all of Perl's numeric types
including int, float, exponents, hex, and octal along
with optional whitespace.
● Better yet, let Perl figure it out for you:
if( looks_like_number $x ) { … }

Switching on numerics
● Switches with looks_like_number help parsing and
make the logic more readable:
if( looks_like_number $_ )
{
…
}
elsif( $regex )

# deal with text
...
}

Sorting and Sanity Checks
sub generic_minimum
{
looks_like_number $_[0]
$_[0] ? min @_ : minstr @_
}

sub numeric_input
{
my $numstr = get_user_input;

looks_like_number $numstr
or die "Not a number: '$numstr'";

$numstr
}

Anonymous Prototyping
● set_prototype adjusts the prototype on a subref.
– Including anonymous subroutines.
– Allows installation of subs that handle block inputs or
multiple arrays – think of import subs.
● Another is removing or modifying mis-guided
prototypes in wrappers that call them.
– Example is a prototype of “$$” that prevents calling a
wrapped sub with “@_”.

Bi-polar Variables
● dulvar is a fast handler for dealing with multimode
string+numeric data.
● Returns stringy or numeric portion depending on
context:
$a = dualvar ( 90, '/var/tmp' );
print $a if $a > 80; # prints “/var/tmp”
or
sort { $a <=> $b or $a cmp $b } @list;
● dulvar's are faster than blessed ref's with overloads
and offer better encapsulation.

But wait, there's more!!!
● Obvious sanity checks:
● openhandle returns true for an open filehandle.
– validate stdin for interactive sessions.
– check for [still] live sockets.
● isvstring returns true for a vstrings (e.g.,
“v5.16.0”).
● tainted returns true for tainted values.
● isreadonly checks for readonly values or variables.

Managing lists
● List::Util provides mostly-obvious functions: sum,
max, min, maxstr, minstr, shuffle, first, and reduce.
● max and min compare numbers, maxstr and minstr
handle strings.
● shuffle randomized the order of a list – useful for
security or simulations.
● first & reduce take a bit more explanation...

First Thing: Why Bother?
● These can all be written in Pure Perl.
● Why bother with Yet Another Module and XS?
– Most people think of speed, which is true.
– These all have simple, clean interfaces that Just Work.
– XS encapsulates the in-work data.
– Module provides them in one place, once, with POD.
● So, speed is not the only issue –but it doesn't hurt
that these are fast.

Second Thing's first()
● first looks a lot like grep, with a block and list.
● Unlike grep, first stops after finding the first match.
● It returns the first scalar that leaves the block true – not
the blocks output!
● Lists don't have to be data: they can be anything.
my $odd = first { $_ % 2} @itemz;
my $valid = first { /$rx/ } @regexen;
my $found = first { foo $_} @inputz;
my $obj = first { $_->valid($data) } @objz
or die “Invalid data...”;

first with ~~ for validation
● Ever get sick of running through if-blocks for
mutually exclusive switches?
● first with smart matching offers is declarative:
my @bogus = ( [ qw( fork debug ) ], … );
...
if( my $botched = first { $_ ~~ %argz } @bogus )
{
local $” = ' ';
die “Mutually exclusive: @$botched”;
}

● Hash-slicing the arguments array allows comparing
invalid values with the same structure.

Working smarter
● First saves overhead by stopping early.
● Returning a scalar simplifies the syntax for
assigning a result.
● Depending on your data, first on an array may be
faster than exists on a hash key.
● Useful for more than iterating data:
– Use a list of regexes to determine what type of data is
being processed.
– Lists of objects can be iterated to find the correct parser
for general input.

Smart Match ~~ first
● Unlike most Perly boolean operators, smart returns true
or false, not the argument value that left it true.
● first returns the value that matched:
my $found = first { $record ~~ $_ } @filterz;
● $found is the first entry from @filterz that matches the
record.
● Filters can be regexen, arrays, hashes, or objects with
overloaded ~~ matching valid or unusable data.
– Use to check edge-cases in testing data handlers.

Inside-out data for a regex
● Use an inside-out structure to associate arbitrary
data or state with the regex.
● Smart matching handles blessed regexen properly:
works equally well with std regex or object.
my $regex1 = qr{ ... };
my $regex2 = qr{ ... };

$inside{ refaddr $regex1 } = [];

my @filtrz = ( $regex1, $regex2 );
my $found = first { $input ~~ $_ } @filtrz;

push @{ $inside{ refaddr $found }, $input;

Use first to pick handlers
● Say you have records with a variety of fields.
● A set of arrays with the required fields for handlers
makes it easy to pick the right one:
my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] );

my $found = first { $record ~~ $_ } @keyz
or die 'Record fails minimum key test';

● Add a bit of inside-out data and you can dispatch
the record and its handler in a few lines of code.

Reducing your workload
● All of the min, max, and sum functions are canned
versions of reduce.
● reduce looks like sort, with $a and $b.
● Empty returns undef, singletons return themselves.
● Otherwise:
– $a, $b are aliased to the first two list values.
– The block's result is assigned to $a.
– $b is cycled through the remaining list values.

Example: min, max, sum, prod
my @list = ( 1 .. 100 );

my $min = reduce { $a < $b ? $a : $b } @list;
my $max = reduce { $a > $b ? $a : $b } @list;

# sum, product roll the value forward:

my $sum = reduce { $a += $b } @list;
my $prd = reduce { $a *= $b } @list;

# sum of x-squared uses a placeholder:

my $sumx2
= reduce { $a += $b**2 } ( 0,@list );

But wait, there's more more!!!
● List::Utils lacks a number of operations that are
easy to implement in Pure Perl:
– unique
– interleave, every nth record, groups of N records.
● Using XS does have advantages, not the least
having none of use re-write the same Pure Perl.
● So... we have List::MoreUtils, written by Adam
Kennedy, maintained by Tassilo von Parseval.

Taking lazyness to XS
● This module is a kitchen sink of things you've done
at least once:

any all none notall true false firstidx
first_index lastidx last_index
insert_after
insert_after_string apply indexes after
after_incl before before_incl firstval
first_value lastval last_value each_array
each_arrayref pairwise natatime mesh zip
uniq
distinct minmax part

Indexes and last items
● first is nice, but to find the last item you need to
reverse a list, which is expensive.
● Looking up using indexes with first requires
$ary[$_], which also gets expensive.
●
last, last_index, first_index do what you'd
expect [novel idea, what?].
●
before and after are more compact versions of
slices using the results of first_index.

If first is false, use any
● first returns a list value, which might be false.
● any() returns true the first time its block is true.
● Solves tests using first failing on a false list value:
# $x is 0, $y is 1
@list = ( 0, 1, 2 );
$x = first { defined $_ } @list;
$y = any { defined $_ } @list;

Unique lists
● MoreUtil's unique returns a list in its original order
(list) or the last value (scalar):
# 1 2 3 5 4
my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4;
# 5
my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;
● Using hash keys gives a random order.
● Any Pure Perl approach requires sort or lots of index
operations.

Relative locations
● insert_after places an item after the first item for
which its block passes.
● insert_after_string uses a string compare, avoiding
the need for a block.
● Example: post-insert sentinel values into processed
lists.

apply: map Without Side-effects
● One downside to map, sort, & grep is that they
alias their block variables.
– Updating $_ or $a/$b will alter the inputs.
●
apply works like map: extracting the result of a
block applied to each element in a list.
– The difference is that $_ is copied, not aliased.
– The inputs are safe from modification.

Merging Lists
● Pairwise processing of lists uses prototypes to keep
the syntax saner:
@sum_xy = pairwise { $a + $b } @x, @y;
@x = pairwise { $a->($b) } @subz, @valz;
● Nice for merging key/value pairs, which is what
mesh does without a block:
%y = pairwise{ ($a,$b) } @keyz, @valz;
%y = mesh @keyz, @valz;
● Prototypes require arrays; arrayrefs have to use
“@$arrayref” sytax.

Iterating Separate Lists
●
each_array generates an iterator that cycles
through successive values in multiple lists:
my $each = each_array @a, @b, @c;
while( my( $a, $b, $c ) = $each->() )
{ … }
● This avoids having to destroy the lists with shift or
the overhead of many index accesses.
●
each_arrayref takes arrayref (vs. array) args.
● Limitation of prototypes: can't mix arrays & refs.

Breaking up is easy to do
● Partitioning a list is quite doable in Pure Perl but
gets messy when handling arbitrary lists.
●
part uses a block to select index entries, returning
an array[ref] segregated by the block output:
# [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ]
my @partz = part { $i ++ % 2 } ( 1 .. 8 );
● using %3 generates three lists.
● Block can use regexen (including parsing results),
looks_like_number, error levels, whatever.

POD is your friend
● Actually, the module authors are: All of these
modules are well documented, with good
examples.
● Especially for MoreUtils: Take the time to run the
POD code in a debugger to see what it does.

CPAN & the Power of Perl
● Code on CPAN isn't mouldy just because it's old.
– The modules are kept up to date.
– The guts of Perl have remained stable enough to keep
the XS working.
● This is due to a lot of effort from module owners
and Perl hackers.

Summary
● Smart matches did not obviate “first”, they work
together.
● Utils work with newer features like smart
matching and switches.
● Any time you find yourself hacking indexes, it's
probably time to think about these modules.
● POD is your friend – check the modules for
examples (and good examples of writing XS).
● Truly lazy wheels are not re-invented.

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (6)

Similar to Our Friends the Utils: A highway traveled by wheels we didn't re-invent.

Similar to Our Friends the Utils: A highway traveled by wheels we didn't re-invent. (20)

More from Workhorse Computing

More from Workhorse Computing (20)

Recently uploaded

Recently uploaded (20)

Our Friends the Utils: A highway traveled by wheels we didn't re-invent.