A short description of Perly grammar processors leading up to Regexp::Grammars. Develops two R::G modules, one for single-line logfile entries, another for larger FASTA format entries in the NCBI "nr.gz" file. The second example shows how to derive one grammar from another by overriding tags in the base grammar.
2. Grammars are the guts of compilers
● Compilers convert text from one form to another.
– C compilers convert C source to CPU-specific assembly.
– Databases compile SQL into RDBMS op's.
● Grammars define structure, precedence, valid inputs.
– Realistic ones are often recursive or context-sensitive.
– The complexity in defining grammars led to a variety of tools for defining
them.
– The standard format for a long time has been “BNF”, which is the input to
YACC.
● They are wasted on for 'flat text'.
– If “split /t/” does the job skip grammars entirely.
3. The first Yet Another: YACC
● Yet Another Compiler Compiler
– YACC takes in a standard-format grammar structure.
– It processes tokens and their values, organizing the
results according to the grammar into a structure.
● Between the source and YACC is a tokenizer.
– This parses the inputs into individual tokens defined by
the grammar.
– It doesn't know about structure, only breaking the text
stream up into tokens.
4. Parsing is a pain in the lex
● The real pain is gluing the parser and tokenizer
together.
– Tokenizers deal in the language of patterns.
– Grammars are defined in terms of structure.
● Passing data between them makes for most of the
difficulty.
– One issue is the global yylex call, which makes having
multiple parsers difficult.
– Context-sensitive grammars with multiple sub-
grammars are painful.
5. The perly way
● Regexen, logic, glue... hmm... been there before.
– The first approach most of us try is lexing with regexen.
– Then add captures and if-blocks or excute (?{code})
blocks inside of each regex.
● The problem is that the grammar is embedded in
your code structure.
– You have to modify the code structure to change the
grammar or its tokens.
– Hubris, maybe, but Truly Lazy it ain't.
– Was the whole reason for developing standard
grammars & their handlers in the first place.
6. Early Perl Grammar Modules
● These take in a YACC grammar and spit out
compiler code.
● Intentionally looked like YACC:
– Able to re-cycle existing YACC grammar files.
– Benefit from using Perl as a built-in lexer.
– Perl-byacc & Parse::Yapp.
● Good: Recycles knowledge for YACC users.
● Bad: Still not lazy: The grammars are difficult to
maintain and you still have to plug in post-
processing code to deal with the results.
8. The Swiss Army Chainsaw
● Parse::RecDescent extended the original BNF
syntax, combining the tokens & handlers.
● Grammars are largely declarative, using OO Perl to
do the heavy lifting.
– OO interface allows multiple, context sensitive parsers.
– Rules with Perl blocks allows the code to do anything.
– Results can be acquired from a hash, an array, or $1.
– Left, right, associative tags simplify messy situations.
9. Example P::RD
● This is part
of an infix
formula
compiler I
wrote.
● It compiles
equations to
a sequence
of closures.
add_op : '+' | '-' | '%' { $item[ 1 ] }
mult_op : '*' | '/' | '^' { $item[ 1 ] }
add : <leftop: mult add_op mult>
{
compile_binop @{ $item[1] }
}
mult : <leftop: factor mult_op factor>
{
compile_binop @{ $item[1] }
}
10. Just enough rope to shoot yourself...
● The biggest problem: P::RD is sloooooooowsloooooooow.
● Learning curve is perl-ish: shallow and long.
– Unless you really know what all of it does you may not
be able to figure out the pieces.
– Lots of really good docs that most people never read.
● Perly blocks also made it look too much like a job-
dispatcher.
– People used it for a lot of things that are not compilers.
– Good & Bad thing: it really is a compiler.
11. R.I.P. P::RD
● Supposed to be replaced with Parse::FastDescent.
– Damian dropped work on P::FD for Perl6.
– His goal was to replace the shortcomings with P::RD with
something more complete, and quite a bit faster.
● The result is Perl6 Grammars.
– Declarative syntax extends matching with rules.
– Built into Perl6 as a structure, not an add-on.
– Much faster.
– Not available in Perl5
12. Regex::Grammars
● Perl5 implementation derived from Perl6.
– Back-porting an idea, not the Perl6 syntax.
– Much better performance than P::RD.
● Extends the v5.10 recursive matching syntax,
leveraging the regex engine.
– Most of the speed issues are with regex design, not the
parser itself.
– Simplifies mixing code and matching.
– Single place to get the final results.
– Cleaner syntax with automatic whitespace handling.
13. Extending regexen
● “use Regexp::Grammar” turns on added syntax.
– block-scoped (avoids collisions with existing code).
● You will probably want to add “xm” or “xs”
– extended syntax avoids whitespace issues.
– multi-line mode (m) simplifies line anchors for line-
oriented parsing.
– single-line mode (s) makes ignoring line-wrap
whitespace largely automatic.
– I use “xm” with explicit “n” or “s” matches to span
lines where necessary.
14. What you get
● The parser is simply a regex-ref.
– You can bless it or have multiple parsers for context
grammars.
● Grammars can reference one another.
– Extending grammars via objects or modules is
straightforward.
● Comfortable for incremental development or
refactoring.
– Largely declarative syntax helps.
– OOP provides inheritance with overrides for rules.
15. my $compiler
= do
{
use Regexp::Grammars;
qr
{
<data>
<rule: data > <[text]>+
<rule: text > .+
}xm
};
Example: Creating a compiler
● Context can be
a do-block,
subroutine, or
branch logic.
● “data” is the
entry rule.
● All this does is
read lines into
an array with
automatic ws
handling.
16. Results: %/
● The results of parsing are in a tree-hash named %/.
– Keys are the rule names that produced the results.
– Empty keys ('') hold input text (for errors or
debugging).
– Easy to handle with Data::Dumper.
● The hash has at least one key for the entry rule, one
empty key for input data if context is being saved.
● For example, feeding two lines of a Gentoo emerge
log through the line grammar gives:
17. {
'' => '1367874132: Started emerge on: May 06, 2013
21:02:12
1367874132: *** emerge --jobs --autounmask-write --keep-
going --load-average=4.0 --complete-graph --with-bdeps=y
--deep talk',
data =>
{
'' => '1367874132: Started emerge on: May 06, 2013
21:02:12
1367874132: *** emerge --jobs --autounmask-write --keep-
going --load-average=4.0 --complete-graph --with-bdeps=y
--deep talk',
text =>
[
'1367874132: Started emerge on: May 06, 2013
21:02:12',
'
1367874132: *** emerge --jobs --autounmask-write --keep-
going --load-average=4.0 --complete-graph --with-bdeps=y
--deep talk'
]
Parsing a few lines of logfile
18. Getting rid of context
● The empty-keyed values are useful for
development or explicit error messages.
● They also get in the way and can cost a lot of
memory on large inputs.
● You can turn them on and off with <context:> and
<nocontext:> in the rules.
19. qr
{
<nocontext:> # turn off globally
<data>
<rule: data > <text>+ # oops, left off the []!
<rule: text > .+
}xm;
warn | Repeated subrule <text>+ will only capture its
final match
| (Did you mean <[text]>+ instead?)
|
{
data => {
text => '
1367874132: *** emerge --jobs --autounmask-write --keep-
going --load-average=4.0 --complete-graph --with-bdeps=y
--deep talk'
}
}
You usually want [] with +
20. {
data =>
{
text => the [text] parses to an array of text
[
'1367874132: Started emerge on: May 06, 2013 21:02:12',
'
1367874132: *** emerge --jobs --autounmask-write –...
],
...
qr
{
<nocontext:> # turn off globally
<data>
<rule: data > <[text]>+
<rule: text > (.+)
}xm;
An array[ref] of text
21. Breaking up lines
● Each log entry is prefixed with an entry id.
● Parsing the ref_id off the front adds:
<data>
<rule: data > <[line]>+
<rule: line > <ref_id> <[text]>
<token: ref_id > ^(d+)
<rule: text > .+
line =>
[
{
ref_id => '1367874132',
text => ': Started emerge on: May 06, 2013 21:02:12'
},
…
]
22. Removing cruft: “ws”
● Be nice to remove the leading “: “ from text lines.
● In this case the “whitespace” needs to include a
colon along with the spaces.
● Whitespace is defined by <ws: … >
<rule: line> <ws:[s:]+> <ref_id> <text>
{
ref_id => '1367874132',
text => '*** emerge --jobs –autounmask-wr...
}
23. The '***' prefix means something
● Be nice to know what type of line was being
processed.
● <prefix= regex > asigns the regex's capture to the
“prefix” tag:
<rule: line > <ws:[s:]*> <ref_id> <entry>
<rule: entry >
<prefix=([*][*][*])> <text>
|
<prefix=([>][>][>])> <text>
|
<prefix=([=][=][=])> <text>
|
<prefix=([:][:][:])> <text>
|
<text>
24. {
entry => {
text => 'Started emerge on: May 06, 2013 21:02:12'
},
ref_id => '1367874132'
},
{
entry => {
prefix => '***',
text => 'emerge --jobs –autounmask-write...
},
ref_id => '1367874132'
},
{
entry => {
prefix => '>>>',
text => 'emerge (1 of 2) sys-apps/...
},
ref_id => '1367874256'
}
“entry” now contains optional prefix
25. Aliases can also assign tag results
● Aliases assign a
key to rule
results.
● The match from
“text” is aliased
to a named type
of log entry.
<rule: entry>
<prefix=([*][*][*])> <command=text>
|
<prefix=([>][>][>])> <stage=text>
|
<prefix=([=][=][=])> <status=text>
|
<prefix=([:][:][:])> <final=text>
|
<message=text>
27. Parsing without capturing
● At this point we don't really need the prefix strings
since the entries are labeled.
● A leading '.' tells R::G to parse but not store the
results in %/:
<rule: entry >
<.prefix=([*][*][*])> <command=text>
|
<.prefix=([>][>][>])> <stage=text>
|
<.prefix=([=][=][=])> <status=text>
|
<.prefix=([:][:][:])> <final=text>
|
<message=text>
29. The “entry” nesting gets in the way
● The named subrule is not hard to get rid of: just
move its syntax up one level:
<ws:[s:]*> <ref_id>
(
<.prefix=([*][*][*])> <command=text>
|
<.prefix=([>][>][>])> <stage=text>
|
<.prefix=([=][=][=])> <status=text>
|
<.prefix=([:][:][:])> <final=text>
|
<message=text>
)
30. data => {
line => [
{
message => 'Started emerge on: May 06, 2013 21:02:12',
ref_id => '1367874132'
},
{
command => 'emerge --jobs --autounmask-write --keep-
going --load-average=4.0 --complete-graph --with-bdeps=y --deep
talk',
ref_id => '1367874132'
},
{
command => 'terminating.',
ref_id => '1367874133'
},
{
message => 'Started emerge on: May 06, 2013 21:02:17',
ref_id => '1367874137'
},
Result: array of “line” with ref_id & type
31. Funny names for things
● Maybe “command” and “status” aren't the best way
to distinguish the text.
● You can store an optional token followed by text:
<rule: entry > <ws:[s:]*> <ref_id> <type>? <text>
<token: type>
(
[*][*][*]
|
[>][>][>]
|
[=][=][=]
|
[:][:][:]
)
32. Entrys now have “text” and “type”
entry => [
{
ref_id => '1367874132',
text => 'Started emerge on: May 06, 2013 21:02:12'
},
{
ref_id => '1367874133',
text => 'terminating.',
type => '***'
},
{
ref_id => '1367874137',
text => 'Started emerge on: May 06, 2013 21:02:17'
},
{
ref_id => '1367874137',
text => 'emerge --jobs --autounmask-write –...
type => '***'
},
33. prefix alternations look ugly.
● Using a count works:
[*]{3} | [>]{3} | [:]{3} | [=]{3}
but isn't all that much more readable.
● Given the way these are used, use a block:
[*>:=] {3}
34. qr
{
<nocontext:>
<data>
<rule: data > <[entry]>+
<rule: entry >
<ws:[s:]*>
<ref_id> <prefix>? <text>
<token: ref_id > ^(d+)
<token: prefix > [*>=:]{3}
<token: text > .+
}xm;
This is the skeleton parser:
● Doesn't take much:
– Declarative syntax.
– No Perl code at all!
● Easy to modify by
extending the
definition of “text”
for specific types of
messages.
35. Finishing the parser
● Given the different line types it will be useful to
extract commands, switches, outcomes from
appropriate lines.
– Sub-rules can be defined for the different line types.
<rule: command> “emerge”
<.ws><[switch]>+
<token: switch> ([-][-]S+)
● This is what makes the grammars useful: nested,
context-sensitive content.
36. Inheriting & Extending Grammars
● <grammar: name> and <extends: name> allow a
building-block approach.
● Code can assemble the contents of for a qr{} without
having to eval or deal with messy quote strings.
● This makes modular or context-sensitive grammars
relatively simple to compose.
– References can cross package or module boundaries.
– Easy to define a basic grammar in one place and reference
or extend it from multiple other parsers.
37. The Non-Redundant File
● NCBI's “nr.gz” file is a list if sequences and all of
the places they are known to appear.
● It is moderately large: 140+GB uncompressed.
● The file consists of a simple FASTA format with
heading separated by ctrl-A char's:
>Heading 1
[amino-acid sequence characters...]
>Heading 2
...
38. Example: A short nr.gz FASTA entry
● Headings are grouped by species, separated by ctrl-A
(“cA”) characters.
– Each species has a set of sources & identifier pairs
followed by a single description.
– Within-species separator is a pipe (“|”) with optional
whitespace.
– Species counts in some header run into the thousands.
>gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827
[Dictyostelium discoideum AX4]gi|1705556|sp|P54670.1|CAF1_DICDI
RecName: Full=Calfumirin-1; Short=CAF-1gi|793761|dbj|BAA06266.1|
calfumirin-1 [Dictyostelium discoideum]gi|60470106|gb|EAL68086.1|
hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4]
MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQ...
KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEK...
VQKLLNPDQ
39. First step: Parse FASTA
qr
{
<grammar: Parse::Fasta>
<nocontext:>
<rule: fasta > <.start> <head> <.ws> <[body]>+
<rule: head > .+ <.ws>
<rule: body > ( <[seq]> | <.comment> ) <.ws>
<token: start > ^ [>]
<token: comment > ^ [;] .+
<token: seq > ^ [nw-]+
}xm;
● Instead of defining an entry rule, this just defines a
name “Parse::Fasta”.
– This cannot be used to generate results by itself.
– Accessible anywhere via Rexep::Grammars.
40. The output needs help, however.
● The “<seq>” token captures newlines that need to be
stripped out to get a single string.
● Munging these requires adding code to the parser using
Perl's regex code-block syntax: (?{...})
– Allows inserting almost-arbitrary code into the regex.
– “almost” because the code cannot include regexen.
seq =>
[ 'MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYD
KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDP
VQKLLNPDQ
'
]
41. Munging results: $MATCH
● The $MATCH and %MATCH can be assigned to alter
the results from the current or lower levels of the parse.
● In this case I take the “seq” match contents out of %/,
join them with nothing, and use “tr” to strip the
newlines.
– join + split won't work because split uses a regex.
<rule: body > ( <[seq]> | <.comment> ) <.ws>
(?{
$MATCH = join '' => @{ delete $MATCH{ seq } };
$MATCH =~ tr/n//d;
})
42. One more step: Remove the arrayref
● Now the body is a single string.
● No need for an arrayref to contain one string.
● Since the body has one entry, assign offset zero:
body =>
[
'MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDK
DNDGKITIKELAGDIDFDKALKEYKEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDT
KDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQKVQKLLNPDQ'
],
<rule: fasta> <.start> <head> <.ws> <[body]>+
(?{
$MATCH{ body } = $MATCH{ body }[0];
})
43. Result: a generic FASTA parser.
{
fasta => [
{
body =>
'MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDK
DNDGKITIKELAGDIDFDKALKEYKEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDIT
KDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQKVQKLLNPDQ',
head => 'gi|66816243|ref|XP_642131.1| hypothetical p
rotein DDB_G0277827 [Dictyostelium discoideum AX4]gi|1705556
|sp|P54670.1|CAF1_DICDI RecName: Full=Calfumirin-1; Short=C
AF-1gi|793761|dbj|BAA06266.1| calfumirin-1 [Dictyostelium
discoideum]gi|60470106|gb|EAL68086.1| hypothetical protein
DDB_G0277827 [Dictyostelium discoideum AX4]
'
}
]
}
● The head and body are easily accessible.
● Next: parse the nr-specific header.
44. Deriving a grammar
● Existing grammars are “extended”.
● The derived grammars are capable of producing
results.
● In this case:
● References the grammar and extracts a list of fasta
entries.
<extends: Parse::Fasta>
<[fasta]>+
45. Splitting the head into identifiers
● Overloading fasta's “head” rule handles allows
splitting identifiers for individual species.
● Catch: cA is separator, not a terminator.
– The tail item on the list does't have a cA to anchor on.
– Using “.+[cAn] walks off the header onto the sequence.
– This is a common problem with separators & tokenizers.
– This can be handled with special tokens in the grammar,
but R::G provides a cleaner way.
46. First pass: Literal “tail” item
● This works but is ugly:
– Have two rules for the main list and tail.
– Alias the tail to get them all in one place.
<rule: head> <[ident]>+ <[ident=final]>
(?{
# remove the matched anchors
tr/cAn//d for @{ $MATCH{ ident } };
})
<token: ident > .+? cA
<token: final > .+ n
47. Breaking up the header
● The last header item is aliased to “ident”.
● Breaks up all of the entries:
head => {
ident => [
'gi|66816243|ref|XP_642131.1| hypothetical protein
DDB_G0277827 [Dictyostelium discoideum AX4]',
'gi|1705556|sp|P54670.1|CAF1_DICDI RecName:
Full=Calfumirin-1; Short=CAF-1',
'gi|793761|dbj|BAA06266.1| calfumirin-1
[Dictyostelium discoideum]',
'gi|60470106|gb|EAL68086.1| hypothetical protein
DDB_G0277827 [Dictyostelium discoideum AX4]'
]
}
48. Dealing with separators: '% <sep>
● Separators happen often enough:
– 1, 2, 3 , 4 ,13, 91 # numbers by commas, spaces
– g-c-a-g-t-t-a-c-a # characters by dashes
– /usr/local/bin # basenames by dir markers
– /usr:/usr/local:bin # dir's separated by colons
that R::G has special syntax for dealing with them.
● Combining the item with '%' and a seprator:
<rule: list> <[item]>+ % <separator> # one-or-more
<rule: list_zom> <[item]>* % <separator> # zero-or-more
49. Cleaner nr.gz header rule
● Separator syntax cleans things up:
– No more tail rule with an alias.
– No code block required to strip the separators and trailing
newline.
– Non-greedy match “.+?” avoids capturing separators.
qr
{
<nocontext:>
<extends: Parse::Fasta>
<[fasta]>+
<rule: head > <[ident]>+ % [cA]
<token: ident > .+?
}xm
50. Nested “ident” tag is extraneous
● Simpler to replace the “head” with a list of
identifiers.
● Replace $MATCH from the “head” rule with the
nested identifier contents:
qr
{
<nocontext:>
<extends: Parse::Fasta>
<[fasta]>+
<rule: head > <[ident]>+ % [cA]
(?{
$MATCH = delete $MATCH{ ident };
})
<token: ident > .+?
}xm
51. Result:
{
fasta => [
{
body => 'MASTQNIVEEVQKMLDT...NPDQ',
head => [
'gi|66816243|ref|XP_6...rt=CAF-1',
'gi|793761|dbj|BAA0626...oideum]',
'gi|60470106|gb|EAL68086...m discoideum AX4]'
]
}
]
}
● The fasta content is broken into the usual “body” plus
a “head” broken down on cA boundaries.
● Not bad for a dozen lines of grammar with a few
lines of code:
52. One more level of structure: idents.
● Species have <source > | <identifier> pairs followed
by a description.
● Add a separator clause “ % (?:s*|s*)”
– This can be parsed into a hash something like:
gi|66816243|ref|XP_642131.1|hypothetical ...
Becomes:
{
gi => '66816243',
ref => 'XP_642131.1',
desc => 'hypothetical...'
}
53. Munging the separated input
<fasta>
(?{
my $identz = delete $MATCH{ fasta }{ head }{ ident };
for( @$identz )
{
my $pairz = $_->{ taxa };
my $desc = pop @$pairz;
$_ = { @$pairz, desc => $desc }
}
$MATCH{ fasta }{ head } = $identz;
})
<rule: head > <[ident]>+ % [cA]
<token: ident > <[taxa]>+ % (?: s* [|] s* )
<token: taxa > .+?
54. Result: head with sources, “desc”
{
fasta => {
body => 'MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKR...EDQN',
head => [
{
desc => '30S ribosomal protein S18 [Lactococ...
gi => '15674171',
ref => 'NP_268346.1'
},
{
desc => '30S ribosomal protein S18 [Lactoco...
gi => '116513137',
ref => 'YP_812044.1'
},
...
55. Balancing R::G with calling code
● The regex engine could process all of nr.gz.
– Catch: <[fasta]>+ returns about 250_000 keys and literally
millions of total identifiers in the head's.
– Better approach: <fasta> on single entries, but chunking input
on '>' removes it as a leading charactor.
– Making it optional with <.start>? fixes the problem:
local $/ = '>';
while( my $chunk = readline )
{
chomp;
length $chunk or do { --$.; next };
$chunk =~ $nr_gz;
# process single fasta record in %/
}
56. Fasta base grammar: 3 lines of code
qr
{
<grammar: Parse::Fasta>
<nocontext:>
<rule: fasta > <.start>? <head> <.ws> <[body]>+
(?{
$MATCH{ body } = $MATCH{ body }[0];
})
<rule: head > .+ <.ws>
<rule: body > ( <[seq]> | <.comment> ) <.ws>
(?{
$MATCH = join '' => @{ delete $MATCH{ seq } };
$MATCH =~ tr/n//d;
})
<token: start > ^ [>]
<token: comment > ^ [;] .+
<token: seq > ^ ( [nw-]+ )
}xm;
57. Extension to Fasta: 6 lines of code.
qr
{
<nocontext:>
<extends: Parse::Fasta>
<fasta>
(?{
my $identz = delete $MATCH{ fasta }{ head }{ ident };
for( @$identz )
{
my $pairz = $_->{ taxa };
my $desc = pop @$pairz;
$_ = { @$pairz, desc => $desc };
}
$MATCH{ fasta }{ head } = $identz;
})
<rule: head > <[ident]>+ % [cA]
<rule: ident > <[taxa]>+ % (?: s* [|] s* )
<token: taxa > .+?
}xm
58. Result: Use grammars
● Most of the “real” work is done under the hood.
– Regexp::Grammars does the lexing, basic compilation.
– Code only needed for cleanups or re-arranging structs.
● Code can simplify your grammar.
– Too much code makes them hard to maintain.
– Trick is keeping the balance between simplicity in the
grammar and cleanup in the code.
● Either way, the result is going to be more
maintainable than hardwiring the grammar into code.
59. Aside: KwikFix for Perl v5.18
● v5.17 changed how the regex engine handles inline
code.
● Code that used to be eval-ed in the regex is now
compiled up front.
– This requires “use re 'eval'” and “no strict 'vars'”.
– One for the Perl code, the other for $MATCH and friends.
● The immediate fix for this is in the last few lines of
R::G::import, which push the pragmas into the caller:
● Look up $^H in perlvars to see how it works.
require re; re->import( 'eval' );
require strict; strict->unimport( 'vars' );
60. Use Regexp::Grammars
● Unless you have old YACC BNF grammars to
convert, the newer facility for defining the
grammars is cleaner.
– Frankly, even if you do have old grammars...
● Regexp::Grammars avoids the performance pitfalls
of P::RD.
– It is worth taking time to learn how to optimize NDF
regexen, however.
● Or, better yet, use Perl6 grammars, available today
at your local copy of Rakudo Perl6.
61. More info on Regexp::Grammars
● The POD is thorough and quite descriptive
[comfortable chair, enjoyable beverage suggested].
● The ./demo directory has a number of working – if
un-annotated – examples.
● “perldoc perlre” shows how recursive matching in
v5.10+.
● PerlMonks has plenty of good postings.
● Perl Review article by brian d foy on recursive
matching in Perl 5.10.