SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Regular Expressions
Part 2: Advanced Concepts
How repetition tokens match a test string?
Repetition tokens are greedy.
They continue to match until the last matching token.
Let’s check with a valid HTML. http://rubular.com/r/nVoDVeAafp
How do we solve this greediness?
How to fix greediness?
Quick fix is to use laziness. By adding a ? after +.
So, <.+?> matches only the HTML tags. Check
http://rubular.com/r/yoEJztaClW
A better alternative is to use negative character
class. <[^>]+>. This is much more efficient in terms
of backtracking and hence returns results faster.
Check http://rubular.com/r/WHjIrJW3v7
Possessive Quantifiers
Greedy tokens match as many repeats as
possible. Lazy tokens match as few repeats
as possible. Then try permutations by
backtracking to match the test string.
Possessive quantifiers, on the other hand,
hold whatever was matched once and
forget the backtracking position. So the
regex engine returns as soon as there is no
match and doesn’t backtrack.
/D*+g/ /string/
Why??
Because, D*+ matches all of
string and unlike lazy/greedy
tokens, Possessive quantifiers
can’t backtrack. Therefore a
permutation to match strin
with repeat tokens & g as
literal character is never tried.
Repetition Quantifiers
Property Token Backtracks
Greedy
(Default)
*,+,?,{m,n} Yes
Lazy *?,+?,??,
{m,n}?
Yes
Possessive *+,++,?+,
{m,n}+
No
Rest are
possessive!
:)
^d*D+.?S{1,10}[^0-9]+$
Recap
Repetition tokens are Greedy but
Regular Expression Engines are Eager.tip
^d*D+.?S{1,10}[^0-9]+$
Recap
Challenge 1:
Construct A
Regex to
match ip (v4)
address
Alternation
Lowest precedence among all regex
operators.
Matches single one of the many regexes.
/I have a cat but no dog./
/I have three clown fish as pet./
What’s your pet?
cat|dog|fish
Word Boundaries: Zero Length Assertions
There are three different positions that qualify
as word boundaries:
● Before the first character in the string,
if the first character is a word
character.
● After the last character in the string, if
the last character is a word character.
● Between two characters in the string,
where one is a word character and the
other is not a word character.
/bw+b/ /bat=cat/
/Bw+B/ /bat=cat/
Groups/Backreference
Token Property Regex Example Test String
(group) Club characters
together as one unit
/work(shop)?/ I work at a computer
workshop.
1 Default numeric
reference for a group
/(w+)=1/ Is cat=bat or rat=rat?
(?<n>group) Named groups /(?<word>w+)/ $!, cat eats rat.
k{n} Named reference for
a group
/(?<a1>w+)=k{a1}/ Is cat=bat or rat=rat?
(?:group) Non-capturing
groups
/work(?:shop)?/ I work at a computer
workshop.
(?>group) Atomic groups /a(?>bc|b)c/ abbc, abc
What’s your language?
c|c++|java|javascript
/I use java for android
development and javascript for
everything else./
Alternation/Word Boundary/Groups
What’s your language?
c|c++|java|javascript
/I use java for android
development and javascript for
everything else./
Challenge 2:
will this
regex ever
match c++ and
javascript?
Fix it to be
“inclusive”.
Revisiting HTML
tags
/^(.*)?@d+.d{2,3}$/
Regex for matching an email address
/^(.*)?@d+.d{2,3}$/
Regex for matching an email address
Challenge 3:
Fix the REGEX!
Lookaround Token - Lookahead & Lookbehind
Token Property Regex Example Test String
(?=text) Positive lookahead q(?=u)D+ question, Iraq
(?!text) Negative lookahead q(?!u) qatar, Iraq, question
(?>=text) Positive lookbehind (?<=a)b cab, bed, debt
(?>!text) Negative lookbehind (?<!a)b cab, bed, debt
Let’s try at http://rubular.com/r/cMuagzut6g
Unicode encoding Sample character Regex Unicode Regex
Encoded as 2 code
points
å =
U+0061(a)U+0300(`)
^..$ P{M}p{M}*+ or
(>P{M}p{M}*)
Encoded as one
code point
U+00E0 &.$ u00E0
Any unicode
character
Punctuation mark,
numerals etc
.|.. X
How does a regex engine
work?
Mathematics Behind Regex
● Originated in 1956, when mathematician Stephen
Cole Kleene described regular languages using
his mathematical notation called regular sets.
● Entered popular use from 1968 in two uses:
pattern matching in a text editor and lexical
analysis in a compiler.
● Among first uses, Ken Thompson, implemented
first Regex engine into QED editor and later in
UNIX editor ed. That led to `grep`. Guess what
grep is: g/re/p
Applications
Wait, there’s more recursion, subroutines.
You can even match palindrome strings in ruby and Perl
using regex!
https://engineering.linkedin.com/puzzle
Let’s take away with the homework
Questions?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Javascript basics
Javascript basicsJavascript basics
Javascript basics
 
Ruby on Rails Presentation
Ruby on Rails PresentationRuby on Rails Presentation
Ruby on Rails Presentation
 
Introduction to Javascript
Introduction to JavascriptIntroduction to Javascript
Introduction to Javascript
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
CS8651 IP Unit 3.pptx
CS8651 IP Unit 3.pptxCS8651 IP Unit 3.pptx
CS8651 IP Unit 3.pptx
 
Javascript essentials
Javascript essentialsJavascript essentials
Javascript essentials
 
JavaScript Promises
JavaScript PromisesJavaScript Promises
JavaScript Promises
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Php and MySQL
Php and MySQLPhp and MySQL
Php and MySQL
 
HTTP by Hand: Exploring HTTP/1.0, 1.1 and 2.0
HTTP by Hand: Exploring HTTP/1.0, 1.1 and 2.0HTTP by Hand: Exploring HTTP/1.0, 1.1 and 2.0
HTTP by Hand: Exploring HTTP/1.0, 1.1 and 2.0
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Php.ppt
Php.pptPhp.ppt
Php.ppt
 
Regular expression
Regular expressionRegular expression
Regular expression
 
JavaScript Control Statements II
JavaScript Control Statements IIJavaScript Control Statements II
JavaScript Control Statements II
 
Javascript
JavascriptJavascript
Javascript
 
PHP
PHPPHP
PHP
 
How to Make HTML and CSS Files
How to Make HTML and CSS FilesHow to Make HTML and CSS Files
How to Make HTML and CSS Files
 
Java script
Java scriptJava script
Java script
 
Html Ppt
Html PptHtml Ppt
Html Ppt
 
Ajax presentation
Ajax presentationAjax presentation
Ajax presentation
 

Similar a Advanced regular expressions

Data weave 2.0 advanced (recursion, pattern matching)
Data weave 2.0   advanced (recursion, pattern matching)Data weave 2.0   advanced (recursion, pattern matching)
Data weave 2.0 advanced (recursion, pattern matching)ManjuKumara GH
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameAntony Stubbs
 
The Java Script Programming Language
The  Java Script  Programming  LanguageThe  Java Script  Programming  Language
The Java Script Programming Languagezone
 
Les origines de Javascript
Les origines de JavascriptLes origines de Javascript
Les origines de JavascriptBernard Loire
 
Javascript by Yahoo
Javascript by YahooJavascript by Yahoo
Javascript by Yahoobirbal
 
The JavaScript Programming Language
The JavaScript Programming LanguageThe JavaScript Programming Language
The JavaScript Programming LanguageRaghavan Mohan
 
Java script final presentation
Java script final presentationJava script final presentation
Java script final presentationAdhoura Academy
 
c++ Data Types and Selection
c++ Data Types and Selectionc++ Data Types and Selection
c++ Data Types and SelectionAhmed Nobi
 
Introduction to Boost regex
Introduction to Boost regexIntroduction to Boost regex
Introduction to Boost regexYongqiang Li
 
A limited guide to intermediate and advanced Ruby
A limited guide to intermediate and advanced RubyA limited guide to intermediate and advanced Ruby
A limited guide to intermediate and advanced RubyVysakh Sreenivasan
 
Mario Fusco - Lazy Java - Codemotion Milan 2018
Mario Fusco - Lazy Java - Codemotion Milan 2018Mario Fusco - Lazy Java - Codemotion Milan 2018
Mario Fusco - Lazy Java - Codemotion Milan 2018Codemotion
 

Similar a Advanced regular expressions (20)

Data weave 2.0 advanced (recursion, pattern matching)
Data weave 2.0   advanced (recursion, pattern matching)Data weave 2.0   advanced (recursion, pattern matching)
Data weave 2.0 advanced (recursion, pattern matching)
 
Scala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love GameScala Language Intro - Inspired by the Love Game
Scala Language Intro - Inspired by the Love Game
 
Javascript
JavascriptJavascript
Javascript
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
The Java Script Programming Language
The  Java Script  Programming  LanguageThe  Java Script  Programming  Language
The Java Script Programming Language
 
Les origines de Javascript
Les origines de JavascriptLes origines de Javascript
Les origines de Javascript
 
Javascript by Yahoo
Javascript by YahooJavascript by Yahoo
Javascript by Yahoo
 
The JavaScript Programming Language
The JavaScript Programming LanguageThe JavaScript Programming Language
The JavaScript Programming Language
 
Java script final presentation
Java script final presentationJava script final presentation
Java script final presentation
 
Grep Introduction
Grep IntroductionGrep Introduction
Grep Introduction
 
c++ Data Types and Selection
c++ Data Types and Selectionc++ Data Types and Selection
c++ Data Types and Selection
 
Introduction to Boost regex
Introduction to Boost regexIntroduction to Boost regex
Introduction to Boost regex
 
JavaScript.pptx
JavaScript.pptxJavaScript.pptx
JavaScript.pptx
 
Ruby Gotchas
Ruby GotchasRuby Gotchas
Ruby Gotchas
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
A limited guide to intermediate and advanced Ruby
A limited guide to intermediate and advanced RubyA limited guide to intermediate and advanced Ruby
A limited guide to intermediate and advanced Ruby
 
Lazy Java
Lazy JavaLazy Java
Lazy Java
 
Mario Fusco - Lazy Java - Codemotion Milan 2018
Mario Fusco - Lazy Java - Codemotion Milan 2018Mario Fusco - Lazy Java - Codemotion Milan 2018
Mario Fusco - Lazy Java - Codemotion Milan 2018
 
Lazy java
Lazy javaLazy java
Lazy java
 
Lazy Java
Lazy JavaLazy Java
Lazy Java
 

Más de Neha Jain

Bringing HTML5 uploads to SlideShare
Bringing HTML5 uploads to SlideShareBringing HTML5 uploads to SlideShare
Bringing HTML5 uploads to SlideShareNeha Jain
 
The power of happiness
The power of happinessThe power of happiness
The power of happinessNeha Jain
 
Houserentreceipt
HouserentreceiptHouserentreceipt
HouserentreceiptNeha Jain
 
Slideshareistestinginfographics regularupload
Slideshareistestinginfographics regularuploadSlideshareistestinginfographics regularupload
Slideshareistestinginfographics regularuploadNeha Jain
 
House rent receipt
House rent receiptHouse rent receipt
House rent receiptNeha Jain
 
Byte of vim_v051
Byte of vim_v051Byte of vim_v051
Byte of vim_v051Neha Jain
 

Más de Neha Jain (7)

FindIn
FindInFindIn
FindIn
 
Bringing HTML5 uploads to SlideShare
Bringing HTML5 uploads to SlideShareBringing HTML5 uploads to SlideShare
Bringing HTML5 uploads to SlideShare
 
The power of happiness
The power of happinessThe power of happiness
The power of happiness
 
Houserentreceipt
HouserentreceiptHouserentreceipt
Houserentreceipt
 
Slideshareistestinginfographics regularupload
Slideshareistestinginfographics regularuploadSlideshareistestinginfographics regularupload
Slideshareistestinginfographics regularupload
 
House rent receipt
House rent receiptHouse rent receipt
House rent receipt
 
Byte of vim_v051
Byte of vim_v051Byte of vim_v051
Byte of vim_v051
 

Último

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 

Último (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 

Advanced regular expressions

  • 1. Regular Expressions Part 2: Advanced Concepts
  • 2. How repetition tokens match a test string? Repetition tokens are greedy. They continue to match until the last matching token. Let’s check with a valid HTML. http://rubular.com/r/nVoDVeAafp How do we solve this greediness?
  • 3. How to fix greediness? Quick fix is to use laziness. By adding a ? after +. So, <.+?> matches only the HTML tags. Check http://rubular.com/r/yoEJztaClW A better alternative is to use negative character class. <[^>]+>. This is much more efficient in terms of backtracking and hence returns results faster. Check http://rubular.com/r/WHjIrJW3v7
  • 4. Possessive Quantifiers Greedy tokens match as many repeats as possible. Lazy tokens match as few repeats as possible. Then try permutations by backtracking to match the test string. Possessive quantifiers, on the other hand, hold whatever was matched once and forget the backtracking position. So the regex engine returns as soon as there is no match and doesn’t backtrack. /D*+g/ /string/ Why?? Because, D*+ matches all of string and unlike lazy/greedy tokens, Possessive quantifiers can’t backtrack. Therefore a permutation to match strin with repeat tokens & g as literal character is never tried.
  • 5. Repetition Quantifiers Property Token Backtracks Greedy (Default) *,+,?,{m,n} Yes Lazy *?,+?,??, {m,n}? Yes Possessive *+,++,?+, {m,n}+ No Rest are possessive! :)
  • 6. ^d*D+.?S{1,10}[^0-9]+$ Recap Repetition tokens are Greedy but Regular Expression Engines are Eager.tip
  • 8. Alternation Lowest precedence among all regex operators. Matches single one of the many regexes. /I have a cat but no dog./ /I have three clown fish as pet./ What’s your pet? cat|dog|fish
  • 9. Word Boundaries: Zero Length Assertions There are three different positions that qualify as word boundaries: ● Before the first character in the string, if the first character is a word character. ● After the last character in the string, if the last character is a word character. ● Between two characters in the string, where one is a word character and the other is not a word character. /bw+b/ /bat=cat/ /Bw+B/ /bat=cat/
  • 10. Groups/Backreference Token Property Regex Example Test String (group) Club characters together as one unit /work(shop)?/ I work at a computer workshop. 1 Default numeric reference for a group /(w+)=1/ Is cat=bat or rat=rat? (?<n>group) Named groups /(?<word>w+)/ $!, cat eats rat. k{n} Named reference for a group /(?<a1>w+)=k{a1}/ Is cat=bat or rat=rat? (?:group) Non-capturing groups /work(?:shop)?/ I work at a computer workshop. (?>group) Atomic groups /a(?>bc|b)c/ abbc, abc
  • 11. What’s your language? c|c++|java|javascript /I use java for android development and javascript for everything else./
  • 12. Alternation/Word Boundary/Groups What’s your language? c|c++|java|javascript /I use java for android development and javascript for everything else./ Challenge 2: will this regex ever match c++ and javascript? Fix it to be “inclusive”.
  • 15. /^(.*)?@d+.d{2,3}$/ Regex for matching an email address Challenge 3: Fix the REGEX!
  • 16. Lookaround Token - Lookahead & Lookbehind Token Property Regex Example Test String (?=text) Positive lookahead q(?=u)D+ question, Iraq (?!text) Negative lookahead q(?!u) qatar, Iraq, question (?>=text) Positive lookbehind (?<=a)b cab, bed, debt (?>!text) Negative lookbehind (?<!a)b cab, bed, debt Let’s try at http://rubular.com/r/cMuagzut6g
  • 17.
  • 18. Unicode encoding Sample character Regex Unicode Regex Encoded as 2 code points å = U+0061(a)U+0300(`) ^..$ P{M}p{M}*+ or (>P{M}p{M}*) Encoded as one code point U+00E0 &.$ u00E0 Any unicode character Punctuation mark, numerals etc .|.. X
  • 19. How does a regex engine work?
  • 20. Mathematics Behind Regex ● Originated in 1956, when mathematician Stephen Cole Kleene described regular languages using his mathematical notation called regular sets. ● Entered popular use from 1968 in two uses: pattern matching in a text editor and lexical analysis in a compiler. ● Among first uses, Ken Thompson, implemented first Regex engine into QED editor and later in UNIX editor ed. That led to `grep`. Guess what grep is: g/re/p
  • 22. Wait, there’s more recursion, subroutines. You can even match palindrome strings in ruby and Perl using regex!