SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
Secrets of Regexp
      Hiro Asari
     Red Hat, Inc.
Let's Talk About
Regular Expressions
Let's Talk About
  Regular Expressions


• There is no regular expression
Let's Talk About
  Regular Expressions


• A good approximation as a name
Let's Talk About
     Regexp
Some people, when confronted
         with a problem, think, "I know,
          I'll use regular expressions."
        Now they have two problems.

                                                              Jaime Zawinski
                                                                 12 Aug, 1997




http://regex.info/blog/2006-09-15/247
http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-
problems.html

The point is not so much the evils of regular expressions, but the evils of overuse of it.
Formal Language
         Theory

• The Language L
• Over Alphabet Σ
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
• Words over Σ: "a", "b", "ab", "aequafdhfad"
Formal Language
          Theory

• Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
• Words over Σ: "a", "b", "ab", "aequafdhfad"
• Σ*: The set of all words over Σ
Formal Language
         over Σ

• A subset L of Σ* (with various properties)
• L can be finite, and enumerate well-formed
  words, but often infinite
Example

• Language L over Σ = {a,b}
• 'a' is a word
• a word may be obtained by appending 'ab'
  to an existing word
• only words thus formed are legal
Well-formed words
a
aab
aabab
Ill-formed words
b
aaaab
abb
Succinctly…


• a(ab)*
Expression

• Textual representation of the formal
  language against which an input is tested
  whether it is a well-formed word in that
  language
Regular Languages
• ∅ (empty language) is regular
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B
  (union), A•B (concatenation), and A*
  (Kleene star) are regular languages
Regular Languages
• ∅ (empty language) is regular
• For each a ∈ Σ (a belongs to Σ), the
  singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B
  (union), A•B (concatenation), and A*
  (Kleene star) are regular languages
• No other languages over Σ are regular.
Regular Expressions


• Expressions of regular languages
Regular Expressions



              ot
• Expressions of regular languages
             N
Regular? Expressions

• It turns out that some expressions are
  more powerful and expresses non-regular
  languages
• Language of 'squares': (.*)1
 • a, aa, aaaa, WikiWiki
How does Regexp
        work?

• Build a finite state automaton representing
  a given regular expression
• Feed the String to the regular expression
  and see if the match succeeds
a




a
ab*




 a

      b
.*




.
a$




a        $
a?




a

     ε
a|b



a



b
(ab|c)



a            b



      c
(ab+|c)

       b

a             b



       c
Match is attempted at
every character, left to
        right
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^
         zyxwvutsrqponmlkjihgfedcba
               ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
/a$/
         zyxwvutsrqponmlkjihgfedcba
         ^
         zyxwvutsrqponmlkjihgfedcba
           ^
         zyxwvutsrqponmlkjihgfedcba
             ^
         zyxwvutsrqponmlkjihgfedcba
               ^
         ⋮
         zyxwvutsrqponmlkjihgfedcba
                                  ^




Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward
to the end of the line
^s*(.*)s*$
         abc d a dfadg
^
     abc d a dfadg
 ^
      abc d a dfadg
     ^
      abc d a dfadg
      ^

# matches 'abc d a dfadg   '
a?a?a?…a?aaa…a
def pathological(n=5)
  Regexp.new('a?' * n + 'a' * n)
end


1.upto(40) do |n|
  print n, ": "
  print Time.now, "n" if 'a'*n =~ pathological(n)
end
a?a?a?aaa
aaa
^
Regexp tips
Use /x
UP_TO_256 = /b(?:25[0-5]   #   250-255
|2[0-4][0-9]                #   200-249
|1[0-9][0-9]                #   100-199
|[1-9][0-9]                 #   2-digit numbers
|[0-9])                     #   single-digit numbers
b/x

IPV4_ADDRESS = /#{UP_TO_256}(?:.#{UP_TO_256}){3}/
A, z for strings
       ^, $ for lines
• A: the beginning of the string
• z: the end of the string
• ^: after n
• $: before n
A, z for strings
       ^, $ for lines
• A: the beginning of the string
• z: the end of the string
• ^: after n
• $: before n                      always in Ruby
What's the problem?




also note the difference in what /m means
What's the problem?
         #! /usr/bin/env perl
         $a = "abcndef";
         if ($a =~ /^d/) {
           print "yesn";
         }
         if ($a =~ /^d/m) {
           print "yes nown";
         }
         # prints 'yes now'




also note the difference in what /m means
What's the problem?
         #! /usr/bin/env ruby

         a = "abcndef";
         if (a =~ /^d/)
           p "yes"
         end




http://guides.rubyonrails.org/security.html#regular-expressions
Security Implications
         class File < ActiveRecord::Base
           validates :name, :format => /^[w.-+]+$/
         end




http://guides.rubyonrails.org/security.html#regular-expressions
file.txt%0A<script>alert(‘hello’)</script>
file.txt%0A<script>alert(‘hello’)</script>
file.txtn<script>alert(‘hello’)</script>
file.txtn<script>alert(‘hello’)</script>


             /^[w.-+]+$/
file.txtn<script>alert(‘hello’)</script>


             /^[w.-+]+$/



            Match succeeds
    ActiveRecord validation succeeds
file.txtn<script>alert(‘hello’)</script>


            /A[w.-+]+z/
file.txtn<script>alert(‘hello’)</script>


            /A[w.-+]+z/



               Match fails
       ActiveRecord validation fails
Prefer Character Class
     to Alterations
require 'benchmark'

# simple benchmark for alternations and character class

n = 5_000

str = 'cafebabedeadbeef'*5_000

Benchmark.bmbm do |x|
     x.report('alternation') do
          str =~ /^(a|b|c|d|e|f)+$/
     end
     x.report('character class') do
          str =~ /^[a-f]+$/
     end
end
Benchmarks
Ruby 1.8.7
                      user     system      total         real
alternation       0.030000   0.010000   0.040000 (   0.036702)
character class   0.000000   0.000000   0.000000 (   0.004704)

Ruby 2.0.0
                      user     system      total         real
alternation       0.020000   0.010000   0.030000 (   0.023139)
character class   0.000000   0.000000   0.000000 (   0.009641)

JRuby 1.7.4.dev
                      user     system      total       real
alternation       0.030000   0.000000   0.030000 ( 0.021000)
character class   0.010000   0.000000   0.010000 ( 0.007000)
Beware of Character
                 Classes
         # case-insensitively match any non-word character…

         # one is unlike the others
         'r' =~ /(?i:[W])/
         's' =~ /(?i:[W])/     matches, even if 's' is a word character
         't' =~ /(?i:[W])/




https://bugs.ruby-lang.org/issues/4044
/^1?$|^(11+?)1+$/
/^1?$|^(11+?)1+$/
    Matches '1' or ''
/^1?$|^(11+?)1+$/
Non-greedily match 2 or more 1's
/^1?$|^(11+?)1+$/

1 or more additional times
/^1?$|^(11+?)1+$/

matches a composite number
/^1?$|^(11+?)1+$/
Matches a string of 1's if and only
if there are a non-prime # of 1's
Integer#prime?
          class Integer
            def prime?
              "1" * self !~ /^1?$|^(11+?)1+$/
            end
          end




                         No performance guarantee




Attributed a Perl hacker Abigail
• @hiro_asari
• Github: BanzaiMan

Más contenido relacionado

La actualidad más candente (7)

Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Intro to pattern matching in scala
Intro to pattern matching in scalaIntro to pattern matching in scala
Intro to pattern matching in scala
 
Hw1 rubycalisthenics
Hw1 rubycalisthenicsHw1 rubycalisthenics
Hw1 rubycalisthenics
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 

Similar a Regexp secrets

Lecture 23
Lecture 23Lecture 23
Lecture 23
rhshriva
 
Crash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmersCrash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmers
Gil Megidish
 

Similar a Regexp secrets (20)

Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Crash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmersCrash Course in Perl – Perl tutorial for C programmers
Crash Course in Perl – Perl tutorial for C programmers
 
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
my$talk=qr{((?:ir)?reg(?:ular )?exp(?:ressions?)?)}i;
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
regex.ppt
regex.pptregex.ppt
regex.ppt
 
Ruby for perl developers
Ruby for perl developersRuby for perl developers
Ruby for perl developers
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Perl_Tutorial_v1
Perl_Tutorial_v1Perl_Tutorial_v1
Perl_Tutorial_v1
 
Basic perl programming
Basic perl programmingBasic perl programming
Basic perl programming
 
Advanced Regular Expressions Redux
Advanced Regular Expressions ReduxAdvanced Regular Expressions Redux
Advanced Regular Expressions Redux
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 
Zend Certification Preparation Tutorial
Zend Certification Preparation TutorialZend Certification Preparation Tutorial
Zend Certification Preparation Tutorial
 
First steps in C-Shell
First steps in C-ShellFirst steps in C-Shell
First steps in C-Shell
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Cleancode
CleancodeCleancode
Cleancode
 
PHP_Lecture.pdf
PHP_Lecture.pdfPHP_Lecture.pdf
PHP_Lecture.pdf
 

Más de Hiro Asari

Rubyを持て、世界に出よう
Rubyを持て、世界に出ようRubyを持て、世界に出よう
Rubyを持て、世界に出よう
Hiro Asari
 

Más de Hiro Asari (7)

JRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' LivesJRuby: Enhancing Java Developers' Lives
JRuby: Enhancing Java Developers' Lives
 
JRuby and You
JRuby and YouJRuby and You
JRuby and You
 
Spring into rails
Spring into railsSpring into rails
Spring into rails
 
Rubyを持て、世界に出よう
Rubyを持て、世界に出ようRubyを持て、世界に出よう
Rubyを持て、世界に出よう
 
Pi
PiPi
Pi
 
Using Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRBUsing Java from Ruby with JRuby IRB
Using Java from Ruby with JRuby IRB
 
JRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the CloudJRuby, Ruby, Rails and You on the Cloud
JRuby, Ruby, Rails and You on the Cloud
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Regexp secrets

  • 1. Secrets of Regexp Hiro Asari Red Hat, Inc.
  • 3. Let's Talk About Regular Expressions • There is no regular expression
  • 4. Let's Talk About Regular Expressions • A good approximation as a name
  • 6. Some people, when confronted with a problem, think, "I know, I'll use regular expressions." Now they have two problems. Jaime Zawinski 12 Aug, 1997 http://regex.info/blog/2006-09-15/247 http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two- problems.html The point is not so much the evils of regular expressions, but the evils of overuse of it.
  • 7. Formal Language Theory • The Language L • Over Alphabet Σ
  • 8. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example)
  • 9. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example) • Words over Σ: "a", "b", "ab", "aequafdhfad"
  • 10. Formal Language Theory • Alphabet Σ={a, b, c, d, e, …, z, λ} (example) • Words over Σ: "a", "b", "ab", "aequafdhfad" • Σ*: The set of all words over Σ
  • 11. Formal Language over Σ • A subset L of Σ* (with various properties) • L can be finite, and enumerate well-formed words, but often infinite
  • 12. Example • Language L over Σ = {a,b} • 'a' is a word • a word may be obtained by appending 'ab' to an existing word • only words thus formed are legal
  • 16. Expression • Textual representation of the formal language against which an input is tested whether it is a well-formed word in that language
  • 17. Regular Languages • ∅ (empty language) is regular
  • 18. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
  • 19. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A and B are regular languages, then A ∪ B (union), A•B (concatenation), and A* (Kleene star) are regular languages
  • 20. Regular Languages • ∅ (empty language) is regular • For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. • If A and B are regular languages, then A ∪ B (union), A•B (concatenation), and A* (Kleene star) are regular languages • No other languages over Σ are regular.
  • 21. Regular Expressions • Expressions of regular languages
  • 22. Regular Expressions ot • Expressions of regular languages N
  • 23. Regular? Expressions • It turns out that some expressions are more powerful and expresses non-regular languages • Language of 'squares': (.*)1 • a, aa, aaaa, WikiWiki
  • 24. How does Regexp work? • Build a finite state automaton representing a given regular expression • Feed the String to the regular expression and see if the match succeeds
  • 25. a a
  • 26. ab* a b
  • 27. .* .
  • 28. a$ a $
  • 29. a? a ε
  • 31. (ab|c) a b c
  • 32. (ab+|c) b a b c
  • 33. Match is attempted at every character, left to right
  • 34. /a$/ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 35. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 36. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 37. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 38. /a$/ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ zyxwvutsrqponmlkjihgfedcba ^ ⋮ zyxwvutsrqponmlkjihgfedcba ^ Regexp does not think, 'a$' can match only at the end of the line, so we should fast forward to the end of the line
  • 39. ^s*(.*)s*$ abc d a dfadg ^ abc d a dfadg ^ abc d a dfadg ^ abc d a dfadg ^ # matches 'abc d a dfadg '
  • 40. a?a?a?…a?aaa…a def pathological(n=5) Regexp.new('a?' * n + 'a' * n) end 1.upto(40) do |n| print n, ": " print Time.now, "n" if 'a'*n =~ pathological(n) end
  • 43. Use /x UP_TO_256 = /b(?:25[0-5] # 250-255 |2[0-4][0-9] # 200-249 |1[0-9][0-9] # 100-199 |[1-9][0-9] # 2-digit numbers |[0-9]) # single-digit numbers b/x IPV4_ADDRESS = /#{UP_TO_256}(?:.#{UP_TO_256}){3}/
  • 44. A, z for strings ^, $ for lines • A: the beginning of the string • z: the end of the string • ^: after n • $: before n
  • 45. A, z for strings ^, $ for lines • A: the beginning of the string • z: the end of the string • ^: after n • $: before n always in Ruby
  • 46. What's the problem? also note the difference in what /m means
  • 47. What's the problem? #! /usr/bin/env perl $a = "abcndef"; if ($a =~ /^d/) { print "yesn"; } if ($a =~ /^d/m) { print "yes nown"; } # prints 'yes now' also note the difference in what /m means
  • 48. What's the problem? #! /usr/bin/env ruby a = "abcndef"; if (a =~ /^d/) p "yes" end http://guides.rubyonrails.org/security.html#regular-expressions
  • 49. Security Implications class File < ActiveRecord::Base   validates :name, :format => /^[w.-+]+$/ end http://guides.rubyonrails.org/security.html#regular-expressions
  • 54. file.txtn<script>alert(‘hello’)</script> /^[w.-+]+$/ Match succeeds ActiveRecord validation succeeds
  • 56. file.txtn<script>alert(‘hello’)</script> /A[w.-+]+z/ Match fails ActiveRecord validation fails
  • 57. Prefer Character Class to Alterations require 'benchmark' # simple benchmark for alternations and character class n = 5_000 str = 'cafebabedeadbeef'*5_000 Benchmark.bmbm do |x| x.report('alternation') do str =~ /^(a|b|c|d|e|f)+$/ end x.report('character class') do str =~ /^[a-f]+$/ end end
  • 58. Benchmarks Ruby 1.8.7 user system total real alternation 0.030000 0.010000 0.040000 ( 0.036702) character class 0.000000 0.000000 0.000000 ( 0.004704) Ruby 2.0.0 user system total real alternation 0.020000 0.010000 0.030000 ( 0.023139) character class 0.000000 0.000000 0.000000 ( 0.009641) JRuby 1.7.4.dev user system total real alternation 0.030000 0.000000 0.030000 ( 0.021000) character class 0.010000 0.000000 0.010000 ( 0.007000)
  • 59. Beware of Character Classes # case-insensitively match any non-word character… # one is unlike the others 'r' =~ /(?i:[W])/ 's' =~ /(?i:[W])/ matches, even if 's' is a word character 't' =~ /(?i:[W])/ https://bugs.ruby-lang.org/issues/4044
  • 61. /^1?$|^(11+?)1+$/ Matches '1' or ''
  • 63. /^1?$|^(11+?)1+$/ 1 or more additional times
  • 65. /^1?$|^(11+?)1+$/ Matches a string of 1's if and only if there are a non-prime # of 1's
  • 66. Integer#prime? class Integer def prime? "1" * self !~ /^1?$|^(11+?)1+$/ end end No performance guarantee Attributed a Perl hacker Abigail