SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Introduction
   HTML parser choice
HTML5::Sanitizer interna
 HTML5::Sanitizer usage
             Conclusion




         HTML5::Sanitizer
  Sanitizing HTML 5 with Perl 5


                 Uwe Voelker

                     XING AG


            August 16th 2011




            Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice
                   HTML5::Sanitizer interna
                    HTML5::Sanitizer usage
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice      Task: WYSIWYG editor
                 HTML5::Sanitizer interna   Team
                  HTML5::Sanitizer usage    Live example
                              Conclusion




1   Introduction
       Task: WYSIWYG editor
       Team
       Live example

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                             Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse
     goals: secure, share profiles (allowed tags) between frontend
     and backend




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Team




 Christopher Blum        Ingo Chao                           Uwe Voelker
 Javascript              QA (HTML5/CSS)                      Perl


                           Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Live example




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice      CPAN modules
                   HTML5::Sanitizer interna   Evaluation
                    HTML5::Sanitizer usage    Final decision
                                Conclusion




1   Introduction

2   HTML parser choice
     CPAN modules
     Evaluation
     Final decision

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                               Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      CPAN modules
               HTML5::Sanitizer interna   Evaluation
                HTML5::Sanitizer usage    Final decision
                            Conclusion


HTML parser on CPAN



     HTML::Parser
     HTML::TreeBuilder
     HTML::TreeBuilder::LibXML
     XML::LibXML
     HTML::HTML5::Parser
     Marpa::HTML
     ...




                           Uwe Voelker    HTML5::Sanitizer
Introduction
   HTML parser choice      CPAN modules
HTML5::Sanitizer interna   Evaluation
 HTML5::Sanitizer usage    Final decision
             Conclusion




            Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en
final choice: XML::LibXML




                      Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna
     Processing Phases
     Parsing
     Converting
     Writing

4   HTML5::Sanitizer usage

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)
      writing (DOM tree → HTML)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML

  use XML : : LibXML ;

  my $ p a r s e r = XML : : LibXML−>new (
       encoding                        => ’UTF−8 ’ ,
       recover                         => 2 ,
       keep blanks                     => 1 ,
       no cdata                        => 1 ,
       expand entities                 => 1 ,
      no network                       => 1 ,
       suppress errors                 => 1 ,
       s u p p r e s s w a r n i n g s => 1 ,
  );

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML



  my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g (
      $html ,
      {
          no cdata                        => 1 ,
          suppress errors                 => 1 ,
          s u p p r e s s w a r n i n g s => 1 ,
      },
  );




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                         Processing Phases
                 HTML parser choice
                                         Parsing
              HTML5::Sanitizer interna
                                         Converting
               HTML5::Sanitizer usage
                                         Writing
                           Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)




                          Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Processing Phases
                  HTML parser choice
                                          Parsing
               HTML5::Sanitizer interna
                                          Converting
                HTML5::Sanitizer usage
                                          Writing
                            Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes
     proceed recursively with child nodes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Processing Phases
                     HTML parser choice
                                             Parsing
                  HTML5::Sanitizer interna
                                             Converting
                   HTML5::Sanitizer usage
                                             Writing
                               Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML

  $text   =˜   s/&/&amp ; / g ;
  $text   =˜   s / ’ /&#39;/g;# ’
  $text   =˜   s /”/&q u o t ; / g;#”
  $text   =˜   s/</& l t ; / g ;
  $text   =˜   s/>/&g t ; / g ;
  $text   =˜   s / ‘/&#9 6 ; / g ;
  $text   =˜   s /{/&#1 2 3 ; / g ;
  $text   =˜   s /}/&#1 2 5 ; / g ;


                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage
     Usage
     Profile
     Examples
     Debugging

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Usage



 # construct object
 my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new (
      p r o f i l e => ’My : : P r o f i l e ’ ,
 );

 # c a l l process ()
 my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ;




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Usage
                  HTML parser choice
                                          Profile
               HTML5::Sanitizer interna
                                          Examples
                HTML5::Sanitizer usage
                                          Debugging
                            Conclusion


Profile


     you have to build your own




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:
           remove remove complete sub tree (boolean)
      rename tag rename tag (string)
     set attributes set these attributes (hashref)
     check attributes check/transform these attributes (hashref)
          set class set class (string)
         add class add class from other attributes (hashref)



                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }

      otherwise it would be converted to <span>
      and all children processed recursively




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - big



     <big> → <span class=”big”>




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Usage
                     HTML parser choice
                                             Profile
                  HTML5::Sanitizer interna
                                             Examples
                   HTML5::Sanitizer usage
                                             Debugging
                               Conclusion


Examples - big



      <big> → <span class=”big”>

  {
       r e n a m e t a g => ’ s p a n ’ ,
       s e t c l a s s => ’ b i g ’ ,
  }




                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Examples - a



     add rel=”nofollow” and target=” blank” to every link




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - a



      add rel=”nofollow” and target=” blank” to every link

  {
       s e t a t t r i b u t e s => {
             rel          => ’ n o f o l l o w ’ ,
             t a r g e t => ’ b l a n k ’ ,
       },
  }




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,

  sub c l a s s s i z e f o n t {
    my ( $ s e l f , $ v a l ) = @ ;
    return unless $val ;
    r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ;
    # ...
    r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ;

      r e t u r n ’ s i z e −l a r g e r ’        i f $ v a l =˜ /ˆ+/;
      r e t u r n ’ s i z e −s m a l l e r ’      i f $ v a l =˜ /ˆ −/;
      return ;
  }
                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Debugging

        if the result is not as expected, you can access intermediate
        results:

  my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t

  # s e e HTML5 : : S a n i t i z e r : : R e s u l t
  s a y $ r e s −>i n p u t ;
  s a y $ r e s −>p r e p r o c e s s e d ;
  s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ;
  s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ;
  s a y $ r e s −>o u t p u t ;

  p r i n t $ r e s −>d e b u g o u t p u t ;

                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice
                HTML5::Sanitizer interna
                 HTML5::Sanitizer usage
                             Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5
      Feedback? uwe@uwevoelker.de




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                HTML parser choice
             HTML5::Sanitizer interna
              HTML5::Sanitizer usage
                          Conclusion


Questions?




                         Uwe Voelker    HTML5::Sanitizer

Más contenido relacionado

La actualidad más candente

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous IntegrationKelli Mohr
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingShyam Sunder Verma
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy CodeAdam Culp
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfectionJorge Ortiz
 

La actualidad más candente (6)

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation Testing
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfection
 

Similar a Sanitizing HTML 5 with Perl 5

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?FossilDesigns
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's dayAnkur Mishra
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03Rajiv Pant
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsAntonio Carpentieri
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEdgar Parada
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerMichael Wales
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETerminalfour
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubDevOps.com
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module MaintenanceDave Cross
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith
 

Similar a Sanitizing HTML 5 with Perl 5 (20)

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's day
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
 
Xhtml
XhtmlXhtml
Xhtml
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 min
 
Daniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVMDaniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVM
 
Html5
Html5Html5
Html5
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A Freelancer
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module Maintenance
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Sanitizing HTML 5 with Perl 5

  • 1. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion HTML5::Sanitizer Sanitizing HTML 5 with Perl 5 Uwe Voelker XING AG August 16th 2011 Uwe Voelker HTML5::Sanitizer
  • 2. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 3. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion 1 Introduction Task: WYSIWYG editor Team Live example 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 4. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions Uwe Voelker HTML5::Sanitizer
  • 5. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse Uwe Voelker HTML5::Sanitizer
  • 6. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse goals: secure, share profiles (allowed tags) between frontend and backend Uwe Voelker HTML5::Sanitizer
  • 7. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Team Christopher Blum Ingo Chao Uwe Voelker Javascript QA (HTML5/CSS) Perl Uwe Voelker HTML5::Sanitizer
  • 8. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Live example Uwe Voelker HTML5::Sanitizer
  • 9. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion 1 Introduction 2 HTML parser choice CPAN modules Evaluation Final decision 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 10. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion HTML parser on CPAN HTML::Parser HTML::TreeBuilder HTML::TreeBuilder::LibXML XML::LibXML HTML::HTML5::Parser Marpa::HTML ... Uwe Voelker HTML5::Sanitizer
  • 11. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion Uwe Voelker HTML5::Sanitizer
  • 12. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags Uwe Voelker HTML5::Sanitizer
  • 13. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 14. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 15. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en final choice: XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 16. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna Processing Phases Parsing Converting Writing 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 17. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) Uwe Voelker HTML5::Sanitizer
  • 18. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) Uwe Voelker HTML5::Sanitizer
  • 19. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) Uwe Voelker HTML5::Sanitizer
  • 20. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) writing (DOM tree → HTML) Uwe Voelker HTML5::Sanitizer
  • 21. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML use XML : : LibXML ; my $ p a r s e r = XML : : LibXML−>new ( encoding => ’UTF−8 ’ , recover => 2 , keep blanks => 1 , no cdata => 1 , expand entities => 1 , no network => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , ); Uwe Voelker HTML5::Sanitizer
  • 22. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g ( $html , { no cdata => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , }, ); Uwe Voelker HTML5::Sanitizer
  • 23. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) Uwe Voelker HTML5::Sanitizer
  • 24. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> Uwe Voelker HTML5::Sanitizer
  • 25. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes Uwe Voelker HTML5::Sanitizer
  • 26. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes proceed recursively with child nodes Uwe Voelker HTML5::Sanitizer
  • 27. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 28. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML $text =˜ s/&/&amp ; / g ; $text =˜ s / ’ /&#39;/g;# ’ $text =˜ s /”/&q u o t ; / g;#” $text =˜ s/</& l t ; / g ; $text =˜ s/>/&g t ; / g ; $text =˜ s / ‘/&#9 6 ; / g ; $text =˜ s /{/&#1 2 3 ; / g ; $text =˜ s /}/&#1 2 5 ; / g ; Uwe Voelker HTML5::Sanitizer
  • 29. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage Usage Profile Examples Debugging 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 30. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Usage # construct object my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new ( p r o f i l e => ’My : : P r o f i l e ’ , ); # c a l l process () my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ; Uwe Voelker HTML5::Sanitizer
  • 31. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own Uwe Voelker HTML5::Sanitizer
  • 32. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: Uwe Voelker HTML5::Sanitizer
  • 33. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: remove remove complete sub tree (boolean) rename tag rename tag (string) set attributes set these attributes (hashref) check attributes check/transform these attributes (hashref) set class set class (string) add class add class from other attributes (hashref) Uwe Voelker HTML5::Sanitizer
  • 34. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) Uwe Voelker HTML5::Sanitizer
  • 35. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } Uwe Voelker HTML5::Sanitizer
  • 36. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } otherwise it would be converted to <span> and all children processed recursively Uwe Voelker HTML5::Sanitizer
  • 37. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> Uwe Voelker HTML5::Sanitizer
  • 38. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> { r e n a m e t a g => ’ s p a n ’ , s e t c l a s s => ’ b i g ’ , } Uwe Voelker HTML5::Sanitizer
  • 39. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link Uwe Voelker HTML5::Sanitizer
  • 40. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link { s e t a t t r i b u t e s => { rel => ’ n o f o l l o w ’ , t a r g e t => ’ b l a n k ’ , }, } Uwe Voelker HTML5::Sanitizer
  • 41. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , Uwe Voelker HTML5::Sanitizer
  • 42. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , sub c l a s s s i z e f o n t { my ( $ s e l f , $ v a l ) = @ ; return unless $val ; r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ; # ... r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ; r e t u r n ’ s i z e −l a r g e r ’ i f $ v a l =˜ /ˆ+/; r e t u r n ’ s i z e −s m a l l e r ’ i f $ v a l =˜ /ˆ −/; return ; } Uwe Voelker HTML5::Sanitizer
  • 43. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Debugging if the result is not as expected, you can access intermediate results: my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t # s e e HTML5 : : S a n i t i z e r : : R e s u l t s a y $ r e s −>i n p u t ; s a y $ r e s −>p r e p r o c e s s e d ; s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ; s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ; s a y $ r e s −>o u t p u t ; p r i n t $ r e s −>d e b u g o u t p u t ; Uwe Voelker HTML5::Sanitizer
  • 44. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer Uwe Voelker HTML5::Sanitizer
  • 45. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Uwe Voelker HTML5::Sanitizer
  • 46. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Feedback? uwe@uwevoelker.de Uwe Voelker HTML5::Sanitizer
  • 47. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Questions? Uwe Voelker HTML5::Sanitizer