SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Web Scraping
                               @alvaro_aguirre




Saturday, November 5, 2011
In search of our cosmic origins...




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Data Scraping
                                  vs
                             Web Scraping


Saturday, November 5, 2011
Data Scraping
                    <html>
                             <header></header>
                             <body>
                             .....
                             </body>
                    </html>




Saturday, November 5, 2011
Web Scraping




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Deliverance
                               XDV
                             Diazo



Saturday, November 5, 2011
Diazo




Saturday, November 5, 2011
Saturday, November 5, 2011
<replace css:content=”h1” css:theme=”#main” />




Saturday, November 5, 2011
<drop css:content=”h1” />

                             <drop css:theme=”breadcrumbs” />




Saturday, November 5, 2011
<replace css:theme=”#header” content=”#header-
                       element” if-content=”” />




Saturday, November 5, 2011
<drop css:theme="#info-box" if-path="/news"/>




Saturday, November 5, 2011
<theme/>
                         <notheme/>
                         <replace/>
                         <before/>
                         <after/>
                         <drop/>
                         <strip/>
                         <merge/>
                         <copy/>




Saturday, November 5, 2011
<replace css:theme="#details">
          <dl id="details">
            <xsl:for-each css:select="table#details > tr">
              <dt><xsl:copy-of select="td[1]/text()" /></dt>
              <dd><xsl:copy-of select="td[2]/node()"/></dd>
            </xsl:for-each>
          </dl>
       </replace>/></dt>




      <table id="details">
                                              <dl id="details">
          <tr>
                                                  <dt>One</dt>
               <td>One</td>
                                                  <dd>1</dd>
               <td>1</td>
                                                  <dt>Two</dt>
          </tr>
                                                  <dd>2</dd>
          <tr>
                                              </dl>
               <td>Two</td>
               <td>2</td>
          </tr>
      </table>




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Tools



Saturday, November 5, 2011
External Content




Saturday, November 5, 2011
Saturday, November 5, 2011
• development of web & mobile interfaces
                    • legacy apps integrations
                    • prototypes
                    • low coupling


Saturday, November 5, 2011
from diazo.compiler import compile_theme
                             from lxml import etree
                             from diazo.compiler import compile_theme

                             absolute_prefix = "/static"

                             rules = "rules.xml"
                             theme = "theme.html"

                             compiled_theme = compile_theme(rules, theme,
                                                            absolute_prefix=absolute_prefix)

                             transform = etree.XSLT(compiled_theme)
                             content = etree.parse(some_content)
                             transformed = transform(content)

                             output = etree.tostring(transformed)




Saturday, November 5, 2011
github/aaguirre




Saturday, November 5, 2011
diazo.org




Saturday, November 5, 2011
plone.org




Saturday, November 5, 2011
gracias!




Saturday, November 5, 2011

Más contenido relacionado

Similar a Web Scraping using Diazo!

What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceSencha
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0James Thomas
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPsmueller_sandsmedia
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1Duda Dornelles
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad partsMikko Ohtamaa
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기형우 안
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”Kejun Zhang
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemLeonard Axelsson
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端lifesinger
 
Parser combinators
Parser combinatorsParser combinators
Parser combinatorslifecoder
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Leonardo Borges
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesSencha
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)Mediacurrent
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyBlazing Cloud
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011Mikko Ohtamaa
 

Similar a Web Scraping using Diazo! (20)

Semantic chop
Semantic chopSemantic chop
Semantic chop
 
What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James Pearce
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHP
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad parts
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기
 
Sequel @ madrid-rb
Sequel @  madrid-rbSequel @  madrid-rb
Sequel @ madrid-rb
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy Ecosystem
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端
 
Parser combinators
Parser combinatorsParser combinators
Parser combinators
 
Xml
XmlXml
Xml
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced Templates
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_many
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Web Scraping using Diazo!