SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Web Scraping
                               @alvaro_aguirre




Saturday, November 5, 2011
In search of our cosmic origins...




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Data Scraping
                                  vs
                             Web Scraping


Saturday, November 5, 2011
Data Scraping
                    <html>
                             <header></header>
                             <body>
                             .....
                             </body>
                    </html>




Saturday, November 5, 2011
Web Scraping




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Deliverance
                               XDV
                             Diazo



Saturday, November 5, 2011
Diazo




Saturday, November 5, 2011
Saturday, November 5, 2011
<replace css:content=”h1” css:theme=”#main” />




Saturday, November 5, 2011
<drop css:content=”h1” />

                             <drop css:theme=”breadcrumbs” />




Saturday, November 5, 2011
<replace css:theme=”#header” content=”#header-
                       element” if-content=”” />




Saturday, November 5, 2011
<drop css:theme="#info-box" if-path="/news"/>




Saturday, November 5, 2011
<theme/>
                         <notheme/>
                         <replace/>
                         <before/>
                         <after/>
                         <drop/>
                         <strip/>
                         <merge/>
                         <copy/>




Saturday, November 5, 2011
<replace css:theme="#details">
          <dl id="details">
            <xsl:for-each css:select="table#details > tr">
              <dt><xsl:copy-of select="td[1]/text()" /></dt>
              <dd><xsl:copy-of select="td[2]/node()"/></dd>
            </xsl:for-each>
          </dl>
       </replace>/></dt>




      <table id="details">
                                              <dl id="details">
          <tr>
                                                  <dt>One</dt>
               <td>One</td>
                                                  <dd>1</dd>
               <td>1</td>
                                                  <dt>Two</dt>
          </tr>
                                                  <dd>2</dd>
          <tr>
                                              </dl>
               <td>Two</td>
               <td>2</td>
          </tr>
      </table>




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Tools



Saturday, November 5, 2011
External Content




Saturday, November 5, 2011
Saturday, November 5, 2011
• development of web & mobile interfaces
                    • legacy apps integrations
                    • prototypes
                    • low coupling


Saturday, November 5, 2011
from diazo.compiler import compile_theme
                             from lxml import etree
                             from diazo.compiler import compile_theme

                             absolute_prefix = "/static"

                             rules = "rules.xml"
                             theme = "theme.html"

                             compiled_theme = compile_theme(rules, theme,
                                                            absolute_prefix=absolute_prefix)

                             transform = etree.XSLT(compiled_theme)
                             content = etree.parse(some_content)
                             transformed = transform(content)

                             output = etree.tostring(transformed)




Saturday, November 5, 2011
github/aaguirre




Saturday, November 5, 2011
diazo.org




Saturday, November 5, 2011
plone.org




Saturday, November 5, 2011
gracias!




Saturday, November 5, 2011

Más contenido relacionado

Similar a Web Scraping using Diazo!

What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceSencha
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0James Thomas
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPsmueller_sandsmedia
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1Duda Dornelles
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad partsMikko Ohtamaa
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기형우 안
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”Kejun Zhang
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemLeonard Axelsson
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端lifesinger
 
Parser combinators
Parser combinatorsParser combinators
Parser combinatorslifecoder
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Leonardo Borges
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesSencha
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)Mediacurrent
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyBlazing Cloud
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011Mikko Ohtamaa
 

Similar a Web Scraping using Diazo! (20)

Semantic chop
Semantic chopSemantic chop
Semantic chop
 
What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James Pearce
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHP
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad parts
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기
 
Sequel @ madrid-rb
Sequel @  madrid-rbSequel @  madrid-rb
Sequel @ madrid-rb
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy Ecosystem
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端
 
Parser combinators
Parser combinatorsParser combinators
Parser combinators
 
Xml
XmlXml
Xml
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced Templates
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_many
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 

Último

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Web Scraping using Diazo!