SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Creating Zotero Translators Using Framework


               Sebastian Karcher



         Boston College, November 2012
Why Learn About Translators?



      One-click import from the web is perhaps the key features
      that distinguishes Zotero
      In this session we will write a “screen scraper” type translator
      for Zotero
      Best Case: This will allow you to write translators for sites
      you or your “clients” need
      Minimal Case: This will give you an undertstanding on how
      translators work and what may be possible, even if you’re not
      going to do it yourself
Some Notes on Zotero Translators



      Each Zotero translator is an individual file, written in
      javascript
      There are four types of translators: Web, Import, Search,
      Export
      Some web translators, like those for many libraries, call on an
      import translators (e.g. MARC) - we won’t learn about those.
      Other web translators “scrape” data from the page - that is
      what we will do now
(I know last week was Halloween but:) This isn’t going to
be scary!



      You cannot break Zotero by fiddling with translators - you can
      always “reset” from the advanced panel of the preferences
      About 2 years ago, Eric Hetzner of the UC libraries developed
      a “framework” for wrting translators –> now you don’t need
      any javascript. Just Xpaths and regular expressions. And
      those are easy!
Xpaths are “directions” on a website

      xpaths are basically “directions” used to point to a part of a
      webpage
      A webpage is built up from a number of nested nodes
      This is what the most simple webpage looks like

   <html>
       <head>
           <title>A Basic Webpage</title>
       </head>
       <body>
           <div id="title">The Title of the webpage</div>
           <div id="content" class="text">The Content of the w
       </body>
   </html>
The most basic Xpath




      Give directions: at every corner/node, tell Zotero where to go:
      Let’s say we want to go go to “The Content of the webpage”
      “Take the HTLM road, take a left at”body“, then take
      the”div” street, or in HTML:
      /html/body/div
Making Xpaths more precise



      But we’re still “lost” - which of the two “div” streets do we
      go down?
      Option 1: Take the second <div>
      /html/body/div[2]
      Option 2: Take the <div> that has “content” as an id
      /html/body/div[@id="content"]
Making Xpaths more efficient


      In an actual webpage, an xpath can be very long, so we’d like
      to make them shorter. we can use // to start anywhere in the
      html tree, e.g “the <div> with”content” as an “id”
      anywhere on the site:
      //div[@id="content"]
      Sometimes we don’t want the precise content of an attribute
      like id - in those case we can use contains() as in
      //div[contains(@id, "cont")]
      We can combine conditions with “and” or “or” (in lowercase!)
      //div[@id="content" and @class="text"]
A sample Framework Translator - Single item

   /** Articles */
   FW.Scraper({
   itemType : ’blogPost’,
   detect : FW.Xpath(’//h1[@class="article-title"]’),
   title : FW.Xpath(’//h1[@class="article-title"]’).text().tri
   attachments : {
      url : FW.Url(),title : "voxEU snapshot",type : "text/html
   },
   creators : FW.Xpath(’//div[@class="author"]
   //span[@class="field-content"]/a’).text().cleanAuthor("auth
   abstractNote : FW.Xpath(’//div[@class="article-teaser"]’).t
   date : FW.Xpath(’//h1[@class="article-title"]/following-sib
   publicationTitle : "VoxEU.org"
   });
A sample Framework Translator - Multiples


   /** Search results */
   FW.MultiScraper({
   itemType : "multiple",
   detect : FW.Xpath(’//ul[contains(@class, "search-results")]
   choices : {
   titles : FW.Xpath(’//ul[contains(@class, "search-results")]
               /li/h2/a’).text(),
   urls : FW.Xpath(’//ul[contains(@class, "search-results")]
               /li/h2/a’).key(’href’).text()
   }
   });
Our Tools




      Scaffold - a Firefox extension to write and test the translator
      Firefox “Inspect Element” - to help us understand the
      structure of a webpage (there are alternatives like “Firebug”)

Más contenido relacionado

Similar a Zotero Framework Translators

Intro to mobile web application development
Intro to mobile web application developmentIntro to mobile web application development
Intro to mobile web application developmentzonathen
 
Html css workshop, lesson 0, how browsers work
Html css workshop, lesson 0, how browsers workHtml css workshop, lesson 0, how browsers work
Html css workshop, lesson 0, how browsers workAlbino Tonnina
 
Essential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_TutorialEssential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_Tutorialtutorialsruby
 
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>tutorialsruby
 
Essential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_TutorialEssential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_Tutorialtutorialsruby
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2GDSCUniversitasMatan
 
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1Girl Develop It Cincinnati: Intro to HTML/CSS Class 1
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1Erin M. Kidwell
 
Basic JavaScript Tutorial
Basic JavaScript TutorialBasic JavaScript Tutorial
Basic JavaScript TutorialDHTMLExtreme
 
Html5, a gentle introduction
Html5, a gentle introduction Html5, a gentle introduction
Html5, a gentle introduction Diego Scataglini
 
2022 HTML5: The future is now
2022 HTML5: The future is now2022 HTML5: The future is now
2022 HTML5: The future is nowGonzalo Cordero
 
Html 5, a gentle introduction
Html 5, a gentle introductionHtml 5, a gentle introduction
Html 5, a gentle introductionDiego Scataglini
 

Similar a Zotero Framework Translators (20)

Intro to mobile web application development
Intro to mobile web application developmentIntro to mobile web application development
Intro to mobile web application development
 
Html css workshop, lesson 0, how browsers work
Html css workshop, lesson 0, how browsers workHtml css workshop, lesson 0, how browsers work
Html css workshop, lesson 0, how browsers work
 
#1 HTML [know-how]
#1 HTML [know-how]#1 HTML [know-how]
#1 HTML [know-how]
 
Essential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_TutorialEssential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_Tutorial
 
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>
Essential Javascript -- A Javascript &lt;b>Tutorial&lt;/b>
 
Essential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_TutorialEssential_Javascript_--_A_Javascript_Tutorial
Essential_Javascript_--_A_Javascript_Tutorial
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2Bootcamp - Web Development Session 2
Bootcamp - Web Development Session 2
 
jQuery
jQueryjQuery
jQuery
 
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1Girl Develop It Cincinnati: Intro to HTML/CSS Class 1
Girl Develop It Cincinnati: Intro to HTML/CSS Class 1
 
JavaScript - Part-1
JavaScript - Part-1JavaScript - Part-1
JavaScript - Part-1
 
Please dont touch-3.6-jsday
Please dont touch-3.6-jsdayPlease dont touch-3.6-jsday
Please dont touch-3.6-jsday
 
Basic JavaScript Tutorial
Basic JavaScript TutorialBasic JavaScript Tutorial
Basic JavaScript Tutorial
 
Html5, a gentle introduction
Html5, a gentle introduction Html5, a gentle introduction
Html5, a gentle introduction
 
Intro to HTML 5 / CSS 3
Intro to HTML 5 / CSS 3Intro to HTML 5 / CSS 3
Intro to HTML 5 / CSS 3
 
2022 HTML5: The future is now
2022 HTML5: The future is now2022 HTML5: The future is now
2022 HTML5: The future is now
 
Day1
Day1Day1
Day1
 
Html5 For Jjugccc2009fall
Html5 For Jjugccc2009fallHtml5 For Jjugccc2009fall
Html5 For Jjugccc2009fall
 
HTML literals, the JSX of the platform
HTML literals, the JSX of the platformHTML literals, the JSX of the platform
HTML literals, the JSX of the platform
 
Html 5, a gentle introduction
Html 5, a gentle introductionHtml 5, a gentle introduction
Html 5, a gentle introduction
 

Último

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Zotero Framework Translators

  • 1. Creating Zotero Translators Using Framework Sebastian Karcher Boston College, November 2012
  • 2. Why Learn About Translators? One-click import from the web is perhaps the key features that distinguishes Zotero In this session we will write a “screen scraper” type translator for Zotero Best Case: This will allow you to write translators for sites you or your “clients” need Minimal Case: This will give you an undertstanding on how translators work and what may be possible, even if you’re not going to do it yourself
  • 3. Some Notes on Zotero Translators Each Zotero translator is an individual file, written in javascript There are four types of translators: Web, Import, Search, Export Some web translators, like those for many libraries, call on an import translators (e.g. MARC) - we won’t learn about those. Other web translators “scrape” data from the page - that is what we will do now
  • 4. (I know last week was Halloween but:) This isn’t going to be scary! You cannot break Zotero by fiddling with translators - you can always “reset” from the advanced panel of the preferences About 2 years ago, Eric Hetzner of the UC libraries developed a “framework” for wrting translators –> now you don’t need any javascript. Just Xpaths and regular expressions. And those are easy!
  • 5. Xpaths are “directions” on a website xpaths are basically “directions” used to point to a part of a webpage A webpage is built up from a number of nested nodes This is what the most simple webpage looks like <html> <head> <title>A Basic Webpage</title> </head> <body> <div id="title">The Title of the webpage</div> <div id="content" class="text">The Content of the w </body> </html>
  • 6. The most basic Xpath Give directions: at every corner/node, tell Zotero where to go: Let’s say we want to go go to “The Content of the webpage” “Take the HTLM road, take a left at”body“, then take the”div” street, or in HTML: /html/body/div
  • 7. Making Xpaths more precise But we’re still “lost” - which of the two “div” streets do we go down? Option 1: Take the second <div> /html/body/div[2] Option 2: Take the <div> that has “content” as an id /html/body/div[@id="content"]
  • 8. Making Xpaths more efficient In an actual webpage, an xpath can be very long, so we’d like to make them shorter. we can use // to start anywhere in the html tree, e.g “the <div> with”content” as an “id” anywhere on the site: //div[@id="content"] Sometimes we don’t want the precise content of an attribute like id - in those case we can use contains() as in //div[contains(@id, "cont")] We can combine conditions with “and” or “or” (in lowercase!) //div[@id="content" and @class="text"]
  • 9. A sample Framework Translator - Single item /** Articles */ FW.Scraper({ itemType : ’blogPost’, detect : FW.Xpath(’//h1[@class="article-title"]’), title : FW.Xpath(’//h1[@class="article-title"]’).text().tri attachments : { url : FW.Url(),title : "voxEU snapshot",type : "text/html }, creators : FW.Xpath(’//div[@class="author"] //span[@class="field-content"]/a’).text().cleanAuthor("auth abstractNote : FW.Xpath(’//div[@class="article-teaser"]’).t date : FW.Xpath(’//h1[@class="article-title"]/following-sib publicationTitle : "VoxEU.org" });
  • 10. A sample Framework Translator - Multiples /** Search results */ FW.MultiScraper({ itemType : "multiple", detect : FW.Xpath(’//ul[contains(@class, "search-results")] choices : { titles : FW.Xpath(’//ul[contains(@class, "search-results")] /li/h2/a’).text(), urls : FW.Xpath(’//ul[contains(@class, "search-results")] /li/h2/a’).key(’href’).text() } });
  • 11. Our Tools Scaffold - a Firefox extension to write and test the translator Firefox “Inspect Element” - to help us understand the structure of a webpage (there are alternatives like “Firebug”)