SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
ResourceSync: Leveraging Sitemaps
for Resource Synchronization
WWW 2013, Rio de Janeiro, May 17th
Bernhard Haslhofer | University ofVienna
Simeon Warner | Cornell University
Carl Lagoze | University of Michigan
Martin Klein, Robert Sanderson | Los Alamos National Labs
Michael L. Nelson | Old Dominion University
Herbert van de Sompel | Los Alamos National Labs
http://www.openarchives.org/rs/
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
2
WWW 2013, May 17th
What?
• A framework for synchronizing Web
resources from a Source to a Destination
3
Web
sync
$ resync http://example.com
WWW 2013, May 17th
Why?
• rsync: filesystem sync, but not Web
• OAI-PMH: metadata, but not resources
• Web-DAV: extends HTTP, requires server
installation at source
• ...
4
… because lots of projects and services are doing
synchronization but rely on ad-hoc solutions!
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
5
WWW 2013, May 17th
arxiv.org mirroring
• 2.4M resources (PDF,
metadata, Latex src)
• ~800/day created or
updated
• uses homebrew
mirroring since 1994 (!)
• look for more general
solution to support
independent destinations
6
WWW 2013, May 17th
Wikipedia
• 1.4 updates / sec
• many dependent
services reusing
Wikipedia content (e.g.,
DBPedia, Freebase, etc.)
• harvest articles via OAI-
PMH, retrieve changes
via IRC, download
dumps
7
WWW 2013, May 17th
data.europeana.eu
• aggregates metadata
from >200 data
providers in Europe
• 10 largest providers
contribute 80%
• >190 providers
contribute 20%
8
WWW 2013, May 17th
Design Guidelines
• Sync small websites / repositories (few
resources) but also large data collections
(millions of resources)
• Support low change frequency (weeks /
months) to high change frequency
(seconds) sources
• Low adoption barrier!
9
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
10
WWW 2013, May 17th
Resource List
11
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
</url>
<url>
<loc>http://example.com/res2</loc>
</url>
</urlset>
$ resync -b http://example.com
XML Sitemap
WWW 2013, May 17th
Resource List
12
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"/>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T14:00:00Z</lastmod>
<rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"/>
</url>
</urlset>
Source
WWW 2013, May 17th
Change List
13
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="changelist"
modified="2013-01-03T11:00:00Z"/>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md change="updated"/>
</url>
<url>
<loc>http://example.com/res3</loc>
<lastmod>2013-01-02T18:00:00Z</lastmod>
<rs:md change="deleted"/>
</url>
</urlset>
$ resync -b http://example.com
$ resync -i http://example.com
XML Sitemap
WWW 2013, May 17th
Resource Dump
14
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcedump"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/resourcedump.zip</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
XML Sitemap
WWW 2013, May 17th
Resource Dump
15
http://example.com/resourcedump.zip
|- manifest.xml
|- resources
|- res1
|- res2
WWW 2013, May 17th
Resource Dump Manifest
16
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcedump-manifest"
modified="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-03T03:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
path="/resources/res1"/>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-03T04:00:00Z</lastmod>
<rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"
path="/resources/res2"/>
</url>
</urlset>
manifest.xml (XML Sitemap)
WWW 2013, May 17th
Capability List
17
Destination
Source
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln href="http://example.com/info-about-source.xml"
rel="describedby"
type="application/xml"/>
<rs:md capability="capabilitylist"
modified="2013-01-02T14:00:00Z"/>
<url>
<loc>http://example.com/dataset1/resourcelist.xml</loc>
<rs:md capability="resourcelist"/>
</url>
<url>
<loc>http://example.com/dataset1/resourcedump.xml</loc>
<rs:md capability="resourcedump"/>
</url>
<url>
<loc>http://example.com/dataset1/changelist.xml</loc>
<rs:md capability="changelist"/>
</url>
</urlset>
$ resync -x http://example.com
XML Sitemap
WWW 2013, May 17th
Large Resource Lists
18
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
modified="2013-01-03T09:00:00Z"/>
<sitemap>
<loc>http://example.com/resourcelist-part2.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</sitemap>
<sitemap>
<loc>http://example.com/resourcelist-part1.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</sitemap>
</sitemapindex>
Source
WWW 2013, May 17th
Other Capabilities
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics Walkthrough
• Demos
• Status and Next Steps
20
WWW 2013, May 17th
Available code
• ResourceSync client
and library (Python)
• ResourceSync source
simulator
21
http://github.com/resync
WWW 2013, May 17th
Install resync client/library
22
$ git clone git://github.com/resync/resync.git
$ cd resync/
$ python setup.py build
$ sudo python setup.py install
$ sudo easy_install resync
$ sudo pip install resync
or
or
WWW 2013, May 17th
Install resync simulator
23
$ git clone git://github.com/resync/simulator.git
$ cd simulator/
$ chmod u+x simulate-source
$ ./simulate-source
$ sudo easy_install tornado
WWW 2013, May 17th
Run client against simulator
24
$ resync -b http://localhost:8888
$ resync -i http://localhost:8888
WWW 2013, May 17th
resync @ arxiv.org
25
resync -v --noauth http://resync.library.cornell.edu/
arxiv-q-bio=/tmp/qbio http://
resync.library.cornell.edu/arxiv=/tmp/arxiv
WWW 2013, May 17th
resync @ en.wikipedia.org
26
WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics Walkthrough
• Demos
• Status and Next Steps
27
WWW 2013, May 17th
Status
• Beta spec (v.0.6) for public comment
http://www.openarchives.org/rs/0.6/
resourcesync
• Tool development started
• Separate documents for archiving and push
deployments
28
WWW 2013, May 17th
Next Steps
• Continue tool development & deployment
• Collect
• public comments on
resourcesync@googlegroups.com
• implementation issues on
https://github.com/resync/resync/issues
• Version 0.9 to be released in Summer 2013
• Version 1.0 in fall 2013 (NISO standard)
29
WWW 2013, May 17th
Thanks!
@bhaslhofer
http://slideshare.net/bhaslhofer
http://openarchives.org/rs
resourcesync@googlegroups.com

Más contenido relacionado

Similar a ResourceSync: Leveraging Sitemaps for Resource Synchronization

LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...Ross Singer
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Chris Richardson
 
REST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practiceREST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practicehamnis
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with SmallworldPeter Batty
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with SmallworldPeter Batty
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introductionKai Li
 
Intro to Semantic Web
Intro to Semantic WebIntro to Semantic Web
Intro to Semantic WebTimea Turdean
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the webshellac
 
Java colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rsJava colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rsSagara Gunathunga
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cMartin Toshev
 
RESTful Web Services with Spring MVC
RESTful Web Services with Spring MVCRESTful Web Services with Spring MVC
RESTful Web Services with Spring MVCdigitalsonic
 
The RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with OracleThe RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with OracleEmiliano Pecis
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedChristian Rokitta
 
Unify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearchUnify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearchGasperi Jerome
 

Similar a ResourceSync: Leveraging Sitemaps for Resource Synchronization (20)

LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
LITA 2010: The Linked Library Data Cloud: it's time to stop think and start l...
 
Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...Polyglot persistence for Java developers: time to move out of the relational ...
Polyglot persistence for Java developers: time to move out of the relational ...
 
REST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practiceREST teori og praksis; REST in theory and practice
REST teori og praksis; REST in theory and practice
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
 
Enterprise integration options with Smallworld
Enterprise integration options with SmallworldEnterprise integration options with Smallworld
Enterprise integration options with Smallworld
 
Web 3 0
Web 3 0Web 3 0
Web 3 0
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
Intro to Semantic Web
Intro to Semantic WebIntro to Semantic Web
Intro to Semantic Web
 
Linked data: spreading data over the web
Linked data: spreading data over the webLinked data: spreading data over the web
Linked data: spreading data over the web
 
Java colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rsJava colombo-deep-dive-into-jax-rs
Java colombo-deep-dive-into-jax-rs
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
RESTful Web Services with Spring MVC
RESTful Web Services with Spring MVCRESTful Web Services with Spring MVC
RESTful Web Services with Spring MVC
 
Restful webservices
Restful webservicesRestful webservices
Restful webservices
 
The RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with OracleThe RESTful Soa Datagrid with Oracle
The RESTful Soa Datagrid with Oracle
 
Web Services
Web ServicesWeb Services
Web Services
 
Drupal and the Semantic Web
Drupal and the Semantic WebDrupal and the Semantic Web
Drupal and the Semantic Web
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimized
 
Unify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearchUnify Earth Observation products access with OpenSearch
Unify Earth Observation products access with OpenSearch
 

Más de Bernhard Haslhofer

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Bernhard Haslhofer
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesBernhard Haslhofer
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Bernhard Haslhofer
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksBernhard Haslhofer
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...Bernhard Haslhofer
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Bernhard Haslhofer
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsBernhard Haslhofer
 
Mind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringMind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringBernhard Haslhofer
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsBernhard Haslhofer
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBernhard Haslhofer
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBernhard Haslhofer
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Bernhard Haslhofer
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkBernhard Haslhofer
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveBernhard Haslhofer
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Semantic Tagging on Historical Maps
Semantic Tagging on Historical MapsSemantic Tagging on Historical Maps
Semantic Tagging on Historical MapsBernhard Haslhofer
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazBernhard Haslhofer
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebBernhard Haslhofer
 

Más de Bernhard Haslhofer (20)

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency Networks
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
 
Mind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringMind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software Engineering
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
 
Things, not Strings
Things, not StringsThings, not Strings
Things, not Strings
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Semantic Tagging on Historical Maps
Semantic Tagging on Historical MapsSemantic Tagging on Historical Maps
Semantic Tagging on Historical Maps
 
The Story behind Maphub
The Story behind MaphubThe Story behind Maphub
The Story behind Maphub
 
OpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup GrazOpenGLAM Intro @ OKFN.AT Meetup Graz
OpenGLAM Intro @ OKFN.AT Meetup Graz
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
 

Último

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

ResourceSync: Leveraging Sitemaps for Resource Synchronization

  • 1. ResourceSync: Leveraging Sitemaps for Resource Synchronization WWW 2013, Rio de Janeiro, May 17th Bernhard Haslhofer | University ofVienna Simeon Warner | Cornell University Carl Lagoze | University of Michigan Martin Klein, Robert Sanderson | Los Alamos National Labs Michael L. Nelson | Old Dominion University Herbert van de Sompel | Los Alamos National Labs http://www.openarchives.org/rs/
  • 2. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 2
  • 3. WWW 2013, May 17th What? • A framework for synchronizing Web resources from a Source to a Destination 3 Web sync $ resync http://example.com
  • 4. WWW 2013, May 17th Why? • rsync: filesystem sync, but not Web • OAI-PMH: metadata, but not resources • Web-DAV: extends HTTP, requires server installation at source • ... 4 … because lots of projects and services are doing synchronization but rely on ad-hoc solutions!
  • 5. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 5
  • 6. WWW 2013, May 17th arxiv.org mirroring • 2.4M resources (PDF, metadata, Latex src) • ~800/day created or updated • uses homebrew mirroring since 1994 (!) • look for more general solution to support independent destinations 6
  • 7. WWW 2013, May 17th Wikipedia • 1.4 updates / sec • many dependent services reusing Wikipedia content (e.g., DBPedia, Freebase, etc.) • harvest articles via OAI- PMH, retrieve changes via IRC, download dumps 7
  • 8. WWW 2013, May 17th data.europeana.eu • aggregates metadata from >200 data providers in Europe • 10 largest providers contribute 80% • >190 providers contribute 20% 8
  • 9. WWW 2013, May 17th Design Guidelines • Sync small websites / repositories (few resources) but also large data collections (millions of resources) • Support low change frequency (weeks / months) to high change frequency (seconds) sources • Low adoption barrier! 9
  • 10. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics • Demos • Status and Next Steps 10
  • 11. WWW 2013, May 17th Resource List 11 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> </url> <url> <loc>http://example.com/res2</loc> </url> </urlset> $ resync -b http://example.com XML Sitemap
  • 12. WWW 2013, May 17th Resource List 12 <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T14:00:00Z</lastmod> <rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e"/> </url> </urlset> Source
  • 13. WWW 2013, May 17th Change List 13 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="changelist" modified="2013-01-03T11:00:00Z"/> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:00:00Z</lastmod> <rs:md change="updated"/> </url> <url> <loc>http://example.com/res3</loc> <lastmod>2013-01-02T18:00:00Z</lastmod> <rs:md change="deleted"/> </url> </urlset> $ resync -b http://example.com $ resync -i http://example.com XML Sitemap
  • 14. WWW 2013, May 17th Resource Dump 14 Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/resourcedump.zip</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </url> </urlset> XML Sitemap
  • 15. WWW 2013, May 17th Resource Dump 15 http://example.com/resourcedump.zip |- manifest.xml |- resources |- res1 |- res2
  • 16. WWW 2013, May 17th Resource Dump Manifest 16 <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcedump-manifest" modified="2013-01-03T09:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-03T03:00:00Z</lastmod> <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" path="/resources/res1"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-03T04:00:00Z</lastmod> <rs:md hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e" path="/resources/res2"/> </url> </urlset> manifest.xml (XML Sitemap)
  • 17. WWW 2013, May 17th Capability List 17 Destination Source <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln href="http://example.com/info-about-source.xml" rel="describedby" type="application/xml"/> <rs:md capability="capabilitylist" modified="2013-01-02T14:00:00Z"/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability="resourcelist"/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability="resourcedump"/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability="changelist"/> </url> </urlset> $ resync -x http://example.com XML Sitemap
  • 18. WWW 2013, May 17th Large Resource Lists 18 <?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" modified="2013-01-03T09:00:00Z"/> <sitemap> <loc>http://example.com/resourcelist-part2.xml</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </sitemap> <sitemap> <loc>http://example.com/resourcelist-part1.xml</loc> <lastmod>2013-01-03T09:00:00Z</lastmod> </sitemap> </sitemapindex> Source
  • 19. WWW 2013, May 17th Other Capabilities
  • 20. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics Walkthrough • Demos • Status and Next Steps 20
  • 21. WWW 2013, May 17th Available code • ResourceSync client and library (Python) • ResourceSync source simulator 21 http://github.com/resync
  • 22. WWW 2013, May 17th Install resync client/library 22 $ git clone git://github.com/resync/resync.git $ cd resync/ $ python setup.py build $ sudo python setup.py install $ sudo easy_install resync $ sudo pip install resync or or
  • 23. WWW 2013, May 17th Install resync simulator 23 $ git clone git://github.com/resync/simulator.git $ cd simulator/ $ chmod u+x simulate-source $ ./simulate-source $ sudo easy_install tornado
  • 24. WWW 2013, May 17th Run client against simulator 24 $ resync -b http://localhost:8888 $ resync -i http://localhost:8888
  • 25. WWW 2013, May 17th resync @ arxiv.org 25 resync -v --noauth http://resync.library.cornell.edu/ arxiv-q-bio=/tmp/qbio http:// resync.library.cornell.edu/arxiv=/tmp/arxiv
  • 26. WWW 2013, May 17th resync @ en.wikipedia.org 26
  • 27. WWW 2013, May 17th ResourceSync • What and Why? • Synchronization Scenarios • ResourceSync Basics Walkthrough • Demos • Status and Next Steps 27
  • 28. WWW 2013, May 17th Status • Beta spec (v.0.6) for public comment http://www.openarchives.org/rs/0.6/ resourcesync • Tool development started • Separate documents for archiving and push deployments 28
  • 29. WWW 2013, May 17th Next Steps • Continue tool development & deployment • Collect • public comments on resourcesync@googlegroups.com • implementation issues on https://github.com/resync/resync/issues • Version 0.9 to be released in Summer 2013 • Version 1.0 in fall 2013 (NISO standard) 29
  • 30. WWW 2013, May 17th Thanks! @bhaslhofer http://slideshare.net/bhaslhofer http://openarchives.org/rs resourcesync@googlegroups.com