Scanning the Internet for External Cloud Exposures via SSL Certs
ResourceSync: Leveraging Sitemaps for Resource Synchronization
1. ResourceSync: Leveraging Sitemaps
for Resource Synchronization
WWW 2013, Rio de Janeiro, May 17th
Bernhard Haslhofer | University ofVienna
Simeon Warner | Cornell University
Carl Lagoze | University of Michigan
Martin Klein, Robert Sanderson | Los Alamos National Labs
Michael L. Nelson | Old Dominion University
Herbert van de Sompel | Los Alamos National Labs
http://www.openarchives.org/rs/
2. WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
2
3. WWW 2013, May 17th
What?
• A framework for synchronizing Web
resources from a Source to a Destination
3
Web
sync
$ resync http://example.com
4. WWW 2013, May 17th
Why?
• rsync: filesystem sync, but not Web
• OAI-PMH: metadata, but not resources
• Web-DAV: extends HTTP, requires server
installation at source
• ...
4
… because lots of projects and services are doing
synchronization but rely on ad-hoc solutions!
5. WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
5
6. WWW 2013, May 17th
arxiv.org mirroring
• 2.4M resources (PDF,
metadata, Latex src)
• ~800/day created or
updated
• uses homebrew
mirroring since 1994 (!)
• look for more general
solution to support
independent destinations
6
7. WWW 2013, May 17th
Wikipedia
• 1.4 updates / sec
• many dependent
services reusing
Wikipedia content (e.g.,
DBPedia, Freebase, etc.)
• harvest articles via OAI-
PMH, retrieve changes
via IRC, download
dumps
7
8. WWW 2013, May 17th
data.europeana.eu
• aggregates metadata
from >200 data
providers in Europe
• 10 largest providers
contribute 80%
• >190 providers
contribute 20%
8
9. WWW 2013, May 17th
Design Guidelines
• Sync small websites / repositories (few
resources) but also large data collections
(millions of resources)
• Support low change frequency (weeks /
months) to high change frequency
(seconds) sources
• Low adoption barrier!
9
10. WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics
• Demos
• Status and Next Steps
10
27. WWW 2013, May 17th
ResourceSync
• What and Why?
• Synchronization Scenarios
• ResourceSync Basics Walkthrough
• Demos
• Status and Next Steps
27
28. WWW 2013, May 17th
Status
• Beta spec (v.0.6) for public comment
http://www.openarchives.org/rs/0.6/
resourcesync
• Tool development started
• Separate documents for archiving and push
deployments
28
29. WWW 2013, May 17th
Next Steps
• Continue tool development & deployment
• Collect
• public comments on
resourcesync@googlegroups.com
• implementation issues on
https://github.com/resync/resync/issues
• Version 0.9 to be released in Summer 2013
• Version 1.0 in fall 2013 (NISO standard)
29
30. WWW 2013, May 17th
Thanks!
@bhaslhofer
http://slideshare.net/bhaslhofer
http://openarchives.org/rs
resourcesync@googlegroups.com