SlideShare una empresa de Scribd logo
1 de 79
Descargar para leer sin conexión
ResourceSync:
                           Web-Based
                             Resource
                        Synchronization
                            Herbert Van de Sompel
                                 Los Alamos National Laboratory
                                                    @hvdsomp



                                  ResourceSync is funded by
#resourcesync                   The Sloan Foundation & JISC


                   ResourceSync – Herbert Van de Sompel
                NISO Forum, September 24 2012, Denver, CO
ResourceSync Core Team – NISO & OAI

Los Alamos National Laboratory & OAI: Martin Klein, Robert
Sanderson, Herbert Van de Sompel

Cornell University & OAI: Berhard Haslhofer, Simeon
Warner

Old Dominion University & OAI: Michael L. Nelson

University of Michigan & OAI: Carl Lagoze

NISO: Todd Carpenter, Nettie Lagace, Peter Murray



                            ResourceSync – Herbert Van de Sompel
                         NISO Forum, September 24 2012, Denver, CO
ResourceSync Technical Group


•  Manuel Bernhardt, Delving B.V.
•  Kevin Ford, Library of Congress
•  Richard Jones, JISC
•  Graham Klyne, JISC
•  Stuart Lewis, JISC
•  David Rosenthal, LOCKSS
•  Christian Sadilek, Red Hat
•  Shlomo Sanders, Ex Libris, Inc.
•  Sjoerd Siebinga, Delving B.V.
•  Ed Summers, Library of Congress
•  Jeff Young, OCLC Online Computer Library Center


                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Synchronize What?

•  Web resources – things with a URI that can be dereferenced and
   are cache-able (no dependency on underlying OS, technologies
   etc.)

•  Small websites/repositories (a few resources) to large
   repositories/datasets/linked data collections (many millions of
   resources)

•  That change slowly (weeks/months) or quickly (seconds), and
   where latency needs may vary

•  Focus on needs of research communication and cultural heritage
   organizations, but aim for generality


                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Why?

… because lots of projects and services are doing synchronization
but have to resort to ad-hoc, case by case, approaches!

•  Project team involved with projects that need this

•  Experience with OAI-PMH: widely used in repos but
    o  XML metadata only

    o  Attempts at synchronizing actual content via OAI-PMH

       (complex object formats, dc:identifier) not successful.
    o  Web technology has moved on since 1999




•  Devise a shared solution for data, metadata, linked data?


                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Use Cases – The Basics




            ResourceSync – Herbert Van de Sompel
         NISO Forum, September 24 2012, Denver, CO
Use Cases - More




         ResourceSync – Herbert Van de Sompel
      NISO Forum, September 24 2012, Denver, CO
Out Of Scope (For Now)

•  Bidirectional synchronization

•  Destination-defined selective synchronization (query)

•  Bulk URI migration




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Use Case: arXiv Mirroring

•  1M article versions, ~800/day created or
   updated at 8 PM US Eastern Time

•  Metadata and full-text for each article

•  Accuracy important

•  Want low barrier for others to use

•  Look for more general solution than current
   homebrew mirroring (running with minor
   modifications since 1994!) and occasional rsync
   (filesystem layout specific, auth issues)

                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Use Case: DBpedia Live Duplication

•  Average of 2 updates per second
•  Want low latency => need a push technology




                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync Problem


•  Consideration:
    •  Source (server) A has resources that change over time: they
       get created, modified, deleted
    •  Destination (servers) X, Y, and Z leverage (some) resources
       of Source A.
•  Problem:
    •  Destinations want to keep in step with the resource changes
       at source A: resource synchronization.
•  Goal:
    •  Design an approach for resource synchronization aligned
       with the Web Architecture that has a fair chance of adoption
       by different communities.
        •  The approach must scale better than recurrent HTTP
           HEAD/GET on resources.


                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
Destination: 3 Basic Synchronization Needs

1.  Baseline synchronization – A destination must be able to
    perform an initial load or catch-up with a source
       -  avoid out-of-band setup

2.  Incremental synchronization – A destination must have some
    way to keep up-to-date with changes at a source
       -  subject to some latency; minimal: create/update/delete
       -  allow to catch-up after destination has been offline

3.  Audit – A destination should be able to determine whether it is
    synchronized with a source
       -  subject to some latency



                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations
to know about, it may describe them:

    o    Publish an inventory of resource URIs and possibly
         associated metadata
         -  Destination GETs the Content Description
         -  Destination GETs listed resources by their URI




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
           updated resources, removes deleted resources.




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.


   o     2.2. Push Change Set: Push a list of recent change events
         (create, update, delete resource) towards (a) destination(s)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.




                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set

   o    3.2. Historical Content: Provide access to prior resource versions




                                     ResourceSync – Herbert Van de Sompel
                                  NISO Forum, September 24 2012, Denver, CO
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump

   o    4.2. Alternate Content Transfer: Support alternative
        mechanisms to optimize getting content (see later)




                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Source: Advertise Capabilities

A source needs to advertise the capabilities it supports to allow a
destination to discover them

•     Some capabilities may be provided by a third party, not the
      source itself
     o   e.g. Historical Change Sets, Historical Content
     o   But the source should still make those third party capabilities
         discoverable - trust




                                    ResourceSync – Herbert Van de Sompel
                                 NISO Forum, September 24 2012, Denver, CO
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Possible Technical Choices

Q&A




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync: A Framework of Capabilities


•  Modular framework allowing selective deployment of
   capabilities

•  A Source selects which capabilities to support in order
   to meet local and community needs

•  A Source’s Capabilities can be discovered via
   capability descriptions




                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
BY REFERENCE!




    BY VALUE!




   ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Sitemap


<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
     <loc>http://example.com/res1</loc>
     <lastmod>2012-08-08T08:15:00Z</lastmod>
  </url>
  <url>
     <loc>http://example.com/res2</loc>
     <lastmod>2012-08-08T13:22:00Z</lastmod>
  </url>
</urlset>




                               ResourceSync – Herbert Van de Sompel
                            NISO Forum, September 24 2012, Denver, CO
Baseline Matching - Sitemap


•  Periodic publication of up-to-date Sitemap, which is a “by
   reference” inventory of a Source’s resources

•  Use ”as is” with resource location and last modification date as
   core elements

•  Introduce extension elements aimed at supporting audit: e.g.
   MD5 hash of content




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
robots.txt!


              discovery




                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
Baseline Matching – Dump


•  A Dump is a “by-value” inventory of a Source’s resources

•  Periodic publication of an up-to-date Dump

•  Possible technology: ZIP file consisting of:

    •  Special-purpose Sitemap that acts as a manifest for
       resources contained in the ZIP file
        •  Introduce an element to express correspondence
           between resource URI and filename in the ZIP file
    •  Resource bitsteams

•  Possible technology: WARC file


                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication – Pull Change Sets


•  Periodic publication of a Change Set that describes recent
   changes

•  A Change Set is a Sitemap-style document, enhanced to
   express change events rather than inventory. Per change event,
   convey:
    •  About the event:
        •  datetime
        •  event type: create/update/delete (maybe move/copy)
    •  About the changed resource:
        •  URI
        •  Information relevant for audit, e.g. fixity, size, mime type
        •  Further information to aide accessing the resource (see
           later)

                                  ResourceSync – Herbert Van de Sompel
                               NISO Forum, September 24 2012, Denver, CO
Change Set, Based on Sitemap


<?xml version="1.0" encoding="UTF-8"?>
<urlset rs:type="changeset”
        xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <url>
     <loc>http://example.com/res1</loc>
     <lastmod rs:type="updated">2012-08-08T08:15:00Z</lastmod>
  </url>
  <url>
     <loc>http://example.com/res2</loc>
     <lastmod rs:type="created">2012-08-08T10:22:00Z</lastmod>
  </url>
</urlset>


                              ResourceSync – Herbert Van de Sompel
                           NISO Forum, September 24 2012, Denver, CO
Change Set, from Scratch


<?xml version="1.0" encoding="UTF-8"?>
<changeset xmlns="http://www.openarchives.org/rs/changeset">
  <change>
     <link rel="created" length="1234" type="text/html”
           href="http://example.com/res1.html"/>
     <date>2012-09-25T09:00:00Z</date>
     <fixity>ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx</fixity>
   </change>
</changeset>




                              ResourceSync – Herbert Van de Sompel
                           NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication – Push Change Sets


•  Use a push technology to convey changes

•  Express changes using same Sitemap-style document
    •  A Change Set in this case might convey only one change
       event

•  Possible technology: XMPP PubSub




                               ResourceSync – Herbert Van de Sompel
                            NISO Forum, September 24 2012, Denver, CO
<XMPP PubSub Intermezzo>

XMPP Publish-Subscribe: Client to Subscription Service,
  Subscription Service to Client(s) communication

•  One of the XMPP (Extensible Messaging and Presence Protocol)
   extensions http://xmpp.org/extensions/xep-0060.html

•  Apple Notifications based on XMPP PubSub

•  Both client and server tools widely available




                                   ResourceSync – Herbert Van de Sompel
                                NISO Forum, September 24 2012, Denver, CO
</XMPP PubSub Intermezzo>




Source   PubSub Server                                               Destination


                            ResourceSync – Herbert Van de Sompel
                         NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Change Communication Memory


•  Publication of one or more Change Sets that convey historical
   (rather than recent) changes

•  All historical Change Sets use same Sitemap-style document

•  Same approach irrespective of whether pull or push is used for
   Change Communication




                                ResourceSync – Herbert Van de Sompel
                             NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Resource Transfer


•  Resources are obtained in bulk by obtaining a Dump

•  An individual resource is, by default, obtained by dereferencing a
   resource’s URI listed in:
    •  Sitemap
    •  Change Set

•  Alternative access mechanisms are introduced to obtain an
   individual resource:
    •  From a mirror site
    •  Access to diff with previous version instead of access to the
       entire changed resource
    •  Resource version


                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Resource Memory


•  Requires a (short or long term) archive of resource versions

•  Access to specific version can be expressed as an alternative
   access mechanism in e.g. Change Set.
    •  Via a link to a version resource that is the result of the
       change expressed in the Change Set
    •  Via a link to a Memento TimeGate that supports access to all
       available prior versions




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
<Memento Intermezzo>




  http://www.mementoweb.org/

              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Original Resources and Mementos




                ResourceSync – Herbert Van de Sompel
             NISO Forum, September 24 2012, Denver, CO
Bridge from Present to Past




              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Bridge from Past to Present




              ResourceSync – Herbert Van de Sompel
           NISO Forum, September 24 2012, Denver, CO
Memento Framework




         ResourceSync – Herbert Van de Sompel
      NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
Memento Framework

Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png




                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
Time Travel across Versions of a Picture of the Day




Movie at: http://www.mementoweb.org/demo/picoftheday.mov
                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Memento Framework

Original Resource: http://dbpedia.org/resource/France




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
Time-Series Analysis across DBpedia Versions




      Data collected through HTTP Navigation

           Paper at http://arxiv.org/abs/1003.3661

                             ResourceSync – Herbert Van de Sompel
                          NISO Forum, September 24 2012, Denver, CO
</Memento Intermezzo>




   http://www.mementoweb.org/

               ResourceSync – Herbert Van de Sompel
            NISO Forum, September 24 2012, Denver, CO
ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO
ResourceSync Timeline
•  August 2012
    o  First draft spec shared for feedback with ResourceSync team




•  September 2012
    o  Problem Statement paper in D-Lib Magazine

    o  In-person meeting of ResourceSync Team




•  October 2012
    o  Revise spec, conduct experiments

    o  Solicit broad feedback




•  December 2012 – Finalize specification (?)



                                 ResourceSync – Herbert Van de Sompel
                              NISO Forum, September 24 2012, Denver, CO
Pointers
•  First ResourceSync draft spec (do not implement!):
   http://www.openarchives.org/rs/0.1/resourcesync!

•  ResourceSync Simulator code on github
   http://github.org/resync/simulator!

•  NISO ResourceSync workspace
   http://www.niso.org/workrooms/resourcesync/!

•  Memento
   http://mementoweb.org!




                           ResourceSync – Herbert Van de Sompel
                        NISO Forum, September 24 2012, Denver, CO
ResourceSync:
         Get the Sticker!

            Herbert Van de Sompel
                 Los Alamos National Laboratory
                                    @hvdsomp


                  ResourceSync is funded by
                The Sloan Foundation & JISC




   ResourceSync – Herbert Van de Sompel
NISO Forum, September 24 2012, Denver, CO

Más contenido relacionado

Destacado

NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 

Destacado (7)

The Future of eReaders
The Future of eReadersThe Future of eReaders
The Future of eReaders
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 

Similar a NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
Herbert Van de Sompel
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
nisohq
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
Robert Sanderson
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
National Information Standards Organization (NISO)
 

Similar a NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization (20)

ResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource SynchronizationResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource Synchronization
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
ResourceSync
ResourceSyncResourceSync
ResourceSync
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
 
ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014ResourceSync - NISO Update Jan 2014
ResourceSync - NISO Update Jan 2014
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13ResourceSync Introduction at SWIB13
ResourceSync Introduction at SWIB13
 
ResourceSync - An Introduction
ResourceSync - An IntroductionResourceSync - An Introduction
ResourceSync - An Introduction
 
NISO ResourceSync Training Session
NISO ResourceSync Training SessionNISO ResourceSync Training Session
NISO ResourceSync Training Session
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
 
Resource Sync - Introduction
Resource Sync - IntroductionResource Sync - Introduction
Resource Sync - Introduction
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
Refactoring HUBzero for Linked Data
Refactoring HUBzero for Linked DataRefactoring HUBzero for Linked Data
Refactoring HUBzero for Linked Data
 
A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 

Más de National Information Standards Organization (NISO)

Más de National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Último

Último (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Synchronization

  • 1. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by #resourcesync The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 2. ResourceSync Core Team – NISO & OAI Los Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de Sompel Cornell University & OAI: Berhard Haslhofer, Simeon Warner Old Dominion University & OAI: Michael L. Nelson University of Michigan & OAI: Carl Lagoze NISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 3. ResourceSync Technical Group •  Manuel Bernhardt, Delving B.V. •  Kevin Ford, Library of Congress •  Richard Jones, JISC •  Graham Klyne, JISC •  Stuart Lewis, JISC •  David Rosenthal, LOCKSS •  Christian Sadilek, Red Hat •  Shlomo Sanders, Ex Libris, Inc. •  Sjoerd Siebinga, Delving B.V. •  Ed Summers, Library of Congress •  Jeff Young, OCLC Online Computer Library Center ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 4. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 5. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 6. Synchronize What? •  Web resources – things with a URI that can be dereferenced and are cache-able (no dependency on underlying OS, technologies etc.) •  Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) •  That change slowly (weeks/months) or quickly (seconds), and where latency needs may vary •  Focus on needs of research communication and cultural heritage organizations, but aim for generality ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 7. Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! •  Project team involved with projects that need this •  Experience with OAI-PMH: widely used in repos but o  XML metadata only o  Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o  Web technology has moved on since 1999 •  Devise a shared solution for data, metadata, linked data? ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 8. Use Cases – The Basics ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 9. Use Cases - More ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 10. Out Of Scope (For Now) •  Bidirectional synchronization •  Destination-defined selective synchronization (query) •  Bulk URI migration ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 11. Use Case: arXiv Mirroring •  1M article versions, ~800/day created or updated at 8 PM US Eastern Time •  Metadata and full-text for each article •  Accuracy important •  Want low barrier for others to use •  Look for more general solution than current homebrew mirroring (running with minor modifications since 1994!) and occasional rsync (filesystem layout specific, auth issues) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 12. Use Case: DBpedia Live Duplication •  Average of 2 updates per second •  Want low latency => need a push technology ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 13. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 14. ResourceSync Problem •  Consideration: •  Source (server) A has resources that change over time: they get created, modified, deleted •  Destination (servers) X, Y, and Z leverage (some) resources of Source A. •  Problem: •  Destinations want to keep in step with the resource changes at source A: resource synchronization. •  Goal: •  Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. •  The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 15. Destination: 3 Basic Synchronization Needs 1.  Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source -  avoid out-of-band setup 2.  Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source -  subject to some latency; minimal: create/update/delete -  allow to catch-up after destination has been offline 3.  Audit – A destination should be able to determine whether it is synchronized with a source -  subject to some latency ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 16. Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o  Publish an inventory of resource URIs and possibly associated metadata -  Destination GETs the Content Description -  Destination GETs listed resources by their URI ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 17.
  • 18.
  • 19. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. o  2.2. Push Change Set: Push a list of recent change events (create, update, delete resource) towards (a) destination(s) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 25. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 26.
  • 27.
  • 28.
  • 29. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set o  3.2. Historical Content: Provide access to prior resource versions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 30. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 31.
  • 32. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump o  4.2. Alternate Content Transfer: Support alternative mechanisms to optimize getting content (see later) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 33. Source: Advertise Capabilities A source needs to advertise the capabilities it supports to allow a destination to discover them •  Some capabilities may be provided by a third party, not the source itself o  e.g. Historical Change Sets, Historical Content o  But the source should still make those third party capabilities discoverable - trust ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 34. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Possible Technical Choices Q&A ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 35. ResourceSync: A Framework of Capabilities •  Modular framework allowing selective deployment of capabilities •  A Source selects which capabilities to support in order to meet local and community needs •  A Source’s Capabilities can be discovered via capability descriptions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 36. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 37. BY REFERENCE! BY VALUE! ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 38. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 39. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 40. Sitemap <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://example.com/res1</loc> <lastmod>2012-08-08T08:15:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2012-08-08T13:22:00Z</lastmod> </url> </urlset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 41. Baseline Matching - Sitemap •  Periodic publication of up-to-date Sitemap, which is a “by reference” inventory of a Source’s resources •  Use ”as is” with resource location and last modification date as core elements •  Introduce extension elements aimed at supporting audit: e.g. MD5 hash of content ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 42. robots.txt! discovery ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 43. Baseline Matching – Dump •  A Dump is a “by-value” inventory of a Source’s resources •  Periodic publication of an up-to-date Dump •  Possible technology: ZIP file consisting of: •  Special-purpose Sitemap that acts as a manifest for resources contained in the ZIP file •  Introduce an element to express correspondence between resource URI and filename in the ZIP file •  Resource bitsteams •  Possible technology: WARC file ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 44. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 45. Change Communication – Pull Change Sets •  Periodic publication of a Change Set that describes recent changes •  A Change Set is a Sitemap-style document, enhanced to express change events rather than inventory. Per change event, convey: •  About the event: •  datetime •  event type: create/update/delete (maybe move/copy) •  About the changed resource: •  URI •  Information relevant for audit, e.g. fixity, size, mime type •  Further information to aide accessing the resource (see later) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 46. Change Set, Based on Sitemap <?xml version="1.0" encoding="UTF-8"?> <urlset rs:type="changeset” xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <url> <loc>http://example.com/res1</loc> <lastmod rs:type="updated">2012-08-08T08:15:00Z</lastmod> </url> <url> <loc>http://example.com/res2</loc> <lastmod rs:type="created">2012-08-08T10:22:00Z</lastmod> </url> </urlset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 47. Change Set, from Scratch <?xml version="1.0" encoding="UTF-8"?> <changeset xmlns="http://www.openarchives.org/rs/changeset"> <change> <link rel="created" length="1234" type="text/html” href="http://example.com/res1.html"/> <date>2012-09-25T09:00:00Z</date> <fixity>ni:///sha-256;f4OxZX_x_FO5LcGBSKHWXfwtSx</fixity> </change> </changeset> ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 48. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 49. Change Communication – Push Change Sets •  Use a push technology to convey changes •  Express changes using same Sitemap-style document •  A Change Set in this case might convey only one change event •  Possible technology: XMPP PubSub ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 50. <XMPP PubSub Intermezzo> XMPP Publish-Subscribe: Client to Subscription Service, Subscription Service to Client(s) communication •  One of the XMPP (Extensible Messaging and Presence Protocol) extensions http://xmpp.org/extensions/xep-0060.html •  Apple Notifications based on XMPP PubSub •  Both client and server tools widely available ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 51. </XMPP PubSub Intermezzo> Source PubSub Server Destination ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 52. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 53. Change Communication Memory •  Publication of one or more Change Sets that convey historical (rather than recent) changes •  All historical Change Sets use same Sitemap-style document •  Same approach irrespective of whether pull or push is used for Change Communication ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 54. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 55. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 56. Resource Transfer •  Resources are obtained in bulk by obtaining a Dump •  An individual resource is, by default, obtained by dereferencing a resource’s URI listed in: •  Sitemap •  Change Set •  Alternative access mechanisms are introduced to obtain an individual resource: •  From a mirror site •  Access to diff with previous version instead of access to the entire changed resource •  Resource version ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 57. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 58. Resource Memory •  Requires a (short or long term) archive of resource versions •  Access to specific version can be expressed as an alternative access mechanism in e.g. Change Set. •  Via a link to a version resource that is the result of the change expressed in the Change Set •  Via a link to a Memento TimeGate that supports access to all available prior versions ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 59. <Memento Intermezzo> http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 60. Original Resources and Mementos ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 61. Bridge from Present to Past ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 62. Bridge from Past to Present ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 63. Memento Framework ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 64. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 65. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 66. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 67. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 68. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 69. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 70. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 71. Memento Framework Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 72. Time Travel across Versions of a Picture of the Day Movie at: http://www.mementoweb.org/demo/picoftheday.mov ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 73. Memento Framework Original Resource: http://dbpedia.org/resource/France ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 74. Time-Series Analysis across DBpedia Versions Data collected through HTTP Navigation Paper at http://arxiv.org/abs/1003.3661 ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 75. </Memento Intermezzo> http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 76. ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 77. ResourceSync Timeline •  August 2012 o  First draft spec shared for feedback with ResourceSync team •  September 2012 o  Problem Statement paper in D-Lib Magazine o  In-person meeting of ResourceSync Team •  October 2012 o  Revise spec, conduct experiments o  Solicit broad feedback •  December 2012 – Finalize specification (?) ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 78. Pointers •  First ResourceSync draft spec (do not implement!): http://www.openarchives.org/rs/0.1/resourcesync! •  ResourceSync Simulator code on github http://github.org/resync/simulator! •  NISO ResourceSync workspace http://www.niso.org/workrooms/resourcesync/! •  Memento http://mementoweb.org! ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO
  • 79. ResourceSync: Get the Sticker! Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel NISO Forum, September 24 2012, Denver, CO