CESSDA Persistent Identifiers

dans.knaw.nl
DANS is een instituut van KNAW en NWO
CESSDA Persistent Identifiers
Workshop PID Information Types for the Social Sciences
May 29, 2017, The Hague
Vyacheslav Tykhonov
Senior Information Scientist (DANS)
vyacheslav.tykhonov@dans.knaw.nl

dans.knaw.nl
DANS data repositories with Persistent Identifiers
Within the context of DANS’ mission, it is obligatory that every (digital) object
archived via DANS has a PID, so that it can be (re)located and cited. DANS
uses PIDs for both (digital) objects and people.
DataverseNL for ongoing research projects
• every dataset has its own handle (for Dutch Universities)
• revisions of dataset don’t change the handle, every new version changing only
citation
EASY for permanent data archiving (DOIs)
• archived dataset has DOI
• every version of dataset archived from DataverseNL producing new DOI

dans.knaw.nl
• DANS has developed Plugin to archive datasets deposited
in Dataverse temporary storage to Trusted Digital
Repositories (TDR)
• Before putting datasets in the long term archive users
should create account in TDR and get proper permissions
to archive their data
• Archival Plugin is open source software and can be easily
extended by support of any TDRs:
https://github.com/DANS-KNAW/dataverse-bridge

dans.knaw.nl
“Archive” button is
available for local
Dataverse administrators
to push datasets to EASY
archive for long term
preservation

dans.knaw.nl
Administrator can make
choice where to archive
the dataset:
Archivematica, Islandora,
FEDORA or DANS EASY
(EASY is default option)

dans.knaw.nl
Archiving process will run
in background to extract
data and metadata from
dataset and will create
archived (bagit) package
containing all files and
checksums

dans.knaw.nl
After process of archiving
will be finished button
“Archive” will disappear on
the page. Dataset citation
will be extended with DOI
pointing to archived version
of the dataset in EASY

dans.knaw.nl
Archived version of the
dataset is available on
EASY landing page and
can be cited in
research papers

dans.knaw.nl
Archived dataset
automatically will get
DOI and URN pointing
to archived revision
(version) of dataset

dans.knaw.nl
All files from
dataset will get
permission levels
corresponding to
versions of files
stored in Dataverse

Dataverse as Archival Service
• We’re working on the extension of Dataverse with DOIs
generated for every version of dataset to make it work as
permanent storage
• Citations can contain duplicate metadata but dataset content
(data files) should be different
• Archival part can be hosted by the same Dataverse
depending from plugin settings

CESSDA PID plugin
• Universal plugin to get DOIs and handles in the same
Dataverse instance
• Prefix of every organisation will be generated based on the
configuration and authentication settings of the plugin
• switch Dataverse between support of ongoing research and
archive (in separate subdataverses)

Challenges
• We need PID “Proxy” Service collecting information about all
DOIs generated for different versions of datasets with handles
• depending from the location and status of dataset every
citation should contain handle (Netherlands), URN:NBN
(Europe) and DOI (worldwide)
• statistics about all citations of datasets in research papers
should be aggregated and provided as part of “Proxy” Service
to build own “PageRank” index
• Big Data and Linked Open Data archiving with Persistent
Identifiers
• higher level of granularity for separate files, subsets,
fragments, time services to make citation more accurate
• tombstone pages maintenance

Big Data repository with Persistent Identifiers
The approach is suitable for product development companies (industry) and
organisations and institutions (CESSDA) looking for sustainable (Big) data
archiving services.
Big Data object in Dataverse consists of:
• metadata with authorship and citation information
• data usage licence
• persistent DOI or handle
• information how to obtain key (API token) to start use API endpoint(s)
• link to API endpoint delivering data
• representation of API (interactive documentation, Swagger)
• data provenance
• controlled vocabularies to meet domain specific community standards (optional)
Public demonstration is available on Dataverse demo website.

Linked Data hubs as archived object
Source: PID object

dans.knaw.nl
Questions?

CESSDA Persistent Identifiers

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a CESSDA Persistent Identifiers

Similar a CESSDA Persistent Identifiers (20)

Más de vty

Más de vty (20)

Último

Último (20)

CESSDA Persistent Identifiers