1. data citation made easy
Joan Starr
California Digital Library
@joan_starr
Data Curation for Practitioners Workshop
2. data citation made easy
what is it?
why does it matter?
what are identifiers?
sounds hard—is it?
where/how do you start?
Data Curation for Practitioners Workshop
3. data citation in the wild
Data Curation for Practitioners Workshop
4. data citation in the wild
Data Curation for Practitioners Workshop
11. what is an identifier?
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
Data Curation for Practitioners Workshop
12. what is an identifier?
What you see: alphanumeric string (never changes)
Associated with: location of object (such as a URL)
Optional: who, what, when, etc (i.e. metadata)
By Joelk75: http://www.flickr.com/photos/75001512@N00/2728233597/
Data Curation for Practitioners Workshop
13. typical identifier
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.bologna.edu/biology/xfg/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus punctatus)
publisher: University of Bologna
date: 8/31/2011
Data Curation for Practitioners Workshop
14. typical identifier
string: doi:10.9999/FK40K2GTV
html version: http://dx.doi.org/10.9999/FK40K2GTV
location: http://www.state.edu/ecology/783s/123.xls
metadata
creator: Dr. Felix Kottor
title: Data for chromosomal study of catfish (Ictalurus
punctatus)
publisher: Dryad Data Repository
date: 2/01/2013
Data Curation for Practitioners Workshop
15. identifier body language
• string: doi:10.9999/FK40K2GTV
“prefix” “suffix”
http://www.flickr.com/photos/87441638@N00/3747944914/ By Lara Photography
Data Curation for Practitioners Workshop
16. data citation made easy
what is it?
why does it matter?
what are identifiers?
sounds hard—is it?
where/how do you start?
Data Curation for Practitioners Workshop
17. identifiers made easy
• Precise identification of a dataset
(DOI or ARK)
• Credit to data producers and data
publishers
• A link from the traditional literature
to the data
• Exposure and research metrics for
datasets
(Web of Knowledge, Google)
Primary Functions
1. Create long term identifiers
2. Manage identifiers over time
3. Manage associated metadata over time
Data Curation for Practitioners Workshop
25. what else?
machine-
to-
machine
http://www.flickr.com/photos/kitschbitch/4060038899/ By Katy Lindemann
Data Curation for Practitioners Workshop
26. what else?
community
of users
courtesy of Oxnard Public Library,
http://content.cdlib.org/ark:/13030/kt6c600758
Data Curation for Practitioners Workshop
29. start by starting
• Create test IDs using the UI http://n2t.net/ezid
• Try out the API
• Talk to your librarians about getting an
account
• Put EZID in your data management plan
• Cite your data!
Data Curation for Practitioners Workshop
30. for more information
Web: n2t.net/ezid
Twitter: @ezidCDL
Email: uc3@ucop.edu
And here’s how to find me:
Twitter: @joan_starr
Email: joan.starr@ucop.edu
Data Curation for Practitioners Workshop
Notas del editor
This is a quick look at the topics I’ll try to cover this morning. And I’m going to make sure we have plenty of time for discussion afterward.
So here is what this looks like. Here is an example of a data set deposited with one of our clients, Dryad.Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences.
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (VitorLeite) Why should researchers bother with DATA CITATION? What is their motivation?To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re-useSource: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (VitorLeite) Why should researchers bother with DATA CITATION? What is their motivation?To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re-useSource: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (VitorLeite) Why should researchers bother with DATA CITATION? What is their motivation?To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re-useSource: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (VitorLeite) Why should researchers bother with DATA CITATION? What is their motivation?To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re-useSource: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (VitorLeite) Why should researchers bother with DATA CITATION? What is their motivation?To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re-useSource: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
DOIs are one kind of persistent identifier.But what is an identifier?An identifier is an alphanumeric string assigned to an object, and if that assignment is managed with some metadata and the object is made available over time, the identifier becomes a VERY reliable way of keeping track of that object.
Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATIONEXPOSURE & CITATION TRACKING(this is not an actual DOI, nor an actual study)
Let’s take a look at one.So you can see that with just the identifier and a simple set of metadata, you get:Location for VERIFICATIONEXPOSURE & CITATION TRACKING(this is not an actual DOI, nor an actual study)
We’re going to look at that same DOI so we can talk about it’s structure. Remember: this is a STRING associated with a TARGET URL.DOI structure is based on the Handle system of identifiers, because you can think of DOIs are a special implementation of the Handle system.So, here is the segment called the PREFIX.All DOI prefixes begin with ’10’ and this is followed by a “dot” and more numbers. The prefix is a unique number assigned to the specific registrant of DOIs. CDL has its own prefix, for example. NCAR has one too. The prefix is the common element in every DOI the registrant makes.The second part is the suffix--the part after the slash. This part has to be unique for every DOI created with the prefix.
This is a quick look at the topics I’ll try to cover this morning. And I’m going to make sure we have plenty of time for discussion afterward.
How can EZID be in the business of issuing DataCite DOIs? California Digital Library was one of the founding members.DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 17, with the National Research Council of Thailand joining this past December. In addition there are 5 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
In this new interface, we’ve introduced a number of new features One unified destination for all EZID INFORMATION.—screenshot of home pageEnhanced Management of identifiers—screenshots of browse and searchc) The ability to designate the identifier’s status (reserved or public) from the UI. Using advance create.If you click on any one of these links, you’ll get to information about how to use EZID, who’s using EZID, outreach materials, and so on.
By default, we take to a SIMPLE create screen.
Advanced options like the ability to “choose your own identifier”—the suffix body part we saw earlier.
A machine-to-machine interface or API, for automated processing, when you need large numbers of identifiers or you want the identifier creation event to be part of a workflow.
All but one of the Ucs5 members of Orbis CascadeAll told: 20 research universities, including all but one of the UCs and 5 members of Orbis Cascade Alliance6 more research groups on other university campuses7 government agenciesA couple of publishers and a handful of private companies offering data-related services.