1. The Research
Data Life Cycle
From Flickr by Velo Steve
Carly Strasser
California Digital Library
GeoData
18 June 2014
2. Why don’t people
share data?
Is data management
being taught?
Do attitudes about
sharing differ
among disciplines?
What role can
libraries play in data
education?
How can we promote storing
data in repositories?
What barriers to sharing
can we eliminate?
NSF funded DataNet Project
Office of Cyberinfrastructure
22. … “Federal agencies investing in research
and development (more than $100 million
in annual expenditures) must have clear
and coordinated policies for increasing
public access to research products.”
Feb
2013
23. From
Calisphere,
Courtesy
of
UC
Riverside,
California
Museum
of
Photography
What do
researchers
think?
24. They don’t know about policies.
John
Kratz,
CLIR/DLF
Postdoc
at
CDL
25. They aren’t taught data management.
Quality control and quality assurance
The proper way to name computer files
Types of files and software to use
Metadata generation Workflows
Protecting data
Databases and data archiving
Data re-use
Meta-analysis
Data sharing
Reproducibility
Notebook protocols (lab or field)
Strasser
&
Hampton
2013.
“Undergraduates
&
Ecological
Data
Management
Training
in
the
US”.
DOI:10.1890/ES12-‐00139.1
26. 0
10
20
30
40
50
60
70
BAS
RU
In Curriculum?
They aren’t taught data management.
27. No
one
reads
it
anyway.
It’s
an
unfunded
mandate.
I
wrote
it
the
night
before.
They aren’t concerned.
28. What does success look like?
DMPs…
• are flexible
• are useful and used
• result in easily discoverable data
• linked to open data
• are created in partnership with institutional service
providers
• are used as a/n (automated) compliance tool
• are part of the workflow of research
• include digital and non-digital materials (where
relevant)
29. “Community-driven”
But what if community
doesn’t care (yet)?
“Generic, work for everyone”
But community-specific
standards
39. From
Flickr
by
iowa_spirit_walker
• Cost
• Confusion about
standards
• Lack of training
• Fear of lost rights or
benefits
• No incentives
40.
41. Data are being
recognized as first class
products of research
From Flickr by Richard Moross
NSF bio-sketches can include data
Data Publication
Data Citation
48. What does “data
publication” mean?
1. Available
2. Citable
3. Trustworthy*
Data
are
*peer reviewed?
certified?
Props to Sarah Callaghan & colleagues
49. Available | Citable | Trustworthy
Publish means to “make public”.
You should not have to email the author.
The data doesn’t have to be open access.
“Email me!”
CC-0 on web
50. Simple case…
Data citations should be in reference list.
Five-element citation: author, year, title,
publisher, identifier
Available | Citable | Trustworthy
Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in
adaptive evolution. Theoretical Population Biology. Published in Dryad.
doi:10.5061/dryad.j8n0p7vc
51. More complicated…
Deep data citation: what if you want to
cite a subset?
Dynamic data: how to create a reliable
citation when a dataset is changing?
Available | Citable | Trustworthy
52. Technical VS. Scientific
Sometimes consider
impact and/or novelty
Guidelines provided
Available | Citable | Trustworthy
From Flickr by Percival Lowell
53. 1. Data as supplemental material
Data published alongside a traditional journal article.
Available + citable. Review varies.
Potential issues with long-term availability.
What does a data
publication look like?
From Flickr by subsetsum
54. 2. Data paper:
Data + descriptive “data paper”
Most require data be in a trusted repository.
All have a component of peer review.
Examples:
• Standalone journals: Nature Scientific Data, Geoscience Data
Journal, Ecological Archives
• Journals that publish data papers: GigaScience, F1000 Research,
Internet Archaeology
What does a data
publication look like?
From Flickr by subsetsum
55. 3. Standalone data
Data published without a related journal article.
Rich metadata (structured or unstructured)
Examples:
• Open Context
• NASA PDS Peer Review Data
• figshare (but no validation)
What does a data
publication look like?
From Flickr by subsetsum
60. Repositories
for data
General content
Non-institutional
Publishers/for-profits
Other
Institutional
Discipline-specific
Repository choices…
61. Institutional
Discipline-specific
• All data associated
with a paper
• Tells a story
• Clearinghouse for
researcher’s works
• Some of data for a
given paper
• Discoverable
• Integrated systems
• Collection policies
?
Both
Which should a
researcher use?
Which is more
important?
Depends
Repository choices…
62. Simplify data deposit for
UC researchers
Branded for campus
Merritt underneath the hood