Meeting Federal Research Requirements

From Data Sharing to Data
Stewardship: Meeting Federal
Data Sharing Requirements
ACRL 2015
Thursday, March 26, 2015
ICPSR – University of Michigan
Hashtag: #icpsr

https://www.flickr.com/photos/29261037@N02/8896766525

https://www.flickr.com/photos/shawnhoke/6040690284

Direct identifiers
• Addresses, including ZIP and other postal codes
• Telephone numbers, including area codes
Indirect identifiers
• Exact dates of events (birth, death, marriage)
• Detailed income
• Detailed geographic information (e.g., county)

“The study is composed of about 180,000 autopsy x-
ray image files taken of 58 corpses. The images
originally arrived on DVD and are formatted to
comply with the Digital Imaging and
Communications in Medicine (DICOM) standard….
The images are the data of the study, the images
files themselves contain metadata (metadata on the
images) scrubbed of identifiers but there isn't much
in terms of documentation.”

http://www.wired.com/wp-content/uploads/2014/04/480815249-660x672.jpg

Today
• History (brief!) of federal data sharing requirements
• What is good data sharing? How do you achieve data
stewardship?
• Public data sharing services – tours & take-away tips
• Resources for creating data management plans and
funding quotes

You should leave this session with -
• Keen understanding of several sustainable data
sharing models
• Ability to assess data sharing services
– Through review of several services
– Walk-away tips for evaluating
• Knowledge (a portal) of resources for creating
data management plans for grant applications

• 50+ years of
experience
• Data stewardship
• Data management
• Data curation
• Data preservation
ICPSR

Recent Federal Data Sharing Initiatives
• NIH: 2003 – data sharing plans
• NSF: 2011 – data management plans
• OSTP: 2013 – Memo with subject “Increasing
Access to the Results of Federally Funded
Scientific Research”

https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

http://sites.nationalacademies.org/DBASSE/CurrentProjects/DBASSE_082378

http://www.icpsr.umich.edu/files/ICPSR/ICPSRComments.pdf

http://guides.library.oregonstate.edu/federaloa

Data Portion of Memo - 13 Elements
• The elements are also summarized online
within ICPSR’s Web site:
http://icpsr.umich.edu/content/datamanagement/ostp.html

1.Maximize access
2.Protect confidentiality and privacy
3.Appropriate attribution
4.Long term preservation and
sustainability
5.Data management planning

UK results on data sharing attitudes
• In 2011 survey, 85% of researchers said they
thought their data would be of interest to
others.
• Only 41% said they would be happy to make
their data available.
• Only a third had previously published data.
Source: DaMaRO Project, University of Oxford
http://www.slideshare.net/DigCurv/15-meriel-patrick

Data Sharing Status
Federal
Agency
Shared
Formally,
Archived
(n=111)
Shared
Informally,
Not
Archived
(n=415)
Not
Shared
(n=409)
NSF
(27.3%)
22.4% 43.7% 33.9%
NIH
(72.7%)
7.4% 45.0% 47.6%
Total 11.5% 44.6% 43.9%
Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?”
http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009
See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR:
Identifying Important ‘At Risk’ Social Science Data.”
http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf
Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The
Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307

What is good data sharing - the basis of
data stewardship?
1.Maximize access
2.Protect confidentiality and privacy
3.Appropriate attribution
4.Long term preservation and sustainability
5.Data management planning

Maximize Access (Data Curation)

Discoverable
http://www.flickr.com/photos/papertrix/38028138/

Accessible
http://www.guardian.co.uk/science/grrlscientist/2012/mar/29/1

A well-prepared data collection
“contains information intended to
be complete and self-explanatory”
for future users.
Do no harm.

Protect confidentiality and privacy
• It is critically important to protect the identities of research
subjects
• Disclosure risk is a term that is often used for the possibility
that a data record from a study could be linked to a specific
person
• Data with these risks can be shared via a secured virtual
environment
• Data concerning very sensitive topics can also be shared via
a secured environment

Appropriate Attribution
• Properly citing data encourages the replication of
scientific results, improves research standards, guarantees
persistent reference, and gives proper credit to data
producers.
• Citing data is straightforward. Each citation must include
the basic elements that allow a unique dataset to be
identified over time: title, author, date, version, and
persistent identifier.
• Resources: ICPSR's Data Citations page , IASSIST's Quick
Guide to Data Citation, DataCite.

Long term preservation and sustainability

“Digital information lasts forever or five years,
whichever comes first”.
-Jeff Rothenberg

http://www.flickr.com/photos/blude/2665906010/

Data Management Planning
• Data management plans describe how researchers
will provide for long-term preservation of, and
access to, scientific data in digital formats.
• Data management plans provide opportunities for
researchers to manage and curate their data more
actively from project inception to completion.
• See ICPSR's resource: Guidelines for Effective Data
Management Plans

The Status of Data Sharing
– Good data sharing exists!
– Good data sharing requires funding -
sustainable funding!
– Sustainable funding for free public access
remains a challenge

Sustainable Data Sharing Models –
Three to Explore
• Fee for access model (subscription model)
• Agency model (agency or foundation funds
public access)
• Fee for deposit model (researcher writes fee
into grant and pays at deposit to fund public
access)

I. Fee-for-Access Data Sharing
• Funding is maintained by annual subscription fees charged to
institutions; individuals at subscribing institutions have free
(open) access to data
• Pooled (ongoing) subscriber fees are used to acquire, curate,
and maintain the service
• The service, open to everyone, is thus sustained by subscribers,
but agencies indicate these models are not ‘open enough’
because of the access fees

II. Agency-funded Data Sharing
• Agency sponsors/funds (ongoing) data curation & sharing enabling the
public to access without charge
• The archive is hosted with a curation entity like ICPSR where the public
can easily discover and access data and restricted-use data can also be
securely shared
• Agency directs data selection and compliance policies

III. Fee-for-Deposit Data Sharing
• Depositor (individual or entity) pays for data to be
curated and stored – a fee at deposit
• Deposit fees should be written into the grant
application
• Incoming deposit fees sustain the service and the
professionals behind it
• Sustainability risk fairly high in this model as it
depends upon:
– Continuous influx of deposit fees
– Depositors to put allocated fees towards curation & sharing
• Data tends to be bit-level (not curated): WIDIWYG

Fee for Deposit Services Arriving Daily!
(tips for evaluating coming shortly)

First: A Side-Note on Sharing
Restricted-Use Data
• Data with disclosure risk –
potential to identify a research
subject
• Data with highly sensitive
personal information
What is Restricted-Use Data?

Common Objection/Misperception:
“My data are too sensitive to share. . .”
• ICPSR has been sharing restricted-use data for
over a decade. Three methods are used:
– Secure Download
– Virtual Data Enclave
– Physical Enclave
• ICPSR stores & shares over 6,400 restricted-
use datasets associated with over 2,000
‘active’ restricted-use data agreements

Reality: Restricted-use data can be
effectively shared with the public
• Through the use of a virtual data enclave where
the data never leave the server
• Where there is a process (and understanding!)
to garner IRB approval from the requesting
scientist’s university
• Where there is a system, technology, data
professionals, and collaboration space in place
to disseminate (expensive to build!)
• Because agencies do allow for an incremental
charge to the data requestor to offset marginal
costs

Review of Public Data Sharing Services
• Overview of public data sharing services we have
reviewed
– Some key strengths of each
• Disclaimer: ICPSR has recently launched a public access
service (hosted)
– You’ll likely notice some bias when we talk about the
strengths of openICPSR
– And because we built the service, we know much more
about it
– Still, ICPSR’s public access service isn’t for everyone –
more on that shortly

openICPSR – www.openicpsr.org

How is openICPSR unique?
openICPSR is a public data-sharing service:
• Where the deposit is reviewed by professional data curators who
are experts in developing metadata (tags) for the social and
behavioral sciences = discoverable
• With an immediate distribution network of over 750 institutions
looking for research data, that has powerful search tools, and a
data catalog indexed by major search engines = usage
• Sustained by a respected organization with over 50 years of
experience in reliably protecting research data = sustainable
• Prepared to accept and disseminate sensitive and/or restricted-
use data in the public-access environment = protection of research
subjects

How will openICPSR disseminate sensitive
data to the public?
• The deposit of sensitive (restricted-use) data is similar
to the deposit of non-sensitive data except that the
depositor will indicate that the data should be for
restricted-use only
• Dissemination of sensitive data will be through
ICPSR’s virtual data enclave; in this environment, data
never leave the secure server and analysis takes place
in the virtual space
• Scientists desiring to access the data will need to
apply for the data and will pay an access fee
• openICPSR has already received sensitive (restricted-
use) and dissemination of these data has begun

openICPSR for Institutions and Journals
• Uses openICPSR platform
• Fully hosted in the ICPSR
cloud – no tech or patches
needed
• Branded with a logo and
colors
• Deposits incorporated into
ICPSR’s data catalog
• On-demand administrative
usage tools

A final note: openICPSR accepts research data from
a wide array of disciplines/fields, but not all

Tips for Evaluating a Data Sharing Service
• How will the service sustain itself? Does it have a long term funding
stream?
• How will the service care for my data in the long term should the service
fail? Is there a plan? A safety net?
• Can the service quickly maximize discoverability of my data? Does it
explain how it will do so?
• Does the service have a network of interested researchers & students
seeking data? Will my data get used?
• Does the service have knowledge of international archiving standards?
• Does the service provide a DOI, data citation, and version control should I
need to update my files?
• I have sensitive data or data with some disclosure risk to deposit. Does
the service understand how to secure it upon intake and when sharing?
Does it have experience in this area?
Questions to consider when selecting a data sharing service:

Resources for Creating Data
Management Plans for Grant
Applications

ICPSR’s Data Management & Curation Site
http://www.icpsr.umich.edu/datamanagement/

Purpose of Data Management Plans
• Data management plans describe how researchers
will provide for long-term preservation of, and
access to, scientific data in digital formats.
• Data management plans provide opportunities for
researchers to manage and curate their data more
actively from project inception to completion.

Data Management
Plan Resources

DMP Template Tool to Get You Started!

And still more guidelines after the
project is awarded:
• Guide emphasizes
preparation for data
sharing throughout
the project
• Available online and
via download (pdf)

ICPSR Data Curation Training Workshops
• 1-5 day workshops on data curation/data
repository management decisions
– Participants learn about best practices and
tools for data curation, from selecting and
preparing data for archiving to optimizing and
promoting data for reuse
• Available via ICPSR Summer Program (Ann
Arbor – July 27-31, 2015) or onsite at your
institution

Copies of these Slides & Use
• Feel free to share it; present
it; cite it!
• Find copies of these slides
on Slideshare.net
– Several notes and
additional links are found in
the notes view

Get More information
• Visit ICPSR’s Data Management &
Curation site:
http://www.icpsr.umich.edu/datamanage
ment/index.jsp
• Contact us:
– netmail@icpsr.umich.edu
– (734) 647-2200
• More on Assuring Access to
Scientific Data: white paper –
“Sustaining Domain Repositories
for Digital Data”

Meeting Federal Research Requirements

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Meeting Federal Research Requirements

Similar a Meeting Federal Research Requirements (20)

Más de ICPSR

Más de ICPSR (16)

Último

Último (20)

Meeting Federal Research Requirements

Notas del editor