Presentation to the UM Library Emergent Research Series

SEAD: Sustainable Environment
through Actionable Data
Margaret Hedstrom
Professor of Information
Faculty Associate Institute for Social Research (ICPSR)
PI, SEAD
June 23, 2014

Overview
• What is SEAD?
• Vision and Rationale
• Target Audience and User Communities
• Current Status
• SEAD, Universities, and Libraries
• Some Lessons Learned (so far)
• Plans and Future Engagement
2

What is SEAD?
• A Cooperative Agreement funded by NSF to
develop sustainable cyberinfrastructure for
preservation and access to scientific data ($8
million/5 years)
• A partnership between the universities of
Michigan, Indiana and Illinois
• An emerging set of services for data management,
sharing, curation, discovery and preservation for
researchers in the “long tail”
• A case study of data needs in sustainability
science
3

SEAD Vision and Rationale
• Small teams, researchers with short-term
projects, and individual scientists (the long tail)
are under served by today’s data preservation
and access infrastructure
• These communities will take advantage of
evolving data preservation and access
infrastructure if:
– it supports science objectives and enables new kinds
of science
– it is easy to use
– collaborators and peers are also using it
• Sustainability science is a good test case

Researcher(s)
Create and
Analyze Data
Researchers
Publish
Results
?
Researchers
Deposit Data
Libraries
Acquire
Publications
Repositories
Curate Data
Researchers
Search for
Publications
Researchers
Integrate, Create
New Data
Researchers
Search for
Data
Data Preservation and Access Today

Researcher(s)
Create and
Analyze Data
Researchers
Publish
Results
?
Researchers
Deposit Data
Libraries
Acquire
Publications
Repositories
Curate Data
Researchers
Search for
Publications
Researchers
Integrate, Create
New Data, and
Analyze Data
Researchers
Search for
Data
Data Preservation and Access Today

Research
Question
SEARCH for
People
Publications
Data
Collaboration
Environment
Discovery and Access
Environments
Combine,
Integrate,
Analyze
Preservation
Environments
SEAD Vision
Share
Improve
Curate Data
Upload/Do
wnload
DataSEAD ACR
SEAD Virtual Archive
SEAD Social Network

Target Audience / User Communities
Sustainability Scientists
• Focused on problems that require data, methods,
tools, and expertise from multiple disciplines
• Requires many different types of data about physical,
natural, and social phenomena in order to understand
interactions between natural and human systems
• Uses a combination of observational (field) data,
experimental data, simulations, and models
• Conducts research in small to medium-sized labs or
centers under the direction of a single PI or a Center
Director.
8

Target Audience / User Communities
the “Long Tail” of Scientific Research
• Data discovery is via targeted foraging and word-of-mouth
• Almost all data are stored locally
• Minimal local IT support
• Metadata standards and ontologies, where they do exist, are based
on disciplinary norms or local practices
• Data formats and metadata standards are often controlled by
multiple independent third-parties (e.g. instrument and application
providers
• Data are vulnerable to interruptions in organizational arrangements
(graduate students finish PhD’s and move on – lab or center funding
sunsets)
• No single data set is likely to have great value standing alone, but
when aggregated, combined and integrated data become valuable
resources of discovery and innovation.
9

Overview
Project Start 10/01/11
User Requirements Report 5/12
NCED Repository Ingest 8/12
Prototype Review 4/22/13
SEAD 1.0 Released 10/13
DataOne Member Node 11/13
End User Workshop 4/11/14
10th User Group 5/11/14
36-Month Review 10/14/14
Renewal (?) for Years 6-10
10

Summary of Current Status
• Working Platform
– SEAD Active Content Repository (ACR)
• Collaboration / File Sharing Space for Research Projects
• Staging Area for Data Prior to Publishing or Archiving
– SEAD Virtual Archive
• Capability to push data from ACR and/or local research
environments to preservation and discovery services
(Institutional Repositories/DataONE)
– SEAD Research Network
• Researcher initiated profiles with harvesting of citations,
linkage of data-people-publications, reporting
11

SEAD, Universities, and Libraries
• From the researcher’s perspective
– SEAD is an project work space that enables data
sharing, commenting, secure storage, extraction
of metadata, and active/social curation
• From the university research infrastructure
perspective
– SEAD is a staging area for data curation prior to
publication, submission, and preservation
13

Data Set Publishing Workflow
•Data content used
within ACR
•Researcher Profile
Established in VIVO
NCED Data Set
Ingested to ACR
•Data Set ready to
publish
NCED Data Set
Ingested to VA •DataCite minted
DOI attached to
finalized Data Set
NCED Data Set
Deposited with IR
•DOI Resolution to
designated IR
NCED Data Set
Published to
VIVO

Data Citation
Example - Person

Data Citation
Example - Dataset
DOI
Authors
Subject areas
Abstract
Geographic focus
Rights information

SEAD: Explore Sustainability Research
PEOPLE ORGANIZATIONS
RESEARCH
(DATA + PUBLICATIONS)

SEAD Virtual Archive
• Purpose: Long-term preservation and discovery
– Thin virtualization layer on top of multiple university
Institutional Repositories (IRs)
– Enhances IRs by being sustainability science-aware
• Team: IU Libraries, UIUC Libraries, and Data To
Insight Center at IU
• Starting point: Data Conservancy code (Johns
Hopkins U.)
– Extended for sustainability science long tail use cases

Making Data Sustainable: Use Case
Active Curation
Repository
(ACR)
SEAD Virtual
Archive
IUScholarwork
s
UIUC Ideals
Packaged
object
Preserve data
Keep private for 5 years
Index data, metadata
and relationships
• Collected data about Lower
Mississippi flood
• Stored in Active Repository
• Organized as a collection
• Marked “Ready for
publication”
• Collections visible to team only
for 5 years
• Deposited to repository based
on dataset creator affiliation
• Find by author, location,
keywords or repository

Preview
Data
Upload
Data to
VA
Run
Virus
Checking
File
Charact-
erization
Mint
DOI
Deposit
to IR (&
cloud)
Update
DOI
target
Index
Metadata
Index
Scientific
Metadata
Large
Dataset
Decision
Version
Data
IR
Match-
maker
Index
Scientific
Metadata
Accept
Repository
Agreement
Ingest Workflow into SEAD VA
Link to live demo http://seadva.d2i.indiana.edu:8181/sead-access/#login

Successful automatic
ingest into UIUC IDEALS
repository

Communication with IRs
Datasets deposited into IU SDA, IU
Scholworks and UIUC IDEALS

Some Lessons Learned
• Some researchers and projects in the “long tail”
have sophisticated demands for active data
services
• Supporting analysis of data in SEAD adds
complexity and cost
• Users want some degree of customization of
bare-bones file storage and active project space
• A big gap remains between data producers and
the campus/library/archive infrastructures for
long-term access and preservation.
24

SEAD Priorities and Future Plans
• Make SEAD more stable and more usable
• Attract a larger, broader, and more diverse
user community
– Network effects in the long tail
– Self service
• Expand repository options
• Resolve Governance and Sustainability
25

More info
• www.sead-data.net
• Give or send email to myersjd@umich.edu for
access to the SEAD Demo site

Presentation to the UM Library Emergent Research Series

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Presentation to the UM Library Emergent Research Series

Similar a Presentation to the UM Library Emergent Research Series (20)

Más de SEAD

Más de SEAD (9)

Último

Último (20)

Presentation to the UM Library Emergent Research Series

Notas del editor