SEAD Prototype: Data Curation and Preservation for Sustainability Science

The SEAD Prototype: Data Curation and Preservation for Sustainability Science
Beth Plale, Robert H. McDonald, Kavitha Chandrasekar, Inna Kouper, Indiana University, {plale, rhmcdona, kavchand, inkouper}@indiana.edu
Margaret Hedstrom, James Myers, University of Michigan, {hedstrom, myersjd}@umich.edu
Praveen Kumar, Rob Kooper, Luigi Marini, University of Illinois at Urbana-Champaign, {kumar1, kooper, lmarini}@illinois.edu

SEAD Vision and Rationale







Serve interdisciplinary and data-driven
research in sustainability science
Enable access to publications, data and
people
Support new types of analyses with
heterogeneous data
Reduce overall cost of data curation
and preservation
Capture metadata to provide
immediate value for users, producers
and repositories
Increase capabilities for research data
re-use

Active Curation, Actionable
Data (ACR)

Community Exploration,
Research Analytics (VIVO)

SEAD Use Cases (focusing on curation)






Active
Content
Repository

APIs – Web Services

Role-based
Access Control

Data/Metadata
Management

People / Projects / Publications
Data Citations
Organizations
Visualized Networks and Community
Dynamics

People
Projects
Publications
Organizations
Data Citations
Visualizations

 Branded Public Access
 Active Project Spaces
 Individual Data Pages
Data pages
Collection pages
Tag – Search – Map
Project Summary
Geo-Web App
Branded Repository
Android – Desktop
Apps

 Ingestion of heterogeneous data types
(e.g., images, geo-spatial data, and sensor
data) and mapping of semantic
relationships among the research data
collections as well as semantic annotation
and tagging.
 Support of data discovery through
interoperable standards and algorithms,
social networking and data publishing.
 Enhancements of existing data through
automated scientific metadata extraction
and data visualization plugins.
 Ingestion of new data sets directly via
workbench tools.
 Curation of data via federated deposit into
institutional and disciplinary repositories.

Data Publication,
Preservation and Discovery (VA)

 Policy-Driven Curation
 Institutional / Cloud / Grid Storage
 Faceted Search
Curator’s Workbench
Ingest Processing
Matchmaking
Faceted Search
Geo-spatial Search



SEAD Prototype

VIVO

APIs – Web Services

APIs – Joseki – Web Services
Extractors and
Indexing

User Management
RDF –Tupelo 2 – Medici – Lucene – Geoserver

SPRAQL /
HTTP

User / Entity Management –
Analytics
Jena – RDF
MySQL – Local File System

Virtual Archive

SPRAQL / HTTP
BAGIT

Metadata Extraction
Persistent IDs
Indexing
Archiving

Solr Query
(XML)
Geospatial
Query

BagIt
Matchmaker
Conversion

DataONE
Member Node

MySQL – Local File System – Solr – PostGIS

MySQL – Local File System

Acknowledgements
SEAD is funded by the National Science Foundation under Cooperative Agreement #OCI0940824.
SEAD gratefully acknowledges all of our partner participants who have been involved in developing our services
framework. This includes the research teams from the following organizations: School of Information, University
of Michigan; Department of Civil and Environmental Engineering, the National Center for Supercomputing
Applications (NCSA) and UIUC Libraries, University of Illinois at Urbana-Champaign; Data to Insight Center, IU
Libraries and School of Informatics and Computing, Indiana University; the Interuniversity Consortium for Political
and Social Research (ICPSR); the National Center for Earth-Surface Dynamics (NCED) and the Data Conservancy
Project, John Hopkins University.

Currently, SEAD has implemented core functionality for uploading, annotating, and viewing data, linking data to researcher profiles, and mechanisms to package this information
and transfer it to institutional repositories or archival cloud storage. The curation pipeline to institutional repositories supports both long-term preservation and search and
discovery workflows. The SEAD prototype is currently being tested by ingesting, annotating, and preserving datasets from the National Center for Earth Surface Dynamics (1.6
terabytes of data containing over 450,000 files) which involves transfer of data and metadata between SEAD ACR, VIVO and VA components.

SEAD Prototype: Data Curation and Preservation for Sustainability Science

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a SEAD Prototype: Data Curation and Preservation for Sustainability Science

Similar a SEAD Prototype: Data Curation and Preservation for Sustainability Science (20)

Más de SEAD

Más de SEAD (16)

Último

Último (20)

SEAD Prototype: Data Curation and Preservation for Sustainability Science