CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD
Sustainable Environment –
Actionable Data
CNI Fall Members Meeting
Margaret Hedstrom Robert H. McDonald Arlington, VA
SEAD PI/Project Director SEAD Sr. Personnel 12/12/2011
Professor & Associate Dean Assoc. Dean/Associate Director
UM School of Information Indiana University

NSF DataNet Program
• new types of organizations that integrate library & archival
sciences, cyberinfrastructure, computer & information sciences, &
domain science expertise
• provide reliable digital preservation, access, integration, and
analysis capabilities for science and/or engineering data over a
decades-long timeline;
• continuously anticipate and adapt to changes in technologies and in
user needs and expectations;
• engage in research to drive the leading edge forward
• serve as component elements of an interoperable data preservation
and access network

http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503141

• SEAD’s Unique
Partners
Contributions
– Address domain-driven
needs & requirements
– Serve scientists and
researchers in the “long tail”
– Integrate existing
technologies, tools &
services (rather than build
new from scratch)

Sustainability
Science

Science

Cooperation Technology

Policy Economics

Poverty &
Justice

4

Data challenges
• Heterogeneity of
all kinds
• Multiple scales
• Multidisciplinary
• Many small
datasets

The long tail of scientific research

• Small and derived data sets
• Heterogeneous data
• Multiple sources of data
• Short-lived data with long-term
value
• Value of data grows when combined
& integrated

SEAD’s Goals
• Provide data services that address the needs of
researchers working toward sustainability
• Integrate these services into an generalizable “Active and
Social Curation” infrastructure suited to the social
structure and economics of long-tail research
communities
• Develop capabilities to package and migrate the most
valuable datasets to a federated repository
infrastructure for long-term preservation
• Education, outreach, & training to disseminate SEAD‟s
contributions to other projects & communities

SEAD’s Strategy

• Leverage social media for discovery of
data, interest, and expertise
• Move data curation upstream in the data life
cycle
• Involve domain scientists in setting priorities
for evolution of data and services
• Take advantage of existing infrastructures
(Institutional Repositories, ICPSR) for long-
term preservation

Active and Social Curation
• Engage researchers during projects, not at the
end
• Automatically capture metadata as defined by
the data producers
• Provide facilities for
commentary, recommendations, and mark-up
of data
• Further reduce costs by re-engineering
curation processes to leverage this rich
metadata and volunteered effort

Active Curation Model
Active Curation Social Media

Workflows
Data Review
Rating
Commenting

Metadata

SEAD Status

Phase 1 Phase 2
Months 1-18 Years 3-5
Grow SEAD
Develop
users, data, an
Prototype d functionality

SEAD start date: 10/1/2011

In other words, SEAD is not ready to accept your data!

SEAD Personnel
• Margaret Hedstrom, PI (Michigan)
• Praveen Kumar, co-PI (Illinois)
• Jim Myers, co-PI (RPI)
• Beth Plale, co-PI (Indiana)
• Ann Zimmerman, co-PI/Project Manager
(Michigan)
• George Alter (ICPSR)
• Bryan Beecher (ICPSR)
• Katy Börner (Indiana)
• Robert McDonald (Indiana)
• Jude Yew, Post-doc (Michigan)
• + many more to come

SEAD TEAM
University of Michigan: Margaret Hedstrom (UM PI), Ann
Zimmerman (Co-PI and Project Manager), George Alter, Bryan
Beecher, Charles Severance, Karen Woollams, Jude Yew.
Indiana University: Beth Plale (IU PI), Katy Borner, Robert H.
McDonald, Kavitha Chandrasekar, Robert Ping, Stacy
Kowalczyk, Robert Light.
University of Illinois: Praveen Kumar (UIUC PI), Rob Kooper, Luigi
Marini, Terry McLaren.
Rensselaer Polytechnic Institute: Jim Myers (RPI PI), Ram Prasanna
Govind Krishnan, Lindsay Todd, Adam Wilson.

SEAD Cyberinfrastructure
• An international resource
for sustainability science
• Novel technical and
business approaches to
supporting the long-tail
of research data
• Lifecycle support:
actionable data services
integrated with curation
and preservation
infrastructure

Key Challenges for SEAD
Cyberinfrastructure
• Managed Data storage and services are expensive!
• Begging for metadata doesn‟t work!
• Curation and preservation are time consuming!
• The long-tail is not standardized!
• Data collections are always missing something
valuable!
• Data models evolve!
• Cyberinfrastructure is obsolete by the time you build
it!
• Building Community as you leverge
cyberinfrastructure

SEAD: Social Networking
• Co-authorship
• Co-funding
• Micro-citation
• Shared project repositories
• Shared tags
• Threaded discussions
• Quoting, forwarding, …

Linked Data and Repositories
• Tag and annotate data
• Overlay it with reference data
• Organize it in domain terminology
• Link it to
people, papers, projects, conversations…

Using Science of Science to Link
Repositories

KEY SEAD Questions
• What could SEAD capture when?
• How can SEAD provide direct value
to data producers, users, and
curators?
• How can robust web-services and
social computing lower barriers and
reduce/realign costs?

SEAD: Active Content Repository
• With the „Big Picture‟ graph in-hand, curators
can:
▫ Focus on what to curate and when,
▫ Automate parts of the process
▫ Use existing/emerging technologies for packaging
and preserving datasets
▫ Better manage federated repositories

SEAD: Leveraging Existing Resources
• Cyberinfrastructure
▫ IU Data Capacitor/HPC Capabilities
▫ UIUC/NCSA HPC Capabilities
▫ Rensselaer CCNI Capabilities
• Repositories
▫ UM Deep Blue
▫ IU ScholarWorks
▫ ICPSR Repository
▫ UIUC IDEALS

SEAD LayerCake View
• Services over an
Network of Data
Producers

active content layer
that is backed
by/harvested into a Web User Interface

federated archive Active Content Repository

infrastructure based
Services Provided
Content Curation Archival Other
data services
Mining Decisions
on institutional generation

Virtual Archives
resources Institutional Repositories

Data IU RPI UIUC UM ICPSR
Conservancy

User Network

CI Technical Approach
Active and Social Curation OAIS Repository Federation
Curation Boundary
Automated
Curation
Data Metadata Workflow/Rule
Acquisition, Management Engine
Analysis and DDI3.
Operates on
Simulation METS, PREMIS, MODS
Metadata, Content Scholarly
Objects and Trigger
, DC, SensorML, OGC,
… Events Communication

Ingest scripts:
Ingest, AIPs
Appraisal fixity, integrity, a
Compound Objects - OAI-ORE
VIVO/ and CI Technical Approach
uthentication, tr
Linked Active Selection ansformation
Data Content Digital Repository Federation
(OAIS compliant)
Repositor Preservation
Actions
y
Dissemination Packages

Wide-Area File System

Search, Brows
e,
Migration
and Access Mechanisms
Annotation, V Use, Reuse, R
and E-Scholarship
isualization epurposing Emulation
Contributor User Services
Tools Tools Tools

Toward PetaScale Data

• Internet2 upgrade:
▫ Total bandwidth from 100 Gbps to 8.8 Tbps
▫ Moving a petabyte of data will go from from 10 days to 25 hrs

SEAD 18 Month Prototype Targets for
Cyberinfrastructure
• Active and Social Content Curation
▫ Pilot Active Content Repository, VIVO deployments
▫ Exemplar services for Data Ingest, Discovery, Re-
use, Curation
• CI for Long-term Access
▫ Data model, protocol design/development
▫ Pilot Federated Repository infrastructure

SEAD CI QuickView
• SEAD will quickly build a repository and data services infrastructure
for sustainability research that can be responsively adapted based on
community feedback – Community Agile Development
• SEAD will leverage existing tools and emerging practices to
dramatically enhance the interactions of researchers and data
librarians – Active Curation
• SEAD‟s focus on the long-tail will force an emphasis on ease-of-use
and low costs that is critical for long-term sustainability – Leverage
Existing Institution Resources for Long-term Access
• SEAD will leverage experiences in the sustainability research
community to provide guidance for other long-tail communities
making the transition to an interdisciplinary, systems-oriented
approach to research – Sustainability and Resource Growth
Partnership and Collaboration

Acknowledgments
SEAD is funded by the National Science
Foundation under cooperative agreement
#OCI0940824

• For more on SEAD go to:
• http://sead-data.net

• Follow us on Twitter
@SEADdatanet

http://sead-data.net

CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Similar to CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011) (20)

More from SEAD

More from SEAD (15)

Recently uploaded

Recently uploaded (20)

CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Editor's Notes