Powerful Google developer tools for immediate impact! (2023-24 C)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
1. GETTING THE MOST OUT OF DATANET: A
PANEL DISCUSSION OF THE NSF FUNDED
DATANET PARTNERSHIPS
Robert H. McDonald – SEAD – Indiana University
Catherine Fitch – TerraPop – Minnesota Population Center
Richard Marciano – Datanet Federation Consortium – University of
North Carolina
Sayeed Choudhury – Data Conservancy – Johns Hopkins University
William Michener – DataOne – University of New Mexico
NSF DATANET PROGRAM-
OFFICE OF CYBERINFRASTRUCTURE
3. NSF DATANET PROGRAM
• DataNet efforts effectively balance:
• Production infrastructure for operational data
curation services
• Research to create next generation data
cyberininfrastructure
• DataNet awards are partnerships:
• Responsive to user communities to define their
meaningful and useful scope
• Form a coordinated network to provide national,
interdisciplinary data models and infrastructure
5. SEAD TEAM
University of Michigan: Margaret Hedstrom (UM PI), Ann
Zimmerman (Co-PI and Project Manager), George Alter, Bryan
Beecher, Charles Severance, Karen Woollams, Jude Yew. Indiana
University: Beth Plale (IU PI), Katy Borner, Robert H.
McDonald, Kavitha Chandrasekar, Robert Ping, Stacy
Kowalczyk, Robert Light. University of Illinois: Praveen Kumar
(UIUC PI), Rob Kooper, Luigi Marini, Terry McLaren. Rensselaer
Polytechnic Institute: Jim Myers (RPI PI), Ram Prasanna Govind
Krishnan, Lindsay Todd, Adam Wilson.
#OCI0940824
6. SEAD PARTNERSHIP
Beth Plale
Margaret Hedstrom, PI Katy Börner
Ann Zimmerman Robert H. McDonald
Praveen James Myers
Kumar
George Alter & Bryan Beecher
10. SEAD’S GOALS
Provide data services that address the pressing needs of
researchers working toward sustainability
Integrate these services into an generalizable “Active and
Social Curation” infrastructure well-suited to the social
structure and economics of long-tail research
communities
Develop capabilities to package and migrate datasets to
a federated repository infrastructure for long-term
preservation
Education, outreach, & training, to maximize value and
disseminate SEAD’s contributions to other projects and
communities
11. SEAD’S STRATEGY
Move data curation upstream in the
data life cycle
• Involve domain scientists in setting
priorities for evolution of data and
services
• Use a wide variety of mechanisms to
remain resilient in a dynamic research
and technology environment
12. ACTIVE AND SOCIAL
CURATION
• Engage researchers during projects, not at the
end
• Use information that is automatically captured
or generated through tools to reduce the costs
of metadata collection and to capture its value
in actionable form
• Further reduce costs by re-engineering curation
processes to leverage this rich metadata and
volunteered effort
13. ACTIVE CURATION MODEL
Active Curation Social Media
Review
Workflows Rating
Data Commenting
Metadata
14. SEAD LAYERCAKE VIEW
Network of Data
Producers
Services over an
active content layer
Web User Interface
that is backed
Active Content Repository
by/harvested into a
Services Provided
federated archive Content Curation Archival
data
Other
Mining Decisions services
infrastructure based generation
on institutional Virtual Archives
resources Institutional Repositories
Data IU RPI UIUC UM ICPSR
Conservancy
User Network
Currently, these data are difficult to find, obtain, and use because people from disciplines across the natural and social sciences collect, describe and store their data in many different ways. These data could have significant value if it was possible to connect data collectors with potential users of data and if it was easy for individuals to search for, aggregate, and maintain valuable data for the long term.
To expand a bit on the previous slide … We characterize the needs of sustainability scientists as a “long tail problem” where scientists need diverse data from multiple different sources that overlap in geographic coverage and time, but also have gaps in location, time, resolution, and types of measurements. The data are heterogeneous and vary in format, metadata, size, and quality. One of the biggest challenges we face is supporting diverse needs for heterogeneous data.Our strategies for coping with the diversity of data effectively are based several underlying principles for long tail phenomena: While the aggregate demand for SEAD’s service is large and growing, demand for any particular collection of data is small and focused. Therefore, the investments SEAD makes in any particular set of data have to be quite low. Deciding which data merits investment in curation should be driven by its value to the community and its potential for productive use. Building on a strong foundation of existing infrastructure, collaborative relationships, and expertise, the SEAD team will be able to tackle challenging problems in the long tail with innovative, forwarding-looking and outward facing approaches.
Mention something about the 18-month prototype and that the tasks during this time-frame focus on the first 3 bullets.
Additional text for bullet 1: Provide tools and services that provide benefits to data providers during active projects Provide tools and services that allow data users to collaboratively curate data
We will build usable and useful tools that scientists can take advantage of as they collect, generate and organize data in their active projects. This Active Curation approach will be designed with a great deal of user input to make sure that the tools are light-weight, easy to learn, easy to use, and more effective than the painstaking, hand-crafted approach that many sustainability scientists use today. The Active Curation approach will make data management easier for data producers and lower the curation costs to SEAD.Another part of our strategy is to deploy a variety of social networking and social-media inspired tools to engage the community of data producers and users. These include tools for annotation, rating and commentary on data sets, visualizations of publication and citation networks that map the invisible college of sustainability science researchers, and social networking tools that help build network effects. We have designed our program with multiple mechanisms to encourage participation in SEAD and adoption of its approach. These include domain engagement workshops to surface needs and requirements, ensure usability of tools, and enlisting key leaders in sustainability as early adopters and promoters of SEAD. These strategies along with support for centralized curation services, education, outreach and training will create a model for sustainable access and preservation of heterogeneous data for sustainability science and other small science disciplines in the long tail.
Robert, I wanted to illustrate the long-term repository piece, but couldn’t find anything very good from previous slides. I put this in for now, but you may have something better.