“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
Influencing policy (training slides from Fast Track Impact)
10-15-13 “Metadata and Repository Services for Research Data Curation” Presentation Slides
1. Hot Topics: The DuraSpace
Community Webinar Series
Series Six:
“Research Data in Repositories”
Curated by David Minor
October 15, 2014
Hot Topics: DuraSpace Community Webinar Series
2. Webinar 2: Metadata & Repository
Services for Research Data Curation
Presented by:
Declan Fleming, Chief Technology Strategist, UC San Diego
Library
Matt Critchlow, Manager of Development and Web Services,
UC San Diego Library
Arwen Hutt, Metadata Librarian, UC San Diego Library
October 15, 2013
Hot Topics: DuraSpace Community Webinar Series
3. Hot Topics Web Seminar Series: Research
Data in Repositories
The UC San Diego Experience
Second Webinar: Metadata and Repository Services
for Research Data Curation
4. General Series Intro
•
First webinar: Intro and Framing: UC San Diego
decisions and planning
•
Second Webinar: Deep dive into technology and
metadata
•
Third Webinar: The perspective from researchers,
next steps
5. Your esteemed presenters …
First webinar:
David Minor – Program Director, Research Data Curation
Declan Fleming - Chief Technology Strategist
Second webinar:
Declan Fleming - Chief Technology Strategist
Arwen Hutt - Metadata Librarian
Matt Critchlow - Manager of Development and Web Services
Third webinar:
Dick Norris – Professor, Scripps Institution of Oceanography
Rick Wagner – Data Scientist at San Diego Supercomputer Center
6. Today we will …
• Discuss real-world researcher interaction
• Document how metadata and files combine to make
digital objects
• Describe the DAMS data model and how it supports
complex research objects
• Detail the technology driving the DAMS
• Point to the future
7. Working with Researchers: Pilots
• The Brain Observatory
• NSF OpenTopography Facility
• Levantine Archaeology Laboratory
• Scripps Institute of Oceanography
Geological Collections
• The Laboratory for Computational
Astrophysics
8. Working with Researchers: Process
•
•
•
•
Introductory meeting
Metadata point person
Ongoing discussions
One on one work
Iterative, collaborative, customized, experimental…pilot!
10. Working with Researchers: What is an object?
• What are the boundaries on a discreet set or
subset of data? What is required to make the
data intelligible, usable and reusable?
• What needs to be preserved?
• What do they want to display and/or share?
• What do they want to be able to refer to or
cite?
12. Working with Researchers: Take Aways
They are the subject experts
There are a lot of broad level similarities
But no such thing as one size fits all
13. We want a new data model…
• One that is flexible and accommodates disparate
metadata from a variety of sources
• While promoting consistency within the data store
• One that supports relationships within and between
objects
• One that is more community engaged, both sharing
vocabularies and technology, and utilizing others
shared vocabularies and technologies
• One that supports improved management of objects
and metadata
14. DAMS Data Model Development Process
• Five people, in a room, 16 hours a week for 4
months
• Worked through existing data, use case scenarios,
known data requirements, investigated known
ontologies, etc.
• Lots and lots and lots of discussion
• Utilizes MADS (Metadata Authority Description
Schema)
• Results = a data dictionary and an OWL ontology
• Living document
15. DAMS Data Model: Flexibility
• The data model provides enough flexibility
that we can accommodate a wide variety of
data within the schema
– Vocabularies
– Use of “types” or “display labels” to distinguish
specific subtypes of a data field
– Flexible structures and relationships
– Extensible
16. DAMS Data Model: Consistency
• But enough consistency that searching and
display rules do not need to be customized for
each individual collection of material
– Rules can be applied at the level of the broader
concept
• As well as establishing the organizational
structure necessary for maintaining
consistency over time
– Evaluation and approval of modifications
17. DAMS Data Model: Relationships
• It allows us to create a number
of different relationships
– Collections and sub-collections
– Collections and objects
– Objects and components
(complex hierarchical objects)
– Other related resources internal
or external to the DAMS
complex object
example
18. DAMS Data Model: Vocabularies
• Allow management of local & community
vocabularies
– Vocabulary terms as entities
– Ability to encode authority data (vocabulary
source, value uri, etc.) as well as sameAs
relationships between the same term expressed in
multiple sources
– Ability to update authority records as community
vocabularies become more formalized.
19. DAMS Data Model: Management
• One that supports improved management of
objects and metadata
– Authority management of vocabulary terms
– Event metadata!
21. Preservation: Chronopolis
Current DAMS Process
1. Create Bagit bags for all objects
2. Host via HTTP(S)
3. Bags are retrieved and ingested into Chronopolis
DAMS4 Process
1. Create Bagit bags for Δ objects using Event metadata
2. Host via HTTP(S) or enqueue on messaging queue for
ingestion
23. Storage: EMC Isilon 72NL
Storage For Library Collections
1 cluster of 5 Nodes
1 Node = 36 x 2TB Drives
Total Current Usable Storage of 320TB
OneFS 7.0.2.1
24. Storage: OpenStack
Storage For Research Data Collections
Testing:
• Performance versus Local Storage
• Large Files (up to 1TB)
– Segmenting files > 5GB
– Lexical order bug fix: 1,10,2 -> 0001,0002,…0010
• Rackspace CloudFiles API VS OpenStack REST API
Testing Notes:
https://libraries.ucsd.edu/blogs/dams/openstack-testing-notes/
43. Next Steps
Beta Release: Late October
Production Release: January
Future:
• Sufia/Curate Integration for administrative functionality
• Additional Linked Data Integration and Crosswalks
– Schema.org, OpenURL, Dublin Core, ResourceSync
• Fedora4
44. More Information
DAMS Overview
https://github.com/ucsdlib/dams/wiki/DAMS-Manual
DAMS Hydra Head
https://github.com/ucsdlib/damspas
DAMS Ontology
https://github.com/ucsdlib/dams/tree/master/ontology
DAMS REST API
https://github.com/ucsdlib/dams/wiki/REST-API
Hot Topics Series 3: Get a Head on the Repository with Hydra
http://duraspace.org/hot-topics
Hydra Technical Overview
https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
OneFS Technical Overview
http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf
Isilon Overview
http://www.emc.com/collateral/software/data-sheet/h10541-ds-isilon-platform.pdf
45. Coming Up Next
Final Webinar (October 31)
The researcher perspective from two of our pilot
participants
Dick Norris – Professor, Scripps Institution of
Oceanography
Rick Wagner – Data Scientist at San Diego
Supercomputer Center