2. A Data Discovery Index prototype that:
• Helps users find and access shared data
• Interoperates in the NIH Commons
(biomedical digital assets)
3. aggregator'
A'
B C
A
aggregator'
Data'Discovery'Index'
data'
Organizing framework and
portal for data
Dashed lines:
mapping of metadata standards,
links to aggregators, data
Aggregators:
repositories or various indices
Data:
digital research objects
Pilot projects*Core
development team
* There is work for everyone (and more)
Designed as an element of the ecosystem
4. v Define a metadata specification that support
intended capability of the DataMed prototype
v Synergies with many groups, including:
² BD2K Center for Expanded Data Annotation
and Retrieval (CEDAR)
² BD2K cross-centers Metadata WG
² ELIXIR EXCELERATE WP5 Interoperability
6. Created using 2 complementary approaches
top-down
analyzing use cases
bottom-up
mapping existing standards/schemas
The model and serializations
7. Bottom-up approach: mapped schemas
v schema.org
v DataCite
v RIF-CS
v W3C HCLS dataset descriptions (mapping of many models including DCAT, PROV, VOID, Dublin Core)
v Project Open Metadata (used by HealthData.gov )
v ISA
v BioProject
v BioSample
v MiNIML
v PRIDE-ml
v MAGE-tab
v GA4GH metadata schema
v SRA xml
v CDISC SDM / element of BRIDGE model
8. v model to be implemented and tested in DataMed
² we have aimed to have maximum coverage of use cases with
minimal number of data elements
² we do foresee that not all questions can be answered in full
v Repositories workshop on June 23
² hands-on experience mapping to the model
² many databases won’t have all these metadata elements;
conversely, domain-specific databases have more
v Discussion ongoing to create an extension as part of
bioschemas.org
What is next?
9. Prototype, model, mappings, documentation and more at
https://biocaddie.org and https://github.com/biocaddie
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego