SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Supported by the NIH grant 1U24 AI117966-01 to UCSD
PI , Co-Investigators at:
The
annotated with schema.org
Susanna-Assunta Sansone,
Alejandra Gonzalez-Beltran, Philippe Rocca-Serra
Oxford e-Research Centre, University of Oxford, UK
bioCADDIE - DATS Workshop, Bethesda, 7 May 2017
The model
What is ?
Like the JATS (Journal Article Tag Suite) is used by PubMed to index literature,
a DATS (DatA Tag Suite) is needed for a scalable way to index data sources in
the DataMed prototype
Where do I find the documentation?
Like the JATS (Journal Article Tag Suite) is used by PubMed to index literature,
a DATS (DatA Tag Suite) is needed for a scalable way to index data sources in
the DataMed prototype
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
May17
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development Continued evaluation & consolidation
May17
Evaluation & iterative refinement
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development
• Use cases collection= George, ExcTeam
• Method development (SOP); competency
questions; metadata mapping and
definition= Susanna, Alejandra, Philippe
SOP and
metadata
strawman
<DATS>
name
May17
Metadata
specification V1.0
with JSON
schema
Use cases
workshop
WG3 formed;
telecons start;
dissemination via
WG7 formed;
telecons start
Evaluation & iterative refinement Continued evaluation & consolidation
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development
• Use cases collection= George, ExcTeam
• Method development (SOP); competency
questions; metadata mapping and
definition= Susanna, Alejandra, Philippe
SOP and
metadata
strawman
<DATS>
name
DATS
v1.1
May17
DATS v2.0
(with access
metadata,
WG7)
Metadata
specification V1.0
with JSON
schema
Use cases
workshop
1st DATS
workshop
WG3 formed;
telecons start;
dissemination via
WG7 formed;
telecons start
Evaluation & iterative refinement
• DATS specification, serializations,
refinement= (Susanna), Alejandra,
Philippe and CoreDevTeam, also
based on community feedback
Continued evaluation & consolidation
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development
• Use cases collection= George, ExcTeam
• Method development (SOP); competency
questions; metadata mapping and
definition= Susanna, Alejandra, Philippe
SOP and
metadata
strawman
<DATS>
name
DATS
v1.1
May17
DATS v2.0
(with access
metadata,
WG7)
DATS v2.1
(schema.org
JSON-LD)
DATS
v2.2
Metadata
specification V1.0
with JSON
schema
Use cases
workshop
1st DATS
workshop
WG3 formed;
telecons start;
dissemination via
2nd DATS
workshop
WG7 formed;
telecons start
WG12 formed;
telecons start
Evaluation & iterative refinement
• DATS specification, serializations,
refinement= (Susanna), Alejandra,
Philippe and CoreDevTeam, also
based on community feedback
Continued evaluation & consolidation
• Alignment with other community efforts;
documentation and curation guidelines =
Susanna, Alejandra, Philippe, Jared,
ExcTeam and CoreDevTeam
Mar15
Jun15
Dec15
Jun16
Aug15
May16
Sep16
Mar17
Use cases
workshop
bioCADDIE team’s work on DATS
Our community engagement: input, feedback and links
Phase 1 Phase 2 Phase 3
Design and development
• Use cases collection= George, ExcTeam
• Method development (SOP); competency
questions; metadata mapping and
definition= Susanna, Alejandra, Philippe
Evaluation & iterative refinement
• DATS specification, serializations,
refinement= (Susanna), Alejandra,
Philippe and CoreDevTeam, also
based on community feedback
Continued evaluation & consolidation
• Alignment with other community efforts;
documentation and curation guidelines =
Susanna, Alejandra, Philippe, Jared,
ExcTeam and CoreDevTeam
1st DATS
workshop
SOP and
metadata
strawman
WG3 formed;
telecons start;
dissemination via
<DATS>
name
DATS
v1.1
May17
2nd DATS
workshop
DATS v2.0
(with access
metadata,
WG7)
WG7 formed;
telecons start
DATS v2.1
(schema.org
JSON-LD)
DATS
v2.2
primarily metadata modelers
primarily implementers
Metadata
specification V1.0
with JSON
schema
WG12 formed;
telecons start
❖ Enabling discoverability: find and access datasets
❖ Focusing on surfacing key metadata descriptors, such as
✧ information and relations between authors, datasets, publication,
funding sources, nature of biological signal and perturbation etc.
✧ Not the perfect model to represent the experimental details
✧ the level of details and metadata needed to ensure interoperability
and reusability are left to the indexed databases
❖ Better than just having keywords
✧ we have aimed to have maximum coverage of use cases with
minimal number of data elements and relations
What is supposed to do and be?
Metadata elements identified by combining the two complementary approaches
USE CASES: top-down approach SCHEMAS: bottom-up approach
The development process in a nutshell
(v1.0, v1.1, v2.0, v2.1, v2.2)
Extracting requirements from use cases
❖ Selected competency questions
✧ representative set collected from: use cases workshop, white paper, submitted by
the community and from NIH and Phil Bourne’s ADDS office
✧ key metadata elements processed: abstracted, color-coded and terms binned
binned as Material, Process, Information, Properties; relation identified
top-down approach
bottom-up approach
Standing on the shoulders of giants
❖ schema.org
❖ DataCite
❖ RIF-CS
❖ W3C HCLS dataset descriptions (mapping of many models including DCAT, PROV, VOID, Dublin
Core)
❖ Project Open Metadata (used by HealthData.gov is being added in this new iteration)
❖ ……(full list in the DATS specification)
❖ ISA
❖ BioProject
❖ BioSample
❖ ……(full list in the DATS specification)
❖ MiNIML
❖ PRIDE-ml
❖ MAGE-tab
❖ GA4GH metadata schema
❖ SRA xml
❖ CDISC SDM / element of BRIDGE model
❖ ……(full list in the DATS specification)
Convergence
of elements
extracted from
competency
questions
and existing
(generic and
biomedical)
data models
(incl. DataCite,
DCAT, schema.org,
HCLS dataset, RIF-
CS, ISA-Tab, SRA-
xml etc.)
model for scalable indexing
Adoption
of elements extracted
from
and from
core entities
extended entities
❖ The descriptors for each metadata element (Entity), include
✧ Property (describing the Entity), Definition (of each Entity and Property),
Value(s) (allowed for each Property)
Key features of
❖ The descriptors for each metadata element (Entity), include
✧ Property (describing the Entity), Definition (of each Entity and Property),
Value(s) (allowed for each Property)
❖ We have defined a set of core and extended entities
✧ Core elements are generic and applicable to any type of datasets, like the
JATS can describe any type of publication.
✧ Extended elements includes an additional elements, some of which are
specific for life, environmental and biomedical science domains
✧ this set can be further extended as needed
Key features of
❖ The descriptors for each metadata element (Entity), include
✧ Property (describing the Entity), Definition (of each Entity and Property),
Value(s) (allowed for each Property)
❖ We have defined a set of core and extended entities
✧ Core elements are generic and applicable to any type of datasets, like the
JATS can describe any type of publication.
✧ Extended elements includes an additional elements, some of which are
specific for life, environmental and biomedical science domains
✧ this set can be further extended as needed
❖ Entities are not mandatory, in both core and extended set
✧ An entity is used only when applicable to the dataset to be described
✧ In that case, only few of its properties are defined as mandatory
Key features of
❖ Dataset, a core entity catering for any unit of information
✧ archived experimental datasets, which do not change after deposition to the
repository => examples available for dbGAP, GEO, ClinicalTrials.org
✧ datasets in reference knowledge bases, describing dynamic concepts, such
as “genes”, whose definition morphs over time => examples available for
UniProt
❖ Dataset entity is also linked to other digital research objects
✧ Software and Data Standard, which are also part of the NIH Commons, but
the focus on other discovery indexes and therefore are not described in
detail in this model
General design of the
core and extended elements
Of the 20 core elements none is
mandatory
Only few properties of the 20
core elements are mandatory
❖ What is the dataset about?
✧ Material
❖ How was the dataset produced ? Which information does it hold?
✧ Dataset / Data Type with its Information, Method, Platform,
Instrument
❖ Where can a dataset be found?
✧ Dataset, Distribution, Access objects (links to License)
❖ When was the datasets produced, released etc.?
✧ Dates to specify the nature of an event {create, modify, start, end...}
and its timestamp
❖ Who did the work, funded the research, hosts the resources etc.?
✧ Person, Organization and their roles, Grant
Core elements provide the basic info
Interlinking to other indexes
also follows the W3C Data on
the Web Best Practices
DATS follows these, which
also recommend
DatasetDistribution
https://www.w3.org/TR/dwbp
Serializations and use of schema.org
❖ DATS model in JSON schema, serialized as:
✧ JSON* format, and
✧ JSON-LD** with vocabulary from schema.org
✧ serializations in other formats can also be done, as / if needed
❖ Benefits for DataMed and databases index by DataMed
✧ Increased visibility (by both popular search engines), accessibility
(via common query interfaces) and possibly improved ranking
❖ Extending schema.org
✧ Submitted to their tracker missing DATS core elements
✧ Coordinating via the bioschemas.org initiative (ELIXIR is also part of)
the extension of schema.org for life science
* JavaScript Object Notation
** JavaScript Object Notation for Linked Data

Más contenido relacionado

La actualidad más candente

Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
 
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FuturePoster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
 
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceRDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceASIS&T
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collectionSherry Lake
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12ASIS&T
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentOCLC Research
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Clare Dean
 

La actualidad más candente (20)

Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FuturePoster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
 
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceRDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
 
Putnam Data Quality and the IR
Putnam Data Quality and the IRPutnam Data Quality and the IR
Putnam Data Quality and the IR
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
 
Scholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online EnvironmentScholarly Information Practices In The Online Environment
Scholarly Information Practices In The Online Environment
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goal
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 

Similar a Introduction to DATS v2.2 - NIH May 2017

NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelSusanna-Assunta Sansone
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataStuart Chalk
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset cataloguese-ROSA
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsSusanna-Assunta Sansone
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceSusanna-Assunta Sansone
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogsValeria Pesce
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 

Similar a Introduction to DATS v2.2 - NIH May 2017 (20)

NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All HandsBioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
BioCADDIE: Descriptive Metadata for Datasets WG3 - ELIXIR All Hands
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 

Más de Susanna-Assunta Sansone

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRSusanna-Assunta Sansone
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookSusanna-Assunta Sansone
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessSusanna-Assunta Sansone
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features Susanna-Assunta Sansone
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseSusanna-Assunta Sansone
 

Más de Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Último

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Último (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Introduction to DATS v2.2 - NIH May 2017

  • 1. Supported by the NIH grant 1U24 AI117966-01 to UCSD PI , Co-Investigators at: The annotated with schema.org Susanna-Assunta Sansone, Alejandra Gonzalez-Beltran, Philippe Rocca-Serra Oxford e-Research Centre, University of Oxford, UK bioCADDIE - DATS Workshop, Bethesda, 7 May 2017
  • 2.
  • 4. What is ? Like the JATS (Journal Article Tag Suite) is used by PubMed to index literature, a DATS (DatA Tag Suite) is needed for a scalable way to index data sources in the DataMed prototype
  • 5. Where do I find the documentation? Like the JATS (Journal Article Tag Suite) is used by PubMed to index literature, a DATS (DatA Tag Suite) is needed for a scalable way to index data sources in the DataMed prototype
  • 6. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team’s work on DATS Our community engagement: input, feedback and links May17
  • 7. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team’s work on DATS Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development Continued evaluation & consolidation May17 Evaluation & iterative refinement
  • 8. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team’s work on DATS Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development • Use cases collection= George, ExcTeam • Method development (SOP); competency questions; metadata mapping and definition= Susanna, Alejandra, Philippe SOP and metadata strawman <DATS> name May17 Metadata specification V1.0 with JSON schema Use cases workshop WG3 formed; telecons start; dissemination via WG7 formed; telecons start Evaluation & iterative refinement Continued evaluation & consolidation
  • 9. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team’s work on DATS Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development • Use cases collection= George, ExcTeam • Method development (SOP); competency questions; metadata mapping and definition= Susanna, Alejandra, Philippe SOP and metadata strawman <DATS> name DATS v1.1 May17 DATS v2.0 (with access metadata, WG7) Metadata specification V1.0 with JSON schema Use cases workshop 1st DATS workshop WG3 formed; telecons start; dissemination via WG7 formed; telecons start Evaluation & iterative refinement • DATS specification, serializations, refinement= (Susanna), Alejandra, Philippe and CoreDevTeam, also based on community feedback Continued evaluation & consolidation
  • 10. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 bioCADDIE team’s work on DATS Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development • Use cases collection= George, ExcTeam • Method development (SOP); competency questions; metadata mapping and definition= Susanna, Alejandra, Philippe SOP and metadata strawman <DATS> name DATS v1.1 May17 DATS v2.0 (with access metadata, WG7) DATS v2.1 (schema.org JSON-LD) DATS v2.2 Metadata specification V1.0 with JSON schema Use cases workshop 1st DATS workshop WG3 formed; telecons start; dissemination via 2nd DATS workshop WG7 formed; telecons start WG12 formed; telecons start Evaluation & iterative refinement • DATS specification, serializations, refinement= (Susanna), Alejandra, Philippe and CoreDevTeam, also based on community feedback Continued evaluation & consolidation • Alignment with other community efforts; documentation and curation guidelines = Susanna, Alejandra, Philippe, Jared, ExcTeam and CoreDevTeam
  • 11. Mar15 Jun15 Dec15 Jun16 Aug15 May16 Sep16 Mar17 Use cases workshop bioCADDIE team’s work on DATS Our community engagement: input, feedback and links Phase 1 Phase 2 Phase 3 Design and development • Use cases collection= George, ExcTeam • Method development (SOP); competency questions; metadata mapping and definition= Susanna, Alejandra, Philippe Evaluation & iterative refinement • DATS specification, serializations, refinement= (Susanna), Alejandra, Philippe and CoreDevTeam, also based on community feedback Continued evaluation & consolidation • Alignment with other community efforts; documentation and curation guidelines = Susanna, Alejandra, Philippe, Jared, ExcTeam and CoreDevTeam 1st DATS workshop SOP and metadata strawman WG3 formed; telecons start; dissemination via <DATS> name DATS v1.1 May17 2nd DATS workshop DATS v2.0 (with access metadata, WG7) WG7 formed; telecons start DATS v2.1 (schema.org JSON-LD) DATS v2.2 primarily metadata modelers primarily implementers Metadata specification V1.0 with JSON schema WG12 formed; telecons start
  • 12. ❖ Enabling discoverability: find and access datasets ❖ Focusing on surfacing key metadata descriptors, such as ✧ information and relations between authors, datasets, publication, funding sources, nature of biological signal and perturbation etc. ✧ Not the perfect model to represent the experimental details ✧ the level of details and metadata needed to ensure interoperability and reusability are left to the indexed databases ❖ Better than just having keywords ✧ we have aimed to have maximum coverage of use cases with minimal number of data elements and relations What is supposed to do and be?
  • 13. Metadata elements identified by combining the two complementary approaches USE CASES: top-down approach SCHEMAS: bottom-up approach The development process in a nutshell (v1.0, v1.1, v2.0, v2.1, v2.2)
  • 14. Extracting requirements from use cases ❖ Selected competency questions ✧ representative set collected from: use cases workshop, white paper, submitted by the community and from NIH and Phil Bourne’s ADDS office ✧ key metadata elements processed: abstracted, color-coded and terms binned binned as Material, Process, Information, Properties; relation identified top-down approach
  • 15. bottom-up approach Standing on the shoulders of giants ❖ schema.org ❖ DataCite ❖ RIF-CS ❖ W3C HCLS dataset descriptions (mapping of many models including DCAT, PROV, VOID, Dublin Core) ❖ Project Open Metadata (used by HealthData.gov is being added in this new iteration) ❖ ……(full list in the DATS specification) ❖ ISA ❖ BioProject ❖ BioSample ❖ ……(full list in the DATS specification) ❖ MiNIML ❖ PRIDE-ml ❖ MAGE-tab ❖ GA4GH metadata schema ❖ SRA xml ❖ CDISC SDM / element of BRIDGE model ❖ ……(full list in the DATS specification)
  • 16. Convergence of elements extracted from competency questions and existing (generic and biomedical) data models (incl. DataCite, DCAT, schema.org, HCLS dataset, RIF- CS, ISA-Tab, SRA- xml etc.) model for scalable indexing Adoption of elements extracted from and from core entities extended entities
  • 17. ❖ The descriptors for each metadata element (Entity), include ✧ Property (describing the Entity), Definition (of each Entity and Property), Value(s) (allowed for each Property) Key features of
  • 18. ❖ The descriptors for each metadata element (Entity), include ✧ Property (describing the Entity), Definition (of each Entity and Property), Value(s) (allowed for each Property) ❖ We have defined a set of core and extended entities ✧ Core elements are generic and applicable to any type of datasets, like the JATS can describe any type of publication. ✧ Extended elements includes an additional elements, some of which are specific for life, environmental and biomedical science domains ✧ this set can be further extended as needed Key features of
  • 19. ❖ The descriptors for each metadata element (Entity), include ✧ Property (describing the Entity), Definition (of each Entity and Property), Value(s) (allowed for each Property) ❖ We have defined a set of core and extended entities ✧ Core elements are generic and applicable to any type of datasets, like the JATS can describe any type of publication. ✧ Extended elements includes an additional elements, some of which are specific for life, environmental and biomedical science domains ✧ this set can be further extended as needed ❖ Entities are not mandatory, in both core and extended set ✧ An entity is used only when applicable to the dataset to be described ✧ In that case, only few of its properties are defined as mandatory Key features of
  • 20. ❖ Dataset, a core entity catering for any unit of information ✧ archived experimental datasets, which do not change after deposition to the repository => examples available for dbGAP, GEO, ClinicalTrials.org ✧ datasets in reference knowledge bases, describing dynamic concepts, such as “genes”, whose definition morphs over time => examples available for UniProt ❖ Dataset entity is also linked to other digital research objects ✧ Software and Data Standard, which are also part of the NIH Commons, but the focus on other discovery indexes and therefore are not described in detail in this model General design of the
  • 21. core and extended elements
  • 22. Of the 20 core elements none is mandatory
  • 23. Only few properties of the 20 core elements are mandatory
  • 24. ❖ What is the dataset about? ✧ Material ❖ How was the dataset produced ? Which information does it hold? ✧ Dataset / Data Type with its Information, Method, Platform, Instrument ❖ Where can a dataset be found? ✧ Dataset, Distribution, Access objects (links to License) ❖ When was the datasets produced, released etc.? ✧ Dates to specify the nature of an event {create, modify, start, end...} and its timestamp ❖ Who did the work, funded the research, hosts the resources etc.? ✧ Person, Organization and their roles, Grant Core elements provide the basic info
  • 26. also follows the W3C Data on the Web Best Practices DATS follows these, which also recommend DatasetDistribution https://www.w3.org/TR/dwbp
  • 27. Serializations and use of schema.org ❖ DATS model in JSON schema, serialized as: ✧ JSON* format, and ✧ JSON-LD** with vocabulary from schema.org ✧ serializations in other formats can also be done, as / if needed ❖ Benefits for DataMed and databases index by DataMed ✧ Increased visibility (by both popular search engines), accessibility (via common query interfaces) and possibly improved ranking ❖ Extending schema.org ✧ Submitted to their tracker missing DATS core elements ✧ Coordinating via the bioschemas.org initiative (ELIXIR is also part of) the extension of schema.org for life science * JavaScript Object Notation ** JavaScript Object Notation for Linked Data