SlideShare una empresa de Scribd logo
S100
Martínez-Romero, M., O’Connor, M. J., Shankar, R., Panahiazar,
M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A.
Stanford University
Fast and Accurate Metadata Authoring
Using Ontology-Based Recommendations
What is metadata?
2AMIA 2017 | amia.org
• Data that describe data
• Crucial for:
• Finding experimental datasets online
• Understanding how the experiments were performed
• Reusing the data to perform new analyses
3AMIA 2017 | amia.org
4AMIA 2017 | amia.org
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Poor metadata
5AMIA 2017 | amia.org
An analysis of metadata from NCBI’s BioSample
• 73% of “Boolean” values
• nonsmoker, former-smoker
• 26% of “integer” values
• JM52, UVPgt59.4, pig
• 68% of ontology terms
• presumed normal, wild_type
Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous
Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria.
Poor metadata
[Your presentation on this and next slides]
6AMIA 2017 | amia.org
Metadata authoring is hard
• A computational
platform for metadata
management
• Goal: Overcome the
impediments to creating
high-quality metadata
7AMIA 2017 | amia.org
Metadata template
Metadata template
8AMIA 2017 | amia.org
SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE
Template Designer Metadata Editor
Template authors
(e.g., standards
committees)
Metadata authors
(e.g., scientists)
Metadata Repositorytemplate metadata
LINCS
Public Databases
https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
9AMIA 2017 | amia.org
We developed a metadata recommendation system
SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE
Template Designer Metadata Editor
Template authors
(e.g., standards
committees)
Metadata authors
(e.g., scientists)
Metadata Repositorytemplate metadata
LINCS
Public Databases
https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
Metadata recommendation system
10AMIA 2017 | amia.org
Metadata Editor Metadata Repository
https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
analyze
existing metadata
generate
suggestions
1
23
store
metadata
Metadata Recommender
11AMIA 2017 | amia.org
Filling in a CEDAR template
12AMIA 2017 | amia.org
13AMIA 2017 | amia.org
14AMIA 2017 | amia.org
15AMIA 2017 | amia.org
Evaluation workflow
16AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
17AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
18AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
19AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
• For “disease”, ”sex”,
and “tissue”
• Top 3 suggestions
Testing & Analysis
Compared suggested vs. expected metadata
Measure: Reciprocal Rank (RR). Appropriate to judge
systems that return a ranking of suggestions when there is only
a relevant result
20AMIA 2017 | amia.org
!"#$%&'#()	!(+,	(!!) =
1
1
Position of the expected result
in the ranking of suggestions
How is the RR calculated?
21AMIA 2017 | amia.org
Expected Suggested K
Reciprocal Rank
(RR)
asthma
1) asthma
2) lung cancer
3) respiratory disease
1 1/1
lymphoma
1) myeloma
2) lymphoma
3) acute myeloid leukemia
2 1/2
lung cancer
1) respiratory disease
2) asthma
3) lung cancer
3 1/3
Mean Reciprocal Rank (MRR) = (1/1 + 1/2 + 1/3) / 3 = 0.61
Results
22AMIA 2017 | amia.org
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
disease tissue sex
Baseline Metadata Recommender
MeanReciprocalRank(MRR)
On average:
• Metadata
Recommender = 0.77
• Baseline
(majority vote) = 0.31
Better performance with
respect to the baseline for:
• Fields with many
different values
• Templates with many
correlated fields
Summary
• We developed a metadata recommendation system
as part of an end-to-end system for metadata
management called CEDAR
• Generates context-sensitive suggestions in real time
• Incorporates both ontology-based and free-text
suggestions
23AMIA 2017 | amia.org
Summary
Our approach makes it easier for scientists to
generate high-quality metadata for experimental
datasets
• So that the datasets can be found, interpreted, and
reused
• Essential to ensure scientific reproducibility
24AMIA 2017 | amia.org
25AMIA 2017 | amia.org
facebook.com/metadatacenter
@metadatacenter
http://cedar.metadatacenter.org
Channel: Metadata Center
github.com/metadatacenter

Más contenido relacionado

Similar a Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference)

How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
Phoenix Bioinformatics
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
Chen Liang
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Dat analysis part i
Dat analysis part iDat analysis part i
Dat analysis part i
DrShalooSaini
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
Research Information Network
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
Carole Goble
 
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Syed Ahmad Chan Bukhari, PhD
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment help
john mayer
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Dmitry Grapov
 
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN Australia
 
Build a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics SolutionBuild a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics Solution
Saama
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
EUDAT
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
Rakebul Hasan
 
MPDB Presentation
MPDB PresentationMPDB Presentation
MPDB Presentation
Alexander Raskind
 
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
Susanna-Assunta Sansone
 
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
marcosmartinezromero
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
Venkatesh Prasad Ranganath
 
Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data Sciences
Saama
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
Christoph Steinbeck
 

Similar a Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference) (20)

How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Dat analysis part i
Dat analysis part iDat analysis part i
Dat analysis part i
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment help
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
 
Build a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics SolutionBuild a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics Solution
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
MPDB Presentation
MPDB PresentationMPDB Presentation
MPDB Presentation
 
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
 
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data Sciences
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
 

Último

Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 

Último (20)

Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference)

  • 1. S100 Martínez-Romero, M., O’Connor, M. J., Shankar, R., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. Stanford University Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations
  • 2. What is metadata? 2AMIA 2017 | amia.org • Data that describe data • Crucial for: • Finding experimental datasets online • Understanding how the experiments were performed • Reusing the data to perform new analyses
  • 3. 3AMIA 2017 | amia.org
  • 4. 4AMIA 2017 | amia.org age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Poor metadata
  • 5. 5AMIA 2017 | amia.org An analysis of metadata from NCBI’s BioSample • 73% of “Boolean” values • nonsmoker, former-smoker • 26% of “integer” values • JM52, UVPgt59.4, pig • 68% of ontology terms • presumed normal, wild_type Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria. Poor metadata
  • 6. [Your presentation on this and next slides] 6AMIA 2017 | amia.org Metadata authoring is hard
  • 7. • A computational platform for metadata management • Goal: Overcome the impediments to creating high-quality metadata 7AMIA 2017 | amia.org Metadata template Metadata template
  • 8. 8AMIA 2017 | amia.org SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE Template Designer Metadata Editor Template authors (e.g., standards committees) Metadata authors (e.g., scientists) Metadata Repositorytemplate metadata LINCS Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal
  • 9. 9AMIA 2017 | amia.org We developed a metadata recommendation system SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE Template Designer Metadata Editor Template authors (e.g., standards committees) Metadata authors (e.g., scientists) Metadata Repositorytemplate metadata LINCS Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal
  • 10. Metadata recommendation system 10AMIA 2017 | amia.org Metadata Editor Metadata Repository https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal analyze existing metadata generate suggestions 1 23 store metadata Metadata Recommender
  • 11. 11AMIA 2017 | amia.org Filling in a CEDAR template
  • 12. 12AMIA 2017 | amia.org
  • 13. 13AMIA 2017 | amia.org
  • 14. 14AMIA 2017 | amia.org
  • 15. 15AMIA 2017 | amia.org
  • 16. Evaluation workflow 16AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 17. Evaluation workflow 17AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 18. Evaluation workflow 18AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 19. Evaluation workflow 19AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20% • For “disease”, ”sex”, and “tissue” • Top 3 suggestions
  • 20. Testing & Analysis Compared suggested vs. expected metadata Measure: Reciprocal Rank (RR). Appropriate to judge systems that return a ranking of suggestions when there is only a relevant result 20AMIA 2017 | amia.org !"#$%&'#() !(+, (!!) = 1 1 Position of the expected result in the ranking of suggestions
  • 21. How is the RR calculated? 21AMIA 2017 | amia.org Expected Suggested K Reciprocal Rank (RR) asthma 1) asthma 2) lung cancer 3) respiratory disease 1 1/1 lymphoma 1) myeloma 2) lymphoma 3) acute myeloid leukemia 2 1/2 lung cancer 1) respiratory disease 2) asthma 3) lung cancer 3 1/3 Mean Reciprocal Rank (MRR) = (1/1 + 1/2 + 1/3) / 3 = 0.61
  • 22. Results 22AMIA 2017 | amia.org 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 disease tissue sex Baseline Metadata Recommender MeanReciprocalRank(MRR) On average: • Metadata Recommender = 0.77 • Baseline (majority vote) = 0.31 Better performance with respect to the baseline for: • Fields with many different values • Templates with many correlated fields
  • 23. Summary • We developed a metadata recommendation system as part of an end-to-end system for metadata management called CEDAR • Generates context-sensitive suggestions in real time • Incorporates both ontology-based and free-text suggestions 23AMIA 2017 | amia.org
  • 24. Summary Our approach makes it easier for scientists to generate high-quality metadata for experimental datasets • So that the datasets can be found, interpreted, and reused • Essential to ensure scientific reproducibility 24AMIA 2017 | amia.org
  • 25. 25AMIA 2017 | amia.org facebook.com/metadatacenter @metadatacenter http://cedar.metadatacenter.org Channel: Metadata Center github.com/metadatacenter