SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
About the use of
biomedical
ontologies to play
with text
… in the context of the…
Clement Jonquet (jonquet@lirmm.fr)
Conférence Communication Science et Société
LGI2P, Nimes – 17 mars 2015
A few introduction words
Conference C2S
LGI2P, Nimes – 17 mars 2015
Biologist have adopted
ontologies
 To provide canonical representation of scientific
knowledge
 To annotate experimental data to enable
interpretation, comparison, and discovery across
databases
 To facilitate knowledge-based applications for
 Decision support
 Natural language-processing
 Data integration
 But ontologies are: spread out, in different formats, of
different size, with different structures
Conference C2S
LGI2P, Nimes – 17 mars 2015
Working with terminologies &
ontologies – a portal please!
 You’ve built an ontology, how do you let the world know?
 You need an ontology, where do you go o get it?
 How do you know whether an ontology is any good?
 How do you find resources that are relevant to the
domain of the ontology (or to specific terms)?
 How could you leverage your ontology to enable new
science?
 How could you use ontologies without managing them ?
Conference C2S
LGI2P, Nimes – 17 mars 2015
Conference C2S
LGI2P, Nimes – 17 mars 2015
Comparison in [IWBBIO'14]
Annotation challenge
 Explosion of biomedical data: diverse,
distributed, unstructured… not link to
ontologies
 Hard for biomedical researchers to find the
data they need
 Data integration problem
 Translational discoveries are prevented
 Good examples
 GO annotations
 PubMed (biomedical literature) indexed with
Mesh headings
 Annotate data with ontology concepts
 Horizontal approach
ONTOLOGIES
RESOURCES
Conference C2S
LGI2P, Nimes – 17 mars 2015
A few words about SIFR
project
Conference C2S LGI2P, Nimes –
17 mars 2015
Semantic Indexing of
French Biomedical Data
Resources project
… in collaboration with…
Context:
increasing number of biomedical data
+ multilingualism
 Limits of keyword-based indexing
 Biomedical community has turned to ontologies to describe their
data and turn them into structured and formalized knowledge
 Using ontologies is by means of creating semantic annotations
 Crucial need for tools & services for French biomedical data
 Biomedical data integration challenge
 New potential sceintific discoveries hidden in data
 Translational research
Conference C2S LGI2P, Nimes –
17 mars 2015
Use ontologies for indexing, mining
and searching (French) biomedical
data
 Obj1: Design, development and deployment
of the French Annotator.
 Obj2: Obtain new research results to exploit
and enhance ontology-based indexing
services.
 semantic distances
 ontology alignment
 ontology enrichment and disambiguation
 Obj3: Valorization of indexing services
Conference C2S LGI2P, Nimes –
17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
A French biomedical Annotator
Conference C2S LGI2P, Nimes –
17 mars 2015
Use biomedical ontologies-based
annotations end-user applications
Reuse of the NCBO
technology
Conference C2S LGI2P, Nimes –
17 mars 2015
http://bioportal.bioontology.org
BioPortal Ontology Repository
http://data.bioontology.org
Ontology
Services
• Search
• Traverse
• Comment
• Download
Widgets
• Tree-view
• Auto-complete
• Graph-view
Annotation
Data Access
Mapping
Services
• Create
• Upload
• Download
Term recognition
Search “data”
annotated with a
given term
http://bioportal.bioontology.org Conference C2S LGI2P, Nimes –
17 mars 2015
SIFR axes of research (1/2)
 Design of the SIFR (French) Annotator service
 Deployment of a local instance of BioPortal at LIRMM
 Scoring of annotations & representation RDF using the AO [SWAT4LS
2014]
 Dealing with multilingualism within BioPortal [TOTh-w 2014]
 Automatic extraction of biomedical terminology from text
 Hereafter [LBM 2013][ISWC 2014][TALN 2014][PolTAL 2014]
 Semantic distance framework
 Collaboration with LGI2P to reuse Semantic Measure Library (SML)
Conference C2S LGI2P, Nimes –
17 mars 2015
SIFR axes of research (2/2)
 Dealing with public patient data on blogs, forums and tweets
(Sandra Bringay)
 Detection of emotion [EGC 2014]
 Patient vocabulary [eTELEMED 2014]
 Adverse drug event mining from EHRs
 Project to compare pharmacogenomics literature and EHRs
 Design of a semantic annotation workflow for plant data -
collaboration with IBC project [CO-PDI 2014]
 AgroLD project [RDA 2014]
 Cropontology.org
 Semantic indexing and users feedback – Viewpoint [IC 2014]
 Collaboration with P. Lemoisson (CIRAD)
 PhD project of Guillaume Surroca
Conference C2S LGI2P, Nimes –
17 mars 2015
Biomedical
terminology extraction
Work realized in the context of
Juan Antonio Lossio Ventura ‘s PhD preparation
In collaboration with Mathieu Roche & Maguelonne Teisseire (TETIS)
Motivations for automatic
terminology extraction
 Experiment and validate approaches for French data
 Offer services for both English and French communities
 Go beyond the state-of-the-art
 Contribute to the ontology enrichment process
 Acquire some NLP expertise to enhance the NCBO
Annotation workflow
Conference C2S LGI2P, Nimes –
17 mars 2015
Combining ATR & AKE
ATR AKE
Automatic Term
Recognition
Automatic Keyword
Extraction
Input one large corpus single document of a dataset
Output technical terms of a domain keywords that describe the
document
Domain very specific none
Exemples C-value TFIDF, Okapi
Automatic Term
Recognition
Automatic Keyword
Extraction
term1
term2
…
termn
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
Assign each word in a text to its grammatical category (e.g.,
noun, adjective).
We apply part-of-speech to the whole corpus
Three tools:
• TreeTagger,
• Stanford Tagger,
• Brill’s rules
(1) Part-of-speech tagging
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms
extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(2) Candidate term extraction
following patterns
Conference C2S LGI2P, Nimes –
17 mars 2015
~ 5M concepts
161 sources
Unified Medical Language System
…
UMLS
MeSH
ICD
SNOMED
(2) Candidate term extraction
following patterns
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(3) Ranking of candidate terms
Conference C2S LGI2P, Nimes –
17 mars 2015
Using C-value
Where:
In order to extract single-word and multi-word terms
(3) Ranking of candidate terms
Using TF-IDF and Okapi BM25
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(4) Computing of new
combination measures
Conference C2S LGI2P, Nimes –
17 mars 2015
F-OCapi and F-TFIDF-C (Harmonic mean)
Conference C2S LGI2P, Nimes –
17 mars 2015
C-Okapi and C-TFIDF
(4) Computing of new
combination measures
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(5) Re-ranking using web-
based measure
Conference C2S LGI2P, Nimes –
17 mars 2015
term
1
term
2
…
term
n
WEB
“treponema pallidum”
treponema pallidum
Experiments: datasets
 Plus automatic validation using UMLS (EN) & MeSH (FR)
Conference C2S LGI2P, Nimes –
17 mars 2015
Drugs and Herbs
Medical Tests
PubMed
(EN, FR)
(EN, FR)
(EN)
Precision comparison of the best measures for term extraction for
English.
Precision comparison of the best measures for term extraction for
French.
Precision comparison between F-OCapiM and WebR with automatic validation for
French.
Conference C2S LGI2P, Nimes –
17 mars 2015
Experiments: results
Conference C2S LGI2P, Nimes –
17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
Current & future work on
term extraction
 Methodology for term extraction and ranking for two
languages, French and English.
 C-value adapted to extract French biomedical terms.
 Two new measures thanks to the combination of three
existing methods and another new web-based measure.
 WebR was applied to re-rank the best list positioning
the true biomedical terms at the top of list.
 Reuse such NLP within the SIFR Annotator workflow to
enhance semantic annotation
Conference C2S LGI2P, Nimes –
17 mars 2015
A few words by way of
conclusion
Conference C2S
LGI2P, Nimes – 17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
 Terminologies & ontologies are relevant
features for knowledge representation
 But a large majority of the data are texts
 Go beyond one language
 Share & mutualize relevant resources in the
domain: ontologies, terminologies,
mappings, annotations, technologies

Más contenido relacionado

Similar a About the use of biomedical ontologies to play with text in the context of the SIFR project.

Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
ijtsrd
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
Margaret-Anne Storey
 
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
Dr. Haxel Consult
 

Similar a About the use of biomedical ontologies to play with text in the context of the SIFR project. (20)

Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
Mastering an ontology & vocabulary management technology in France ?
Mastering an ontology & vocabulary management technology in France ?Mastering an ontology & vocabulary management technology in France ?
Mastering an ontology & vocabulary management technology in France ?
 
The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Enabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics dataEnabling automated processing and analysis of large-scale proteomics data
Enabling automated processing and analysis of large-scale proteomics data
 
Industrial Natural Language Processing and Information Extraction
Industrial Natural Language Processing and Information ExtractionIndustrial Natural Language Processing and Information Extraction
Industrial Natural Language Processing and Information Extraction
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
AgroPortal : a proposition for ontology- based services in the agronomic domain
AgroPortal : a proposition for ontology- based services in the agronomic domainAgroPortal : a proposition for ontology- based services in the agronomic domain
AgroPortal : a proposition for ontology- based services in the agronomic domain
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
Schuurman phd presentation 2015 02 27
Schuurman phd presentation 2015 02 27Schuurman phd presentation 2015 02 27
Schuurman phd presentation 2015 02 27
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biom...
 
NLP & ML Webinar
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
 
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
 
LIFE3: Predicting Long Term Preservation Costs, by Brian Hole
LIFE3: Predicting Long Term Preservation Costs, by Brian HoleLIFE3: Predicting Long Term Preservation Costs, by Brian Hole
LIFE3: Predicting Long Term Preservation Costs, by Brian Hole
 

Más de INRAE (MISTEA) and University of Montpellier (LIRMM)

Más de INRAE (MISTEA) and University of Montpellier (LIRMM) (19)

Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
 
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
 
O’FAIRe: Ontology FAIRness Evaluator in the AgroPortal semantic resource rep...
O’FAIRe: Ontology FAIRness Evaluator in theAgroPortal semantic resource rep...O’FAIRe: Ontology FAIRness Evaluator in theAgroPortal semantic resource rep...
O’FAIRe: Ontology FAIRness Evaluator in the AgroPortal semantic resource rep...
 
Presentation FAIRsFAIR workshop (June 2021)
Presentation FAIRsFAIR workshop (June 2021)Presentation FAIRsFAIR workshop (June 2021)
Presentation FAIRsFAIR workshop (June 2021)
 
Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)
 
Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”
 
Ontology Repository and Ontology-based Services
Ontology Repository and Ontology-based ServicesOntology Repository and Ontology-based Services
Ontology Repository and Ontology-based Services
 
Portail d’ontologies et annotation sémantique de texte - Application en biomé...
Portail d’ontologies et annotation sémantique de texte - Application en biomé...Portail d’ontologies et annotation sémantique de texte - Application en biomé...
Portail d’ontologies et annotation sémantique de texte - Application en biomé...
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
 
SIFR : Indexation sémantique de ressources biomédicales francophones
SIFR : Indexation sémantique de ressources biomédicales francophonesSIFR : Indexation sémantique de ressources biomédicales francophones
SIFR : Indexation sémantique de ressources biomédicales francophones
 
Tutoriel : "Gestion d’ontologies"
Tutoriel : "Gestion d’ontologies"Tutoriel : "Gestion d’ontologies"
Tutoriel : "Gestion d’ontologies"
 
SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...
SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...
SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...
 
Challenges for ontology repositories and applications to biomedicine and agro...
Challenges for ontology repositories and applications to biomedicine and agro...Challenges for ontology repositories and applications to biomedicine and agro...
Challenges for ontology repositories and applications to biomedicine and agro...
 
Presentation AgroPortal
Presentation AgroPortalPresentation AgroPortal
Presentation AgroPortal
 
Roadmap for a multilingual BioPortal
Roadmap for a multilingual BioPortalRoadmap for a multilingual BioPortal
Roadmap for a multilingual BioPortal
 
Presentation Sommet iPad en education 2014 Polytech Montpellier
Presentation Sommet iPad en education 2014 Polytech MontpellierPresentation Sommet iPad en education 2014 Polytech Montpellier
Presentation Sommet iPad en education 2014 Polytech Montpellier
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
Dynamic Service Generation: Agent interactions for service exchange on the Grid
Dynamic Service Generation: Agent interactions for service exchange on the GridDynamic Service Generation: Agent interactions for service exchange on the Grid
Dynamic Service Generation: Agent interactions for service exchange on the Grid
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

About the use of biomedical ontologies to play with text in the context of the SIFR project.

  • 1. About the use of biomedical ontologies to play with text … in the context of the… Clement Jonquet (jonquet@lirmm.fr) Conférence Communication Science et Société LGI2P, Nimes – 17 mars 2015
  • 2. A few introduction words Conference C2S LGI2P, Nimes – 17 mars 2015
  • 3. Biologist have adopted ontologies  To provide canonical representation of scientific knowledge  To annotate experimental data to enable interpretation, comparison, and discovery across databases  To facilitate knowledge-based applications for  Decision support  Natural language-processing  Data integration  But ontologies are: spread out, in different formats, of different size, with different structures Conference C2S LGI2P, Nimes – 17 mars 2015
  • 4. Working with terminologies & ontologies – a portal please!  You’ve built an ontology, how do you let the world know?  You need an ontology, where do you go o get it?  How do you know whether an ontology is any good?  How do you find resources that are relevant to the domain of the ontology (or to specific terms)?  How could you leverage your ontology to enable new science?  How could you use ontologies without managing them ? Conference C2S LGI2P, Nimes – 17 mars 2015
  • 5. Conference C2S LGI2P, Nimes – 17 mars 2015 Comparison in [IWBBIO'14]
  • 6. Annotation challenge  Explosion of biomedical data: diverse, distributed, unstructured… not link to ontologies  Hard for biomedical researchers to find the data they need  Data integration problem  Translational discoveries are prevented  Good examples  GO annotations  PubMed (biomedical literature) indexed with Mesh headings  Annotate data with ontology concepts  Horizontal approach ONTOLOGIES RESOURCES Conference C2S LGI2P, Nimes – 17 mars 2015
  • 7. A few words about SIFR project Conference C2S LGI2P, Nimes – 17 mars 2015
  • 8. Semantic Indexing of French Biomedical Data Resources project … in collaboration with…
  • 9. Context: increasing number of biomedical data + multilingualism  Limits of keyword-based indexing  Biomedical community has turned to ontologies to describe their data and turn them into structured and formalized knowledge  Using ontologies is by means of creating semantic annotations  Crucial need for tools & services for French biomedical data  Biomedical data integration challenge  New potential sceintific discoveries hidden in data  Translational research Conference C2S LGI2P, Nimes – 17 mars 2015
  • 10. Use ontologies for indexing, mining and searching (French) biomedical data  Obj1: Design, development and deployment of the French Annotator.  Obj2: Obtain new research results to exploit and enhance ontology-based indexing services.  semantic distances  ontology alignment  ontology enrichment and disambiguation  Obj3: Valorization of indexing services Conference C2S LGI2P, Nimes – 17 mars 2015
  • 11. Conference C2S LGI2P, Nimes – 17 mars 2015 A French biomedical Annotator
  • 12. Conference C2S LGI2P, Nimes – 17 mars 2015 Use biomedical ontologies-based annotations end-user applications
  • 13. Reuse of the NCBO technology Conference C2S LGI2P, Nimes – 17 mars 2015
  • 15. http://data.bioontology.org Ontology Services • Search • Traverse • Comment • Download Widgets • Tree-view • Auto-complete • Graph-view Annotation Data Access Mapping Services • Create • Upload • Download Term recognition Search “data” annotated with a given term http://bioportal.bioontology.org Conference C2S LGI2P, Nimes – 17 mars 2015
  • 16. SIFR axes of research (1/2)  Design of the SIFR (French) Annotator service  Deployment of a local instance of BioPortal at LIRMM  Scoring of annotations & representation RDF using the AO [SWAT4LS 2014]  Dealing with multilingualism within BioPortal [TOTh-w 2014]  Automatic extraction of biomedical terminology from text  Hereafter [LBM 2013][ISWC 2014][TALN 2014][PolTAL 2014]  Semantic distance framework  Collaboration with LGI2P to reuse Semantic Measure Library (SML) Conference C2S LGI2P, Nimes – 17 mars 2015
  • 17. SIFR axes of research (2/2)  Dealing with public patient data on blogs, forums and tweets (Sandra Bringay)  Detection of emotion [EGC 2014]  Patient vocabulary [eTELEMED 2014]  Adverse drug event mining from EHRs  Project to compare pharmacogenomics literature and EHRs  Design of a semantic annotation workflow for plant data - collaboration with IBC project [CO-PDI 2014]  AgroLD project [RDA 2014]  Cropontology.org  Semantic indexing and users feedback – Viewpoint [IC 2014]  Collaboration with P. Lemoisson (CIRAD)  PhD project of Guillaume Surroca Conference C2S LGI2P, Nimes – 17 mars 2015
  • 18. Biomedical terminology extraction Work realized in the context of Juan Antonio Lossio Ventura ‘s PhD preparation In collaboration with Mathieu Roche & Maguelonne Teisseire (TETIS)
  • 19. Motivations for automatic terminology extraction  Experiment and validate approaches for French data  Offer services for both English and French communities  Go beyond the state-of-the-art  Contribute to the ontology enrichment process  Acquire some NLP expertise to enhance the NCBO Annotation workflow Conference C2S LGI2P, Nimes – 17 mars 2015
  • 20. Combining ATR & AKE ATR AKE Automatic Term Recognition Automatic Keyword Extraction Input one large corpus single document of a dataset Output technical terms of a domain keywords that describe the document Domain very specific none Exemples C-value TFIDF, Okapi Automatic Term Recognition Automatic Keyword Extraction term1 term2 … termn Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  • 21. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 22. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 23. Assign each word in a text to its grammatical category (e.g., noun, adjective). We apply part-of-speech to the whole corpus Three tools: • TreeTagger, • Stanford Tagger, • Brill’s rules (1) Part-of-speech tagging Conference C2S LGI2P, Nimes – 17 mars 2015
  • 24. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 25. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015 ~ 5M concepts 161 sources Unified Medical Language System … UMLS MeSH ICD SNOMED
  • 26. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015
  • 27. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 28. (3) Ranking of candidate terms Conference C2S LGI2P, Nimes – 17 mars 2015 Using C-value Where: In order to extract single-word and multi-word terms
  • 29. (3) Ranking of candidate terms Using TF-IDF and Okapi BM25 Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  • 30. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 31. (4) Computing of new combination measures Conference C2S LGI2P, Nimes – 17 mars 2015 F-OCapi and F-TFIDF-C (Harmonic mean)
  • 32. Conference C2S LGI2P, Nimes – 17 mars 2015 C-Okapi and C-TFIDF (4) Computing of new combination measures
  • 33. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 34. (5) Re-ranking using web- based measure Conference C2S LGI2P, Nimes – 17 mars 2015 term 1 term 2 … term n WEB “treponema pallidum” treponema pallidum
  • 35. Experiments: datasets  Plus automatic validation using UMLS (EN) & MeSH (FR) Conference C2S LGI2P, Nimes – 17 mars 2015 Drugs and Herbs Medical Tests PubMed (EN, FR) (EN, FR) (EN)
  • 36. Precision comparison of the best measures for term extraction for English. Precision comparison of the best measures for term extraction for French. Precision comparison between F-OCapiM and WebR with automatic validation for French. Conference C2S LGI2P, Nimes – 17 mars 2015 Experiments: results
  • 37. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 38. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 39. Current & future work on term extraction  Methodology for term extraction and ranking for two languages, French and English.  C-value adapted to extract French biomedical terms.  Two new measures thanks to the combination of three existing methods and another new web-based measure.  WebR was applied to re-rank the best list positioning the true biomedical terms at the top of list.  Reuse such NLP within the SIFR Annotator workflow to enhance semantic annotation Conference C2S LGI2P, Nimes – 17 mars 2015
  • 40. A few words by way of conclusion Conference C2S LGI2P, Nimes – 17 mars 2015
  • 41. Conference C2S LGI2P, Nimes – 17 mars 2015  Terminologies & ontologies are relevant features for knowledge representation  But a large majority of the data are texts  Go beyond one language  Share & mutualize relevant resources in the domain: ontologies, terminologies, mappings, annotations, technologies