SlideShare una empresa de Scribd logo
1 de 40
Scientific Lenses over Linked Data:
Identity Management in the
Open PHACTS project
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
@gray_alasdair
http://c745.r45.cf2.rackcdn.com/img/2009/le
ns_filter_coasters.jpg
Open PHACTS Use Case
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
 Chemical Properties (Chemspider)
 Launched drugs (Drugbank)
 Human => Mouse (Homologene)
 Protein Families (Enzyme)
 Bioactivty Data (ChEMBL)
 … other info (Uniprot/Entrez etc.)
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
21/05/2014 Brighton Seminar 1
Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
Repeat @ each
company
x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
A single, shared
solution.
Funded under
• IMI: 2011-14
• ENSO: 2014-16
Pre-competitive Informatics
Open PHACTS Discovery Platform
21/05/2014 Brighton Seminar 3
Drug Discovery Platform
Apps
Domain API
Interactive
responses
Production quality
integration platform
Method
Calls
(April 2013 – March 2014)
15.8 million total hits
API Hits
An “App Store”?
http://www.openphactsfoundation.org/apps.html
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna
Drug
Disease
PathwayTarget
https://dev.openphacts.org/
Linked Data API
21/05/2014 Brighton Seminar 6
OPS Discovery Platform
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
Platform Interaction
Provenance
Multiple Identities
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
21/05/2014 Brighton Seminar 10
P12047
X31045
GB:29384 Are these the
same thing?
Gleevec® = Imatinib Mesylate
21/05/2014 Brighton Seminar 11
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
21/05/2014 Brighton Seminar 12
21/05/2014 Brighton Seminar 13
Multiple Links: Different Reasons
21/05/2014 Brighton Seminar 15
Link: skos:closeMatch
Reason: non-salt form
Link: skos:exactMatch
Reason: drug name
Strict Relaxed
Analysing Browsing
Dynamic Equality
21/05/2014 Brighton Seminar 16
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Dynamic Equality
21/05/2014 Brighton Seminar 17
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)
Initial Connectivity
21/05/2014 Brighton Seminar 18
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
Compound Information
Genes == Proteins?
BRCA1
Breast cancer type 1
susceptibility protein
21/05/2014 Brighton Seminar 20
http://en.wikipedia.org/wiki/File:Pr
otein_BRCA1_PDB_1jm7.png
http://en.wikipedia.org/wiki/File:BRCA1_en.p
ng
Proceed with Caution!
21/05/2014 Brighton Seminar 21
Co-reference Computation
Rules ensure
• Unrestricted transitivity
within conceptual type
• Restrict crossing
conceptual types
Based on justifications
Provenance captured
21/05/2014 Brighton Seminar 22
0..*
0..*
0..*
0..1
0..1
Initial Connectivity
21/05/2014 Brighton Seminar 23
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
Inferred Connectivity
21/05/2014 Brighton Seminar 24
Datasets 37
Linksets 883
Links 17,383,846
Justifications 7
BridgeDb
21/05/2014 Brighton Seminar 25
http://ops.rsc.org/OPS45975 http://ops.rsc.org/OPS45978
has_isotopically_unspecified_parent
[CHEMINF:000459]
has OPS normalized counterpart
[CHEMINF:000458]
http://ops.rsc.org/OPS45991
is_tautomer_of
[chebi:is_tautomer_of]
http://ops.rsc.org/OPS45987
has_stereoundefined_parent
[CHEMINF:000456]
http://ops.rsc.org/OPS45981
Lenses
OPS Discovery Platform
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
?iri cheminf:logd ?logd .
FILTER (?iri = cw:979b545d-f9a9 ||
?iri = cs:2157 ||
?iri = chembl:1280 ||
?iri = db:db00945 )
cw:979b545d-f9a9 cheminf:logd ?logd .
GRAPH <http://rdf.chemspider.com> {
}
cw:979b545d-f9a9 cheminf:logd ?logd .
Query Expansion
Identity
Mapping Service
(BridgeDB)
Query Expander
Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,
cs:2157,
chembl:1280,
db:db00945]
cw:979b545d-f9a9, L1
Can also be achieved through UNION
21/05/2014 Brighton Seminar 28
Experiment
Is it feasible to use a stand-off
mapping service?
• Base lines (no external call):
– “Perfect” URIs
– Linked data querying
• Expansion approaches (external service call):
– FILTER by Graph
– UNION by Graph
C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S. Pettifer: Including Co-
referent URIs in a SPARQL Query. COLD 2013.
http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf
21/05/2014 Brighton Seminar 29
“Perfect” URI Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
chembl_mol:m1280 cheminf:mw ?mw .
}
}
21/05/2014 Brighton Seminar 30
Linked Data Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
?chemblid cheminf:mw ?mw .
}
cs:2157 skos:exactMatch ?chemblid .
}
21/05/2014 Brighton Seminar 31
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
21/05/2014 Brighton Seminar 32
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
21/05/2014 Brighton Seminar 33
Data:
167,783,592 triples
Mappings:
2,114,584 triples
Lenses:
1
Experiment Data
21/05/2014 Brighton Seminar 34
Average execution times
35
Average execution times
0.018
36
Q6: Target Pharmacology
43
Conclusions
• Computing co-reference advantageous
– Requires less raw linksets
– Larger coverage across datasets
• Rules ensure control
– Genes can equal proteins
– Compounds never equal proteins
• Provenance captured throughout
21/05/2014 Brighton Seminar 44
Conclusions
• Query expansion slower in general
– Due to separate service call
– Difference below human perception
– UNION faster than FILTER on Virtuoso
• Stand-off mappings feasible
• Infrastructure can support lenses
21/05/2014 Brighton Seminar 45
Strict Relaxed
Analysing Browsing
Questions
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
@gray_alasdair
pmu@openphacts.org
www.openphacts.org
@open_phacts

Más contenido relacionado

La actualidad más candente

Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Alasdair Gray
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
Capturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataCapturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataChris Southan
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionPaul Groth
 
Collaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageCollaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageSean Ekins
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChemSunghwan Kim
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Frederik van den Broek
 
Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial dataChris Southan
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationSunghwan Kim
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceGeorge Papadatos
 
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...
USUGM 2014 -  Gregory Landrum (Novartis): What else can you do with the Marku...USUGM 2014 -  Gregory Landrum (Novartis): What else can you do with the Marku...
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...ChemAxon
 

La actualidad más candente (20)

Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
Capturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataCapturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor data
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Data Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tensionData Integration vs Transparency: Tackling the tension
Data Integration vs Transparency: Tackling the tension
 
Collaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageCollaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile age
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
ChEMBL+KNIME
ChEMBL+KNIMEChEMBL+KNIME
ChEMBL+KNIME
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
Sourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicologySourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicology
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
 
Connecting antimalarial data
Connecting antimalarial dataConnecting antimalarial data
Connecting antimalarial data
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics education
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...
USUGM 2014 -  Gregory Landrum (Novartis): What else can you do with the Marku...USUGM 2014 -  Gregory Landrum (Novartis): What else can you do with the Marku...
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...
 

Similar a Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zenecaKerstin Forsberg
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataAlasdair Gray
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Kerstin Forsberg
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Manchester Open Notebook Science Talk
Manchester Open Notebook Science TalkManchester Open Notebook Science Talk
Manchester Open Notebook Science TalkJean-Claude Bradley
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply ChainPaul Groth
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataMaulik Kamdar
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-upopen_phacts
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Michel Dumontier
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeSciBite Limited
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug DiscoverySciBite Limited
 
Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011Jean-Claude Bradley
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance
 
IGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkIGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkJean-Claude Bradley
 
Adressing Research Questions by Integration of Open PHACTS with Pipeline Pilot
Adressing Research Questions by Integration of Open PHACTS with Pipeline PilotAdressing Research Questions by Integration of Open PHACTS with Pipeline Pilot
Adressing Research Questions by Integration of Open PHACTS with Pipeline PilotChristian Herhaus
 

Similar a Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project (20)

Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Manchester Open Notebook Science Talk
Manchester Open Notebook Science TalkManchester Open Notebook Science Talk
Manchester Open Notebook Science Talk
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open DataGraph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011Open Notebook Science Web Services - ACS Spring 2011
Open Notebook Science Web Services - ACS Spring 2011
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
 
IGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science TalkIGERT Drexel Open Notebook Science Talk
IGERT Drexel Open Notebook Science Talk
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Adressing Research Questions by Integration of Open PHACTS with Pipeline Pilot
Adressing Research Questions by Integration of Open PHACTS with Pipeline PilotAdressing Research Questions by Integration of Open PHACTS with Pipeline Pilot
Adressing Research Questions by Integration of Open PHACTS with Pipeline Pilot
 

Más de Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryAlasdair Gray
 

Más de Alasdair Gray (20)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Notas del editor

  1. 1 of 83 business driver questions
  2. Pharma are all accessing, processing, storing & re-processing external research data OPS: 29 partners
  3. A platform for integrated pharmacology data Relied upon by pharma companies Public domain, commercial, and private data sources Provides domain specific API Making it easy to build multiple drug discovery applications: examples developed in the project
  4. Public launch April 2013
  5. 17 apps 5 external 1 in partnership
  6. Linked data API: multiple response formats (JSON, RDF, XML, CSV …) 3scala deployment Public dataset
  7. Import data into cache API calls populate SPARQL queries Integration approach Data kept in original model Data cached in central triple store API call translated to SPARQL query Query expressed in terms of original data Queries expanded by IMS to cover URIs of original datasets
  8. Example using Explorer application, see Ian’s demo of the new version in the demo session User starts typing Server sends back suggestions – User selects one URI sent to platform Integrated Information returned including provenance
  9. Each captures a subtly different view of the world Are they the same? … depends on your point of view
  10. Example drug: Gleevec Cancer drug for leukemia Lookup in three popular public chemical databases Different results Data is messy!
  11. Enter with ChemSpider URI for Imatinib This is not Gleevec
  12. sameAs != sameAs depends on your point of view Links relate individual data instances: source, target, predicate, reason. Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
  13. Interested in physiochemical properties of Gleevec
  14. Interested in biomedical and pharmacological properties
  15. Can enter with IDs from any of the supported datasets
  16. Platform extracts data from certain datasets These need to be connected Here there is no issue in computing transitive as they are all the same compound based on InChI key Would compute the full set of links
  17. Do genes == proteins? Different conceptual types: gene and protein Often used as a shortcut for retrieval: BRCA1 easier to remember and type! Require the ability to equate them in the IMS ---- But if you’re saying why genes=proteins you may also want to be prepared for questions of when genes!=proteins. Splice variation is a common example, n the FAS receptor: http://en.wikipedia.org/wiki/Alternative_splicing#Exon_definition:_Fas_receptor there is one gene but it can be made into two distinct proteins - which have different biological effects), so you can obviously mix bio data that shouldnt be mixed by integrating these two functions on the same ID. [We currently dont handle this well in OPS] And the most used example here, the ghrelin gene is transcribed into a protein which is cleaved in two to form two completely different hormones, ghrelin and obestatin, which do very different things. But come from the same gene http://en.wikipedia.org/wiki/Ghrelin#Synthesis_and_variants
  18. Insulin Receptor Issue when linking through PDB due to the way that proteins are crystalised
  19. Can enter with IDs from any of the supported datasets
  20. These are 1.3 figures In 1.4 130 raw linksets with 6,985,278 links 40,802 computed linksets with 25,584,293 links
  21. Implementation available IMS takes query and expands URIs
  22. Retinoic Acid
  23. Reminder: enter with method and URI, implemented as a query Challenge: can we efficiently support lenses Lenses require stand-off mappings, implemented as extra service call
  24. Query with URIs Extract URIs Find equivalents Expand query Optimise based on context
  25. Result size in brackets
  26. Orange are actual OPS queries
  27. Subset of the OPS data
  28. Linked data approach performs badly with query 6 due to the query construction Name being bound to the chemical structure returned
  29. Focus on other queries In general expansion is slower than base lines Worst case delta: 0.01842 (under 20ms) Human perception is 0.050 to 0.2 (50 -200ms)
  30. Focus on query 6 No linked data as it performed very poorly on this query Size of result obliterates external call cost