Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Scientific Lenses over Linked Data:
Identity Management in the
Open PHACTS project
Alasdair J G Gray
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
@gray_alasdair
http://c745.r45.cf2.rackcdn.com/img/2009/le
ns_filter_coasters.jpg

Open PHACTS Use Case
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
 Chemical Properties (Chemspider)
 Launched drugs (Drugbank)
 Human => Mouse (Homologene)
 Protein Families (Enzyme)
 Bioactivty Data (ChEMBL)
 … other info (Uniprot/Entrez etc.)
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
21/05/2014 Brighton Seminar 1

Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
Repeat @ each
company
x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
A single, shared
solution.
Funded under
• IMI: 2011-14
• ENSO: 2014-16
Pre-competitive Informatics

Open PHACTS Discovery Platform
Drug Discovery Platform
Apps
Domain API
Interactive
responses
Production quality
integration platform
Method
Calls

(April 2013 – March 2014)
15.8 million total hits
API Hits

An “App Store”?
http://www.openphactsfoundation.org/apps.html
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

Drug
Disease
PathwayTarget
https://dev.openphacts.org/
Linked Data API

OPS Discovery Platform
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps

Multiple Identities
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
P12047
X31045
GB:29384 Are these the
same thing?

Gleevec® = Imatinib Mesylate
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N

Multiple Links: Different Reasons
Link: skos:closeMatch
Reason: non-salt form
Link: skos:exactMatch
Reason: drug name

Strict Relaxed
Analysing Browsing
Dynamic Equality
skos:exactMatch
(InChI)

Strict Relaxed
Analysing Browsing
Dynamic Equality
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)

Initial Connectivity
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7

Genes == Proteins?
BRCA1
Breast cancer type 1
susceptibility protein
http://en.wikipedia.org/wiki/File:Pr
otein_BRCA1_PDB_1jm7.png
http://en.wikipedia.org/wiki/File:BRCA1_en.p
ng

Proceed with Caution!

Co-reference Computation
Rules ensure
• Unrestricted transitivity
within conceptual type
• Restrict crossing
conceptual types
Based on justifications
Provenance captured
0..*
0..*
0..*
0..1
0..1

Initial Connectivity
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7

Inferred Connectivity
Datasets 37
Linksets 883
Links 17,383,846
Justifications 7

BridgeDb

http://ops.rsc.org/OPS45975 http://ops.rsc.org/OPS45978
has_isotopically_unspecified_parent
[CHEMINF:000459]
has OPS normalized counterpart
[CHEMINF:000458]
http://ops.rsc.org/OPS45991
is_tautomer_of
[chebi:is_tautomer_of]
has_stereoundefined_parent
[CHEMINF:000456]
Lenses

?iri cheminf:logd ?logd .
FILTER (?iri = cw:979b545d-f9a9 ||
?iri = cs:2157 ||
?iri = chembl:1280 ||
?iri = db:db00945 )
cw:979b545d-f9a9 cheminf:logd ?logd .
GRAPH <http://rdf.chemspider.com> {
}
cw:979b545d-f9a9 cheminf:logd ?logd .
Query Expansion
Identity
Mapping Service
(BridgeDB)
Query Expander
Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,
cs:2157,
chembl:1280,
db:db00945]
cw:979b545d-f9a9, L1
Can also be achieved through UNION

Experiment
Is it feasible to use a stand-off
mapping service?
• Base lines (no external call):
– “Perfect” URIs
– Linked data querying
• Expansion approaches (external service call):
– FILTER by Graph
– UNION by Graph
C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S. Pettifer: Including Co-
referent URIs in a SPARQL Query. COLD 2013.
http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf

“Perfect” URI Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
chembl_mol:m1280 cheminf:mw ?mw .
}
}

Linked Data Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
?chemblid cheminf:mw ?mw .
}
cs:2157 skos:exactMatch ?chemblid .
}

Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)

Data:
167,783,592 triples
Mappings:
2,114,584 triples
Lenses:
1
Experiment Data

Average execution times
0.018
36

Conclusions
• Computing co-reference advantageous
– Requires less raw linksets
– Larger coverage across datasets
• Rules ensure control
– Genes can equal proteins
– Compounds never equal proteins
• Provenance captured throughout

Conclusions
• Query expansion slower in general
– Due to separate service call
– Difference below human perception
– UNION faster than FILTER on Virtuoso
• Stand-off mappings feasible
• Infrastructure can support lenses
Strict Relaxed
Analysing Browsing

Questions
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
@gray_alasdair
pmu@openphacts.org
www.openphacts.org
@open_phacts

Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Similar a Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project (20)

Más de Alasdair Gray

Más de Alasdair Gray (20)

Último

Último (20)

Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project

Notas del editor