SlideShare una empresa de Scribd logo
1 de 24
Scientific lenses to support 
multiple views over linked 
Chemistry data 
Alasdair J G Gray 
A.J.G.Gray@hw.ac.uk 
alasdairjggray.co.uk 
@gray_alasdair 
Open PHACTS 
pmu@openphacts.org 
openphacts.org 
@open_phacts
Multiple Identities 
GB:29384 
P12047 
X31045 
21 October 2014 Scientific Lenses – A. J. G. Gray 1
Gleevec®: Imatinib Mesylate 
Imatinib 
Imatinib MesylateMesylate 
YLMAHDNUQAMNNX-UHFFFAOYSA-N 
ChemSpider Drugbank PubChem 
21 October 2014 Scientific Lenses – A. J. G. Gray 2
Gleevec®: Imatinib Mesylate 
Imatinib 
Are these records the same? 
It depends upon your task! 
Imatinib MesylateMesylate 
YLMAHDNUQAMNNX-UHFFFAOYSA-N 
ChemSpider Drugbank PubChem 
21 October 2014 Scientific Lenses – A. J. G. Gray 3
Example Use Cases 
I need to perform an 
analysis, give me details 
of the active compound 
in Gleevec. 
Which targets are 
known to interact 
with Gleevec? 
21 October 2014 Scientific Lenses – A. J. G. Gray 4
Structure Lens 
I need to perform an analysis, give me 
Strict Relaxed 
Analysing Browsing 
skos:exactMatch 
(InChI) 
Scientific Lenses – A. J. G. Gray 5 
21 October 2014 
details of the active compound in 
Gleevec.
Name Lens 
Which targets are known to interact 
Strict Relaxed 
Analysing Browsing 
skos:closeMatch 
(Drug Name) 
skos:exactMatch 
(InChI) 
skos:closeMatch 
(Drug Name) 
Scientific Lenses – A. J. G. Gray 6 
21 October 2014 
with Gleevec?
What is a Scientific Lens? 
A lens defines a conceptual view over the data 
 Specifies operational equivalence conditions 
Consists of: 
 Identifier (URI) 
 Title 
(dct:title) 
 Description 
(dct:description) 
 Documentation link 
(dcat:landingPage) 
 Creator 
(pav:createdBy) 
 Timestamp 
(pav:createdOn) 
 Equivalence rules 
(bdb:linksetJustification) 
16 October 2014 Scientific Lenses – A. J. G. Gray 7
Lens Effects: Ibuprofen 
Ibuprofen consists of two equally active stereoisomers. 
• Stereoisomers not always represented in data 
Users wish to retrieve information for any stereoisomer. 
CHEMBL427526 
CHEMBL521 
CHEMBL175 
21 October 2014 Scientific Lenses – A. J. G. Gray 8
Default Lens 
Ibuprofen consists of two equally active stereoisomers. 
• Stereoisomers not always represented in data 
Users wish to retrieve information for any stereoisomer. 
21 October 2014 Scientific Lenses – A. J. G. Gray 9
Stereoisomer Lens 
Ibuprofen consists of two equally active stereoisomers. 
• Stereoisomers not always represented in data 
Users wish to retrieve information for any stereoisomer. 
21 October 2014 Scientific Lenses – A. J. G. Gray 10
Mapping Generation 
✔ 
ops:OPS437281 
has_stereoundefined_parent 
[ci:CHEMINF_000456] 
ops:OPS380297 
is_stereoisomer_of 
[ci:CHEMINF_000461] 
ops:OPS380292 
Other relationships 
• has part 
• is tautomer of 
• uncharged counterpart 
• isotope 
… 
21 October 2014 Scientific Lenses – A. J. G. Gray 11
Explorer Screenshot 
21 October 2014 Scientific Lenses – A. J. G. Gray 12
Explorer Screenshot 
21 October 2014 Scientific Lenses – A. J. G. Gray 13
OPS Discovery Platform 
Linked Data API (RDF/XML, TTL, JSON) 
Semantic Workflow Engine 
VoID 
Nanopub 
Db 
Data Cache 
(Virtuoso Triple Store) 
Domain 
Specific 
Services 
Identity 
Resolution 
Service 
Chemistry 
Registration 
Normalisation 
& Q/C 
Identifier 
Management 
Service 
Indexing 
Core Platform 
“Adenosine 
receptor 2a” 
P12374 
EC2.43.4 
CS4532 
VoID 
Db 
VoID 
Nanopub 
Db 
VoID 
Db 
VoID 
Nanopub 
Public Content Commercial 
Public Ontologies 
User 
Annotations 
Apps 
21 October 2014 Scientific Lenses – A. J. G. Gray 14
Lenses: Under the hood 
GRAPH <http://rdf.chemspider.com> { 
cw:979b545d-f9a9 cheminf:logd ?logd . 
?iri cheminf:logd ?logd . 
FILTER (?iri = cw:979b545d-f9a9 || 
?iri = cs:2157 || 
?iri = chembl:1280 || 
?iri = db:db00945 ) 
} 
GRAPH <http://… 
Q, L1 Q’ 
Query 
Expander 
Service 
Identity 
Mapping 
Service 
(BridgeDB) 
Mappings 
Profiles 
cw:979b545d-f9a9, L1 
[cw:979b545d-f9a9, 
cs:2157, 
chembl:1280, 
db:db00945] 
• IMS call adds overhead 
• Call time below human perception [1] 
• Can also be achieved through UNION 
[1] C. Y. A. Brenninkmeijer, C. Goble, A. J. G. Gray, P. Groth, A. 
Loizou, and S. Pettifer, “Including Co-referent URIs in a SPARQL 
Query,” COLD2013, http://ceur-ws.org/Vol-1034/ 
21 October 2014 Scientific Lenses – A. J. G. Gray 15
API Hits 
April 2013 – March 2014: 15.8m 
April 2014 – Sept 2014: 14m 
Total: 29.8 million 
21 October 2014 Scientific Lenses – A. J. G. Gray 16
Conclusions 
 Scientific data is complex and messy 
 Requires flexibility in linking 
 Equivalence depends upon context 
 Lenses provide support for operational 
equivalence 
 Chemical structures support automatic 
computing of links with justification 
21 October 2014 Scientific Lenses – A. J. G. Gray 17
Co-authors 
Royal Society of Chemistry 
 Colin Batchelor 
 Karen Karapetyan 
 Jon Steele 
 Valery Tkachenko 
 Antony Williams 
University of Manchester 
 Christian Brenninkmeijer 
 Ian Dunlop 
 Carole Goble 
 Steve Pettifer 
 Robert Stevens 
Swiss Institute for Bioinformatics 
 Christine Chichester 
European Bioinformatics Institute 
 Mark Davies 
 Anna Gaulton 
 John Overington 
University of Vienna 
 Daniela Digles 
Maastricht University 
 Chris Evelo 
 Andra Waagmeester 
 Egon Willighagen 
VU University of Amsterdam 
 Paul Groth 
 Antonis Loizou 
Connected Discovery 
 Lee Harland 
21 October 2014 Scientific Lenses – A. J. G. Gray 18
Questions 
Alasdair J G Gray 
A.J.G.Gray@hw.ac.uk 
alasdairjggray.co.uk 
@gray_alasdair 
Open PHACTS 
pmu@openphacts.org 
openphacts.org 
@open_phacts 
Demo at stall 33 this evening! 
21 October 2014 Scientific Lenses – A. J. G. Gray 19
Open PHACTS Data 
Source Initial Records Triples Properties 
ChEMBL 1,481,473 304,360,749 77 
DrugBank 19,628 517,584 74 
UniProt 564,246 405,473,138 82 
ENZYME 6,187 73,838 2 
ChEBI 40,575 1,673,863 2 
GeneOntology 38,137 2,447,682 26 
GOA 661,232 1,765,622,393 15 
ChemSpider 1,361,568 215,193,441 23 
ConceptWiki 2,828,966 4,291,131 1 
WikiPathways 946 1,949,074 34 
21 October 2014 Scientific Lenses – A. J. G. Gray 20
App Ecosystem An “App Store”? 
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium 
MOE Collector Cytophacts Utopia Garfield SciBite 
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna 
http://www.openphactsfoundation.org/apps.html 
21 October 2014 Scientific Lenses – A. J. G. Gray 21
Discovery Platform 
Apps 
Method 
Calls 
Domain API 
Drug Discovery Platform 
Interactive 
responses 
Production quality 
integration platform 
21 October 2014 Scientific Lenses – A. J. G. Gray 22
Linked Data API 
Drug 
Target Pathway 
Disease (1.4) 
https://dev.openphacts.org/ 
21 October 2014 Scientific Lenses – A. J. G. Gray 23

Más contenido relacionado

Similar a Scientific lenses to support multiple views over linked chemistry data

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...David Peyruc
 
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...Neo4j
 
Integra gen s.a (alint) product pipeline analysis, 2014 update
Integra gen s.a (alint)   product pipeline analysis, 2014 updateIntegra gen s.a (alint)   product pipeline analysis, 2014 update
Integra gen s.a (alint) product pipeline analysis, 2014 updateAmbikabasa
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domainAALForum
 
Mining Small Molecules for Drug Discovery
Mining Small Molecules for Drug DiscoveryMining Small Molecules for Drug Discovery
Mining Small Molecules for Drug DiscoveryGirinath Pillai
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemMariangel (Angie) Garcia, Ph.D
 
Our Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the ChallengeOur Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the ChallengeOECD Environment
 
Paris Data Ladies #14
Paris Data Ladies #14Paris Data Ladies #14
Paris Data Ladies #14Nina Bertrand
 
Euretos presentation ACS
Euretos presentation ACSEuretos presentation ACS
Euretos presentation ACSalbertmons
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Wei Zhong Toh
 
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...You Got Your Engineering in my Data Science - Addressing the Reproducibility ...
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...jonbodner
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research dataSamuel Lampa
 
Creating & Managing Reusable Gene Lists with VSClinical
Creating & Managing Reusable Gene Lists with VSClinicalCreating & Managing Reusable Gene Lists with VSClinical
Creating & Managing Reusable Gene Lists with VSClinicalGolden Helix
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPREdward Perello
 
Activity - Exporting and Analyzing YouTube Data
Activity - Exporting and Analyzing YouTube DataActivity - Exporting and Analyzing YouTube Data
Activity - Exporting and Analyzing YouTube DataKathleen Ludewig Omollo
 
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...Dr. Haxel Consult
 

Similar a Scientific lenses to support multiple views over linked chemistry data (17)

tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
 
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...
From Target to Product - Accelerating the Drug Lifecycle with Knowledge Graph...
 
Integra gen s.a (alint) product pipeline analysis, 2014 update
Integra gen s.a (alint)   product pipeline analysis, 2014 updateIntegra gen s.a (alint)   product pipeline analysis, 2014 update
Integra gen s.a (alint) product pipeline analysis, 2014 update
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domain
 
Mining Small Molecules for Drug Discovery
Mining Small Molecules for Drug DiscoveryMining Small Molecules for Drug Discovery
Mining Small Molecules for Drug Discovery
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
 
Our Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the ChallengeOur Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the Challenge
 
Paris Data Ladies #14
Paris Data Ladies #14Paris Data Ladies #14
Paris Data Ladies #14
 
Euretos presentation ACS
Euretos presentation ACSEuretos presentation ACS
Euretos presentation ACS
 
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Pla...
 
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...You Got Your Engineering in my Data Science - Addressing the Reproducibility ...
You Got Your Engineering in my Data Science - Addressing the Reproducibility ...
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
Creating & Managing Reusable Gene Lists with VSClinical
Creating & Managing Reusable Gene Lists with VSClinicalCreating & Managing Reusable Gene Lists with VSClinical
Creating & Managing Reusable Gene Lists with VSClinical
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPR
 
Activity - Exporting and Analyzing YouTube Data
Activity - Exporting and Analyzing YouTube DataActivity - Exporting and Analyzing YouTube Data
Activity - Exporting and Analyzing YouTube Data
 
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
 

Más de Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesAlasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsAlasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data ContextAlasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked DataAlasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSAlasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsAlasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Alasdair Gray
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptionsAlasdair Gray
 

Más de Alasdair Gray (20)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 

Último

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Scientific lenses to support multiple views over linked chemistry data

  • 1. Scientific lenses to support multiple views over linked Chemistry data Alasdair J G Gray A.J.G.Gray@hw.ac.uk alasdairjggray.co.uk @gray_alasdair Open PHACTS pmu@openphacts.org openphacts.org @open_phacts
  • 2. Multiple Identities GB:29384 P12047 X31045 21 October 2014 Scientific Lenses – A. J. G. Gray 1
  • 3. Gleevec®: Imatinib Mesylate Imatinib Imatinib MesylateMesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N ChemSpider Drugbank PubChem 21 October 2014 Scientific Lenses – A. J. G. Gray 2
  • 4. Gleevec®: Imatinib Mesylate Imatinib Are these records the same? It depends upon your task! Imatinib MesylateMesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N ChemSpider Drugbank PubChem 21 October 2014 Scientific Lenses – A. J. G. Gray 3
  • 5. Example Use Cases I need to perform an analysis, give me details of the active compound in Gleevec. Which targets are known to interact with Gleevec? 21 October 2014 Scientific Lenses – A. J. G. Gray 4
  • 6. Structure Lens I need to perform an analysis, give me Strict Relaxed Analysing Browsing skos:exactMatch (InChI) Scientific Lenses – A. J. G. Gray 5 21 October 2014 details of the active compound in Gleevec.
  • 7. Name Lens Which targets are known to interact Strict Relaxed Analysing Browsing skos:closeMatch (Drug Name) skos:exactMatch (InChI) skos:closeMatch (Drug Name) Scientific Lenses – A. J. G. Gray 6 21 October 2014 with Gleevec?
  • 8. What is a Scientific Lens? A lens defines a conceptual view over the data  Specifies operational equivalence conditions Consists of:  Identifier (URI)  Title (dct:title)  Description (dct:description)  Documentation link (dcat:landingPage)  Creator (pav:createdBy)  Timestamp (pav:createdOn)  Equivalence rules (bdb:linksetJustification) 16 October 2014 Scientific Lenses – A. J. G. Gray 7
  • 9. Lens Effects: Ibuprofen Ibuprofen consists of two equally active stereoisomers. • Stereoisomers not always represented in data Users wish to retrieve information for any stereoisomer. CHEMBL427526 CHEMBL521 CHEMBL175 21 October 2014 Scientific Lenses – A. J. G. Gray 8
  • 10. Default Lens Ibuprofen consists of two equally active stereoisomers. • Stereoisomers not always represented in data Users wish to retrieve information for any stereoisomer. 21 October 2014 Scientific Lenses – A. J. G. Gray 9
  • 11. Stereoisomer Lens Ibuprofen consists of two equally active stereoisomers. • Stereoisomers not always represented in data Users wish to retrieve information for any stereoisomer. 21 October 2014 Scientific Lenses – A. J. G. Gray 10
  • 12. Mapping Generation ✔ ops:OPS437281 has_stereoundefined_parent [ci:CHEMINF_000456] ops:OPS380297 is_stereoisomer_of [ci:CHEMINF_000461] ops:OPS380292 Other relationships • has part • is tautomer of • uncharged counterpart • isotope … 21 October 2014 Scientific Lenses – A. J. G. Gray 11
  • 13. Explorer Screenshot 21 October 2014 Scientific Lenses – A. J. G. Gray 12
  • 14. Explorer Screenshot 21 October 2014 Scientific Lenses – A. J. G. Gray 13
  • 15. OPS Discovery Platform Linked Data API (RDF/XML, TTL, JSON) Semantic Workflow Engine VoID Nanopub Db Data Cache (Virtuoso Triple Store) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing Core Platform “Adenosine receptor 2a” P12374 EC2.43.4 CS4532 VoID Db VoID Nanopub Db VoID Db VoID Nanopub Public Content Commercial Public Ontologies User Annotations Apps 21 October 2014 Scientific Lenses – A. J. G. Gray 14
  • 16. Lenses: Under the hood GRAPH <http://rdf.chemspider.com> { cw:979b545d-f9a9 cheminf:logd ?logd . ?iri cheminf:logd ?logd . FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 ) } GRAPH <http://… Q, L1 Q’ Query Expander Service Identity Mapping Service (BridgeDB) Mappings Profiles cw:979b545d-f9a9, L1 [cw:979b545d-f9a9, cs:2157, chembl:1280, db:db00945] • IMS call adds overhead • Call time below human perception [1] • Can also be achieved through UNION [1] C. Y. A. Brenninkmeijer, C. Goble, A. J. G. Gray, P. Groth, A. Loizou, and S. Pettifer, “Including Co-referent URIs in a SPARQL Query,” COLD2013, http://ceur-ws.org/Vol-1034/ 21 October 2014 Scientific Lenses – A. J. G. Gray 15
  • 17. API Hits April 2013 – March 2014: 15.8m April 2014 – Sept 2014: 14m Total: 29.8 million 21 October 2014 Scientific Lenses – A. J. G. Gray 16
  • 18. Conclusions  Scientific data is complex and messy  Requires flexibility in linking  Equivalence depends upon context  Lenses provide support for operational equivalence  Chemical structures support automatic computing of links with justification 21 October 2014 Scientific Lenses – A. J. G. Gray 17
  • 19. Co-authors Royal Society of Chemistry  Colin Batchelor  Karen Karapetyan  Jon Steele  Valery Tkachenko  Antony Williams University of Manchester  Christian Brenninkmeijer  Ian Dunlop  Carole Goble  Steve Pettifer  Robert Stevens Swiss Institute for Bioinformatics  Christine Chichester European Bioinformatics Institute  Mark Davies  Anna Gaulton  John Overington University of Vienna  Daniela Digles Maastricht University  Chris Evelo  Andra Waagmeester  Egon Willighagen VU University of Amsterdam  Paul Groth  Antonis Loizou Connected Discovery  Lee Harland 21 October 2014 Scientific Lenses – A. J. G. Gray 18
  • 20. Questions Alasdair J G Gray A.J.G.Gray@hw.ac.uk alasdairjggray.co.uk @gray_alasdair Open PHACTS pmu@openphacts.org openphacts.org @open_phacts Demo at stall 33 this evening! 21 October 2014 Scientific Lenses – A. J. G. Gray 19
  • 21. Open PHACTS Data Source Initial Records Triples Properties ChEMBL 1,481,473 304,360,749 77 DrugBank 19,628 517,584 74 UniProt 564,246 405,473,138 82 ENZYME 6,187 73,838 2 ChEBI 40,575 1,673,863 2 GeneOntology 38,137 2,447,682 26 GOA 661,232 1,765,622,393 15 ChemSpider 1,361,568 215,193,441 23 ConceptWiki 2,828,966 4,291,131 1 WikiPathways 946 1,949,074 34 21 October 2014 Scientific Lenses – A. J. G. Gray 20
  • 22. App Ecosystem An “App Store”? Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium MOE Collector Cytophacts Utopia Garfield SciBite KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna http://www.openphactsfoundation.org/apps.html 21 October 2014 Scientific Lenses – A. J. G. Gray 21
  • 23. Discovery Platform Apps Method Calls Domain API Drug Discovery Platform Interactive responses Production quality integration platform 21 October 2014 Scientific Lenses – A. J. G. Gray 22
  • 24. Linked Data API Drug Target Pathway Disease (1.4) https://dev.openphacts.org/ 21 October 2014 Scientific Lenses – A. J. G. Gray 23

Notas del editor

  1. Concept appears in multiple datasets, each with its own identifier This talk is about supporting the multiple identities that exist Rather than define a single approach, we want to support the use of multiple identifiers
  2. Example drug: Gleevec Cancer drug for leukemia Lookup in three popular public chemical databases Different results
  3. Are these records the same? It depends on what you are doing with the data! Each captures a subtly different view of the world Data is messy!
  4. Analysis requires precise knowledge of the form of the compound across datasets Targets is a search activity, some likely to be mis-entered
  5. Interested in physiochemical properties of Gleevec
  6. Interested in biomedical and pharmacological properties sameAs != sameAs depends on your point of view Links relate individual data instances: source, target, predicate, reason. Links are grouped into Linksets which have VoID header providing provenance and justification for the link.
  7. Validate structure: Source data is messy! Identify common problems: Charge imbalance Stereochemistry Compute physiochemical properties Identify related properties based on structure 17 relationship types
  8. Pharmacology count 2370  3044
  9. Import data into cache API calls populate SPARQL queries Integration approach Data kept in original model Data cached in central triple store API call translated to SPARQL query Query expressed in terms of original data Queries expanded by IMS to cover URIs of original datasets
  10. Query with URIs Extract URIs Find equivalents Expand query Optimise based on context
  11. OPS Discovery Platform is actively being used Lenses under active evaluation and refinement within the OPS consortium
  12. Statistics to be added 1,030,727,289 triples Hosted on beefy hardware; data in memory (aim)
  13. A platform for integrated pharmacology data Relied upon by pharma companies Public domain, commercial, and private data sources Provides domain specific API Making it easy to build multiple drug discovery applications: examples developed in the project
  14. Linked data API: multiple response formats (JSON, RDF, XML, CSV …) 3scala deployment Public dataset