This document describes the LIDDI dataset, a linked dataset of drug-drug interactions with associated provenance information. The dataset combines data from multiple sources on 345 drugs and 10 adverse events. It represents drug-drug interactions as nanopublications with separate graphs for the assertion, provenance, and publication information. The dataset contains over 98,000 nanopublications and provides linkages between individual drug properties and mappings across data sources. It is intended to help prioritize drug-drug interactions found in electronic health records.
Provenance-Centered Dataset of Drug-Drug Interactions
1. Provenance-Centered Dataset of
Drug-Drug Interactions
International Semantic Web Conference 2015
Datasets and Ontologies Track
J U A N M . B A N D A 1 , T O B I A S K U H N 2 , N I G A M H . S H A H 1 , M I C H E L D U M O N T I E R 1
1 S T A N F O R D C E N T E R F O R B I O M E D I C A L I N F O R M A T I C S R E S E A R C H , S T A N F O R D , C A ;
2 D E P A R T M E N T O F H U M A N I T I E S , S O C I A L A N D P O L I T I C A L S C I E N C E S , E T H Z U R I C H ,
S W I T Z E R L A N D
2. Motivation
STUDIES ANALYZING COSTS OVER TIME HAVE SHOWN THAT ADVERSE DRUG
REACTIONS (ADRS) COST OVER $136 BILLION A YEAR
ONE SIGNIFICANT CAUSE OF ADRS ARE DRUG-DRUG INTERACTIONS (DDIS)
WHICH GREATLY AFFECT OLDER ADULTS DUE TO THE MULTIPLE DRUGS THEY
ARE TAKING
1
4. Our solution
We present LIDDI – a LInked Drug-Drug Interactions
nanopublication-based RDF dataset with trusty URIs
LIDDI combines data from various disparate sources,
always aware of their provenance due to differences in
individual quality and overall confidence
3
5. Bonus features
LIDDI provides the linking components to branch from
DDIs into individual properties of each drug via
Drugbank and UMLS, as well as mappings between all
selected sources
4
6. Why nanopublications and trusty URIs?
Consisting of an assertion graph with triples expressing
an atomic statement, a provenance graph reporting how
this assertion came about, and a publication information
graph that provides meta-data for the nanopublication
We want URIs that are verifiable, immutable, and
permanent. Trusty URIs come with the possibility to
verify with 100% confidence that a retrieved file
represents the correct and original state of the resource
5
7. Sample DDIs
Sample DDI w event
Drug 1: rosuvastatin
Drug 2: caspofungin
Event: rhabdomyolysis
Sample DDI no event
Drug 1: rosuvastatin
Drug 2: caspofungin
6
9. Data Schema: DDI
We use drug-drug-interaction from SIO and PROV for provenance
8
10. Data Schema: Drug and mappings
Contains all mappings between sources
9
11. How do we derive the event part of a DDI
Used to annotate clinical text at Stanford [1] and for
the DDI descriptions
10
[1] Iyer SV, Harpaz R, LePendu P, Bauer-Mehren A, Shah NH. Mining clinical text for signals of adverse drug-drug interactions.
J Am Med Inform Assoc. 2014;21(2):353-362.
12. LIDDI: DDIs from HER [1]
5,983 DDI-Event (x2)
11
LIDDI
[1] Iyer SV, Harpaz R, LePendu P, Bauer-Mehren A, Shah NH. Mining clinical text for signals of adverse drug-drug interactions.
J Am Med Inform Assoc. 2014;21(2):353-362.
13. LIDDI: DDIs from public sources
21,580 DDI-Event (x2)
2,130 DDI-Event (x2)
871 DDI-Event (x2)
12
LIDDI
14. LIDDI: DDIs generated by prediction methods (1)
[3] Tatonetti NP, Fernald GH, Altman RB. A novel signal detection algorithm for identifying hidden drug-drug interactions in adverse event reports.
J Am Med Inform Assoc 2012;19:79–85.
[4] Avillach P, Dufour JC, Diallo G, Salvo F, Joubert M, Thiessard F, Mougin F, Trifiro G, Fourrier-Reglat A, Pariente A, Fieschi M. Design and validation of
an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project. J Am Med Inform Assoc. 2013;20(3):446–45
74 DDI-Event (x2) [4]
16,145 DDI-Event (x2) [3]
13
LIDDI
15. LIDDI: DDIs generated by prediction methods (2)
4,185 DDI no event (x2) [5]
112 DDI no event (x2) [6]
684 DDI no event
[7] [5] Gottlieb A, Stein GY, Oron Y, et al. INDI: a computational
framework for inferring drug interactions and their associated
recommendations.
Mol Syst Biol 2012;8:592.
[6] Vilar S, Uriarte E, Santana L, et al. Similarity-based modeling in
large-scale prediction of drug-drug interactions. Nat Protocols. 2014
09//print;9(9):2147-63.
[7] Cami A, Manzi S, Arnold A, Reis BY. Pharmacointeraction
Network Models Predict Unknown Drug-Drug Interactions. PLoS
ONE. 2013;8(4):e61468.
12
LIDDI
18. Future Work
Over 1,200 drugs are being mapped to add to this resource
Add more drugs to LIDDI
Add more data sources
We have identified and started mapping 3 new data sources
Some sources provide statistical confidence values for their DDIs
Add additional information to the DDIs
15
19. Acknowledgements to our data providers
Stanford
- Assaf Gottlieb
Harvard University
- Aurel Cami
- Ben Reis
Columbia University
- Santiago Vilar
- George Hripcsak
16
This is a very expensive issue that is actively researched, but people rarely combine sources or compare results
We have more than enough statistically sound and validated methods, each one with their own particular focus on where to look for interactions.
We are looking at the one based on EHR data.
By relying on Semantic Web technologies such as RDF -- and more specifically nanopublications -- these provenance paths can be represented in an
explicit and uniform manner.
Consolidates multiple data sources in one cohesive linked place that allows researchers to have immediate access to
multiple collections of DDI predictions from public databases and reports, biomedical literature, and methods that take different
and sometimes more comprehensive approaches.
Nano publication:
Provenance graph: Data source or generation process
Information graph for meta-data as creators and timestamp
We use the PROV ontology to capture the provenance of
each DDI by linking the assertion graph URI to a PROV activity. This activity
is linked to the software used to generate the drug and event mappings, as well
as to the citation for the method, the generation time, and the dataset where the
DDIs were extracted from. In this last case we also provide a direct link to the
original dataset used.
EMR DDIs mined from at least 100 appearances, this dataset provides Adjusted odds ratios for all predictions
FDA Adverse Event Reporting System (FAERS)
We analyzed over 3.2 million reports found on the FAERS that contain drug-drug-event triples. Our restrictive threshold and drug/event space produced a total of 871 triples.
Drugbank
We examined a total of 2,130 new triples for this analysis.
Drugs.com
We used web crawling agents to mine Drugs.com for their drug-drug interaction data, producing 21,580 triples.
TWOSIDES [2]
Provides 4,651,131 potential DDI associations mined from the FDA’s FAERS using data from the first quarter of 2004 to the first quarter of 2009. We took 16,145 triples from this database.
MEDLINE
We adapted a method by Avilach et al. [3] in order to find DDIs using MEDLINE queries. We found a total of 74 triples.
INferring Drug Interactions (INDI) [5]
Infers interactions based on their similarity to existing known adverse events based that result from a common metabolizing enzyme (CYP). We used a total of 4,185 unique DDI tuples.
Similarity-based Modeling [6]
Uses drug similarity information extracted from multiple sources like: 2D and 3D molecular structure, interaction profile, target similarities, and side-effect similarities. Trained on our gold-standard we found 112 DDI tuples
Predictive Pharmacointeraction Networks [7]
Ppredicts unknown DDIs by exploiting the network structure of all known DDIs, alongside other intrinsic and taxonomic properties of drugs and adverse events. This method found 684 DDI predictions that match our EHR-based predicted tuples.