This document provides information about a datathon focused on drug repurposing for rare diseases. It discusses:
1) The goals of the datathon to use data science approaches like machine learning to identify potential drug repurposing opportunities for rare diseases.
2) The partners involved - The Pistoia Alliance, Cures Within Reach, and Mission: Cure.
3) How participants can access data on the Entellect platform and work collaboratively on the datathon challenge to find new treatments for rare diseases.
1. Datathon for Drug Repurposing for Rare
Diseases
1st October 2018
Dr. Jabe Wilson, Consulting Director, Text and Data Analytics
Introduction to the Datathon
2. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
3. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
5. Rare Diseases and Repurposing
Chronic pancreatitis
• Chronic pancreatitis involves the progressive and permanent destruction of the pancreas, often resulting in exocrine and
endocrine insufficiency and chronic pain. The cause is often multifactorial, usually involving structural or genetic abnormalities in
children.
Meniere's disease
• Meniere's disease (i.e., endolymphatic hydrops) is an idiopathic condition of the membranous labyrinth. It is characterized by
spontaneous bouts of prolonged vertigo, fluctuating hearing loss and tinnitus. Histologically, the amount of endolymph within
the scala media is excessive. The disease primarily affects adults between the ages of 30 and 60 years and is somewhat more
common in women.
Retinitis Pigmentosa
• Retinitis pigmentosa is a group of hereditary disorders characterized by progressive deterioration of vision due to dysfunction,
cell loss and eventual atrophy of the retina Night blindness is usually the presenting feature, with gradual deterioration of retinal
rods and cones.
Obsessive compulsive disorder
• Obsessive-compulsive disorder (OCD) is characterized by the presence of either obsessions (uncontrollably recurring and
intrusive thoughts) or compulsions (uncontrollably recurring needs to repeat behavior), but commonly both. The symptoms can
cause significant functional impairment and/or distress..
02.10.2018
6. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
7. • The Pistoia Alliance:
AI/ML Centre of
Excellence
02.10.2018
The partners
8. • The Pistoia Alliance:
AI/ML Centre of
Excellence
• Cures Within Reach:
Supporting Repurposing
to Clinical Trials
02.10.2018
The partners
9. The partners
• The Pistoia Alliance:
AI/ML Centre of
Excellence
• Cures Within Reach:
Supporting Repurposing
to Clinical Trials
• Mission: Cure: A new
financing model for
curing disease (based
on patient outcomes)02.10.2018
10. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
11. Why a Datathon - the Datathon approach
• Hackathon vs Datathon
02.10.2018
13. | 13
Why a Datathon - the Datathon approach
Picked relevant
pathways
(from a collection of 1800
models)
Explored functions of
proteins using 6.2M pre-
text mined relations
and embedded Gene
Ontology
Summarized what is known
about CHI mechanism in an
overview model
14. | 14
Mean of activities
among these targets
Targets and activities
for each compound
Drug-likeness
metrics for
sorting/classification
• All compounds that
were observed to bind
to targets in pathway
• Sorted by number of
active targets.
Too many targets may
suggest lack of specificity.
Find all targets that
could be used to affect
the disease state
Query for each protein to find
compounds that target it (>6
log units)
Collate data by compound to summarize the
targets/activities related to disease that the
compound hits
• Compute geometric mean of activities for ranking
• Rank by number of targets and geometric mean of
activities against targets
Step 1 Step 2
Step 3
Why a Datathon - the Datathon approach
15. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
16. Data subset available in the Datathon
Person
Drug
Org
Assay
Action
Pathway
Disease
Implication
Trial
Adverse
Event
Target
What are the
druggable targets in
my disease?
Species
Target:
- provenanceName
- target
- uniprotId
- sequence
- targetType
- label
- speciesId
- speciesName
- geneSymbol
Disease:
- provenanceName
- disease
- name
Implication:
- provenanceName
- implication
- disease
- target
- score
18. Person
Drug
Org
Assay
Bio-
Activity
Pathway
Disease
Implication
Trial
Adverse
Event
Target
What classes
of drugs cause
this adverse
reaction?
Species
Data subset available in the Datathon
Substance:
- provenanceName
- substance
- name
- compoundType
- substanceTypeName
- inchiCode
- molecularFormula
- charge
- numberOfAtoms
- numberOfComponents
- numberOfElements
- numberOfFragments
- numberOfStructure
- molWeightPublishedValue
- molWeightPublishedUnit
- molWeightStandardValue
- mpvalue
Adverse event:
- provenanceName
- adverseEvent
- disease
- inducedBy
Citation:
- provenanceName
- citation
- pui
- publicationShortName
- publicationName
- publicationYear
19. Search &
workflow
Visualization
Predictive
analytics
Accelerated
data-science
driven R&D
• Chemistry
intelligence
• Disease
intelligence
• Safety
intelligence
• Efficacy
• intelligence
• Trial
intelligence
• Drug
intelligence
• Commercial
intelligence
Exploratory
Analytics
Compound
& Reaction
Trial
Post Market Assay
-omics Translational
Scientific data from internal
external sources
Ingest &
enrich
Connect Serve
Entellect is a smart and flexible life sciences platform that powers R&D discovery
by using Elsevier’s trusted approach towards data integration and harmonization.
Entellect delivers connected and AI-ready data by linking and enriching disparate content
against established life science taxonomies. Combined with the option of Elsevier data, the
result is a scalable knowledge environment, enabling exploratory and predictive analytics
applications
20. 02.10.2018
“Machine learning
won’t work if your data
is rigidly siloed.”
“One major challenge
is collecting enough
reliable information to
properly train AI systems.
AI is as good as the
data.”
Nick Patience
Founder, 451
Research
“Organizations need to
make sure that the data
being accessed is
treated and defined
consistently across the
sources. Otherwise,
virtualization won't work.”
“All the major AI
advances have been
fueled by advances in
data sets. The algorithms
are easy….
"Collecting, classifying
and labeling datasets
used to train the
algorithms is the grunt
work that’s difficult”
Aspuru-Guzik
Professor of Chemistry &
Machine Learning, Harvard
University
Michael Linhares
BIS team leader,
Pfizer
JJ Guy
CTO, Jask (AI co.)
Access, curation of
authoritative life science
data
Integration of disparate
data, structured and
unstructured
Normalized and
standardized data with
industry standard
taxonomies
‘Siloed’ Lack of standards
Requires labeling and
contextPoor quality ‘Un-siloed’ Harmonized Enriched and linkedQuality1
2 3 4
Professional Services
Build custom and off-the-
shelf analytics tools
Discovery Pre-clinical Clinical Post Launch
Repurposing drugs – semantic data and machine learning
21. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
22. Examples of what can be done
We are looking for innovative data science approaches:
• Semantic data analysis – reasoning with assertions and numeric data
• Machine Learning – looking for patterns beyond the assertions
02.10.2018
41. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
42. How do you participate?
Please go here:
https://www.elsevier.com/promo/entellect/datathon
• 1. Info page
• 2. Register
• 3. Get access (credentials will be emailed)
02.10.2018
43. How do you participate?
4. Get team together:
• join as a team
• find through slack discussion
• assigned group on 15th October
5. Access via Elsevier Entellect:
Data + Environment to run ML + Jupyter notebooks
6. Collaborate on slack
Note: As part of the judging process you will be asked to
comment on fellow participants solutions, as well as the Elsevier
Data Science team, and representatives from the partner
organisations
44. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
45. Schedule
• October 1: Pistoia Alliance webinar and opening of registration
• October 9: Datathon launch event in Boston
• October 15: Teams to be selected (later registration is possible for teams)
• October – November: Progress webinars (TBC)
• November 30: Close of Datathon
• Early December (TBC): Judging (involving participants in peer review)
• End December: Decisions to be made before the end of the year – and taken forward
• March 2019: Spring conference in London with awards ceremony and reports on progress
02.10.2018
46. Datathon for Drug Repurposing for Rare Diseases
• Rare Diseases and Repurposing
• The Partners
• Why a Datathon - the Datathon approach
• The data to be made available
• Examples of what can be done
• How do you participate – what to expect
• Schedule
• Take part
02.10.2018
47. Take part
Please go here:
https://www.elsevier.com/promo/entellect/datathon
We are looking for teams to:
• share openly their experiences as lessons in
best practice;
• to participate in on going communications
around the Datathon (possible publication);
• get involved with the next steps in taking
these findings to clinical trial!
02.10.2018