Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
1. Open Source pre-competitive drug discovery
Moving beyond linear investigations
Both of the science and of how we work
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ Amsterdam
February 28, 2012
2. Partnering
&
Collabora/on-‐So
what
has
been
possible?
All
pa&ents
now
>25,000
at
a
Cancer
Center
partnered
provide
consented
expression
on
their
pts
for
classifying
sub-‐popula&ons
Combina&on
Therapies-‐
each
at
Ph
I-‐
joint
development
2
Pharma
Sharing
all
the
CT
Onc
Trial
imagining
files
among
2
Pharma
Link
Parma
with
an
“Ins&tute
for
Applied
Cancer
Center”
Share
genomic
data
on
25,000
samples
with
clinical
records
and
Expression
and
Exomes
among
three
Pharma
3. Partnering
&
Collabora/on-‐So
what
has
been
possible?
All
pa&ents
now
>25,000
at
a
Cancer
Center
partnered
provide
consented
expression
on
their
pts
for
classifying
sub-‐popula&ons
2006
MoffiP
Cancer
Center-‐
Merck
Combina&on
Therapies-‐
each
at
Ph
I-‐
joint
development
2
Pharma
2007
AZ
Merck
(Mek/Akt)
Sharing
all
the
CT
Onc
Trial
imagining
files
among
2
Pharma
2008
BMS
&
Merck
Link
Parma
with
an
”
Ins&tute
for
Applied
Cancer
Center”
2008
Belfer-‐
Merck
Share
genomic
data
on
25,000
samples
with
clinical
records
and
Expression
and
Exomes
among
three
Pharma
2010
Asian
Cancer
Research
Group
ACRG-‐
Lilly
Merck
Pfizer
4. So
what
is
the
problem?
Most
approved
therapies
were
assumed
to
be
monotherapies
for
diseases
represen&ng
homogenous
popula&ons
Our
exis&ng
disease
models
o]en
assume
pathway
knowledge
sufficient
to
infer
correct
therapies
11. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
16. Why not share clinical /genomic data and model building within
teams in ways currently used by the software industry
(power of tracking workflows and versioning
22. Sage Metagenomics Project
Processed Data
(S3)
• > 10k genomic and expression standardized datasets indexed in SCR
• Error detection, normalization in mG
• Access raw or processed data via download or API in downstream analysis
• Building towards open, continuous community curation
23. Sage Metagenomics using Amazon Simple Workflow
Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/
24. Amazon SWF and Synapse
• Maintains state of analysis • Hosts raw and processed data for
• Tracks step execution further reuse in public or private
projects
• Logs workflow history
• Provides visibility into
• Dispatches work to Amazon or intermediate results and
remote worker nodes algorithmic details
• Efficiently match job size to • Allows programmatic access to
hardware data; integration with R
• Provides error handling and • Provides standard terminologies
recovery for annotations
• Search across data sets
25. Synapse Roadmap
• Data Repository
• Projects and security Synapse Platform Functionality
• R integration • Workflow templates
• Analysis provenance • Social networking
• Publishing figures • User-customized
• Search • Wiki & collaboration tools dashboards
• Controlled Vocabularies • Integrated management • R Studio integration
• Governance of restricted of cloud resources • Curation tool integration
data
Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future
Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013 Q3-2013 Q4-2013
• TCGA • Predictive modeling • TBD: Integrations with other
• METABRIC breast workflows visualization and analysis
cancer challenge • Automated processing of packages
common genomics platforms
• 40+ manually curated clinical studies
• 8000 + GEO / Array Express datasets
• Clinical, genomic, compound sensitivity
• Bioconductor and custom R analysis
Data / Analysis Capabilities
27. Five
Pilots
involving
Sage
Bionetworks
CTCAP
The
Federa/on
Portable
Legal
Consent
ORM
S
Sage
Congress
Project
MAP
F
PLAT
NEW
Arch2POCM
RULES GOVERN
28. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
Started Sept 2010
29. Shared clinical/genomic data sharing and analysis will
maximize clinical impact and enable discovery
• Graphic
of
curated
to
qced
to
models
31. How can we accelerate the pace of scientific discovery?
2008
2009
2010
2011
Ways to move beyond
“traditional” collaborations?
Intra-lab vs Inter-lab
Communication
Colrain/ Industrial PPPs Academic
Unions
33. sage federation:
model of biological age
Faster Aging
Predicted
Age
(liver
expression)
Slower Aging
Clinical Association
- Gender
- BMI
- Disease
Age Differential Genotype Association
Gene Pathway Expression
Chronological
Age
(years)
34. Reproducible
science==shareable
science
Sweave: combines programmatic analysis with narrative
Dynamic generation of statistical reports
using literate data analysis
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
36. Presentation outline
1)
Predic&ng
drug
response
2)
Future
approaches:
3)
Standardized
from
cancer
cell
lines
network-‐based
predictors
workflows
for
data
and
mul&-‐task
learning
management,
Cancer
cell
line
versioning
and
encyclopedia
method
comparison
Molecular characterization
Network
/
pathway
(1,000 cell lines) prior
informa&on
Currently
mRNA
copy number
somatic mutations (36
cancer-related genes)
In progress
targeted exon sequencing Vaske,
et
al.
epigenetics
microRNA TCGA
/ICGC
lncRNA Transfer
Molecular characterization
learning
(50 tumor types)
phospho-tyrosine kinase
metabolites
Viability screens (500 cell genomics
lines, 24 compounds)
transcriptomics
Small molecule screen epigenetics
Predic&ve
Clinical data
model
Vaske,
et
al.
37. 1) Data
management
APIs
to
load
standaridzed
objects,
e.g.
R
ExpressionSets
(MaP
Furia):
ccleFeatureData
<-‐
getEn/ty(ccleFeatureDataId)
ccleResponseData
<-‐
getEn/ty(ccleResponseDataId)
2)
tAutomated,
standardized
workflows
for
cura&on
and
QC
of
large-‐scale
datasets
(-‐
getEn/ty(tcgaFeatureDataId)
cgaFeatureData
< Brig
Mecham).
tcgaResponseData
<-‐
getEn/ty(tcgaResponseDataId)
A. TCGA:
Automated
cloud-‐based
processing.
B. GEO
/
Array
Expression:
Normaliza/on
workflows,
cura/on
of
phenotype
using
standard
ontologies.
C. Addi/onal
studies
with
gene/c
and
phenotypic
data
in
Sage
repository
(e.g.
CCLE
and
Sanger
cell
line
datasets)
Observed Data!=! Systematic Variation! +! Random Variation!
=! +! +!
3) Pluggable
API
to
implement
predic&ve
modeling
algorithms.
Normalization: Remove the influence of
adjustment variables on data...!
A) Support
for
all
commonly
used
machine
learning
methods
4) Sta&s&cal
performance
assessment
across
(for
automated
benchmarking
against
new
methods)
models.
and
mustomPredict()
methods.
B) Pluggable
custom
=! ethods
as
R
classes
implemen/ng
customTrain()
c +!
custom
model
1
be
arbitrarily
complex
(e.g.
pathway
and
other
A) Can
custom
model
2
custom
model
N
priors)
5) Output
of
candidate
biomarkers
aoops.
B) Support
for
paralleliza/on
in
for
each
lnd
feature
evalua&on
(e.g.
GSEA,
pathway
analysis)
custom
model
1
custom
model
2
custom
model
N
6)
Experimental
follow-‐up
on
top
predic&ons
(TBD)
E.g.
for
cell
lines:
medium
throughput
suppressor
/
enhancer
screens
of
drug
sensi/vity
for
knockdown
/
overexpression
of
predicted
biomarkers.
39. Sage
Congress
Project
April
20
2012
RealNames
Parkinson’s
Project
Revisi/ng
Breast
Cancer
Prognosis
Fanconi’s
Anemia
(Responders
Compe//ons-‐
IBM-‐DREAM)
41. Arch2POCM
Restructuring
the
Precompe//ve
Space
for
Drug
Discovery
How
to
poten/ally
De-‐Risk
High-‐Risk
Therapeu/c
Areas
42. Arch2POCM: Highlights
A PPP To De-Risk Novel Targets That The Pharmaceutical Industry Can
Then Use To Accelerate The Development of New and Effective Medicines
• The Arch2POCM will be a charitable Public Private Partnership (PPP) that will file no patents and
whose scientific plan (including target selection) will be endorsed by its pharmaceutical, private
and public funders
• Arch2POCM will de-risk novel targets by developing and using pairs of test compounds (two
different chemotypes) that interact with the selected targets: the compounds will be developed
through Phase IIb clinical trials to determine if the selected target plays a role in the biology of
human disease
• Arch2POCM will work with and leverage patient groups and clinical CROs to enable patient
recruitment, and with regulators to design novel studies and to validate novel biomarkers
• Arch2POCM will make its GMP test compounds available to academic groups and foundations so
they can use them to perform clinical studies and publish on a multitude of additional indications
• Arch2POCM will release all reagents and data to the public at pre-defined stages in its drug
development process. To ensure scientific quality, data and reagents will be released once they
have been vetted by an independent scientific committee
• Arch2POCM will publish all negative POCM data immediately in order to reduce the number of
ongoing redundant proprietary studies (in pharma, biotech and academia) on an invalidated
target and thereby
– minimize unnecessary patient exposure
– provide significant economic savings for the pharmaceutical industry
• In the rare instance in which a molecule achieves positive POCM, Arch2POCM will ensure that
the compound has the ability to reach the market by arranging for exclusive access to the
proprietary IND database for the molecule 42
43. Arch2POCM: scale and scope
• Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/
Immunology. One for Neuroscience/Schizophrenia/Autism. Both
programs will have 6-8 drug discovery projects (targets) - ramped up
over a period of 2 years
– It is envisioned that Arch2POCM’s funding partners will select targets
that are judged as slightly too risky to be pursued at the top of pharma’s
portfolio, but that have significant scientific potential that could benefit
from Arch2POCM’s crowdsourcing effort
• These will be executed over a period of 5 years making a total of 16
drug discovery projects
– Projected pipeline attrition by Year 5 (assuming 12 targets loaded in
early discovery)
• 30% will enter Phase 1
• 20% will deliver Ph 2 POCM data 43
44. Arch2POCM: proposed funding strategy
– Arch2POCM funding will come from a combination of public
funding from governments and private sector funding from
pharmaceutical and biotechnology companies and from private
philanthropists
– By investing $1.6 M annually into one or both of Arch2POCM’s
selected disease areas, partnered pharmaceutical companies:
1. obtain a vote on Arch2POCM target selection
2. gain real time data access to Arch2POCM’s12- 16 drug discovery
projects
3. have the strategic opportunity to expand their overall portfolio
44
45. Entry points for Arch2POCM programs:
Two compounds (different chemotypes) will be advanced per target
Pioneer targets - genomic/ genetic
- disease networks
- academic partners
- private partners
- SAGE, SGC,
Lead Lead
Preclinical Phase I Phase II
identification optimisation
Assay
in vitro
probe
Lead Clinical Phase I Phase II
candidate asset asset
Stage-gate 1: Early Discovery and Stage-gate 2: Pharma’s re-
PCC Compounds (75%) purposed clinical assets (25%) 45
46. Pipeline flow for Arch2POCM
Five Year Objective: Initiate ≈ 8 drug discovery projects with 6 entering in Early Discovery, one entering in
pre-clinical and one entering in PH I
Months → 0-6 7-12 13-18 19-24 25-30 31-36 37-42 43-48 49-54 55-60
Early discovery (2) Pre-clinical Ph 11.3 Ph 2
Year #1 Pre-clinical (1) Ph 1 Ph 2
Arch2POCM
Target Load 11
Early discovery (4) Pre-clinical Ph 1
Year #2 Ph 1 (1) Ph 2
Arch2POCM
Target Load 1
Early discovery (45% PTRS) Arch2POCM Snapshot at Year 5
Pre-clinical (70% PTRS) Targets
Loaded
8
Ph I (65% PTRS) Projected
INDs
filed
3-‐4
Ph II (10% PTRS) Ph
1
or
2
Trials
In
Progress
2
Projected
Complete
Ph
2
(POCM)
Data
1
*PTRS = Probability of technical and regulatory success
Sets
46
47. The case for epigenetics/chromatin biology
1. There are epigenetic oncology drugs on the market (HDACs)
2. A growing number of links to oncology, notably many genetic links (i.e.
fusion proteins, somatic mutations)
3. A pioneer area: More than 400 targets amenable to small molecule
intervention - most of which only recently shown to be “druggable”, and
only a few of which are under active investigation
4. Open access, early-stage science is developing quickly – significant
collaborative efforts (e.g. SGC, NIH) to generate proteins, structures,
assays and chemical starting points
47
48. The current epigenetics universe
Domain Family Typical substrate class* Total
Targets
Histone Lysine Histone/Protein K/R(me)n/ (meCpG) 30
demethylase
Bromodomain Histone/Protein K(ac) 57
R Tudor domain Histone Kme2/3 - Rme2s 59
O
Chromodomain Histone/Protein K(me)3 34
Y
A MBT repeat Histone K(me)3 9
L
PHD finger Histone K(me)n 97
Acetyltransferase Histone/Protein K 17
Methyltransferase Histone/Protein K&R 60
PARP/ADPRT Histone/Protein R&E 17
MACRO Histone/Protein (p)-ADPribose 15
Histone deacetylases Histone/Protein KAc 11
395
Now known to be amenable to small molecule inhibition 48
49. Why is Arch2POCM a “smart bet” for Pharma
investment?
Arch2POCM:
an
external
epigene/c
think
tank
from
which
Pharma
can
load
the
most
likely
to
succeed
targets
as
proprietary
programs
or
leverage
Arch2POCM
results
for
its
other
internal
efforts
• A
front
row
seat
on
the
progression
of
6-‐
8
epigene/c
targets
means
that:
• Pharma
can
select
the
epigene/c
targets
that
best
compliment
their
internal
pormolio
and
for
which
there
is
the
greatest
interest
• Pharma
can
structure
Arch2POCM’s
projects
so
that
key
objec/ves
line
up
with
internal
go/no-‐
go
decisions
• Pharma
can
use
Arch2POCM
data
to
trigger
its
internal
level
of
investment
on
a
par/cular
target
• Pharma
can
use
Arch2POCM
resources
to
enrich
their
internal
epigene/cs
effort:
ac/ve
chemotypes,
assays,
pre-‐clinical
models,
biomarkers,
gene/c
and
phenotypic
data
for
pa/ent
stra/fica/on,
rela/onships
to
epigene/c
experts
•
Pharma
can
use
Arch2POCM’s
lead
compound
chemotypes
to:
•
inform
their
proprietary
medicinal
chemistry
efforts
on
the
target
•
iden/fy
chemical
scaffolds
that
impact
epigene/c
pathways:
a
proprietary
combina/on
therapy
opportunity
•
Toxicity
screening
of
Arch2POCM
compounds
with
FDA
tools
can
be
used
to
guide
internal
proprietary
chemistry
efforts
in
oncology,
inflamma/on
and
beyond
• Arch2POCM’s
crowd
of
scien/sts
and
clinicians
provides
its
Pharma
partners
with
parallel
shots
on
goal
at
the
best
context
for
Arch2POCM’s
compounds/targets
49
50. How will Arch2POCM provide “line of sight” to new
medicines?
Arch2POCM will partner with scientists, clinicians and CROs that:
• use “Omics” approaches to construct predictive models of disease networks
(genomic, proteomic, signaling and metabolic)
• have strategies available to identify those disease network gene(s) which
when perturbed, impact the overall functioning of the network
• already have epigenetic assays in place to identify chemotype structures
(from discovery and/or pharma’s re-purposed un-used clinical assets) that
impact the target and disease-correlated molecular phenotypes
• already have biomarker tools available that can be tested for correlation to
Arch2POCM’s targets
• already have access to patient data and/or patient groups to mine for
genetic and phenotypic signatures that may represent best responders for
Arch2POCM clinical trials
50
51. How will Arch2POCM provide “line of sight” to new
medicines?
• Arch2POCM’s Ph II validation of high risk high opportunity targets
focuses Pharma’s NME efforts
• Positive POCM data: De-risked validated targets for Pharma development
• Negative POCM data: public release of this data minimizes the amount of time
and money that Pharma and the industry place on failed targets
• Arch2POCM’s clinical candidate compounds provide Pharma with
multiple paths to new medicines
• Arch2POCM compounds that achieve POCM can be advanced into Ph 3 by
Arch2POCM Members
• The purchaser of Arch2POCM’s IND database obtains a significant time advantage
over competitors to generate Phase III data and proceed to market
• NMEs that derive from Arch2POCM will launch with database exclusivity protections:
5-8 years to garner a return on investment
• The crowd’s testing of Arch2POCM compounds may identify alternative/better
contexts for agonizing/antagonizing the disease biology target
• indications
• patient stratification
• combination therapy options
51
52. Arch2POCM: current partnering status
• Pharmaceutical Funding Partners
– Three companies are considering a potential role as industry anchors for Arch2POCM
– Two companies have demonstrated interest in Arch2POCM and their company leadership wants to
go to next step- awaiting face to face discussions to go over agreement
• Public Funding Partners
– Good progress is being made to obtain financial backing for Arch2POCM from public funders in a
number of countries (Canada, United Kingdom and Sweden) for both epigenetics and for CNS
– Ontario Brain Institute, Canada has allocated $3M to the development of an autism clinical network that is
committed to work with Arch2POCM
• Philanthropic Funding Partners: awaiting designation of anchor partners
• In kind partners
– GE Healthcare (imaging): lead diagnostics partner and willing to share its experimental oncology
biomarkers
– Cancer Research UK: through some of its drug discovery and development resources considering
participating in Arch2POCM through “in kind efforts”
• Academic partners
– Institutions that have indicated willingness to let their scientists participate without patent filing:
UCSF, Massachusetts General Hospital, University of North Carolina, University of Toronto, Oxford
University, Karolinska Institute
– Academic community of epigenetic experts/resources already identified
• Regulatory partners: Because the objective of the Arch2POCM PPP is to probe and
elucidate disease biology as opposed to develop new proprietary products, FDA and
EMEA are ready to play an active role (toxicity screens, and legacy clinical trial data)
• Patient group partners: leaders from Genetic Alliance, Inspire2Live and the Love Avon
Army of Women are actively engaged 52