This document proposes using data intensive science to build better models of disease. It notes that current disease models make simplistic assumptions and that personalized medicine requires better representations of overlapping pathways. It advocates adopting the "fourth paradigm" of data intensive science to generate massive datasets, ensure interoperability, create open information systems, and host evolving computational models. Six pilot projects are described that involve collaborative data sharing between industry, academia, and non-profits to build disease maps and models. These include initiatives like CTCAP to share clinical trial data, Arch2POCM to de-risk drug targets, and forming a federation to enhance interoperability. The document argues this approach could help address issues like a lack of standard
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Stephen Friend AMIA Symposium 2012-03-21
1. Why not use data intensive science
to build models of disease
Current Reward Structures
Organizational Structures and Tools
Six Pilots
2. What
is
the
problem?
Most
approved
therapies
assume
indica2ons
would
represent
homogenous
popula2ons
Our
exis2ng
disease
models
o8en
assume
pathway
knowledge
sufficient
to
infer
correct
therapies
7. “Data Intensive” Science- Fourth Scientific Paradigm
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Host evolving computational models
in a “Compute Space”
8.
9.
10. WHY
NOT
USE
“DATA
INTENSIVE”
SCIENCE
TO
BUILD
BETTER
DISEASE
MAPS?
11. what will it take to understand disease?
DNA
RNA
PROTEIN
(dark
maGer)
MOVING
BEYOND
ALTERED
COMPONENT
LISTS
13. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
Standard GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
13
Integrated Genetics Approaches
14. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
15. Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ... Nat Genet. (2005)
"Increasing the power to detect causal associations… PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
16. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
18. “Data Intensive” Science- Fourth Scientific Paradigm
Score Card for Medical Sciences
Equipment capable of generating
massive amounts of data A-
IT Interoperability D
Open Information System D-
Host evolving computational models
in a “Compute Space F
19. We still consider much clinical research as if we were
hunter gathers - not sharing
.
27. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
29. Bin Zhang
Model of Breast Cancer: Co-expression Xudong Dai
Jun Zhu
A) Miller 159 samples B) Christos 189 samples
NKI: N Engl J Med. 2002 Dec 19;347(25):1999.
Wang: Lancet. 2005 Feb 19-25;365(9460):671.
Miller: Breast Cancer Res. 2005;7(6):R953.
Christos: J Natl Cancer Inst. 2006 15;98(4):262.
C) NKI 295 samples
E) Super modules
Cell cycle
Pre-mRNA
ECM
D) Wang 286 samples Blood vessel
Immune
response
29
Zhang B et al., Towards a global picture of breast cancer (manuscript).
30.
31. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
38. Six
Pilots
involving
Sage
Bionetworks
CTCAP
Arch2POCM
The
FederaMon
ORM
S
Portable
Legal
Consent
MAP
F
Sage
Congress
Project
PLAT
NEW
BRIDGE
RULES GOVERN
AZ/Merck
CombinaMons
before
approval
Asian
Cancer
Research
Project
BMS/Merck
Imagining
Data
39. CTCAP
Clinical Trial Comparator Arm Partnership “CTCAP”
Strategic Opportunities For Regulatory Science
Leadership and Action
FDA
September 27, 2011
40. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
Started Sept 2010
41. Shared clinical/genomic data sharing and analysis will
maximize clinical impact and enable discovery
• Graphic
of
curated
to
qced
to
models
42. Arch2POCM
Restructuring
the
PrecompeMMve
Space
for
Drug
Discovery
How
to
potenMally
De-‐Risk
High-‐Risk
TherapeuMc
Areas
43.
44. Arch2POCM: Highlights
A PPP To De-Risk Novel Targets That The Pharmaceutical Industry Can
Then Use To Accelerate The Development of New and Effective Medicines
• The Arch2POCM will be a charitable Public Private Partnership (PPP) that will file no patents and
whose scientific plan (including target selection) will be endorsed by its pharmaceutical, private
and public funders
• Arch2POCM will de-risk novel targets by developing and using pairs of test compounds (two
different chemotypes) that interact with the selected targets: the compounds will be developed
through Phase IIb clinical trials to determine if the selected target plays a role in the biology of
human disease
• Arch2POCM will work with and leverage patient groups and clinical CROs to enable patient
recruitment, and with regulators to design novel studies and to validate novel biomarkers
• Arch2POCM will make its GMP test compounds available to academic groups and foundations so
they can use them to perform clinical studies and publish on a multitude of additional indications
• Arch2POCM will release all reagents and data to the public at pre-defined stages in its drug
development process. To ensure scientific quality, data and reagents will be released once they
have been vetted by an independent scientific committee
• Arch2POCM will publish all negative POCM data immediately in order to reduce the number of
ongoing redundant proprietary studies (in pharma, biotech and academia) on an invalidated
target and thereby
– minimize unnecessary patient exposure
– provide significant economic savings for the pharmaceutical industry
• In the rare instance in which a molecule achieves positive POCM, Arch2POCM will ensure that
the compound has the ability to reach the market by arranging for exclusive access to the
proprietary IND database for the molecule 44
45. Arch2POCM: scale and scope
• Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/
Immunology. One for Neuroscience/Schizophrenia/Autism. Both
programs will have 8 drug discovery projects (targets) - ramped up
over a period of 2 years
– It is envisioned that Arch2POCM’s funding partners will select targets
that are judged as slightly too risky to be pursued at the top of pharma’s
portfolio, but that have significant scientific potential that could benefit
from Arch2POCM’s crowdsourcing effort
• These will be executed over a period of 5 years making a total of 16
drug discovery projects
– Projected pipeline attrition by Year 5 (assuming 12 targets loaded in
early discovery)
• 30% will enter Phase 1
• 20% will deliver Ph 2 POCM data 45
46. Arch2POCM: proposed funding strategy
– $160-200M over five years is projected as necessary to advance
up to 8 drug discovery projects within each of the two therapeutic
programs
– Arch2POCM funding will come from a combination of public
funding from governments and private sector funding from
pharmaceutical and biotechnology companies and from private
philanthropists
– By investing $1.6 M annually into one or both of Arch2POCM’s
selected disease areas, partnered pharmaceutical companies:
1. obtain a vote on Arch2POCM target selection
2. have the opportunity to donate existing compounds from their
abandoned clinical programs for re-purposing on Arch2POCM’s
targets
3. gain real time data access to Arch2POCM’s 16 drug discovery
projects
4. have the strategic opportunity to expand their overall portfolio 46
47. Entry points for Arch2POCM programs:
Two compounds (different chemotypes) will be advanced per target
Pioneer targets - genomic/ genetic
- disease networks
- academic partners
- private partners
- SAGE, SGC,
Lead Lead
Preclinical Phase I Phase II
identification optimisation
Assay
in vitro
probe
Lead Clinical Phase I Phase II
candidate asset asset
Stage-gate 1: Early Discovery and Stage-gate 2: Pharma’s re-
PCC Compounds (75%) purposed clinical assets (25%) 47
48. Pipeline flow for Arch2POCM
Five Year Objective: Initiate ≈ 8 drug discovery projects with 6 entering in Early Discovery, one entering in
pre-clinical and one entering in PH I
Months → 0-6 7-12 13-18 19-24 25-30 31-36 37-42 43-48 49-54 55-60
Early discovery (2) Pre-clinical Ph 11.3 Ph 2
Year #1 Pre-clinical (1) Ph 1 Ph 2
Arch2POCM
Target Load 11
Early discovery (4) Pre-clinical Ph 1
Year #2 Ph 1 (1) Ph 2
Arch2POCM
Target Load 1
Early discovery (45% PTRS) Arch2POCM Snapshot at Year 5
Pre-clinical (70% PTRS) Targets
Loaded
8
Ph I (65% PTRS) Projected
INDs
filed
3-‐4
Ph II (10% PTRS) Ph
1
or
2
Trials
In
Progress
2
Projected
Complete
Ph
2
(POCM)
Data
1
*PTRS = Probability of technical and regulatory success
Sets
48
49. The case for epigenetics/chromatin biology
1. There are epigenetic oncology drugs on the market (HDACs)
2. A growing number of links to oncology, notably many genetic links (i.e.
fusion proteins, somatic mutations)
3. A pioneer area: More than 400 targets amenable to small molecule
intervention - most of which only recently shown to be “druggable”, and
only a few of which are under active investigation
4. Open access, early-stage science is developing quickly – significant
collaborative efforts (e.g. SGC, NIH) to generate proteins, structures,
assays and chemical starting points
49
50. The current epigenetics universe
Domain Family Typical substrate class* Total
Targets
Histone Lysine Histone/Protein K/R(me)n/ (meCpG) 30
demethylase
Bromodomain Histone/Protein K(ac) 57
R Tudor domain Histone Kme2/3 - Rme2s 59
O
Chromodomain Histone/Protein K(me)3 34
Y
A MBT repeat Histone K(me)3 9
L
PHD finger Histone K(me)n 97
Acetyltransferase Histone/Protein K 17
Methyltransferase Histone/Protein K&R 60
PARP/ADPRT Histone/Protein R&E 17
MACRO Histone/Protein (p)-ADPribose 15
Histone deacetylases Histone/Protein KAc 11
395
Now known to be amenable to small molecule inhibition 50
51. Required activities and possible participants for
Arch2POCM in Year 1
Some possible Arch2POCM Third Parties: funded or in-kind contributions
Ac2vity
Loca2on/Inves2gator
Target
Structure
SGC
and
academia
Compound
libraries
Partner/SGC/academia
Assay
development
for
epigeneMc
screens
and
biomarkers
Partner/SGC/academia/CRO
HTP
screens
for
epigeneMc
hits
Partner/SGC/academia/CRO
Med
Chem
SAR
To
ID
Two
Suitable
Binding
Arch2POCM
Test
WuXI,/Axon,/ChemDiv/Amira/UNC,
Dana
Farber,
Compounds
Oxford,
Non-‐GLP
scaleup
of
Arch2POCM
Test
Compounds
and
associated
???
analyMcs
DistribuMon
of
Arch2POCM
Test
Compounds
AAI
pharma
PK,
PD,
ADME,
Tox
TesMng
PPD,
CRL
GMP
Manufacturing
of
Arch2POCM
Test
Compounds
AAI
Pharma,
Ipsen,
Camrus,
Durect,
Patheon,
Catalent
GMP
FormulaMon
Camrus,
Durect,
Patheon,
Catalent,
SwissCo,
Pharmatec
GMP
Drug
Storage
and
DistribuMon
AAI
Pharma,
SwissCo,
Pharmatec
IND
PreparaMon
Support
Accurate
FDA
consult./Parexel/QuinMles
Clinical
Assay
Development
and
QualificaMon
GenopMx/Caris
Ph
I-‐II
Clinical
Trials
ICON/
Parexel/Covance/QuinMles
Ph
I-‐II
Database
Management
and
CSR
ProducMon
ICON/Parexel/Covance/QuinMles/AAAH
51
52. General benefits of Arch2POCM for drug
development
1. Arch2POCM s use of test compounds to de-risk previously unexplored
biology enables drug developers to initiate proprietary drug
development starting from an array of unbiased, clinically validated
targets
2. Arch2POCM’s crowdsourced research and trials provides the
pharmaceutical industry with parallel shots on goal: by aligning test
compounds to most promising unmet medical need
3. The positive and negative clinical trial data that Arch2POCM and the
crowd produce and publish will increase clinical success rates (as one
can pick targets and indications more smartly) and will save the
pharmaceutical industry money by reducing redundant proprietary
efforts on failed targets
53. Why is Arch2POCM a “smart bet” for Pharma
investment?
Arch2POCM:
an
external
epigeneMc
think
tank
from
which
Pharma
can
load
the
most
likely
to
succeed
targets
as
proprietary
programs
or
leverage
Arch2POCM
results
for
its
other
internal
efforts
• A
front
row
seat
on
the
progression
of
8
epigeneMc
targets
means
that:
• Pharma
can
select
the
epigeneMc
targets
that
best
compliment
their
internal
porholio
and
for
which
there
is
the
greatest
interest
• Pharma
can
structure
Arch2POCM’s
projects
so
that
key
objecMves
line
up
with
internal
go/no-‐
go
decisions
• Pharma
can
use
Arch2POCM
data
to
trigger
its
internal
level
of
investment
on
a
parMcular
target
• Pharma
can
use
Arch2POCM
resources
to
enrich
their
internal
epigeneMcs
effort:
acMve
chemotypes,
assays,
pre-‐clinical
models,
biomarkers,
geneMc
and
phenotypic
data
for
paMent
straMficaMon,
relaMonships
to
epigeneMc
experts
•
Pharma
can
use
Arch2POCM’s
lead
compound
chemotypes
to:
•
inform
their
proprietary
medicinal
chemistry
efforts
on
the
target
•
idenMfy
chemical
scaffolds
that
impact
epigeneMc
pathways:
a
proprietary
combinaMon
therapy
opportunity
•
Toxicity
screening
of
Arch2POCM
compounds
with
FDA
tools
can
be
used
to
guide
internal
proprietary
chemistry
efforts
in
oncology,
inflammaMon
and
beyond
• Arch2POCM’s
crowd
of
scienMsts
and
clinicians
provides
its
Pharma
partners
withparallel
shots
on
goal
at
the
best
context
for
Arch2POCM’s
compounds/targets
54. Arch2POCM: current partnering status
• Pharmaceutical Funding Partners
– Three companies are considering a potential role as industry anchors for
Arch2POCM
– Two companies have demonstrated interest in Arch2POCM and their company
leadership wants to go to next step- awaiting face to face discussions to go over
agreement
• Public Funding Partners
– Good progress is being made to obtain financial backing for Arch2POCM from
public funders in a number of countries (Canada, United Kingdom and Sweden)
for both epigenetics and for CNS
– Ontario Brain Institute, Canada has allocated $3M to the development of an
autism clinical network that is committed to work with Arch2POCM
• Philanthropic Funding Partners: awaiting designation of anchor partners
• In kind partners
– GE Healthcare (imaging): lead diagnostics partner and willing to share its
experimental oncology biomarkers
– Cancer Research UK: through some of its drug discovery and development
resources participating in Arch2POCM
54
55. Arch2POCM: current partnering status
• Academic partners
– Institutions that have indicated willingness to let their scientists
participate without patent filing: UCSF, Massachusetts General Hospital,
University of North Carolina, University of Toronto, Oxford University,
Karolinska Institute
– Academic community of epigenetic experts/resources already identified
• Regulatory partners: Because the objective of the Arch2POCM PPP is to
probe and elucidate disease biology as opposed to develop new proprietary
products, FDA and EMEA are ready to play an active role (toxicity screens,
and legacy clinical trial data)
• Patient group partners: leaders from Genetic Alliance, Inspire2Live and the
Love Avon Army of Women are actively engaged
55
56. Snapshot of Arch2POCM at Year 5:
Clinical Trials On Cancer and Immunology Targets In Progress
and Data-sharing Network Established
• 3-4 Arch2POCM clinical trials will be in progress or complete
• Ph 2 POCM data for at least one novel target will be available and released
so that either:
– Others can stop their proprietary studies on the same target to protect patient
safety and save money (negative POCM)
– Others can focus proprietary development efforts onto a validated oncology/
immunology target, and Arch2POCM partners can gain exclusive access to the
Arch2POCM IND and test compound for further development (positive POCM)
• Arch2POCM’s disease biology network teams will be conducting discovery
through Ph II clinical trials and sharing all the data
– Arch2POCM’s openly shared data will establish a common stream of knowledge
on Arch2POCM’s targets
– Safe Arch2POCM test compounds will be distributed to qualified labs and centers
– Crowdsourced experimental studies will be underway and providing information
about Arch2POCM targets and test compounds in a range of indications
56
58. How can we accelerate the pace of scientific discovery?
2008
2009
2010
2011
Ways to move beyond
“traditional” collaborations?
Intra-lab vs Inter-lab
Communication
Colrain/ Industrial PPPs Academic
Unions
61. sage federation:
model of biological age
Faster Aging
Predicted
Age
(liver
expression)
Slower Aging
Clinical Association
- Gender
- BMI
- Disease
Age Differential Genotype Association
Gene Pathway Expression
Chronological
Age
(years)
62. Reproducible
science==shareable
science
Sweave: combines programmatic analysis with narrative
Dynamic generation of statistical reports
using literate data analysis
Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports
using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –
Proceedings in Computational Statistics,pages 575-580.
Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9
63. Federated
Aging
Project
:
Combining
analysis
+
narraMve
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared
Data
JIRA:
Source
code
repository
&
wiki
Repository
68. Sage
Congress
Project
April
20
2012
RealNames
Parkinson’s
Project
RevisiMng
Breast
Cancer
Prognosis
Fanconi’s
Anemia
(Responders
CompeMMons-‐
IBM-‐DREAM)