1.
Lessons
Learned:
Reali.es
of
Building
Cancer
Models-‐
Sharing
,
Rewards
and
Affordability
Stephen
Friend
MD
PhD
2.
3.
4.
5. Oncogenes only make good targets in particular molecular
contexts : EGFR story
ERBB2
• EGFR
Pathway
commonly
mutated/ac.vated
in
Cancer
EGFRi
EGFR
• 30%
of
all
epithelial
cancers
BCR/ABL
• Blocking
Abs
approved
for
treatment
of
metasta.c
colon
cancer
KRAS
NRAS
• Subsequently
found
that
RASMUT
tumors
don’t
respond
–
“Nega.ve
Predic.ve
Biomarker”
BRAF
• However
s.ll
EGFR+
/
RASWT
pa.ents
who
don’t
MEK1/2
respond?
–
need
“Posi.ve
Predic.ve
Biomarker”
• And
in
Lung
Cancer
not
clear
that
RASMUT
status
is
Proliferation,
Survival
useful
biomarker
Predic.ng
treatment
response
to
known
oncogenes
is
complex
and
requires
detailed
understanding
of
how
different
gene.c
backgrounds
func.on
7. Preliminary Probabalistic Models- Rosetta
Networks facilitate direct
identification of genes that are
causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
8. Extensive Publications now Substantiating Scientific Approach
Probabilistic Causal Bionetwork Models
• >80 Publications from Rosetta Genetics
Metabolic "Genetics of gene expression surveyed in maize, mouse and man." Nature. (2003)
Disease "Variations in DNA elucidate molecular networks that cause disease." Nature. (2008)
"Genetics of gene expression and its effect on disease." Nature. (2008)
"Validation of candidate causal genes for obesity that affect..." Nat Genet. (2009)
….. Plus 10 additional papers in Genome Research, PLoS Genetics, PLoS Comp.Biology, etc
CVD "Identification of pathways for atherosclerosis." Circ Res. (2007)
"Mapping the genetic architecture of gene expression in human liver." PLoS Biol. (2008)
…… Plus 5 additional papers in Genome Res., Genomics, Mamm.Genome
Bone "Integrating genotypic and expression data …for bone traits…" Nat Genet. (2005)
d
“..approach to identify candidate genes regulating BMD…" J Bone Miner Res. (2009)
Methods "An integrative genomics approach to infer causal associations ...” Nat Genet. (2005)
"Increasing the power to detect causal associations… “PLoS Comput Biol. (2007)
"Integrating large-scale functional genomic data ..." Nat Genet. (2008)
…… Plus 3 additional papers in PLoS Genet., BMC Genet.
9. List of Influential Papers in Network Modeling
Ø 50 network papers
Ø http://sagebase.org/research/resources.php
12. Sage Bionetworks
A non-profit organization with a vision to enable networked team
approaches to building better models of disease
BIOMEDICINE INFORMATION COMMONS INCUBATOR
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
14. Predictive models of cancer phenotypes
Panel
of
tumor
samples
Ima9nib% AZD0530% Erlo9nib%
Molecular Rela:ng'a'gene:c'feature'of'a'cancer'to'the'efficacy'of'a'drug:'
Nilo9nib% ZDG6474%
Lap9nib%
Gleevec'(Ima:nib)'improves'survival'in'CML'pa:ents'harboring'the'
characterization BCREABL'transloca:on'
BCR/ABL% EGFR% ERBB2%
Ø mRNA MET%
Ø copy number NRAS%
KRAS%
Ø somatic overall'survival'(%)'
Predic2ve
PHAG665752%%
PIKC3A% PLX4720%
mutations
model
PF2341066%
BRAF% RAF265%
Ø epigenetics
Ø proteomics MEK1/2% AZD6244%
PDG0325901%
Cancer
phenotypes
Ø Drug sensitivity
TP53%
ARF% months'a)er'beginning'treatment'
MDM2%
screens
Brian&J&Druker,&Nature'Medicine'15,&114901152&(2009)&
Ø Clinical NutlinG3%
prognosis
15. Fundamentally
Biological
Science
hasn’t
changed
yet
because
of
the
‘Omics
Revolu.on……
…..it
is
s.ll
about
the
process
of
linking
a
system
to
a
hypothesis
to
some
data
to
some
analyses
Biological Data Analysis
System
16. Iterative Networked Approaches
To Generating Analyzing and Supporting New Models
Data
Biological
System Analysis
Uncouple the automatic linkage between the
data generators, analyzers, and validators
17. Networked Approaches
BioMedicine Information Commons
Patients/
Data
Generators Citizens
CURATED
DATA
Data
TOOLS/ Analysts
METHODS
RAW
DATA
ANALYSES/
MODELS
Clinicians
SYNAPSE
Experimentalists
18. Networked Team Approaches 2
1
REWARDS
USABLE
RECOGNITION
DATA
BioMedical Information Commons
Patients/
Data
Generators Citizens
CURATED
DATA
Data
5
TOOLS/ Analysts
REWARDS
METHODS
FOR
RAW
DATA
SHARING
ANALYSES/
MODELS
Clinicians
4
HOW
TO
SYNAPSE 3
Experimentalists
DISTRIBUTE
PRIVACY
TASKS
BARRIERS
19. Open and Networked Team Approaches
1
USABLE
DATA
SYNAPSE
2
REWARDS
RECOGNITION
20. Two approaches to building common
scientific and technical knowledge
Every code change versioned
Every issue tracked
Text summary of the completed project Every project the starting point for new work
Assembled after the fact All evolving and accessible in real time
Social Coding
21. Synapse is GitHub for Biomedical Data
Every code change versioned
Every issue tracked
Data and code versioned Every project the starting point for new work
Analysis history captured in real time All evolving and accessible in real time
Work anywhere, and share the results with anyone Social Coding
Social Science
25. Data Analysis with Synapse
Run Any Tool
On Any Platform
Record in Synapse
Share with Anyone
26. Synapse
infrastructure
for
sharing,
searching,
and
analyzing
TCGA
data
• Automated
workflows
for
cura.on,
QC,
and
sharing
of
Copy* Muta6on* Phenotype* large-‐scale
datasets.
Expression* number*
• All
of
TCGA,
GEO,
and
user-‐submihed
data
processed
with
standard
normaliza.on
methods.
Copy* Muta6on* Phenotype* • Searchable
TCGA
data:
Expression* number* • 23
cancers
• 11
data
plajorms
• Standardized
meta-‐data
ontologies
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy*
number* number*
Muta6on* Muta6on*
Predic6ve*model*
genera6on*
Performance*
assessment*
27. Synapse
infrastructure
for
sharing,
searching,
and
analyzing
TCGA
data
Copy* Muta6on* Phenotype* • Comparison
of
many
modeling
approaches
applied
Expression* number* to
the
same
data.
• Models
transparently
shared
and
reusable
through
Expression*
Copy* Muta6on* Phenotype* Synapse.
number*
• Displayed
is
comparison
of
6
modeling
approaches
to
predict
sensi.vity
to
130
drugs.
• Extending
pipeline
to
evaluate
predic.on
of
Expression* Expression*
Phenotype* Phenotype* TCGA
phenotypes.
Copy* Copy*
number* number* • Hos.ng
of
collabora.ve
compe..ons
to
compare
Muta6on* Muta6on* models
from
many
groups.
Accuracy$(R2)$
Predic6ve*model*
Predic.on$
genera6on*
Performance*
assessment*
130$drugs$
28. Open and Networked Approaches
3
PRIVACY
PORTABLE
LEGAL
CONSENT:
weconsent.us
BARRIERS
John
Wilbanks
29. REDEFINING HOW WE WORK TOGETHER:
Sage/DREAM Breast Cancer Prognosis Challenge
4
HOW
TO
DISTRIBUTE
TASKS
COLLABORATIVE
CHALLENGES
5
REWARDS
FOR
SHARING
30. What
is the problem?
Our current models of disease biology are primitive and limit
doctor’s understanding and ability to treat patients
Current incentives reward those who
silo information and work in closed
systems
31. The Solution: Competitions to crowd-source research
in biology and other fields
Ø Why competitions?
• Objective assessments
• Acceleration of progress
• Transparency
• Reproducibility
• Extensible, reusable models
Ø Competitions in biomedical research
• CASP (protein structure)
• Fold it / EteRNA (protein / RNA structure)
• CAGI (genome annotation)
• Assemblethon / alignathon (genome assembly / alignment)
• SBV Improver (industrial methodology benchmarking)
• DREAM (co-organizer of Sage/DREAM competition)
Ø Generic competition platforms
• Kaggle, Innocentive, MLComp
33. Sage/DREAM Challenge: Details and Timing
Phase
1: July thru end-Sep 2012 Phase
2:
Oct 15 thru Nov 12,
2012
Ø Training data: 2,000 breast cancer
samples from METABRIC cohort Ø Evaluation of models in novel
• Gene expression dataset.
• Copy number
• Clinical covariates Ø Validation data: ~500 fresh
• 10 year survival frozen tumors from Norway
Ø Supporting data: Other Sage-curated group with:
breast cancer datasets
• >1,000 samples from GEO
• Clinical covariates
• ~800 samples from TCGA • 10 year survival
• ~500 additional samples from
Norway group
• Curated and available on
Synapse, Sage’s compute
platform
Ø Data released in phases on Synapse
from now through end-September
Ø Will evaluate accuracy of models built
on METABRIC data to predict survival
in:
• Held out samples from
METABRIC
• Other datasets
34. Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
METABRIC
cohort:
Copy* Muta6on* Phenotype* 997
breast
cancer
samples
Expression* number*
Clinical
covariates
Copy* Muta6on* Phenotype*
Expression* number* Gene
expression
(Illumina
HT12v3)
Copy
number
(Affy
SNP
6.0)
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy*
number* number*
Muta6on* Muta6on*
10
year
survival
Predic6ve*model*
genera6on*
Performance*
assessment*
Loaded
through
Synapse
R
client
as
Bioconductor
objects.
35. Synapse transparent, reproducible, versioned machine
learning infrastructure for method comparison
Copy* Muta6on* Phenotype*
Expression* number*
Copy* Muta6on* Phenotype*
Expression* number*
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy* Custom
models
implement
train()
and
number* number* predict()
API.
Muta6on* Muta6on*
Predic6ve*model*
genera6on*
Performance*
assessment*
Implementa)on
of
simple
clinical-‐only
survival
model
used
as
baseline
predictor.
37. Sage-‐DREAM
Breast
Cancer
Prognosis
Challenge
one
month
of
building
beher
disease
models
together
breast
cancer
data
154
par.cipants;
27
countries
268
par.cipants;
32
countries
August
17
Status
Challenge
Launch:
July
17
290
models
posted
to
Leaderboard
42. Summary of Breast Cancer Challenge #1
hVps://synapse.sagebase.org/
-‐
BCCOverview:0
Transparency,
Valida2on
in
novel
reproducibility
Expression*
Copy*
number* Muta6on* Phenotype*
dataset
Copy* Muta6on* Phenotype*
Expression* number*
Expression* Expression*
Phenotype* Phenotype*
Copy* Copy*
number* number*
Muta6on* Muta6on*
Predic6ve*model*
genera6on*
Performance*
assessment*
Publica2on
in
Science
Dona2on
of
Google-‐
Transla2onal
Medicine
scale
compute
space.
For
the
goal
of
promo2ng
democra2za2on
of
medicine…
Registra2on
star2ng
NOW…
sign
up
at:
synapse.sagebase.org
43. Breast Cancer Collaborative Challenges and
Beyond
Announce
best
Start
With
Pre-‐ Collabora.ve
performing
model
to
Collated
Cohort
Challenge
Hosted
on
predict
breast
cancer
Synapse
survival
The
challenge
on
molecular
predictors
of
breast
cancer
will
Obtain
research
create
a
community-‐based
effort
Generate
and
fund
ques.ons
from
to
provide
an
unbiased
research
Challenge
2
breast
cancer
assessment
of
the
most
accurate
research
proposal
community
for
models
and
methodologies
for
predic:on
of
breast
cancer
Challenge
2
survival.
43
44. Networked Team Approaches 2
1
REWARDS
USABLE
RECOGNITION
DATA
BioMedical Information Commons
Patients/
Data
Generators Citizens
CURATED
DATA
Data
5
TOOLS/ Analysts
REWARDS
METHODS
FOR
RAW
DATA
SHARING
ANALYSES/
MODELS
Clinicians
4
HOW
TO
SYNAPSE 3
Experimentalists
DISTRIBUTE
PRIVACY
TASKS
BARRIERS
45.
Lessons
Learned:
Reali.es
of
Building
Cancer
Models-‐
Sharing
,
Rewards
and
Affordability
Stephen
Friend
MD
PhD