2. • About GenomeDx
• Cancer and genomics
• Genomic information we use
‒ Genome-wide RNA expression for applications in cancer
• Our prostate cancer solution
• Why we use H2O ?
• Applications tested:
‒ Tumor Gleason Grade Classifier tested for multiple endpoint
prediction
• Conclusions and Future Directions
Outline
1
3. GenomeDx Biosciences
A b o u t U s
2
A clinical genomics company founded to
transform the practice of oncology
Use machine learning and statistical
algorithms to generate clinical tests
Decipher® metastasis signature
More than 20 Peer-review
publications supporting analytical,
clinical validity and utility
Over 5,000 patients tested in clinical
trials and oncology practice
Decipher GRIDTM platform
Data sharing program for Decipher
users
Free access for academic research
Clinical Lab
San Diego, CA
Informatics Lab
Vancouver, BC
4. Cancer is a disease of the genome
T i s s u e - b a s e d g e n o m i c s
3
• Cancer is a complex disease and has many, many subtypes
‒ Indolent, aggressive, hormone or chemo sensitive/resistant, etc.
DNA RNA Protein
vector.childrenshospital.org people.duke.edu fineartamerica.com
5. • Measuring RNA expression (concentration) and activity of genes is
highly informative for a genomic-based understanding of cancer
Measure gene activity using genome-wide expression
analysis of clinical biosamples
T i s s u e - b a s e d g e n o m i c s
4
RNA
EXTRACTION
MICROARRAY
TUMOR
SAMPLE
CANCER PATIENT
BIOPSY/SURGERY
EXPRESSION
DATA
6. M E D I C A L C E N T E R
MOFFITTCancer Center & Research Institute
H. LEE
Decipher GRID a novel data-sharing program
to accelerate cancer genomics innovation
5
4
6
A B
4.1
6.1
Rhode - custom thinner
7. Prostate cancer is a significant burden on the US
healthcare system
P r o s t a t e c a n c e r m o s t p r e v a l e n t c a n c e r a f f e c t i n g m e n
Prostate cancer alone is projected in 2015 to account for 26% of incident
cancer cases in men
Siegel, Rebecca L., Kimberly D. Miller, and Ahmedin Jemal. "Cancer statistics, 2015." CA: a cancer journal for clinicians 65.1 (2015): 5-29.
6
8. • Accurate forecasting of recurrence
risk key to determining optimal
treatment choice:
‒ Observation
‒ Radiation therapy
‒ Hormone therapy
‒ Chemotherapy
• Goal of risk-adapted therapy:
‒ Reduce side effects of treatment
‒ Reduce costs of treatment
Clinical genomics aims to improve cancer patient care
P r o s t a t e c a n c e r b a l a n c i n g t h e h a r m s a n d b e n e f i t s
7
9. • Highly advanced algorithms such
as Deep Learning
• Ready to use algorithms with
existing languages and tools
• Easily explore data and develop
models
• Multiple algorithms within the
same package
Why we use H2O?
8
http://h2o.ai/
10. • Genomics:
‒ High-dimensional Dataset ~ 46K
features
‒ Feature selection to reduce
dimensionality of data
• Deep Learning:
‒ Can exploit non-linear relationship
between features (genes)
‒ Improve performance
‒ Deep Features may help us
understand the biology
Deep Neural Network
9
11. • Different packages to train deep
neural network:
‒ Filtering to reduce # of Features ~ 100
‒ No grid search
‒ Cross Validation AUC ~ 0.5
• H2O Deep neural network :
‒ Filtering to reduce # of Features ~ 100
‒ Good Results (AUC)
Deep Neural Network
10
13. Tumor gleason grade is a strong prognostic factor and used to
guide treatment decisions
D i g i t i z i n g t h e G l e a s o n G r a d e
• Gleason grade is the current
gold standard in prostate
cancer:
• Assigns score from 1 to 5
based on tissue microscopic
appearance
• Higher score is associated with
more aggressive disease
• Men with higher grade prostate
cancer more likely to receive
chemical castration (hormone
therapy) https://en.wikipedia.org/wiki/Gleason_grading_system
12
14. Why develop a genomic model for pathology tumor grading?
D i g i t i z i n g t h e G l e a s o n G r a d e
• Gleason grade is subjective:
• Depends on pathologist
experience
• Border line cases differently
interpreted
• Gleason grade on biopsy is
often ‘up-graded’ on final
pathology
• Genomics could provide a more
robust prediction of outcomes
https://en.wikipedia.org/wiki/Gleason_grading_system
13
15. G3
(n = 366)
G4+
(n = 624)
G4+
(n = 424)
G3
(n = 113)
Study Design
~ 7000 patients
1,537
Patients
Training
(n = 990)
Testing
(n = 537)
G3 : Patients who had Gleason 3
G4+ : Patients who had Gleason 4 or 5
14
16. Classifier Development Overview
Univariate Filtering
H2O Grid Search (10 Fold C.V)
Deep neural network
Array features on Affymetrix Human
Exon 1.0 ST microarrays were
summarized into ~ 46,000 features
(genes)
H2O
H2O Grid search to optimize hidden
layer size
Two-sample Wilcoxon tests ‘Mann-
Whitney’
n = 366
n = 624
46,000 features
G3
G4+
15
18. Determining Patient Risk
M e t a s t a t i c p r o s t a t e c a n c e r
• Prostate cancer can spread to other parts of
patient body
• After surgery up to 50%1 of men will have
clinical risk factors that increase the chance
of metastasis
• Very few men will experience metastasis
and die of their cancer2
• Gleason grade is surrogate for metastatic
disease
http://www.drugdevelopment-technology.com/projects/
drug_abiateronecance/drug_abiateronecance5.html
17
[1] Swanson, G.P., et al., Pathologic findings at radical prostatectomy: risk factors for failure and death. Urol
Oncol, 2007. 25(2): p. 110-4.
[2] Pound, C.R., et al., Natural history of progression after PSA elevation following radical prostatectomy. JAMA,
1999. 281(17): p. 1591-7
19. Genomic Gleason Classifier Predicts
Metastatic Outcomes
AUC : 73.4 [67.36 – 79.43]
1.0
0.75
0.50
0.25
Metastasis
0
Score
18
MET No-MET
MET
No-MET
ProbabilityofMetastasisFreeSurvival
1.0
0.8
0.6
0.4
0.2
0.0
0 24 48 24072 96
Time (Surgery to Metastasis)
p−value < 0.001
120 144 168 192 216
0.75
0.90
MET : Patients who developed metastatic disease
No-MET : Patients who developed metastatic disease
21. • Applied advanced machine learning algorithm to genomic
data
• H2O Deep Learning model outperform other Gleason
predicting models
• Incorporate more genomic features (46 K) into the analysis
to improve model development and performance
• Exploit nonlinear relationship between features (genes)
• Can Deeplearning help us understand the biology ?
Conclusions and Future
Directions
20