tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

tranSMART: a data warehouse for Translational Medicine
at Takeda Pharmaceuticals International Co.
transMART Community Workshop
November 2013
David Merberg
Bin Li
William Trepicchio

Outline

• Takeda’s tranSMART instance
– Goal

– Data content
– Enhancements

• Case Studies – Models for predicting erlotinib and sorafenib efficacy

1 ｜○○○○ |

DDMMYY

Takeda rationale for implementing tranSMART

• To provide a large, well organized, and integrated dataset consisting
of MPI/Takeda proprietary data, outsourced data, and valuable public
data.
• To provide an integrated environment for accessing clinical data and
molecular profiling data
– Low dimensional data – age, sex, weight, previous treatments, survival,
etc.
– High dimensional data – gene expression microarray, SNP, mutation,
NGS

• To provide tools that will enable Medical and Discovery scientists to
use this data warehouse for biomarker identification, patient
stratification, and drug targeting disease prediction, etc.
2 ｜○○○○ |

DDMMYY

Public data currently in Takeda tranSMART
• Gene Expression Omnibus (GEO)
– Approximately 1600 studies
– Approximately 200 key cancer studies manually curated; another ~150
cancer studies curated via text mining
– Most GEO datasets are cancer studies, but there are also samples from
cardiovascular disease, metabolic diseases, hematopoietic diseases,
and many others.

• The Cancer Genome Atlas (TCGA)
– Gene expression, SNP, and clinical data from close to 1000 patients
(brain, lung, and ovarian cancer)

• Large cell line panels
– The CCLE dataset, ~ 1000 cell lines, screened for 24 SOC drugs
– The Sanger dataset, ~ 1000 cell lines, screened on > 100 SOC drugs
3 ｜○○○○ |

DDMMYY

Proprietary data currently in Takeda tranSMART
• Velcade Trials
– Clinical observations
– Gene expression results
– Mutation data

• Commissioned Studies
– Oncopanel 240 – cell line response to Takeda and SOC compounds
• Drug response (IC50, EC50, cell cycle blocks, apoptosis induction, etc.)
• Mutation status
• Gene expression

– Oncotest – xenograft response to Takeda and SOC compounds
•
•
•
•

4 ｜○○○○ |

Drug response (IC50)
Mutation status
Gene expression
SNP

DDMMYY

OncoPanel 240 (Ricerca/Eurofins Panlabs)
• 240 well-defined tumor cell lines representing diverse tumor types

• Drug sensitivity screen results (IC50, EC50)
– for 13 Standard of Care anti-tumor compounds
– for 8 Takeda compounds targeting diverse pathways

• Baseline gene expression
• Mutation data
5 ｜○○○○ |

DDMMYY

Normalization of information in the data warehouse

• Gene expression data
– Globally normalized GEO gene expression data using frozen Robust
Multiarray Analysis (fMRA),
• Quantile based normalization
• Currently, only selected Affymetrix platforms are globally normalized

– Enabled grouping gene expression results from different labs and
different studies by disease

• Clinical information
– Curate clinical information to create consistent vocabulary

6 ｜○○○○ |

DDMMYY

R interface
• Enable direct access to tranSMART database tables
– Eliminates some limitations of web interface, E.g. inability to perform
multi-study queries and analyses.
– Provide a connection to the R environment, including diverse analysis
packages

• Sample functions
– getDistinctConcepts – given a keyword/string, returns study codes for
matching clinical concepts in the tranSMART database
– getGEXdata – given study codes, gets Gene Expression data from the
tranSMART database.
> br_concepts <transmart.getDistinctConcepts(,'Breast_Cancer')
> study_list <- unique(br_concepts$STUDYCODE)
> ITGB2_GEP_BR2 <transmart.getGEXData(study_list,
gene.list='ITGB2', data.pivot=F)
> hist(ITGB2_GEP_BR2$LOG_INTENSITY, br=50, xlim=c(5,12),
main="All ITGB2 GEP", xlab="GEP")

7 ｜○○○○ |

DDMMYY

Summary
• A data warehouse with a large store of gene expression, SNP, and
phenotypic data
– Clinical samples and cell lines
– Data normalized so that comparisons across studies are meaningful
– Vocabulary standardized across studies

• An R-interface to facilitate cross-study analysis using a large
collection of methods from statistics and machine learning
• A “toolbox” for achieving key Translational Medicine goals
– Bridging the gap between “omic” data generated in preclinical studies
and clinical results
– Predicting drug efficacy using clinical and pre-clinical information
collected for different purposes

• Case studies in using this toolbox follow . . .
8 ｜○○○○ |

DDMMYY

Building and using a model to predict drug sensitivity

MLN7243 IC50 distribution on Ricerca panel

4

Can we identify a
relationship between
baseline gene expression
and drug sensitivity in
cell lines . . .

2
0

1

IC50s

3

?

0

50

100

150

200

Cell lines

???

9 ｜○○○○ |

DDMMYY

. . . and then
extrapolate from that
relationship to use gene
expression to predict
drug efficacy in the
clinic?

Building the predictive models

4

MLN7243 IC50 distribution on Ricerca panel

2

IC50s

3

Oncopanel 240
drug sensitivity

0

1

Oncopanel 240
Expression data

0

50

100

150

200

Cell lines

•
•
•
•

Normalize all Oncopanel 240 expression data
Remove low-intensity and low-variance genes (to get robust signal)
Correlation based feature selection (gene expression vs IC50s)
Develop a methodology for deriving drug sensitivity models
– Based on Partial Least Squares Regression (PLSR)
– Captures consensus information from cancer cell line panel data

•

Use two SOC drugs as proof of concept for methodology
– Predict erlotinib (inhibits EGFR) sensitivity
– Predict sorafenib (inhibits VEGFR and PDGFR) sensitivity
– Use PFS from BATTLE trial to evaluate performance of models

10 ｜○○○○ |

DDMMYY

Accuracy of the erlotinib sensitivity model
Re-predicting Oncopanel 240 log2(IC50)

Accuracy estimation:
Upper boundary: 91%
Lower boundary: 77%
11 ｜○○○○ |

DDMMYY

Signature genes in the Erlotinib model reflect known
drug mechanism
Signature genes over-representing pathways
that contains an EGFR node

Signature genes over-connected to EGFR

EGFR

• Also, EGFR ligand NRG1 is among the signature genes

Real data tests of the models

• Test 1: The BATTLE clinical trial
– 255 lung cancer (NSCLC) patients, 131 with gene expression profile
data (GSE33072)
• 25 patients in erlotinib arm
• 39 patients in sorafenib arm

– Are the predictions of the PLSR models consistent with the results of the
BATTLE trial?

• Test 2: Predicting drug sensitivity across indications
– Use model to predict erlotinib and sorafenib sensitivity based on gene
expression data from 484 Gene Expression Omnibus datasets in Takeda
tranSMART instance
• 11,331 samples grouped into 19 major oncology indications
• Calculate percentage predicted drug sensitive tumors for each indication
• Compare predictions to results of phase III clinical trials and FDA approvals

13 ｜○○○○ |

DDMMYY

Test 1 – The BATTLE Trial: Survival analysis of groups
predicted to be drug sensitive/resistant by PLSR model
0.0 0.2 0.4 0.6 0.8 1.0

P = 0.09
HR = 0.43

0

1

2

3

4

Proportion of Cases

Proportion of Cases

(B)

E_model pred E_PFS

S_model pred S_PFS

0.0 0.2 0.4 0.6 0.8 1.0

(A)

5

P = 0.006
HR = 0.32

0

2

Monthes from Start of Therapy

(D)

2

4

6

8

10


8

10

12

12

S_model pred E_PFS

0.0 0.2 0.4 0.6 0.8 1.0

Proportion of Cases

P = 0.32
HR = 1.87

0

6


E_model pred S_PFS
0.0 0.2 0.4 0.6 0.8 1.0

Proportion of Cases

(C)

4

P = 0.54
HR = 1.32

0

1

2

3

4

5


E: Erlotinib; S: Sorafenib; red: predicted sensitive; green: predicted resistant
14 ｜○○○○ |

DDMMYY

Test 2: Are predictions of erlotinib sensitivity, grouped
by indication, consistent with clinical results?

Kidney cancer is predicted
to be Erlotinib insensitive a phase III clinical trial failed
Lung cancer is predicted
to be erlotinib sensitive,
a phase III clinical trial succeeded,
(companion diagnostic available)

Potential new indication?
Multiple head and neck cancer
trials are going on now
15

Test 2: Are predictions of sorafenib sensitivity,
grouped by indication, consistent with clinical results?

Potential new indication?
Kidney and Liver cancers are
predicted to be Sorafenib
sensitive
Sorafenib has been approved
for Kidney and Liver cancers

16

Conclusions
• Using tranSMART, we created a large data warehouse to provide
computational support for biomarker identification, patient
stratification, and other Translational Medicine goals.
• Patient and cell line data can be grouped across studies by
indication or other attributes to increase statistical power. Grouping is
enabled by:
– Global normalization of numeric data
– Standardization of vocabulary
– An R interface that provides direct access to database tables

• Using erlotinib and sorafenib as case studies, we demonstrated that
the data warehouse and the R interface enable us to predict patient
stratification and drug efficacy in cancer indications.

17 ｜○○○○ |

DDMMYY

Acknowledgements

Takeda
Andy Dorner
Gene Shin
Andrew Krueger
Seema Grover
Jike Cui (now at Sanofi)

Thomson Reuters
Elona Kolpakova-Hart

18 ｜○○○○ |

DDMMYY

Recombinant by Deloitte
Jinlei Liu
Mike McDuffie
Hiaping Xia

Backup Slides

19 ｜○○○○ |

DDMMYY

Model test 2: How well do the models predicts
predict drug-indication efficacy profile?

Successful

Cancer Type
Lung Cancer
Liver Cancer
Kidney Cancer

Phase III trial FDA approval
Erlotinib
Sorafenib
Sorafenib

Number of
samples
329
85
218

% tumors predicted
Erlotinib sensitive
15.81
0.00
0.46 *

% tumors predicted
Sorafenib sensitive
0.61
31.76
24.77

* Erlotinib failed to show efficacy for kidney cancer in a phase III trial
20

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International

Similar a tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International (20)

Más de David Peyruc

Más de David Peyruc (20)

Último

Último (20)

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International