1. DREAM Challenge
Gustavo Stolovitzky, IBM Computational Biology Center
Andrea Califano, Columbia University
2. DREAM
• DREAM is a Dialogue for Reverse Engineering
Assessments and Methods.
• The main objective is to catalyze the interaction
between experiment and theory in the area of
cellular network inference and quantitative model
building in systems biology
3. Challenges
• Network Topology and Parameter Inference Challenge
• Sage Bionetworks - DREAM Breast Cancer Prognosis C
• The DREAM-Phil Bowen ALS Prediction Prize4Life
• NCI-DREAM Drug Sensitivity Prediction Challenge
4. Project/Challenges
• Network Topology and Parameter Inference challenge
• Develop/apply optimization methods including the selection of the
most informative experiments, to accurately estimate parameters
and predict outcomes of perturbations in Systems Biology models
given
• In Model 1 the complete structure of the model (including
expressions for the kinetic rate laws) for a gene regulatory
network composed of 9 genes. Protein and mRNA are
explicitly modeled.
• In Model 2 an incomplete structure of the model, with missing
regulatory links, for a gene regulatory network composed of
11 genes. Here, participants will also have to find the missing
links. Only proteins are explicitly modeled.
5. DREAM Phil Bowen ALS Prediction Prize4Life
• Goal
• Challenge is to predict the progression of disease in ALS patients based
on the patient’s current disease status
• Specifically develop an approach to predict a given patient’s disease status
within a year’s time based on 3 months of data
• Data
• Includes demographics, medical and family history data, functional
measures, vital signs, and lab data (blood
chemistry/hematology/urinalysis) collected at multiple times
• Disease progression will be calculated as the average change in (ALSFRS)
Amyotrophic Lateral Sclerosis Functional Rating Scale over a year’s time
from enrollment in a clinical trial
• At the end of the challenge, the prediction submitted (based on 3 months of
data) will be compared against the actual ALSFRS slope experienced by the
patient over a year
6. Output
• Improve disease prediction beyond the current capabilities by
• Developing more accurate (sensitive and specific) methods of
predicting progression,
• Identifying markers (variables) that would enable a
determination of expected future disease progression earlier
on in the course of the disease
• Validation
• Validate the model against a subset of patients that are neither
part of the training set nor the final validation (test) set.
• Submit the actual code written in R language (“Validation
Code”) and InnoCentive will run the code against the interim
validation data set
7. NCI-DREAM Drug Sensitivity Prediction
Challenge
• Use genomic information to build models capable of
ranking the sensitivity of cancer cell lines to a set of
small molecule compounds or their combinations
8. Sub Challenges
• Sub Challenge 1
• Predict the sensitivity of breast cancer cell lines to previously
untested compounds
• Model capable of ranking the sensitivity of 18 breast cancer
cell lines to 31 compounds
• Challenge in this case is to link the drug effects to the
underlying genetics of the 53 cell lines.
• Sub Challenge 2
• Predicting compound combinations that have a synergistic
effect in reducing viability of a DLBCL cell line
• Predict the activity of pairs of compounds in the DLBCL LY3
cell line from expression profiles acquired after treatment
of the cell line with each of 14 individual compounds
9. Sage Bionetworks - DREAM Breast Cancer Prognosis
Challenge
• Background
• Molecular diagnostics for cancer therapeutic decision making are among
the most promising applications of genomic technology
• Molecular profiles have proved particularly powerful in adding prognosis
information to standard clinical practice in breast cancer
• Trends emerging
• Genes defining predictive signatures of the same phenotype often do
not overlap across studies
• Predictive signatures are not very robust
• No consensus regarding the most accurate signatures or computational
methods for inferring predictive signatures
• No consensus regarding the added value of incorporating molecular
data in addition to or instead of traditionally used clinical covariates
10. Goal/Challenge
• Goal
• To assess the accuracy of computational models designed to predict
breast cancer survival, based on clinical information about the patient's
tumor as well as genome-wide molecular profiling data including gene
expression and copy number profiles
• Challenge
• Create a community-based effort to provide an unbiased assessment of
models and methodologies for the prediction of breast cancer survival
• Common dataset will be provided to all participants, with a validation
dataset held out for model evaluation
• Novel dataset will be generated at the end of the Challenge and used to
provide a final, unbiased score for each model
11. DATA
• Training data set from METABRIC cohort of 2000 breast cancer samples
• Include detailed clinical annotations
• 10 median year survival time, gene expression and copy number data
• Additional breast cancer datasets curated by Sage Bionetworks
• Can be use in the model development
• Web based platform called Synapse
• Enable transparent reproducible model building and analysis
workflows as well as sharing of data, tools and models with the
challenge community
• Validation dataset
• Derived from 300- 500 fresh frozen primary tumors with the same
clinical annotations and survival data as the METABRIC cohort
12. DATA
• Survival data
• Survival data is loaded into R as a Surv object as defined in the R survival package.
• This object is simply a 2 column matrix with sample names on the rows and columns:
• time – time from diagnosis to last follow up.
• status – weather the patient was alive at last follow up time
• Feature data
• Gene expression data.
• Performed on the Illumina HT 12v3 platform
• Loaded as Bioconductor ExpressionSet object
• Data normalized
• Copy number data.
• Performed on the Affymetrix SNP 6.0 platform
• Loaded as Bioconductor ExpressionSet object
• Data normalized
• Clinical covariates
• Loaded as a data.frame object with features
15. Submission
• Models built for this Challenge will be constructed using the
R programming language and uploaded to a common platform
(Synapse) provided by Sage Bionetworks
• Models will be uploaded as R objects implementing a function
called customPredict() that returns a vector of survival
predictions when given a set of feature data as input
• customPredict() function will be run by a validation script for
each submitted model and resulting predictions will be scored
• Phase 3 submissions must be accompanied by a write-up that
includes a short description of the approach used in the final
model
16. Scoring
• Challenge models will be scored by calculating the
concordance index between the predicted survival and
the true survival information in the validation dataset
(accounting for the censor variable indicating whether
the patient was alive at last follow-up)
• Final assessment of models and the determination of
the best performer will be based on the concordance
index of predictions on the test dataset in Phase 3 of
the Challenge.
• In addition, other scoring metrics will be considered
depending on the suggestions of the community
throughout the Challenge
17. Time Line
• Deadline for submitting models for the Breast Cancer Prognosis Challenge
is 5PM EST October 15th
• Best performers will be announced at the DREAM 7 Conference taking
place in San Francisco on November 12 to 16
• Final assessment of all models in newly generated data
• Additional cohort of 350 breast cancer samples with archived fresh
frozen tumor samples has been identified by Anne-Lise Borresen-Dale of
Oslo University Hospital and a generous donation has been made by the
Avon Foundation to obtain gene expression and copy number data on
these samples
• Currently curation of the clinical records of this patient cohort to
harmonize with the current METABRIC dataset and generation of the
genomic profiling data for these samples is being carried on
• Aim is to generate these data by the November 12 DREAM conference
Notas del editor
reverse engineering of gene networks has been on the identification of causal interaction topologies between genes. how can we decide between models having very similar topologies and how do we characterize the actual kinetics of these networks in a way that accurately reflects the causal relationships implied in the proposed topology?
Amyotrophic Lateral Sclerosis (ALS)–also known as Lou Gehrig’s disease (in the US) or Motor Neurone disease (outside the US)–is a fatal neurological disease causing death of the nerve cells in the brain and spinal cord which control voluntary muscle movements. In the early stages of the disease, it is currently very difficult to determine whether a given patient will experience slow or fast disease progression. lack of specific and reliable predictors.
challenge 1 - genomic characterization of 53 cell lines;GI50 concentrations for 31 compounds on 35 cell lines; 18 cell lines for which GI 50 concentrations are not given DLBCL - diffuse large B cell lymphoma cell lines
covariate - may be of direct interest or it may be a confounding or interacting variable.
METABRIC - Molecular taxonomy of Breast Cancer International Consortium