SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
The Importance of Reproducibility in
  High-Throughput Studies: Case
 Studies in Forensic Bioinformatics
                Keith A. Baggerly
    Bioinformatics and Computational Biology
        UT M. D. Anderson Cancer Center
           kabagg@mdanderson.org

               CSE, May 3, 2011
G ENOMIC S IGNATURES                                            1


     Why is Reproducibility Important in H-T-S?
Our intuition about what “makes sense” is very poor in high
dimensions. To use “genomic signatures” as biomarkers, we
need to know they’ve been assembled correctly.

Without documentation, we may need to employ forensic
bioinformatics to infer what was done to obtain the results.

Let’s examine some case studies involving an important
clinical problem: can we predict how a given patient will
respond to available chemotherapeutics?




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            2


           Using the NCI60 to Predict Sensitivity




          Potti et al (2006), Nature Medicine, 12:1294-1300.

The main conclusion is that we can use microarray data from
cell lines (the NCI60) to define drug response “signatures”,
which can be used to predict whether patients will respond.

They provide examples using 7 commonly used agents.

This got people at MDA very excited.


c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            3

                                      Fit Training Data




We want the test data to split like this...
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            4

                                       Fit Testing Data




But it doesn’t. Did we do something wrong?
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                   5

                       5-FU Heatmaps




     Nat Med Paper
G ENOMIC S IGNATURES                    5

                       5-FU Heatmaps




     Nat Med Paper        Our t-tests
G ENOMIC S IGNATURES                                                             5

                                       5-FU Heatmaps




     Nat Med Paper                              Our t-tests     Reported Genes




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            6


                                  Their List and Ours
> temp <- cbind(
    sort(rownames(pottiUpdated)[fuRows]),
    sort(rownames(pottiUpdated)[
         fuTQNorm@p.values <= fuCut]);
> colnames(temp) <- c("Theirs", "Ours");
> temp
     Theirs        Ours
...
[3,] "1881_at"     "1882_g_at"
[4,] "31321_at"    "31322_at"
[5,] "31725_s_at" "31726_at"
[6,] "32307_r_at" "32308_r_at"
...

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                                                                                                                                              7


                             Offset P-Values: Other Drugs
                                           Topotecan+1                                         Etoposide+1                                      Adriamycin+1




                                 1.0




                                                                                     1.0




                                                                                                                                      1.0
                                 0.8




                                                                                     0.8




                                                                                                                                      0.8
                                                                                                                                                                             q




                                 0.6




                                                                                     0.6




                                                                                                                                      0.6
                       P Value




                                                                           P Value




                                                                                                                            P Value
                                                                                                                                                                             q
                                 0.4                                                                                                                                         q




                                                                                     0.4




                                                                                                                                      0.4
                                 0.2




                                                                                     0.2




                                                                                                                                      0.2
                                                                                                                                                                             q
                                 0.0




                                                                                     0.0




                                                                                                                                      0.0
                                                                                                                                                                     q
                                        qqqqqqqqqqqqqqqqqqqqqqqqq
                                       qqqqqqqqqqqqqqqqqqqqqqqqq
                                       qqqqqqqqqqqqqqqqqqqqqqqqq
                                       qqqqqqqqqqqqqqqqqqqqqqqqq
                                       qqqqqqqqqqqqqqqqqqqqqqqqq
                                        qqqqqqqqqqqqqqqqqqqqqqqqq                          qqqqqqqqqqqqqqqqqqqqqqqqq
                                                                                           qqqqqqqqqqqqqqqqqqqqqqqqq                        qqqqqqqqqqqqqqqqqqqqqqqqq
                                                                                                                                            qqqqqqqqqqqqqqqqqqqqqqqqq
                                                                                                                                             qqqqqqqqqqqqqqqqqqqqqqqqq



                                       0      50     100             150                   0   10 20 30 40 50                               0     20     40      60          80

                                                Index                                               Index                                              Index



                                           Paclitaxel+1                                        Docetaxel+1                                        Cytoxan+1
                                 1.0




                                                                                     1.0




                                                                                                                                      1.0
                                                                      q                                                                                                    qq
                                                                                                                                                                          q
                                                                     q                                                  q                                                q
                                                                     q                                                                                                   q
                                                                                                                        q                                               q
                                 0.8




                                                                                     0.8




                                                                                                                                      0.8
                                                                                                                                                                       q
                                                                                                                                                                      q
                                                                                                                                                                     q
                                                                                                                    q
                                                                                                                    q
                                 0.6




                                                                                     0.6




                                                                                                                                      0.6
                       P Value




                                                                           P Value




                                                                                                                    q




                                                                                                                            P Value
                                                                                                                   q
                                                                                                                  qq                                              q
                                                                                                                 qq
                                                                                                                 q                                               q
                                                                 q
                                 0.4




                                                                                     0.4




                                                                                                                                      0.4
                                                                                                             q                                                   q
                                                                                                                                                               qq
                                                                 q                                                                                            q
                                                                                                                                                         qqqq
                                                                                                                                                            q
                                                             q
                                 0.2




                                                                                     0.2




                                                                                                                                      0.2
                                                                                                                                                      qq
                                                                                                                                                       q
                                                                                                             q
                                                                                                                                                   qq
                                                                                                                                                    q
                                                                                                             q
                                                                                                             q                                    q
                                                             q                                             qq                                    q
                                                                                                                                                q
                                 0.0




                                                                                     0.0




                                                                                                                                      0.0
                                       qqqqqqqqqqqqqq
                                        qqqqqqqqqqqqqq                                     qqqqqqqqqqqqqqqqq
                                                                                           qqqqqqqqqqqqqqqq                                  qqq
                                                                                                                                            qqq



                                       0 5      15      25           35                    0   10 20 30 40 50                               0 5        15       25           35

                                                Index                                               Index                                              Index




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            8


                                Using Their Software
Their software requires two input files:

1. a quantification matrix, genes by samples, with a header
   giving classifications (0 = Resistant, 1 = Sensitive, 2 = Test)

2. a list of probeset ids in the same order as the quantification
   matrix. This list must not have a header row.

What do we get?




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                            9

       Heatmaps Match Exactly for Most Drugs!
From the paper:




From the software:
G ENOMIC S IGNATURES                                            9

       Heatmaps Match Exactly for Most Drugs!
From the paper:




From the software:




We match heatmaps but not gene lists? We’ll come back to
this, because their software also gives predictions.
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            10


                  Predicting Docetaxel (Chang 03)




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            11

             Predicting Adriamycin (Holleman 04)




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                              12


                         There Were Other Genes...
The 50-gene list for docetaxel has 19 “outliers”.

The initial paper on the test data (Chang et al) gave a list of
92 genes that separated responders from nonresponders.



Entries 7-20 in Chang et al’s list comprise 14/19 outliers.

The others: ERCC1, ERCC4, ERBB2, BCL2L11, TUBA3.
These are the genes named to explain the biology.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            13

    A Repro Theme: Don’t Take My Word For It!
Read the paper! Coombes, Wang & Baggerly, Nat Med, Nov
6, 2007, 13:1276-7, author reply 1277-8.

Try it yourselves! All of the raw data, documentation*, and
code* is available from our web site (*and from Nat Med):

http://bioinformatics.mdanderson.org/
Supplements/ReproRsch-Chemo.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            14


          Potti/Nevins Reply (Nat Med 13:1277-8)
Labels for Adria are correct – details on their web page.

They’ve gotten the approach to work again. (Twice!)




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                              15


        Adriamycin 0.9999+ Correlations (Reply)
G ENOMIC S IGNATURES                                            15


        Adriamycin 0.9999+ Correlations (Reply)




Redone Aug 08, “using ... 95 unique samples” (also wrong)
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            16

                               Validation 1: Hsu et al




J Clin Oncol, Oct 1, 2007, 25:4350-7.

Same approach, using Cisplatin and Pemetrexed.

For cisplatin, U133A arrays were used for training. ERCC1,
ERCC4 and DNA repair genes are identified as “important”.

With some work, we matched the heatmaps. (Gene lists?)



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                  17


                       The 4 We Can’t Match (Reply)
203719         at, ERCC1,
210158         at, ERCC4,
228131         at, ERCC1, and
231971         at, FANCM (DNA Repair).

The last two probesets are special.
G ENOMIC S IGNATURES                                            17


                       The 4 We Can’t Match (Reply)
203719         at, ERCC1,
210158         at, ERCC4,
228131         at, ERCC1, and
231971         at, FANCM (DNA Repair).

The last two probesets are special.


These probesets aren’t on the U133A arrays that were used.
They’re on the U133B.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            18

                         Validation 2: Bonnefoi et al




Lancet Oncology, Dec 2007, 8:1071-8. (early access Nov 14)

Similar approach, using signatures for Fluorouracil, Epirubicin
Cyclophosphamide, and Taxotere to predict response to
combination therapies: FEC and TET.

Potentially improves ER- response from 44% to 70%.



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                                              19


            We Might Expect Some Differences...




   High Sample Correlations                                     Array Run Dates
    after Centering by Gene

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                           20


                       How Are Results Combined?
Potti et al predict response to TFAC, Bonnefoi et al to TET
and FEC. Let P() indicate prob sensitive. The rules used are
as follows.
G ENOMIC S IGNATURES                                           20


                       How Are Results Combined?
Potti et al predict response to TFAC, Bonnefoi et al to TET
and FEC. Let P() indicate prob sensitive. The rules used are
as follows.


P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C).
G ENOMIC S IGNATURES                                           20


                       How Are Results Combined?
Potti et al predict response to TFAC, Bonnefoi et al to TET
and FEC. Let P() indicate prob sensitive. The rules used are
as follows.


P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C).


                          P (ET ) = max[P (E), P (T )].
G ENOMIC S IGNATURES                                           20


                       How Are Results Combined?
Potti et al predict response to TFAC, Bonnefoi et al to TET
and FEC. Let P() indicate prob sensitive. The rules used are
as follows.


P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C).


                          P (ET ) = max[P (E), P (T )].


                            5                          1
                  P (F EC) = [P (F ) + P (E) + P (C)] − .
                            8                          4
G ENOMIC S IGNATURES                                            20


                       How Are Results Combined?
Potti et al predict response to TFAC, Bonnefoi et al to TET
and FEC. Let P() indicate prob sensitive. The rules used are
as follows.


P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C).


                                P (ET ) = max[P (E), P (T )].


                             5                          1
                   P (F EC) = [P (F ) + P (E) + P (C)] − .
                             8                          4

Each rule is different.
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            21

       Predictions for Individual Drugs? (Reply)




Does cytoxan make sense?
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                           22

                       Temozolomide Heatmaps




Augustine et al., 2009, Clin
Can Res, 15:502-10, Fig 4A.
Temozolomide, NCI-60.
G ENOMIC S IGNATURES                                                                        22

                           Temozolomide Heatmaps




Augustine et al., 2009, Clin                                    Hsu et al., 2007, J Clin
Can Res, 15:502-10, Fig 4A.                                     Oncol, 25:4350-7, Fig 1A.
Temozolomide, NCI-60.                                           Cisplatin, Gyorffy cell lines.


c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                      23


                       Some Timeline Here...
Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*.
Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*.
(* errors reported)
G ENOMIC S IGNATURES                                         23


                       Some Timeline Here...
Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*.
Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*.
(* errors reported)


May/June 2009: we learn clinical trials had begun.
2007: pemetrexed vs cisplatin, pem vs vinorelbine.
2008: docetaxel vs doxorubicin, topotecan vs dox (Moffitt).
G ENOMIC S IGNATURES                                            23


                               Some Timeline Here...
Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*.
Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*.
(* errors reported)


May/June 2009: we learn clinical trials had begun.
2007: pemetrexed vs cisplatin, pem vs vinorelbine.
2008: docetaxel vs doxorubicin, topotecan vs dox (Moffitt).


Sep 1. Paper submitted to Annals of Applied Statistics.
Sep 14. Paper online at Annals of Applied Statistics.
Sep-Oct: Story covered by The Cancer Letter, Duke starts
internal investigation, suspends trials.

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                              24

                                           Jan 29, 2010




Their investigation’s results “strengthen ... confidence in this
evolving approach to personalized cancer treatment.”
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                           25

                       Why We’re Unhappy...
“While the reviewers approved of our sharing the report with
the NCI, we consider it a confidential document” (Duke). A
future paper will explain the methods.
G ENOMIC S IGNATURES                                            25

                               Why We’re Unhappy...
“While the reviewers approved of our sharing the report with
the NCI, we consider it a confidential document” (Duke). A
future paper will explain the methods.


There was also a major new development that the restart
announcement didn’t mention.

In mid-Nov (mid-investigation), the Duke team posted new
data for cisplatin and pemetrexed (in trials since ’07).

These included quantifications for 59 ovarian cancer test
samples (from GSE3149) used for predictor validation.



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            26


                  We Tried Matching The Samples




43 samples are mislabeled; 16 don’t match at all.
The first 16 don’t match because the genes are mislabeled.
We reported this to Duke and to the NCI in mid-November.
All data was stripped from the websites within the week.

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                           27

                       More Timeline
“While the reviewers approved of our sharing the report with
the NCI...”
G ENOMIC S IGNATURES                                           27

                       More Timeline
“While the reviewers approved of our sharing the report with
the NCI...”


April, 2010. Review report sought from the NCI under the
Freedom of Information Act (FOIA).
May, 2010. Redacted report supplied; gaps noted.
G ENOMIC S IGNATURES                                            27

                                        More Timeline
“While the reviewers approved of our sharing the report with
the NCI...”


April, 2010. Review report sought from the NCI under the
Freedom of Information Act (FOIA).
May, 2010. Redacted report supplied; gaps noted.

May, 2010. NCI and CALGB pull lung metagene signature
from an ongoing phase III trial.

Duke trials continue.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            28

                                          July 16, 2010




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            29


                                  Subsequent Events
July 19/20: letter to Varmus; Duke resuspends trials
July 30: Varmus & Duke request IOM Involvement
Oct 22/29: call to retract JCO paper
Nov 9: Duke announces trials terminated
Nov 19: call to retract Nat Med paper, Potti resigns

There have been at least three developments since.

These involve the NCI, the FDA, and Duke.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            30


  Dec 20, 2010: The IOM Meets, the NCI Speaks
At the first meeting of the IOM review panel, Lisa McShane
outlined some NCI interactions with the Duke group.
An MP3 is available from the Cancer Letter.

Our questions prompted the NCI to ask questions of its own.
The NCI released about 550 pages of documents giving the
details.

We posted our annotation of these documents and an overall
timeline on Jan 14th.

We’re going to look at one case they examined.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            31


                  The Lung Metagene Score (LMS)




One of “the most signficant advances on the front lines of
cancer.” – Ozols et al., JCO, 25:146-62, 2007, ASCO survey
of 2006.
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                        32


                The Background of CALGB 30506
When this trial was proposed to the NCI, they had questions
about the amount of blinding, so they asked for one more test.
G ENOMIC S IGNATURES                                        32


                The Background of CALGB 30506
When this trial was proposed to the NCI, they had questions
about the amount of blinding, so they asked for one more test.


The LMS failed.

It almost achieved significance going the wrong way.
G ENOMIC S IGNATURES                                            32


                The Background of CALGB 30506
When this trial was proposed to the NCI, they had questions
about the amount of blinding, so they asked for one more test.


The LMS failed.

It almost achieved significance going the wrong way.


After post-hoc adjustments (adjusting for batches), the Duke
group attained some success (with IB, not IA).

At CALGB’s urging, this went forward, but the NCI only
allowed the LMS to be used for stratification, not allocation.



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                     33

                       After our Paper
The NCI asked to see the data and code associated with the
post-hoc adjustment.

Using the code and data provided, the NCI couldn’t
reproduce the success of the post-hoc adjustment.

More, they noticed another problem.

They ran the algorithm and got predictions.
G ENOMIC S IGNATURES                                     33

                       After our Paper
The NCI asked to see the data and code associated with the
post-hoc adjustment.

Using the code and data provided, the NCI couldn’t
reproduce the success of the post-hoc adjustment.

More, they noticed another problem.

They ran the algorithm and got predictions.


They ran it again.
G ENOMIC S IGNATURES                                            33

                                       After our Paper
The NCI asked to see the data and code associated with the
post-hoc adjustment.

Using the code and data provided, the NCI couldn’t
reproduce the success of the post-hoc adjustment.

More, they noticed another problem.

They ran the algorithm and got predictions.


They ran it again.


The predictions changed.


c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                        34


                       The Changes Weren’t Subtle
Some scores changed from 5% to 95%.

From one run to the next, high/low classifications changed
about 25% of the time.
G ENOMIC S IGNATURES                                        34


                       The Changes Weren’t Subtle
Some scores changed from 5% to 95%.

From one run to the next, high/low classifications changed
about 25% of the time.


The NCI was unhappy. They had understood the rule to be
“locked down”, so that predictions wouldn’t change.
G ENOMIC S IGNATURES                                        34


                       The Changes Weren’t Subtle
Some scores changed from 5% to 95%.

From one run to the next, high/low classifications changed
about 25% of the time.


The NCI was unhappy. They had understood the rule to be
“locked down”, so that predictions wouldn’t change.


The NCI publicly yanked the LMS from CALGB 30506 in May
2010.
G ENOMIC S IGNATURES                                            34


                       The Changes Weren’t Subtle
Some scores changed from 5% to 95%.

From one run to the next, high/low classifications changed
about 25% of the time.


The NCI was unhappy. They had understood the rule to be
“locked down”, so that predictions wouldn’t change.


The NCI publicly yanked the LMS from CALGB 30506 in May
2010.


The NEJM paper was retracted March 2, 2011.

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                         35

                       The FDA Visits
On Jan 28, 2011, the Cancer Letter reported that an FDA
Audit team was visiting Duke to examine how the trials had
been run.

A key question was whether Investigational Device
Exemptions (IDEs) were obtained before the signatures were
used to guide therapy. They weren’t.
G ENOMIC S IGNATURES                                            35

                                        The FDA Visits
On Jan 28, 2011, the Cancer Letter reported that an FDA
Audit team was visiting Duke to examine how the trials had
been run.

A key question was whether Investigational Device
Exemptions (IDEs) were obtained before the signatures were
used to guide therapy. They weren’t.


Some aspects of this issue were anticipated by the IOM
committee (Baggerly and Coombes, Clinical Chemistry,
57(5):688-90).

Per the FDA, genomic signatures are medical devices.


c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                        36

                       Duke’s Initial Investigation
One thing that had puzzled us since the trial restarts was how
the external reviewers missed the new errors we reported.
G ENOMIC S IGNATURES                                            36

                         Duke’s Initial Investigation
One thing that had puzzled us since the trial restarts was how
the external reviewers missed the new errors we reported.


Jan 11, 2011: Nature talks to Duke.

The Duke deans overseeing the investigation, in consultation
with the acting head of Duke’s IRB, decided not to forward
our report to the reviewers.

This was done to avoid “biasing the review”.




c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            37


                  The IOM Meets Again: Mar 30-31
Science: Ned Calogne, Rich Simon, Chuck Perou, Sumitha
Mandrekar

Institutional Responsibilty: Peter Pronovost (Hopkins)

Case Studies: Laura Van’t Veer, Steve Shak, Joe Nevins

Responsible Parties: journal editors (Cathy DeAngelis,
JAMA, Veronique Kiermer, Nature, Katrina Kelner, Science),
authors and PIs (Stott Zeger), institutions (Albert Reece, U
Md, Harold Paz, Penn State, Scott Zeger, Hopkins)

Forensics: Keith Baggerly

http://bioinformatics.mdanderson.org/
Supplements/ReproRsch-All/Modified/IOM/

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            38


                       Some Cautions/Observations
We’ve seen problems like these before.

The most common mistakes are simple.

Confounding in the Experimental Design
Mixing up the sample labels
Mixing up the gene labels
Mixing up the group labels
(Most mixups involve simple switches or offsets)

This simplicity is often hidden.

Incomplete documentation

Unfortunately, we suspect
The most simple mistakes are common.

c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            39


        How Should We Pursue Reproducibility?
AAAS, Feb 19: All slides and an MP3 of the audio are
available from
http://www.stanford.edu/˜vcs/AAAS2011/
ENAR, Mar 23: Panel on ethics in biostatistics. Handouts are
available from our Google Group.
What’s Missing? Nature, Mar 23
AACR, Apr 5: Difficulties in moving biomarkers to the clinic.
CSE, May 3
NCI, Jun 23-4 What should the NCI be looking for in grants
that it funds?
Check back...
c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            40


                        What Should the Norm Be?
For our group? Since 2007, we have prepared reports in
Sweave.

For papers? (Baggerly, Nature, Sep 22, 2010)

Things we look for:
1. Data (often mentioned, given MIAME)
2. Provenance
3. Code
4. Descriptions of Nonscriptable Steps
5. Descriptions of Planned Design, if Used.

For clinical trials?



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            41


                          Some Acknowledgements
Kevin Coombes

Shannon Neeley, Jing Wang

David Ransohoff, Gordon Mills

Jane Fridlyand, Lajos Pusztai, Zoltan Szallasi

MDACC Ovarian SPORE, Lung SPORE, Breast SPORE

Now in the Annals of Applied Statistics! Baggerly and
Coombes (2009), 3(4):1309-34.
http://bioinformatics.mdanderson.org/
Supplements/ReproRsch-All



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
G ENOMIC S IGNATURES                                            42

                                                   Index
Title
Cell Line Story
Trying it Ourselves
Matching Features
Using Software/Making Predictions
Outliers
The Reply
Adriamycin Followup
Hsu et al (Cisplatin)
Timeline, Trials, Cancer Letter
Trial Restart and Objections
Final Lessons



c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes

Más contenido relacionado

La actualidad más candente

2q 08 quimica descriptiva(químicaindustrial)
2q 08 quimica descriptiva(químicaindustrial)2q 08 quimica descriptiva(químicaindustrial)
2q 08 quimica descriptiva(químicaindustrial)CAL28
 
Eliminacion e1 y e2
Eliminacion e1 y e2Eliminacion e1 y e2
Eliminacion e1 y e2noec85
 
Heterocyclic compounds part-I (Pyrrole)
Heterocyclic compounds part-I (Pyrrole)Heterocyclic compounds part-I (Pyrrole)
Heterocyclic compounds part-I (Pyrrole)pramod padole
 
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padole
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padoleHeterocyclic compounds part-I (Pyridine) by dr pramod r. padole
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padolepramod padole
 
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...Dr Venkatesh P
 
Purine derivatives- Xanthine (Caffeine)
Purine derivatives- Xanthine (Caffeine)Purine derivatives- Xanthine (Caffeine)
Purine derivatives- Xanthine (Caffeine)SurendraKumar338
 
Aromatic carboxylic acid preparation and reaction
Aromatic carboxylic acid preparation and reactionAromatic carboxylic acid preparation and reaction
Aromatic carboxylic acid preparation and reactionSayyedMohsina
 
Heterocycilc compounds presentation
Heterocycilc compounds presentationHeterocycilc compounds presentation
Heterocycilc compounds presentationrutviklad
 
Practical Experiment 3: Benzocain
Practical Experiment 3: Benzocain Practical Experiment 3: Benzocain
Practical Experiment 3: Benzocain SONALI PAWAR
 
Synthesis of Aspirin.pptx
Synthesis of Aspirin.pptxSynthesis of Aspirin.pptx
Synthesis of Aspirin.pptxSarwarMHajeway
 
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...Sandeep Ambore
 
Pyrimidine
PyrimidinePyrimidine
PyrimidineTaj Khan
 

La actualidad más candente (20)

Determina las formas resonantes
Determina las formas resonantesDetermina las formas resonantes
Determina las formas resonantes
 
2q 08 quimica descriptiva(químicaindustrial)
2q 08 quimica descriptiva(químicaindustrial)2q 08 quimica descriptiva(químicaindustrial)
2q 08 quimica descriptiva(químicaindustrial)
 
Heterocyclic compounds unit III as per PCI syllabus
Heterocyclic compounds unit III as per PCI syllabusHeterocyclic compounds unit III as per PCI syllabus
Heterocyclic compounds unit III as per PCI syllabus
 
Pyrrole(azole)
Pyrrole(azole)Pyrrole(azole)
Pyrrole(azole)
 
Eliminacion e1 y e2
Eliminacion e1 y e2Eliminacion e1 y e2
Eliminacion e1 y e2
 
Teaching slides for ammonia
Teaching slides for ammoniaTeaching slides for ammonia
Teaching slides for ammonia
 
Heterocyclic compounds part-I (Pyrrole)
Heterocyclic compounds part-I (Pyrrole)Heterocyclic compounds part-I (Pyrrole)
Heterocyclic compounds part-I (Pyrrole)
 
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padole
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padoleHeterocyclic compounds part-I (Pyridine) by dr pramod r. padole
Heterocyclic compounds part-I (Pyridine) by dr pramod r. padole
 
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...
Pyrazole - Synthesis of Pyrazole - Characteristic Reactions of Pyrazole - Med...
 
Unit 3 Pyrrole
Unit 3 PyrroleUnit 3 Pyrrole
Unit 3 Pyrrole
 
Purine derivatives- Xanthine (Caffeine)
Purine derivatives- Xanthine (Caffeine)Purine derivatives- Xanthine (Caffeine)
Purine derivatives- Xanthine (Caffeine)
 
Preparation of Benzocaine
Preparation of BenzocainePreparation of Benzocaine
Preparation of Benzocaine
 
Unit 4 Pyridine
Unit 4 PyridineUnit 4 Pyridine
Unit 4 Pyridine
 
Aromatic carboxylic acid preparation and reaction
Aromatic carboxylic acid preparation and reactionAromatic carboxylic acid preparation and reaction
Aromatic carboxylic acid preparation and reaction
 
Heterocycilc compounds presentation
Heterocycilc compounds presentationHeterocycilc compounds presentation
Heterocycilc compounds presentation
 
Practical Experiment 3: Benzocain
Practical Experiment 3: Benzocain Practical Experiment 3: Benzocain
Practical Experiment 3: Benzocain
 
Synthesis of Aspirin.pptx
Synthesis of Aspirin.pptxSynthesis of Aspirin.pptx
Synthesis of Aspirin.pptx
 
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...
State of matter 3: Latent heat, Vapor pressure, Critical point, Eutectic mixt...
 
Pyrimidine
PyrimidinePyrimidine
Pyrimidine
 
ALCOHOLES Y FENOLES
ALCOHOLES Y FENOLESALCOHOLES Y FENOLES
ALCOHOLES Y FENOLES
 

Destacado

The Calculus Of Rainbows
The Calculus Of RainbowsThe Calculus Of Rainbows
The Calculus Of Rainbowscathyma1993
 
Geological day 2012
Geological day 2012Geological day 2012
Geological day 2012Ale ZenaIT
 
Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestICZN
 
Intro computer
Intro computerIntro computer
Intro computerchamnong
 
Global Image Company Profile
Global Image Company ProfileGlobal Image Company Profile
Global Image Company ProfileGIS Global Image
 
Software Development
Software DevelopmentSoftware Development
Software Developmentebaad
 
IntelliManage (Hebrew)
IntelliManage (Hebrew)IntelliManage (Hebrew)
IntelliManage (Hebrew)IntelliManage
 
Accountability applying research to organizations and associates
Accountability   applying research to organizations and associatesAccountability   applying research to organizations and associates
Accountability applying research to organizations and associatesDani
 
Vestul Anatoliei- Milet, autor Nick Sava
Vestul Anatoliei- Milet, autor Nick SavaVestul Anatoliei- Milet, autor Nick Sava
Vestul Anatoliei- Milet, autor Nick SavaEmanuel Pope
 
Revista Itaca Nr. 2, 2013
Revista Itaca Nr. 2, 2013Revista Itaca Nr. 2, 2013
Revista Itaca Nr. 2, 2013Emanuel Pope
 
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂ
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂDEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂ
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂEmanuel Pope
 
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Ivan Oransky
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminarSang-il Jung
 
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale Emanuel Pope
 
Dangerous universe
Dangerous universeDangerous universe
Dangerous universeBrian Thomas
 
Prefix Group Capabilities Deck
Prefix Group Capabilities DeckPrefix Group Capabilities Deck
Prefix Group Capabilities Deckdanielledavis
 
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...ICZN
 

Destacado (20)

The Calculus Of Rainbows
The Calculus Of RainbowsThe Calculus Of Rainbows
The Calculus Of Rainbows
 
Geological day 2012
Geological day 2012Geological day 2012
Geological day 2012
 
Zoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed TestZoo Bank Talk Ms Ccourse09 Compressed Test
Zoo Bank Talk Ms Ccourse09 Compressed Test
 
Intro computer
Intro computerIntro computer
Intro computer
 
Global Image Company Profile
Global Image Company ProfileGlobal Image Company Profile
Global Image Company Profile
 
Software Development
Software DevelopmentSoftware Development
Software Development
 
IntelliManage (Hebrew)
IntelliManage (Hebrew)IntelliManage (Hebrew)
IntelliManage (Hebrew)
 
Accountability applying research to organizations and associates
Accountability   applying research to organizations and associatesAccountability   applying research to organizations and associates
Accountability applying research to organizations and associates
 
Vestul Anatoliei- Milet, autor Nick Sava
Vestul Anatoliei- Milet, autor Nick SavaVestul Anatoliei- Milet, autor Nick Sava
Vestul Anatoliei- Milet, autor Nick Sava
 
Revista Itaca Nr. 2, 2013
Revista Itaca Nr. 2, 2013Revista Itaca Nr. 2, 2013
Revista Itaca Nr. 2, 2013
 
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂ
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂDEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂ
DEZBATERILE GRUPULUI DE REFLECȚIE PRIVIND DEMOCRATIA REALĂ
 
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
 
Mobile web me2day_seminar
Mobile web me2day_seminarMobile web me2day_seminar
Mobile web me2day_seminar
 
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale
Caietul nr. 10 al Grupului de reflecție privind problemele democrației reale
 
Autos
AutosAutos
Autos
 
Voz, audio e imagen
Voz, audio e imagenVoz, audio e imagen
Voz, audio e imagen
 
Dangerous universe
Dangerous universeDangerous universe
Dangerous universe
 
Prefix Group Capabilities Deck
Prefix Group Capabilities DeckPrefix Group Capabilities Deck
Prefix Group Capabilities Deck
 
War
WarWar
War
 
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...
Per de Place Bjørn - Revolutionizing taxonomy through an open-access web-regi...
 

Similar a Baggerly presentation from CSE

Ispra 2007 luis martín2
Ispra 2007 luis martín2Ispra 2007 luis martín2
Ispra 2007 luis martín2IrSOLaV Pomares
 
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...pmaloney1
 
ST.Monteiro-EmbeddedFeatureSelection.pdf
ST.Monteiro-EmbeddedFeatureSelection.pdfST.Monteiro-EmbeddedFeatureSelection.pdf
ST.Monteiro-EmbeddedFeatureSelection.pdfgrssieee
 
Bayesian hypothesis testing
Bayesian hypothesis testingBayesian hypothesis testing
Bayesian hypothesis testingJohn Cook
 
Presenting objective and subjective uncertainty information for spatial syste...
Presenting objective and subjective uncertainty information for spatial syste...Presenting objective and subjective uncertainty information for spatial syste...
Presenting objective and subjective uncertainty information for spatial syste...University of Adelaide
 
Artefatos para gestão de problemas
Artefatos para gestão de problemasArtefatos para gestão de problemas
Artefatos para gestão de problemasFernando Palma
 
Miltenburg M Sc Presentation Tu Delft
Miltenburg   M Sc Presentation Tu DelftMiltenburg   M Sc Presentation Tu Delft
Miltenburg M Sc Presentation Tu Delftimiltenburg
 

Similar a Baggerly presentation from CSE (10)

Ispra 2007 luis martín2
Ispra 2007 luis martín2Ispra 2007 luis martín2
Ispra 2007 luis martín2
 
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...
Objective Determination Of Minimum Engine Mapping Requirements For Optimal SI...
 
Sattose 2011
Sattose 2011Sattose 2011
Sattose 2011
 
ST.Monteiro-EmbeddedFeatureSelection.pdf
ST.Monteiro-EmbeddedFeatureSelection.pdfST.Monteiro-EmbeddedFeatureSelection.pdf
ST.Monteiro-EmbeddedFeatureSelection.pdf
 
Bayesian hypothesis testing
Bayesian hypothesis testingBayesian hypothesis testing
Bayesian hypothesis testing
 
Presenting objective and subjective uncertainty information for spatial syste...
Presenting objective and subjective uncertainty information for spatial syste...Presenting objective and subjective uncertainty information for spatial syste...
Presenting objective and subjective uncertainty information for spatial syste...
 
Artefatos para gestão de problemas
Artefatos para gestão de problemasArtefatos para gestão de problemas
Artefatos para gestão de problemas
 
9th ICCS Noordwijkerhout
9th ICCS Noordwijkerhout9th ICCS Noordwijkerhout
9th ICCS Noordwijkerhout
 
Newest Approach to Breast Cancer
Newest Approach to Breast CancerNewest Approach to Breast Cancer
Newest Approach to Breast Cancer
 
Miltenburg M Sc Presentation Tu Delft
Miltenburg   M Sc Presentation Tu DelftMiltenburg   M Sc Presentation Tu Delft
Miltenburg M Sc Presentation Tu Delft
 

Más de Ivan Oransky

The relationship between PubPeer comments and the speed of retraction
The relationship between PubPeer comments and the speed of retractionThe relationship between PubPeer comments and the speed of retraction
The relationship between PubPeer comments and the speed of retractionIvan Oransky
 
During COVID-19, Are Preprints The Problem?
During COVID-19, Are Preprints The Problem?During COVID-19, Are Preprints The Problem?
During COVID-19, Are Preprints The Problem?Ivan Oransky
 
Does Science Self-Correct?
Does Science Self-Correct?Does Science Self-Correct?
Does Science Self-Correct?Ivan Oransky
 
Why We Should Be Talking About Reproducibility -- But Not Forget About Fraud
Why We Should Be Talking About Reproducibility -- But Not Forget About FraudWhy We Should Be Talking About Reproducibility -- But Not Forget About Fraud
Why We Should Be Talking About Reproducibility -- But Not Forget About FraudIvan Oransky
 
WCRI 2019: What makes a story a story?
WCRI 2019: What makes a story a story?WCRI 2019: What makes a story a story?
WCRI 2019: What makes a story a story?Ivan Oransky
 
The Retraction Watch Database
The Retraction Watch DatabaseThe Retraction Watch Database
The Retraction Watch DatabaseIvan Oransky
 
Fake Peer Review: What We've Learned at Retraction Watch
Fake Peer Review: What We've Learned at Retraction WatchFake Peer Review: What We've Learned at Retraction Watch
Fake Peer Review: What We've Learned at Retraction WatchIvan Oransky
 
APRIN talk, Tokyo, March 2018
APRIN talk, Tokyo, March 2018APRIN talk, Tokyo, March 2018
APRIN talk, Tokyo, March 2018Ivan Oransky
 
FNLM Carlisle talk
FNLM Carlisle talkFNLM Carlisle talk
FNLM Carlisle talkIvan Oransky
 
Does Social Psychology Really Have More Retractions?
Does Social Psychology Really Have More Retractions?Does Social Psychology Really Have More Retractions?
Does Social Psychology Really Have More Retractions?Ivan Oransky
 
Scientific Fraud, Retractions, and the Future of Scientific Publishing
Scientific Fraud, Retractions, and the Future of Scientific PublishingScientific Fraud, Retractions, and the Future of Scientific Publishing
Scientific Fraud, Retractions, and the Future of Scientific PublishingIvan Oransky
 
How Often Are Studies Wrong?
How Often Are Studies Wrong?How Often Are Studies Wrong?
How Often Are Studies Wrong?Ivan Oransky
 
How Journalists Can Effectively -- And Safely -- Report on Scientific Fraud
How Journalists Can Effectively -- And Safely -- Report on Scientific FraudHow Journalists Can Effectively -- And Safely -- Report on Scientific Fraud
How Journalists Can Effectively -- And Safely -- Report on Scientific FraudIvan Oransky
 
How to pitch reporters without being annoying
How to pitch reporters without being annoyingHow to pitch reporters without being annoying
How to pitch reporters without being annoyingIvan Oransky
 
Reviewers (And Editors) Behaving Badly
Reviewers (And Editors) Behaving BadlyReviewers (And Editors) Behaving Badly
Reviewers (And Editors) Behaving BadlyIvan Oransky
 
Does Stem Cell Research Have A Retraction Problem?
Does Stem Cell Research Have A Retraction Problem?Does Stem Cell Research Have A Retraction Problem?
Does Stem Cell Research Have A Retraction Problem?Ivan Oransky
 
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanCoverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanIvan Oransky
 
Shoot The Messenger? Challenges in Medical Journalism
Shoot The Messenger? Challenges in Medical JournalismShoot The Messenger? Challenges in Medical Journalism
Shoot The Messenger? Challenges in Medical JournalismIvan Oransky
 
Debunking Made Easy, AHCJ 2014
Debunking Made Easy, AHCJ 2014Debunking Made Easy, AHCJ 2014
Debunking Made Easy, AHCJ 2014Ivan Oransky
 

Más de Ivan Oransky (20)

The relationship between PubPeer comments and the speed of retraction
The relationship between PubPeer comments and the speed of retractionThe relationship between PubPeer comments and the speed of retraction
The relationship between PubPeer comments and the speed of retraction
 
During COVID-19, Are Preprints The Problem?
During COVID-19, Are Preprints The Problem?During COVID-19, Are Preprints The Problem?
During COVID-19, Are Preprints The Problem?
 
Does Science Self-Correct?
Does Science Self-Correct?Does Science Self-Correct?
Does Science Self-Correct?
 
Why We Should Be Talking About Reproducibility -- But Not Forget About Fraud
Why We Should Be Talking About Reproducibility -- But Not Forget About FraudWhy We Should Be Talking About Reproducibility -- But Not Forget About Fraud
Why We Should Be Talking About Reproducibility -- But Not Forget About Fraud
 
WCRI 2019: What makes a story a story?
WCRI 2019: What makes a story a story?WCRI 2019: What makes a story a story?
WCRI 2019: What makes a story a story?
 
The Retraction Watch Database
The Retraction Watch DatabaseThe Retraction Watch Database
The Retraction Watch Database
 
Fake Peer Review: What We've Learned at Retraction Watch
Fake Peer Review: What We've Learned at Retraction WatchFake Peer Review: What We've Learned at Retraction Watch
Fake Peer Review: What We've Learned at Retraction Watch
 
APRIN talk, Tokyo, March 2018
APRIN talk, Tokyo, March 2018APRIN talk, Tokyo, March 2018
APRIN talk, Tokyo, March 2018
 
FNLM Carlisle talk
FNLM Carlisle talkFNLM Carlisle talk
FNLM Carlisle talk
 
Does Social Psychology Really Have More Retractions?
Does Social Psychology Really Have More Retractions?Does Social Psychology Really Have More Retractions?
Does Social Psychology Really Have More Retractions?
 
Scientific Fraud, Retractions, and the Future of Scientific Publishing
Scientific Fraud, Retractions, and the Future of Scientific PublishingScientific Fraud, Retractions, and the Future of Scientific Publishing
Scientific Fraud, Retractions, and the Future of Scientific Publishing
 
How Often Are Studies Wrong?
How Often Are Studies Wrong?How Often Are Studies Wrong?
How Often Are Studies Wrong?
 
How Journalists Can Effectively -- And Safely -- Report on Scientific Fraud
How Journalists Can Effectively -- And Safely -- Report on Scientific FraudHow Journalists Can Effectively -- And Safely -- Report on Scientific Fraud
How Journalists Can Effectively -- And Safely -- Report on Scientific Fraud
 
How to pitch reporters without being annoying
How to pitch reporters without being annoyingHow to pitch reporters without being annoying
How to pitch reporters without being annoying
 
Reviewers (And Editors) Behaving Badly
Reviewers (And Editors) Behaving BadlyReviewers (And Editors) Behaving Badly
Reviewers (And Editors) Behaving Badly
 
Does Stem Cell Research Have A Retraction Problem?
Does Stem Cell Research Have A Retraction Problem?Does Stem Cell Research Have A Retraction Problem?
Does Stem Cell Research Have A Retraction Problem?
 
Rutgers
RutgersRutgers
Rutgers
 
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanCoverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
 
Shoot The Messenger? Challenges in Medical Journalism
Shoot The Messenger? Challenges in Medical JournalismShoot The Messenger? Challenges in Medical Journalism
Shoot The Messenger? Challenges in Medical Journalism
 
Debunking Made Easy, AHCJ 2014
Debunking Made Easy, AHCJ 2014Debunking Made Easy, AHCJ 2014
Debunking Made Easy, AHCJ 2014
 

Último

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Último (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Baggerly presentation from CSE

  • 1. The Importance of Reproducibility in High-Throughput Studies: Case Studies in Forensic Bioinformatics Keith A. Baggerly Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org CSE, May 3, 2011
  • 2. G ENOMIC S IGNATURES 1 Why is Reproducibility Important in H-T-S? Our intuition about what “makes sense” is very poor in high dimensions. To use “genomic signatures” as biomarkers, we need to know they’ve been assembled correctly. Without documentation, we may need to employ forensic bioinformatics to infer what was done to obtain the results. Let’s examine some case studies involving an important clinical problem: can we predict how a given patient will respond to available chemotherapeutics? c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 3. G ENOMIC S IGNATURES 2 Using the NCI60 to Predict Sensitivity Potti et al (2006), Nature Medicine, 12:1294-1300. The main conclusion is that we can use microarray data from cell lines (the NCI60) to define drug response “signatures”, which can be used to predict whether patients will respond. They provide examples using 7 commonly used agents. This got people at MDA very excited. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 4. G ENOMIC S IGNATURES 3 Fit Training Data We want the test data to split like this... c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 5. G ENOMIC S IGNATURES 4 Fit Testing Data But it doesn’t. Did we do something wrong? c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 6. G ENOMIC S IGNATURES 5 5-FU Heatmaps Nat Med Paper
  • 7. G ENOMIC S IGNATURES 5 5-FU Heatmaps Nat Med Paper Our t-tests
  • 8. G ENOMIC S IGNATURES 5 5-FU Heatmaps Nat Med Paper Our t-tests Reported Genes c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 9. G ENOMIC S IGNATURES 6 Their List and Ours > temp <- cbind( sort(rownames(pottiUpdated)[fuRows]), sort(rownames(pottiUpdated)[ fuTQNorm@p.values <= fuCut]); > colnames(temp) <- c("Theirs", "Ours"); > temp Theirs Ours ... [3,] "1881_at" "1882_g_at" [4,] "31321_at" "31322_at" [5,] "31725_s_at" "31726_at" [6,] "32307_r_at" "32308_r_at" ... c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 10. G ENOMIC S IGNATURES 7 Offset P-Values: Other Drugs Topotecan+1 Etoposide+1 Adriamycin+1 1.0 1.0 1.0 0.8 0.8 0.8 q 0.6 0.6 0.6 P Value P Value P Value q 0.4 q 0.4 0.4 0.2 0.2 0.2 q 0.0 0.0 0.0 q qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq 0 50 100 150 0 10 20 30 40 50 0 20 40 60 80 Index Index Index Paclitaxel+1 Docetaxel+1 Cytoxan+1 1.0 1.0 1.0 q qq q q q q q q q q 0.8 0.8 0.8 q q q q q 0.6 0.6 0.6 P Value P Value q P Value q qq q qq q q q 0.4 0.4 0.4 q q qq q q qqqq q q 0.2 0.2 0.2 qq q q qq q q q q q qq q q 0.0 0.0 0.0 qqqqqqqqqqqqqq qqqqqqqqqqqqqq qqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqq qqq qqq 0 5 15 25 35 0 10 20 30 40 50 0 5 15 25 35 Index Index Index c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 11. G ENOMIC S IGNATURES 8 Using Their Software Their software requires two input files: 1. a quantification matrix, genes by samples, with a header giving classifications (0 = Resistant, 1 = Sensitive, 2 = Test) 2. a list of probeset ids in the same order as the quantification matrix. This list must not have a header row. What do we get? c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 12. G ENOMIC S IGNATURES 9 Heatmaps Match Exactly for Most Drugs! From the paper: From the software:
  • 13. G ENOMIC S IGNATURES 9 Heatmaps Match Exactly for Most Drugs! From the paper: From the software: We match heatmaps but not gene lists? We’ll come back to this, because their software also gives predictions. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 14. G ENOMIC S IGNATURES 10 Predicting Docetaxel (Chang 03) c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 15. G ENOMIC S IGNATURES 11 Predicting Adriamycin (Holleman 04) c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 16. G ENOMIC S IGNATURES 12 There Were Other Genes... The 50-gene list for docetaxel has 19 “outliers”. The initial paper on the test data (Chang et al) gave a list of 92 genes that separated responders from nonresponders. Entries 7-20 in Chang et al’s list comprise 14/19 outliers. The others: ERCC1, ERCC4, ERBB2, BCL2L11, TUBA3. These are the genes named to explain the biology. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 17. G ENOMIC S IGNATURES 13 A Repro Theme: Don’t Take My Word For It! Read the paper! Coombes, Wang & Baggerly, Nat Med, Nov 6, 2007, 13:1276-7, author reply 1277-8. Try it yourselves! All of the raw data, documentation*, and code* is available from our web site (*and from Nat Med): http://bioinformatics.mdanderson.org/ Supplements/ReproRsch-Chemo. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 18. G ENOMIC S IGNATURES 14 Potti/Nevins Reply (Nat Med 13:1277-8) Labels for Adria are correct – details on their web page. They’ve gotten the approach to work again. (Twice!) c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 19. G ENOMIC S IGNATURES 15 Adriamycin 0.9999+ Correlations (Reply)
  • 20. G ENOMIC S IGNATURES 15 Adriamycin 0.9999+ Correlations (Reply) Redone Aug 08, “using ... 95 unique samples” (also wrong) c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 21. G ENOMIC S IGNATURES 16 Validation 1: Hsu et al J Clin Oncol, Oct 1, 2007, 25:4350-7. Same approach, using Cisplatin and Pemetrexed. For cisplatin, U133A arrays were used for training. ERCC1, ERCC4 and DNA repair genes are identified as “important”. With some work, we matched the heatmaps. (Gene lists?) c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 22. G ENOMIC S IGNATURES 17 The 4 We Can’t Match (Reply) 203719 at, ERCC1, 210158 at, ERCC4, 228131 at, ERCC1, and 231971 at, FANCM (DNA Repair). The last two probesets are special.
  • 23. G ENOMIC S IGNATURES 17 The 4 We Can’t Match (Reply) 203719 at, ERCC1, 210158 at, ERCC4, 228131 at, ERCC1, and 231971 at, FANCM (DNA Repair). The last two probesets are special. These probesets aren’t on the U133A arrays that were used. They’re on the U133B. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 24. G ENOMIC S IGNATURES 18 Validation 2: Bonnefoi et al Lancet Oncology, Dec 2007, 8:1071-8. (early access Nov 14) Similar approach, using signatures for Fluorouracil, Epirubicin Cyclophosphamide, and Taxotere to predict response to combination therapies: FEC and TET. Potentially improves ER- response from 44% to 70%. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 25. G ENOMIC S IGNATURES 19 We Might Expect Some Differences... High Sample Correlations Array Run Dates after Centering by Gene c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 26. G ENOMIC S IGNATURES 20 How Are Results Combined? Potti et al predict response to TFAC, Bonnefoi et al to TET and FEC. Let P() indicate prob sensitive. The rules used are as follows.
  • 27. G ENOMIC S IGNATURES 20 How Are Results Combined? Potti et al predict response to TFAC, Bonnefoi et al to TET and FEC. Let P() indicate prob sensitive. The rules used are as follows. P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C).
  • 28. G ENOMIC S IGNATURES 20 How Are Results Combined? Potti et al predict response to TFAC, Bonnefoi et al to TET and FEC. Let P() indicate prob sensitive. The rules used are as follows. P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C). P (ET ) = max[P (E), P (T )].
  • 29. G ENOMIC S IGNATURES 20 How Are Results Combined? Potti et al predict response to TFAC, Bonnefoi et al to TET and FEC. Let P() indicate prob sensitive. The rules used are as follows. P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C). P (ET ) = max[P (E), P (T )]. 5 1 P (F EC) = [P (F ) + P (E) + P (C)] − . 8 4
  • 30. G ENOMIC S IGNATURES 20 How Are Results Combined? Potti et al predict response to TFAC, Bonnefoi et al to TET and FEC. Let P() indicate prob sensitive. The rules used are as follows. P (T F AC) = P (T )+P (F )+P (A)+P (C)−P (T )P (F )P (A)P (C). P (ET ) = max[P (E), P (T )]. 5 1 P (F EC) = [P (F ) + P (E) + P (C)] − . 8 4 Each rule is different. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 31. G ENOMIC S IGNATURES 21 Predictions for Individual Drugs? (Reply) Does cytoxan make sense? c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 32. G ENOMIC S IGNATURES 22 Temozolomide Heatmaps Augustine et al., 2009, Clin Can Res, 15:502-10, Fig 4A. Temozolomide, NCI-60.
  • 33. G ENOMIC S IGNATURES 22 Temozolomide Heatmaps Augustine et al., 2009, Clin Hsu et al., 2007, J Clin Can Res, 15:502-10, Fig 4A. Oncol, 25:4350-7, Fig 1A. Temozolomide, NCI-60. Cisplatin, Gyorffy cell lines. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 34. G ENOMIC S IGNATURES 23 Some Timeline Here... Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*. Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*. (* errors reported)
  • 35. G ENOMIC S IGNATURES 23 Some Timeline Here... Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*. Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*. (* errors reported) May/June 2009: we learn clinical trials had begun. 2007: pemetrexed vs cisplatin, pem vs vinorelbine. 2008: docetaxel vs doxorubicin, topotecan vs dox (Moffitt).
  • 36. G ENOMIC S IGNATURES 23 Some Timeline Here... Nat Med Nov 06*, Nov 07*, Aug 08. JCO Feb 07*, Oct 07*. Lancet Oncology Dec 07*. PLoS One Apr 08. CCR Jan 09*. (* errors reported) May/June 2009: we learn clinical trials had begun. 2007: pemetrexed vs cisplatin, pem vs vinorelbine. 2008: docetaxel vs doxorubicin, topotecan vs dox (Moffitt). Sep 1. Paper submitted to Annals of Applied Statistics. Sep 14. Paper online at Annals of Applied Statistics. Sep-Oct: Story covered by The Cancer Letter, Duke starts internal investigation, suspends trials. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 37. G ENOMIC S IGNATURES 24 Jan 29, 2010 Their investigation’s results “strengthen ... confidence in this evolving approach to personalized cancer treatment.” c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 38. G ENOMIC S IGNATURES 25 Why We’re Unhappy... “While the reviewers approved of our sharing the report with the NCI, we consider it a confidential document” (Duke). A future paper will explain the methods.
  • 39. G ENOMIC S IGNATURES 25 Why We’re Unhappy... “While the reviewers approved of our sharing the report with the NCI, we consider it a confidential document” (Duke). A future paper will explain the methods. There was also a major new development that the restart announcement didn’t mention. In mid-Nov (mid-investigation), the Duke team posted new data for cisplatin and pemetrexed (in trials since ’07). These included quantifications for 59 ovarian cancer test samples (from GSE3149) used for predictor validation. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 40. G ENOMIC S IGNATURES 26 We Tried Matching The Samples 43 samples are mislabeled; 16 don’t match at all. The first 16 don’t match because the genes are mislabeled. We reported this to Duke and to the NCI in mid-November. All data was stripped from the websites within the week. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 41. G ENOMIC S IGNATURES 27 More Timeline “While the reviewers approved of our sharing the report with the NCI...”
  • 42. G ENOMIC S IGNATURES 27 More Timeline “While the reviewers approved of our sharing the report with the NCI...” April, 2010. Review report sought from the NCI under the Freedom of Information Act (FOIA). May, 2010. Redacted report supplied; gaps noted.
  • 43. G ENOMIC S IGNATURES 27 More Timeline “While the reviewers approved of our sharing the report with the NCI...” April, 2010. Review report sought from the NCI under the Freedom of Information Act (FOIA). May, 2010. Redacted report supplied; gaps noted. May, 2010. NCI and CALGB pull lung metagene signature from an ongoing phase III trial. Duke trials continue. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 44. G ENOMIC S IGNATURES 28 July 16, 2010 c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 45. G ENOMIC S IGNATURES 29 Subsequent Events July 19/20: letter to Varmus; Duke resuspends trials July 30: Varmus & Duke request IOM Involvement Oct 22/29: call to retract JCO paper Nov 9: Duke announces trials terminated Nov 19: call to retract Nat Med paper, Potti resigns There have been at least three developments since. These involve the NCI, the FDA, and Duke. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 46. G ENOMIC S IGNATURES 30 Dec 20, 2010: The IOM Meets, the NCI Speaks At the first meeting of the IOM review panel, Lisa McShane outlined some NCI interactions with the Duke group. An MP3 is available from the Cancer Letter. Our questions prompted the NCI to ask questions of its own. The NCI released about 550 pages of documents giving the details. We posted our annotation of these documents and an overall timeline on Jan 14th. We’re going to look at one case they examined. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 47. G ENOMIC S IGNATURES 31 The Lung Metagene Score (LMS) One of “the most signficant advances on the front lines of cancer.” – Ozols et al., JCO, 25:146-62, 2007, ASCO survey of 2006. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 48. G ENOMIC S IGNATURES 32 The Background of CALGB 30506 When this trial was proposed to the NCI, they had questions about the amount of blinding, so they asked for one more test.
  • 49. G ENOMIC S IGNATURES 32 The Background of CALGB 30506 When this trial was proposed to the NCI, they had questions about the amount of blinding, so they asked for one more test. The LMS failed. It almost achieved significance going the wrong way.
  • 50. G ENOMIC S IGNATURES 32 The Background of CALGB 30506 When this trial was proposed to the NCI, they had questions about the amount of blinding, so they asked for one more test. The LMS failed. It almost achieved significance going the wrong way. After post-hoc adjustments (adjusting for batches), the Duke group attained some success (with IB, not IA). At CALGB’s urging, this went forward, but the NCI only allowed the LMS to be used for stratification, not allocation. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 51. G ENOMIC S IGNATURES 33 After our Paper The NCI asked to see the data and code associated with the post-hoc adjustment. Using the code and data provided, the NCI couldn’t reproduce the success of the post-hoc adjustment. More, they noticed another problem. They ran the algorithm and got predictions.
  • 52. G ENOMIC S IGNATURES 33 After our Paper The NCI asked to see the data and code associated with the post-hoc adjustment. Using the code and data provided, the NCI couldn’t reproduce the success of the post-hoc adjustment. More, they noticed another problem. They ran the algorithm and got predictions. They ran it again.
  • 53. G ENOMIC S IGNATURES 33 After our Paper The NCI asked to see the data and code associated with the post-hoc adjustment. Using the code and data provided, the NCI couldn’t reproduce the success of the post-hoc adjustment. More, they noticed another problem. They ran the algorithm and got predictions. They ran it again. The predictions changed. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 54. G ENOMIC S IGNATURES 34 The Changes Weren’t Subtle Some scores changed from 5% to 95%. From one run to the next, high/low classifications changed about 25% of the time.
  • 55. G ENOMIC S IGNATURES 34 The Changes Weren’t Subtle Some scores changed from 5% to 95%. From one run to the next, high/low classifications changed about 25% of the time. The NCI was unhappy. They had understood the rule to be “locked down”, so that predictions wouldn’t change.
  • 56. G ENOMIC S IGNATURES 34 The Changes Weren’t Subtle Some scores changed from 5% to 95%. From one run to the next, high/low classifications changed about 25% of the time. The NCI was unhappy. They had understood the rule to be “locked down”, so that predictions wouldn’t change. The NCI publicly yanked the LMS from CALGB 30506 in May 2010.
  • 57. G ENOMIC S IGNATURES 34 The Changes Weren’t Subtle Some scores changed from 5% to 95%. From one run to the next, high/low classifications changed about 25% of the time. The NCI was unhappy. They had understood the rule to be “locked down”, so that predictions wouldn’t change. The NCI publicly yanked the LMS from CALGB 30506 in May 2010. The NEJM paper was retracted March 2, 2011. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 58. G ENOMIC S IGNATURES 35 The FDA Visits On Jan 28, 2011, the Cancer Letter reported that an FDA Audit team was visiting Duke to examine how the trials had been run. A key question was whether Investigational Device Exemptions (IDEs) were obtained before the signatures were used to guide therapy. They weren’t.
  • 59. G ENOMIC S IGNATURES 35 The FDA Visits On Jan 28, 2011, the Cancer Letter reported that an FDA Audit team was visiting Duke to examine how the trials had been run. A key question was whether Investigational Device Exemptions (IDEs) were obtained before the signatures were used to guide therapy. They weren’t. Some aspects of this issue were anticipated by the IOM committee (Baggerly and Coombes, Clinical Chemistry, 57(5):688-90). Per the FDA, genomic signatures are medical devices. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 60. G ENOMIC S IGNATURES 36 Duke’s Initial Investigation One thing that had puzzled us since the trial restarts was how the external reviewers missed the new errors we reported.
  • 61. G ENOMIC S IGNATURES 36 Duke’s Initial Investigation One thing that had puzzled us since the trial restarts was how the external reviewers missed the new errors we reported. Jan 11, 2011: Nature talks to Duke. The Duke deans overseeing the investigation, in consultation with the acting head of Duke’s IRB, decided not to forward our report to the reviewers. This was done to avoid “biasing the review”. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 62. G ENOMIC S IGNATURES 37 The IOM Meets Again: Mar 30-31 Science: Ned Calogne, Rich Simon, Chuck Perou, Sumitha Mandrekar Institutional Responsibilty: Peter Pronovost (Hopkins) Case Studies: Laura Van’t Veer, Steve Shak, Joe Nevins Responsible Parties: journal editors (Cathy DeAngelis, JAMA, Veronique Kiermer, Nature, Katrina Kelner, Science), authors and PIs (Stott Zeger), institutions (Albert Reece, U Md, Harold Paz, Penn State, Scott Zeger, Hopkins) Forensics: Keith Baggerly http://bioinformatics.mdanderson.org/ Supplements/ReproRsch-All/Modified/IOM/ c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 63. G ENOMIC S IGNATURES 38 Some Cautions/Observations We’ve seen problems like these before. The most common mistakes are simple. Confounding in the Experimental Design Mixing up the sample labels Mixing up the gene labels Mixing up the group labels (Most mixups involve simple switches or offsets) This simplicity is often hidden. Incomplete documentation Unfortunately, we suspect The most simple mistakes are common. c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 64. G ENOMIC S IGNATURES 39 How Should We Pursue Reproducibility? AAAS, Feb 19: All slides and an MP3 of the audio are available from http://www.stanford.edu/˜vcs/AAAS2011/ ENAR, Mar 23: Panel on ethics in biostatistics. Handouts are available from our Google Group. What’s Missing? Nature, Mar 23 AACR, Apr 5: Difficulties in moving biomarkers to the clinic. CSE, May 3 NCI, Jun 23-4 What should the NCI be looking for in grants that it funds? Check back... c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 65. G ENOMIC S IGNATURES 40 What Should the Norm Be? For our group? Since 2007, we have prepared reports in Sweave. For papers? (Baggerly, Nature, Sep 22, 2010) Things we look for: 1. Data (often mentioned, given MIAME) 2. Provenance 3. Code 4. Descriptions of Nonscriptable Steps 5. Descriptions of Planned Design, if Used. For clinical trials? c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 66. G ENOMIC S IGNATURES 41 Some Acknowledgements Kevin Coombes Shannon Neeley, Jing Wang David Ransohoff, Gordon Mills Jane Fridlyand, Lajos Pusztai, Zoltan Szallasi MDACC Ovarian SPORE, Lung SPORE, Breast SPORE Now in the Annals of Applied Statistics! Baggerly and Coombes (2009), 3(4):1309-34. http://bioinformatics.mdanderson.org/ Supplements/ReproRsch-All c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes
  • 67. G ENOMIC S IGNATURES 42 Index Title Cell Line Story Trying it Ourselves Matching Features Using Software/Making Predictions Outliers The Reply Adriamycin Followup Hsu et al (Cisplatin) Timeline, Trials, Cancer Letter Trial Restart and Objections Final Lessons c Copyright 2007-2011, Keith A. Baggerly and Kevin R. Coombes