SlideShare una empresa de Scribd logo
1 de 39
De Finetti meets Popper
or Should Bayesians care about falsificationism?
Stephen Senn, Edinburgh
(C) Stephen Senn 2019
Lecture at the Popper Symposium on 7 August 2019 at the
16th International Congress on Logic, Methodology & Philosophy of Science, Prague
Basic thesis Outline
The distinction between refuting and
‘corroborating’ a hypothesis is fundamental.
It does not become irrelevant by adopting a
Bayesian approach to inference.
It has no direct bearing on choice of meaning
for probability: subjective, relative frequency,
propensity, logical etc
Various practical problems in analysing clinical
trials illustrate this
Basic background
• De Finetti’s falsificationism
• Simple illustration
• Jeffreys’s alternative approach
• Inspired by Broad’s challenge
Falsificationist issues in clinical trials
• Bioequivalance
• Equivalence and falsificationism
• Blinding
• Competence
• Causal analysis versus prediction in clinical
trials
Conclusions
(C) Stephen Senn 2019
A puzzle to keep you thinking
(C) Stephen Senn 2019
Suppose we are to have 1 million independent trials with a binary outcome.
We wish to decide, in advance of beginning the trials, which of the following
is more likely
A: 1 million successes and no failures
B: 500,000 successes and 500,000 failures in any order
We use a Bayesian approach with a uninform prior for the binary outcome
(such as would have been employed by Laplace)
What is the correct answer?
Basic background
Very elementary – please accept my apologies
(C) Stephen Senn 2019
“The acquisition of a further piece of information, H - in other words
experience, since experience is nothing more than the acquisition of
further information - acts always and only in the way we have just
described: suppressing the alternatives that turn out to be no longer
possible..”
Popper?
No, de Finetti
(C) Stephen Senn 2019
Example
• A man has a CD of popular music with 12 tracks on it
• He can play tracks in random order (Shuffle) or in sequential order
(Play)
• On a particular occasion he thinks he has pressed Shuffle (that was his
intention) but the first track played is the first track, F, on the CD
• What is the probability that he did, in fact, press Shuffle as intended’
(C) Stephen Senn 2019
We can put this together as follows
“Hypothesis” Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 F 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120
Play 1/10 X 0 0
TOTAL 120/120 = 1
(C) Stephen Senn 2019
Note that in de Fineti’s theory the relevant historical process is that of the individual’s thought process not
“real world” events
After seeing (hearing) the evidence, however, only two rows remain
“Hypothesis” Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 F 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120
Play 1/10 X 0 0
TOTAL 21/120
(C) Stephen Senn 2019
So we rescale by dividing by the total probability
“Hypothesis” Prior
Probability
P
Evidence Likelihood P x L Posterior Probability
Shuffle 9/10 F 1/12 9/120 (9/120)/(21/120)
=9/21
Shuffle 9/10 X 11/12 99/120
Play 1/10 F 1 12/120 (12/120)/(21/120)
=12/21
Play 1/10 X 0 0
TOTAL 21/120 21/21=1
(C) Stephen Senn 2019
Returning to De Finetti’s general approach
• Suppose we declare all possible sequences of some binary outcome (say S=
success and F = failure) equally likely
• Then no learning is possible
• This is because for any sequences consisting of a number of S and F
outcomes, then every possible forward sequence of S and F is also equally
likely
• Thus, observing which sequences have not occurred and renormalising
changes nothing
• Caution is required!
• This is one reason why De Finetti was sceptical about any automatic
approaches to Bayesian inference
(C) Stephen Senn 2019
What Jeffreys Understood
(C) Stephen Senn 2019
Theory of Probability, 3rd edition P128
CD Broad, 1918
(C) Stephen Senn 2019
P393
p394
As m goes to
infinity the first
approaches 1
If n is much greater
than m the latter is
small
The Economist gets it wrong
(C) Stephen Senn 2019
The canonical example is to imagine that a precocious newborn observes
his first sunset, and wonders whether the sun will rise again or not. He
assigns equal prior probabilities to both possible outcomes, and
represents this by placing one white and one black marble into a bag. The
following day, when the sun rises, the child places another white marble
in the bag. The probability that a marble plucked randomly from the bag
will be white (ie, the child’s degree of belief in future sunrises) has thus
gone from a half to two-thirds. After sunrise the next day, the child adds
another white marble, and the probability (and thus the degree of belief)
goes from two-thirds to three-quarters. And so on. Gradually, the initial
belief that the sun is just as likely as not to rise each morning is modified
to become a near-certainty that the sun will always rise.
The Economist, ‘In praise of Bayes’, September 2000
Jeffreys’s solution
• The fact that ‘laws’ cannot be proved using Bayes theorem if the
Laplacian approach to choosing prior distributions is adopted means
that the choice of prior distribution is wrong
• His solution is to place a mass of probability on the hypothesis being
true
• This gives simpler representations of the world more prior weight
than more complex ones
• In his view this is necessary to permit induction to work
• Prior probability replaces (or reflects) parsimony as a principle
(C) Stephen Senn 2019
Falsificationist issues in clinical
trials
Rather more technical – again please accept my apologies
(C) Stephen Senn 2019
Equivalence studies
(including bioequivalence)
• Studies in which one tries to prove that treatments do not differ
• The most extreme example is so-called bioequivalence studies
• The molecule is the same but the formulation differs
• The same manufacturer may wish to replace one route of administration by
another
• For example a suppository by a pill
• Or a single-dose inhaler with a multi-dose one
• Or a different so-called generic manufacture may wish to supply the market
with its version of a now off-patent brand-name product
• Or a manufacturer may wish for labelling reasons to prove that a drug does
not differ whether given with or without food
(C) Stephen Senn 2019
But surely, a drug is a drug?
• In fact, no, changing the formulation can have dramatic effects on
potency of a drug
• Here is an example I was involved with
• Bronchodilator in asthma
• Seven treatments compared over twelve hours using forced expiratory
volume in one second (FEV1)
• Placebo
• 6,12 and 24 g of new formulation (MTA)
• 6,12 and 24 g of old formulation (ISF)
• Other details omitted for the sake of brevity
• The results follow (high values of FEV1 are good)
(C) Stephen Senn 2019
Senn, S.J., et al., An incomplete
blocks cross-over in asthma: a
case study in collaboration, in
Cross-over Clinical Trials, J.
Vollmar and L.A. Hothorn, Editors.
1997, Fischer: Stuttgart. p. 3-26.
(C) Stephen Senn 2019
Treatment Placebo MT&A 6 MT&A 12 MT&A 24
FEV1 (L)
2.0
2.5
Minute
0 180 360 720
Placebo and the 3 doses of the new formulation
(C) Stephen Senn 2019
Treatment Placebo MT&A 6 MT&A 12 MT&A 24
ISF 6 ISF 12 ISF 24
FEV1 (L)
2.0
2.5
Minute
0 180 360 540 720
With the 3 doses of reference formulation added
Bioequivalence in terms of confidence
intervals
What is considered ‘proven’
A: neither equivalence nor difference proven
B: exact equivalence rejected
C: inconclusive
D & E: practical equivalence proven
F: practical equivalence proven but exact equivalence
rejected
G: exact and practical equivalence rejected
(C) Stephen Senn 2019
(C) Stephen Senn 2019
First issue: Blinding and Equivalence
• Running a double blind trial does not protect you against a conclusion
of equivalence
• You do not need to know the treatment code to bias results towards
equivalence
• Consider a particular simple (and very common) form of trial in which
two oral formulations of a molecule are compared by looking at the
concentration time profile in a cross-over trial
• Equivalence of these profiles is taken to mean equivalence of the
formulations
• “The blood is a gate through which the drug must pass”
(C) Stephen Senn 2019
The Unscrupulous Pharmacokineticist
• Take the 12 test tubes for day one for a given
volunteer
• hour 1,2…12
• Take the 12 test tubes for day two for the same
volunteer
• hour 1,2…12
• Mix each pair (by hour) together
• Divide them into two
• Et voila
• Perfect equivalence without having to unblind
(C) Stephen Senn 2019
Fanciful?
• In fact blinding does not protect against false conclusions of
equivalence
• Pharmaceutical companies commonly prosecute cheating doctors
• Reason
• Trial fails to show any effect whereas others do
• Explanation
• The trial never took place
• The data have been invented
• This will produce a conclusion of equivalence
(C) Stephen Senn 2019
Second issue: Competence
• Experiment is fair if treatments are handled equivalently
• in all aspects except those that form the essence (definition) of the treatment
• cannot be determined by looking at outcomes
• Competence is the ability to detect differences
• can only partly be determined on external grounds
• can be established if difference is detected
• It is a matter of “assay sensitivity”
(C) Stephen Senn 2019
A Model for Competence
competent, not competent
equivalent, inequivalent
observed difference, no difference
Likelihoods
( ) ( ) 1
( ) ( ) 1
( ) ( ) 1
( ) ( ) 1
1 0
"Priors" (
C C
E E
D D
P D E C P D E C
P D E C P D E C
P D EC P D EC
P D EC P D EC
P E
 
 
 
 
  



    
      
  
    
   
) , ( )P C E  
See Senn, S.J., Inherent difficulties
with active control equivalence
studies. Statistics in Medicine, 1993.
12(24): p. 2367-75.
(C) Stephen Senn 2019
Interpretation of These Parameters
• 1- and  reflect the ‘precision’ of ‘competent’ experiments
• Their converses  and 1- are analogous to type I and II error rates
•  and 1-, can be reduced by more and more precise experiments
•  represents the probability that where a difference between
treatments really does exist a poor (not competent) experiment will
indicate it exists
• Joint effect of  and  represents factors beyond our control
•  is the probability that ‘Nature’ has decided the two treatments are
equivalent
•  is the probability that the trial is competent given that the treatments are
not equivalent
(C) Stephen Senn 2019
Notes
Under this formulation of the likelihoods it is irrelevant as to whether
the trial is competent if the treatments are equivalent.
We could require the combination EC as impossible.
We require  > , but this is a linguistic convention.
(C) Stephen Senn 2019
For those who like formulae
  
 
1 (1 )
( )
(1 ) (1 )
(1 )
( )
(1 )(1 ) (1 )(1 )(1 ) (1 )
as 1 and 0
( ) 1
( )
(1 )(1 )(1 )
P E D
P E D
P E D
P E D
   
    
 
       
 

   
  
 
   

 
       
 
 
 
   
(C) Stephen Senn 2019
, competence
, prior equivalence
(C) Stephen Senn 2019
, competence
, prior
equivalence
NB  has been reduced to 0.005
(C) Stephen Senn 2019
, competence
, prior equivalence
 = 0.05
Consequences
• Asymmetry between concluding equivalence and difference
• The former is more problematic
• Not just a matter of reformulating the problem
• Conditional on an assumption of competence we can conclude
equivalence
• However, if we have any doubts about competence, these doubts increase by
finding a difference
• Speculation: this is a concrete instance of the more general point
made by Popper and Miller 1987
(C) Stephen Senn 2019
Hunt the thimble
• You are looking for a thimble in a room
• Consider two cases
• You find the thimble
• You search but don’t find the thimble
• Inferences about whether the thimble is in the room or not are
fundamentally different in the two cases
• In the first case, you conclude it is, and your competence as a searcher for
thimbles is irrelevant to this conclusion
• In the second case, you may believe that the thimble is not in the room but
this belief depends on your competence as a thimble-searcher, about
which you may come to have doubts
(C) Stephen Senn 2019
Third issue: causal versus predictive inference
• Clinical trials can be used to try and answer a number of very
different questions
• Two examples are
• Did the treatment have an effect in these patients?
• A causal purpose
• What will the effect be in future patients?
• A predictive purpose
• Unfortunately, in practice, an answer is produced without stating
what the question was
• Given certain assumptions these questions can be answered using the
same analysis but the assumptions are strong and rarely stated
(C) Stephen Senn 2019
Two models
Predictive
• The population is taken to be ‘patients in
general’
• Of course this really means future patients
• They are the ones to whom the treatment
will be applied
• We treat the patients in the trial as an
appropriate selection from this
population
• This does not require them to be typical
but it does require additivity of the
treatment effect
Causal
• We take the patients as fixed
• We want to know what the effect
was for them
• Unfortunately there are missing
counterfactuals
• What would have happened to
control patients given intervention
and vice-versa
• The population is the population of
all possible allocations to the
patients studied
(C) Stephen Senn 2019
Coverage probabilities for two questions
Average treatment effect in population is 300ml FEV1
Predictive Causal
Horizontal dashed line is population average effect (LHS & RHS). Blue horizontal bar is true
trial effect (RHS). Black Cis cover true effect, red don’t).
Conclusion
• There is a fundamental difference between
• Demonstrating that things are different
• Demonstrating they are the same
• There is a fundamental difference between
• Concluding something had an effect
• Concluding it must always have this effect
• Many features of clinical trials reflect this
• The value of blinding
• Competence (assay sensitivity)
• Causal versus predictive inference
• These are not a consequence of being frequentist
• They are not vanquished by becoming Bayesian
• The choice of a Bayesian or frequentist framework does not depend on this
(C) Stephen Senn 2019
(C) Stephen Senn 2019
In summary
“Equivalence is different”
The answer to the puzzle
(C) Stephen Senn 2019
Both are equally likely
The prior distribution is uniform.
By the time we completed the trials the relative frequency will be the probability
But the prior distribution says every probability is equally likely
Therefore it is hardly surprising that every relative frequency will be equally likely
Senn, S.J., Dicing with Death. 2003,
Cambridge: Cambridge University Press.

Más contenido relacionado

La actualidad más candente

Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisationStephen Senn
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliryStephen Senn
 
Real world modified
Real world modifiedReal world modified
Real world modifiedStephen Senn
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisationStephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA FisherStephen Senn
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher Stephen Senn
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3Stephen Senn
 
Seven myths of randomisation
Seven myths of randomisation Seven myths of randomisation
Seven myths of randomisation Stephen Senn
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to misleadStephen Senn
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differencesStephen Senn
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond Stephen Senn
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyoStephen Senn
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance blissStephen Senn
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2Stephen Senn
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?Stephen Senn
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding catsStephen Senn
 
Clinical trials: three statistical traps for the unwary
Clinical trials: three statistical traps for the unwaryClinical trials: three statistical traps for the unwary
Clinical trials: three statistical traps for the unwaryStephen Senn
 
What is your question
What is your questionWhat is your question
What is your questionStephenSenn2
 

La actualidad más candente (20)

Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisation
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
Real world modified
Real world modifiedReal world modified
Real world modified
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
Seven myths of randomisation
Seven myths of randomisation Seven myths of randomisation
Seven myths of randomisation
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to mislead
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding cats
 
Clinical trials: three statistical traps for the unwary
Clinical trials: three statistical traps for the unwaryClinical trials: three statistical traps for the unwary
Clinical trials: three statistical traps for the unwary
 
What is your question
What is your questionWhat is your question
What is your question
 

Similar a De Finetti meets Popper

P values and replication
P values and replicationP values and replication
P values and replicationStephen Senn
 
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Senn repligate
Senn repligateSenn repligate
Senn repligatejemille6
 
And thereby hangs a tail
And thereby hangs a tailAnd thereby hangs a tail
And thereby hangs a tailStephen Senn
 
The Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year LaterThe Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year LaterJimGrange
 
Placebos in medical research
Placebos in medical researchPlacebos in medical research
Placebos in medical researchStephen Senn
 
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes AbyanaPSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes AbyanaNurulAbyana
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxkarlhennesey
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicineStephen Senn
 
Types of research design experiments
Types of research design   experimentsTypes of research design   experiments
Types of research design experimentsrozy_kalsi
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revisedSandra Nicks
 
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docxSheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docxlesleyryder69361
 
8 chapter eightpowerpoint
8 chapter eightpowerpoint8 chapter eightpowerpoint
8 chapter eightpowerpointsagebennet
 

Similar a De Finetti meets Popper (20)

P values and replication
P values and replicationP values and replication
P values and replication
 
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Senn repligate
Senn repligateSenn repligate
Senn repligate
 
And thereby hangs a tail
And thereby hangs a tailAnd thereby hangs a tail
And thereby hangs a tail
 
The Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year LaterThe Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year Later
 
Placebos in medical research
Placebos in medical researchPlacebos in medical research
Placebos in medical research
 
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes AbyanaPSY 341 Judgement, Decisions, Reasoning Notes Abyana
PSY 341 Judgement, Decisions, Reasoning Notes Abyana
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicine
 
P value wars
P value warsP value wars
P value wars
 
Types of research design experiments
Types of research design   experimentsTypes of research design   experiments
Types of research design experiments
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
 
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revised
 
Aron chpt 5 ed
Aron chpt 5 edAron chpt 5 ed
Aron chpt 5 ed
 
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docxSheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
Sheet1ParticipantPretestPost Test115172493632121417145211562525720.docx
 
8 chapter eightpowerpoint
8 chapter eightpowerpoint8 chapter eightpowerpoint
8 chapter eightpowerpoint
 
Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis
 

Más de Stephen Senn

Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtStephen Senn
 
What is your question
What is your questionWhat is your question
What is your questionStephen Senn
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Stephen Senn
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
A century of t tests
A century of t testsA century of t tests
A century of t testsStephen Senn
 
The story of MTA/02
The story of MTA/02The story of MTA/02
The story of MTA/02Stephen Senn
 
Confounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksConfounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksStephen Senn
 

Más de Stephen Senn (7)

Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
 
What is your question
What is your questionWhat is your question
What is your question
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
 
The story of MTA/02
The story of MTA/02The story of MTA/02
The story of MTA/02
 
Confounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksConfounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricks
 

Último

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 

Último (20)

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

De Finetti meets Popper

  • 1. De Finetti meets Popper or Should Bayesians care about falsificationism? Stephen Senn, Edinburgh (C) Stephen Senn 2019 Lecture at the Popper Symposium on 7 August 2019 at the 16th International Congress on Logic, Methodology & Philosophy of Science, Prague
  • 2. Basic thesis Outline The distinction between refuting and ‘corroborating’ a hypothesis is fundamental. It does not become irrelevant by adopting a Bayesian approach to inference. It has no direct bearing on choice of meaning for probability: subjective, relative frequency, propensity, logical etc Various practical problems in analysing clinical trials illustrate this Basic background • De Finetti’s falsificationism • Simple illustration • Jeffreys’s alternative approach • Inspired by Broad’s challenge Falsificationist issues in clinical trials • Bioequivalance • Equivalence and falsificationism • Blinding • Competence • Causal analysis versus prediction in clinical trials Conclusions (C) Stephen Senn 2019
  • 3. A puzzle to keep you thinking (C) Stephen Senn 2019 Suppose we are to have 1 million independent trials with a binary outcome. We wish to decide, in advance of beginning the trials, which of the following is more likely A: 1 million successes and no failures B: 500,000 successes and 500,000 failures in any order We use a Bayesian approach with a uninform prior for the binary outcome (such as would have been employed by Laplace) What is the correct answer?
  • 4. Basic background Very elementary – please accept my apologies (C) Stephen Senn 2019
  • 5. “The acquisition of a further piece of information, H - in other words experience, since experience is nothing more than the acquisition of further information - acts always and only in the way we have just described: suppressing the alternatives that turn out to be no longer possible..” Popper? No, de Finetti (C) Stephen Senn 2019
  • 6. Example • A man has a CD of popular music with 12 tracks on it • He can play tracks in random order (Shuffle) or in sequential order (Play) • On a particular occasion he thinks he has pressed Shuffle (that was his intention) but the first track played is the first track, F, on the CD • What is the probability that he did, in fact, press Shuffle as intended’ (C) Stephen Senn 2019
  • 7. We can put this together as follows “Hypothesis” Prior Probability P Evidence Likelihood P x L Shuffle 9/10 F 1/12 9/120 Shuffle 9/10 X 11/12 99/120 Play 1/10 F 1 12/120 Play 1/10 X 0 0 TOTAL 120/120 = 1 (C) Stephen Senn 2019 Note that in de Fineti’s theory the relevant historical process is that of the individual’s thought process not “real world” events
  • 8. After seeing (hearing) the evidence, however, only two rows remain “Hypothesis” Prior Probability P Evidence Likelihood P x L Shuffle 9/10 F 1/12 9/120 Shuffle 9/10 X 11/12 99/120 Play 1/10 F 1 12/120 Play 1/10 X 0 0 TOTAL 21/120 (C) Stephen Senn 2019
  • 9. So we rescale by dividing by the total probability “Hypothesis” Prior Probability P Evidence Likelihood P x L Posterior Probability Shuffle 9/10 F 1/12 9/120 (9/120)/(21/120) =9/21 Shuffle 9/10 X 11/12 99/120 Play 1/10 F 1 12/120 (12/120)/(21/120) =12/21 Play 1/10 X 0 0 TOTAL 21/120 21/21=1 (C) Stephen Senn 2019
  • 10. Returning to De Finetti’s general approach • Suppose we declare all possible sequences of some binary outcome (say S= success and F = failure) equally likely • Then no learning is possible • This is because for any sequences consisting of a number of S and F outcomes, then every possible forward sequence of S and F is also equally likely • Thus, observing which sequences have not occurred and renormalising changes nothing • Caution is required! • This is one reason why De Finetti was sceptical about any automatic approaches to Bayesian inference (C) Stephen Senn 2019
  • 11. What Jeffreys Understood (C) Stephen Senn 2019 Theory of Probability, 3rd edition P128
  • 12. CD Broad, 1918 (C) Stephen Senn 2019 P393 p394 As m goes to infinity the first approaches 1 If n is much greater than m the latter is small
  • 13. The Economist gets it wrong (C) Stephen Senn 2019 The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise. The Economist, ‘In praise of Bayes’, September 2000
  • 14. Jeffreys’s solution • The fact that ‘laws’ cannot be proved using Bayes theorem if the Laplacian approach to choosing prior distributions is adopted means that the choice of prior distribution is wrong • His solution is to place a mass of probability on the hypothesis being true • This gives simpler representations of the world more prior weight than more complex ones • In his view this is necessary to permit induction to work • Prior probability replaces (or reflects) parsimony as a principle (C) Stephen Senn 2019
  • 15. Falsificationist issues in clinical trials Rather more technical – again please accept my apologies (C) Stephen Senn 2019
  • 16. Equivalence studies (including bioequivalence) • Studies in which one tries to prove that treatments do not differ • The most extreme example is so-called bioequivalence studies • The molecule is the same but the formulation differs • The same manufacturer may wish to replace one route of administration by another • For example a suppository by a pill • Or a single-dose inhaler with a multi-dose one • Or a different so-called generic manufacture may wish to supply the market with its version of a now off-patent brand-name product • Or a manufacturer may wish for labelling reasons to prove that a drug does not differ whether given with or without food (C) Stephen Senn 2019
  • 17. But surely, a drug is a drug? • In fact, no, changing the formulation can have dramatic effects on potency of a drug • Here is an example I was involved with • Bronchodilator in asthma • Seven treatments compared over twelve hours using forced expiratory volume in one second (FEV1) • Placebo • 6,12 and 24 g of new formulation (MTA) • 6,12 and 24 g of old formulation (ISF) • Other details omitted for the sake of brevity • The results follow (high values of FEV1 are good) (C) Stephen Senn 2019 Senn, S.J., et al., An incomplete blocks cross-over in asthma: a case study in collaboration, in Cross-over Clinical Trials, J. Vollmar and L.A. Hothorn, Editors. 1997, Fischer: Stuttgart. p. 3-26.
  • 18. (C) Stephen Senn 2019 Treatment Placebo MT&A 6 MT&A 12 MT&A 24 FEV1 (L) 2.0 2.5 Minute 0 180 360 720 Placebo and the 3 doses of the new formulation
  • 19. (C) Stephen Senn 2019 Treatment Placebo MT&A 6 MT&A 12 MT&A 24 ISF 6 ISF 12 ISF 24 FEV1 (L) 2.0 2.5 Minute 0 180 360 540 720 With the 3 doses of reference formulation added
  • 20. Bioequivalence in terms of confidence intervals What is considered ‘proven’ A: neither equivalence nor difference proven B: exact equivalence rejected C: inconclusive D & E: practical equivalence proven F: practical equivalence proven but exact equivalence rejected G: exact and practical equivalence rejected (C) Stephen Senn 2019
  • 21. (C) Stephen Senn 2019 First issue: Blinding and Equivalence • Running a double blind trial does not protect you against a conclusion of equivalence • You do not need to know the treatment code to bias results towards equivalence • Consider a particular simple (and very common) form of trial in which two oral formulations of a molecule are compared by looking at the concentration time profile in a cross-over trial • Equivalence of these profiles is taken to mean equivalence of the formulations • “The blood is a gate through which the drug must pass”
  • 22. (C) Stephen Senn 2019 The Unscrupulous Pharmacokineticist • Take the 12 test tubes for day one for a given volunteer • hour 1,2…12 • Take the 12 test tubes for day two for the same volunteer • hour 1,2…12 • Mix each pair (by hour) together • Divide them into two • Et voila • Perfect equivalence without having to unblind
  • 23. (C) Stephen Senn 2019 Fanciful? • In fact blinding does not protect against false conclusions of equivalence • Pharmaceutical companies commonly prosecute cheating doctors • Reason • Trial fails to show any effect whereas others do • Explanation • The trial never took place • The data have been invented • This will produce a conclusion of equivalence
  • 24. (C) Stephen Senn 2019 Second issue: Competence • Experiment is fair if treatments are handled equivalently • in all aspects except those that form the essence (definition) of the treatment • cannot be determined by looking at outcomes • Competence is the ability to detect differences • can only partly be determined on external grounds • can be established if difference is detected • It is a matter of “assay sensitivity”
  • 25. (C) Stephen Senn 2019 A Model for Competence competent, not competent equivalent, inequivalent observed difference, no difference Likelihoods ( ) ( ) 1 ( ) ( ) 1 ( ) ( ) 1 ( ) ( ) 1 1 0 "Priors" ( C C E E D D P D E C P D E C P D E C P D E C P D EC P D EC P D EC P D EC P E                                       ) , ( )P C E   See Senn, S.J., Inherent difficulties with active control equivalence studies. Statistics in Medicine, 1993. 12(24): p. 2367-75.
  • 26. (C) Stephen Senn 2019 Interpretation of These Parameters • 1- and  reflect the ‘precision’ of ‘competent’ experiments • Their converses  and 1- are analogous to type I and II error rates •  and 1-, can be reduced by more and more precise experiments •  represents the probability that where a difference between treatments really does exist a poor (not competent) experiment will indicate it exists • Joint effect of  and  represents factors beyond our control •  is the probability that ‘Nature’ has decided the two treatments are equivalent •  is the probability that the trial is competent given that the treatments are not equivalent
  • 27. (C) Stephen Senn 2019 Notes Under this formulation of the likelihoods it is irrelevant as to whether the trial is competent if the treatments are equivalent. We could require the combination EC as impossible. We require  > , but this is a linguistic convention.
  • 28. (C) Stephen Senn 2019 For those who like formulae      1 (1 ) ( ) (1 ) (1 ) (1 ) ( ) (1 )(1 ) (1 )(1 )(1 ) (1 ) as 1 and 0 ( ) 1 ( ) (1 )(1 )(1 ) P E D P E D P E D P E D                                                        
  • 29. (C) Stephen Senn 2019 , competence , prior equivalence
  • 30. (C) Stephen Senn 2019 , competence , prior equivalence NB  has been reduced to 0.005
  • 31. (C) Stephen Senn 2019 , competence , prior equivalence  = 0.05
  • 32. Consequences • Asymmetry between concluding equivalence and difference • The former is more problematic • Not just a matter of reformulating the problem • Conditional on an assumption of competence we can conclude equivalence • However, if we have any doubts about competence, these doubts increase by finding a difference • Speculation: this is a concrete instance of the more general point made by Popper and Miller 1987 (C) Stephen Senn 2019
  • 33. Hunt the thimble • You are looking for a thimble in a room • Consider two cases • You find the thimble • You search but don’t find the thimble • Inferences about whether the thimble is in the room or not are fundamentally different in the two cases • In the first case, you conclude it is, and your competence as a searcher for thimbles is irrelevant to this conclusion • In the second case, you may believe that the thimble is not in the room but this belief depends on your competence as a thimble-searcher, about which you may come to have doubts (C) Stephen Senn 2019
  • 34. Third issue: causal versus predictive inference • Clinical trials can be used to try and answer a number of very different questions • Two examples are • Did the treatment have an effect in these patients? • A causal purpose • What will the effect be in future patients? • A predictive purpose • Unfortunately, in practice, an answer is produced without stating what the question was • Given certain assumptions these questions can be answered using the same analysis but the assumptions are strong and rarely stated (C) Stephen Senn 2019
  • 35. Two models Predictive • The population is taken to be ‘patients in general’ • Of course this really means future patients • They are the ones to whom the treatment will be applied • We treat the patients in the trial as an appropriate selection from this population • This does not require them to be typical but it does require additivity of the treatment effect Causal • We take the patients as fixed • We want to know what the effect was for them • Unfortunately there are missing counterfactuals • What would have happened to control patients given intervention and vice-versa • The population is the population of all possible allocations to the patients studied (C) Stephen Senn 2019
  • 36. Coverage probabilities for two questions Average treatment effect in population is 300ml FEV1 Predictive Causal Horizontal dashed line is population average effect (LHS & RHS). Blue horizontal bar is true trial effect (RHS). Black Cis cover true effect, red don’t).
  • 37. Conclusion • There is a fundamental difference between • Demonstrating that things are different • Demonstrating they are the same • There is a fundamental difference between • Concluding something had an effect • Concluding it must always have this effect • Many features of clinical trials reflect this • The value of blinding • Competence (assay sensitivity) • Causal versus predictive inference • These are not a consequence of being frequentist • They are not vanquished by becoming Bayesian • The choice of a Bayesian or frequentist framework does not depend on this (C) Stephen Senn 2019
  • 38. (C) Stephen Senn 2019 In summary “Equivalence is different”
  • 39. The answer to the puzzle (C) Stephen Senn 2019 Both are equally likely The prior distribution is uniform. By the time we completed the trials the relative frequency will be the probability But the prior distribution says every probability is equally likely Therefore it is hardly surprising that every relative frequency will be equally likely Senn, S.J., Dicing with Death. 2003, Cambridge: Cambridge University Press.

Notas del editor

  1. Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one. A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising. On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former. The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests. Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws. By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
  2. In other words, falsificationism is a valuable perspective for Bayesians and Frequentist statisticians alike
  3. See Dicing with Death, Cambridge, 2003, chapter 4
  4. Some general discussion of what it means to be a Bayesian (and also a frequentist) will be found in Senn, S.J., You may believe you are a Bayesian but you are probably wrong. Rationality, Markets and Morals, 2011. 2: p. 48-66. See http://www.frankfurt-school-verlag.de/rmm/downloads/Article_Senn.pdf
  5. de Finetti, B.D., Theory of Probability (Volume 1). Vol. 1. 1974, Chichester: Wiley. 300. p141
  6. This is based on a real example. I was playing the CD Hysteria by Def Leppard when this happened. The example is discussed in more detail in chapter 4 of Statistical Issues in Drug Development.
  7. The four rows give the two combinations of hypothesis and evidence The P column gives the marginal prior probability of the “hypothesis” The evidence column has two sorts of evidence indicated. F for first track on CD and X for any other track. The Likelihood column gives the conditional probability of the evidence given the hypothesis The column headed P x L gives the joint probability of a given hypothesis and evidence combination Strictly speaking, in the de Finetti view, P x L exists directly
  8. The probabilities of the two cases which remain do not add up to 1. However, since these two cases cover all the possibilities which remain, their combined probability must be 1. Therefore, we rescale the individual probabilities to make them add to 1. We can do this without changing their relative value by dividing by their total, 21/120. This has been done in the table below.
  9. This completes the Bayesian solution and the posterior probability is given in the extra final column
  10. “In an article entitled, "In praise of Bayes", that appeared in The Economist in September 2000, the unnamed author tried to show how a newborn baby could, through successively observed sunrises and the application of Laplace's Law of succession, acquire increasing certainty that the sun would always rise. As The Economist put it, "Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise". This is false: not so much praise as hype. The Economist had confused the probability that the sun will rise tomorrow with the probability that it will always rise. One can only hope this astronomical confusion at that journal does not also attach to beliefs about share prices. In praise of Bayes. September 2000.” Dicing with Death, 2003 p77
  11. See https://errorstatistics.com/2015/05/09/stephen-senn-double-jeopardy-judge-jeffreys-upholds-the-law-guest-post/ and also   http://www.senns.demon.co.uk/Papers/Comment%20on%20Robert.pdf
  12. See also http://www.senns.demon.co.uk/Papers/Falsificationism.pdf
  13. There has been a surprising amount of disagreement amongst frequentists as well as amongst Bayesians and of course between the two major camps as to how to analyses such studies. There is no time here to go over all this. However, see http://www.senns.demon.co.uk/Papers/Bioequivalence%20SiM.pdf for an overview Also this blog https://errorstatistics.com/2014/06/05/stephen-senn-blood-simple-the-complicated-and-controversial-world-of-bioequivalence-guest-post/ gives an overview
  14. In fact this was a so-called incomplete blocks cross-over design in which each patient received five of the seven treatments on a total of five days (one day for each treatment) separated by a suitable wash-out. Twenty-one sequences were chosen so that each treatments was used equally often, each of the 21 pairs of treatments were studied in the same number of patients and each treatment appeared equally often I each period. The trial was double blind and a six fold replication was targeted (6 x 21 patients were planned to be recruited). Many different centre were employed and in the event more patients were recruited than planned. The model to analyse the treatment effect used “patient” and “period” (that is to say day 1,2,3,4 or 5) in addition to treatment as factor. Rather than presenting the confidence intervals for the difference here I shall just show the time curves for FEV1 (appropriately adjusted for other effects), since these are sufficient to make the point. A full description of the trials will be found in Senn, S.J., et al., An incomplete blocks cross-over in asthma: a case study in collaboration, in Cross-over Clinical Trials, J. Vollmar and L.A. Hothorn, Editors. 1997, Fischer: Stuttgart. p. 3-26. http://www.senns.demon.co.uk/Papers/SELIPATI.pdf
  15. This is the time course in which patient and period effects have been eliminated (in other words it is a fair comparison). Only placebo and the three doses of the new treatment (MTA) are shown. The efficacy of MTA is clearly shown and there is a gratifying does response.
  16. Unfortunately the highest dose of MTA has an observed effect that is lower than the lowest does of ISF. The conclusion was, much to everyone’s surprise and dismay, that the formulations differed in potency by a factor of 4 to 1.
  17. “In the case of trial A, the treatment estimate lies outside the region of equivalence. However, the confidence intervals are so wide that exact equality of the treatments is not ruled out. In case B exact equality is ruled out (if the conventions of hypothesis testing are accepted), since there is a significant difference, but the possibility that the true treatment difference lies within the region of equivalence is not. In case C, no treatment difference is observed, but the confidence intervals are so wide that values outside the regions of equivalence are still plausible. In cases D, E and F, practical equivalence is ‘demonstrated’. However, in case E it corresponds to no observed difference at all, whereas in case F the treatments are significantly different (confidence interval does not straddle zero) even though practical equivalence appears to have been demonstrated (confidence interval lies within region of equivalence). In case G there is a significant difference and equivalence may be rejected. ” Statistical Issues in Drug Development (2nd edition, 2007), Chapter 15
  18. This concrete illustration first proposed to me by Joachim Roehmel. See also http://www.senns.demon.co.uk/Papers/Fisher%27s%20game%20with%20the%20Devil.pdf
  19. In other words, to fake results to produce a conclusion that two treatments are different, you would have to know which treatment was which. To fake results that two treatments are equivalent you do not need to know which treatment is which. The difference is that in the first case you wish to assert that two distributions are necessary. Thus assignment to the correct distribution is crucial. In the second case you wish to assert that only one distribution is needed. “The value of blinding in clinical trials, is essentially this: despite making sure that there are no superficial and nonpharmacological differences which enable us to distinguish one treatment from another (the trial is double-blind), the labels ‘experimental’ and ‘control’ do have an importance for prognosis. Thus, for a conventional trial where such a difference between groups is observed, because the trial has been run double-blind, we are able to assert that the difference between the groups cannot be due to prejudice and must therefore be due either to pharmacology or to chance. The whole purpose of ACES, however, is to be able to assert that there is no difference between treatment and clearly, therefore, blinding does not protect us against the prejudice that all patients ought to have similar outcomes. The point can be illustrated quite simply by considering the task of a statistician who has been ordered to fake equivalence by simulating suitable data. It is clear that he does not even need to know what the treatment codes are. All he needs to do is simulate data from a single Normal distribution with a suitable standard deviation (Senn, 1994). Whatever the allocation of patients, he is almost bound to demonstrate equivalence. If he is required to prove that one treatment is superior to another, however, such a strategy will not work. He needs to know the treatment codes.” Statistical Issues in Drug Development (2nd edition, 2007), Chapter 15
  20. “There is a paradox of competence associated with equivalence trials and that is that the more we tend to provide proof within a trial of the equivalence of the two treatments, the more we ought to suspect that we have not been looking at the issue in the correct way: that the trial is incapable of finding a difference where it exists. In other words, there is more to a proof of equivalence than the matter of reversing the usual roles of null and alternative hypotheses. Even if in a given trial the test results indicated that the effects of the treatments being compared were very similar (as, say, in case D) the possibility could not be ruled out that a trial with different patients, or alternative measurements or some different approach altogether would have succeeded in finding a difference. No probabilistic calculation on the data in hand has anything to say about this possibility: it is essentially a matter of data not collected. There is a difference in kind between ‘proving’ that drugs are similar and proving that they are not similar. This difference is analogous to the difference which exists in principle between a proof of marital infidelity and fidelity. The first may be provided simply enough (in principle) by evidence; the second, if at all, only by a repeated failure to find the evidence which the first demands.” Statistical Issues in Drug Development (2nd edition, 2007), Chapter 15
  21. See Senn, S.J., Inherent difficulties with active control equivalence studies. Statistics in Medicine, 1993. 12(24): p. 2367-75. http://www.senns.demon.co.uk/Papers/ACES%20SiM%201993.pdf In the paper the symbol  was used instead of , and  was used instead of  but the change has been made here to avoid confusion, since  and  are often used for type I and type II error rates.
  22. One could argue that it is the joint effect of , and  that reflects matters beyond our control. On the other hand, our knowledge of statistics (and experimental design) enables us to fix  and 
  23. This will probably be skipped over in the lecture
  24. Reminder:  is the prior probability of equivalence,  is the probability of competence given non-equivalence It is assumed that a ‘difference’ has been observed The horizontal axis gives the probability, 𝑃(𝐷 𝐸 ′ 𝐶)=𝜋 of observing a ‘difference’ given that the trial is competent (C) and that non-Equivalence (E’) obtains. OTBE, we expect that the more precise the experiment, the bigger this value will be The vertical axis gives the posterior probability of non-equivalence A limit is reached as  approaches 1 but this is because  does not increase Note to self. The program is “ACES Bayesian.gen” and the location is C:\Users\Stephen\Documents\Genstat\GenStat Files\Research\Equivalence
  25. Now that the value of  has been reduced, the limit for the posterior probability is much higher. In principle, simply by designing better experiments, we can make better and better inferences regarding differences.
  26. However, this slide shows that the same is not true of equivalence. There is a limit to what we can conclude unless we can make a judgement of competence that relies on external matters. This may seem puzzling, since what is equivalence but that which applies when non-equivalence does not but the real reason is that three alternatives are involved ‘equivalent’ ‘not competent’ ‘different’. It is distinguishing between the first two that is the problem.
  27. Popper, K. and Miller, D. ‘Why probabilistic support is not inductive’,Philosophical Transactions ofthe Royal Society of London, Series A, 321, 569-591 (1987).
  28. See also The Jealous Husband’s dilemma , Dicing With Death, chapter 4.
  29. Example of a trial in asthma comparing a bronchodilator to placebo using forced expiratory volume in one second (FEV1) in mL This is a simulation to illustrate the issues. In the simulation a population of patients for whom the treatment effect is not identical has been considered. Each clinical trial has a different average treatment effect because involving different (possible unidentifiable) sub-populations of patients. This is done by drawing from a random distribution a common patient effect for the trial from an overall distribution. Sixty trials are simulated Once this value has been established for the trial, then individual patient values are simulated from the distribution for the trial. The point estimates (diamonds) and 95% confidence intervals (whiskers) are calculated. On the LHS the confidence intervals are judged according to whether they cover the population value (given by the horizontal line at 300 mL). Black, yes, red, no. It can be seen that the claimed 95% coverage does not apply. On the RHS coverage is judged by whether they cover the ‘true’ local effect (which is given by the small blue horizontal bar, which varies from trial to trial). The theory holds up well and in fact, 3 out of 60, that is to say 5%, of the true values are not within the intervals.
  30. Of course formal proofs using either calculus (integrating out) or proof by mathematical induction are possible. To understand what your choice of prior distribution commits you to you have to see the answer. This is an example of what Popper once wrote about scientists liking ‘weak’ proofs because they often bring more understanding. The example is discussed an a proof by induction is given in chapter 4 of Dicing with Death.