Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013

Problems with evidence for
pharmaceutical harm
Barbara Osimani
University of Camerino
KCL London, 14 May 2013

Topics
• Philosophical debate on evidence hierarchies
and RCTs
• Epistemological rationale underpinning
evidence hierarchies and alternative
approaches (Principle of total evidence)
• Distinctive roles in causal assessment of
intended vs. unintended effects.
• Case study (acetaminophen/paracetamol side
effects)

Evidence hierarchies:
best evidence for clinical decision and health policies
1. Meta-analyses of RCTs
2. Single RCTs
3. Meta-analyses of observational studies
4. Comparative studies which are not randomized (e.g. cohort
or case-control studies),
5. Reasoning about pathophysiologic mechanisms
6. Expert judgment

The problem of confounders
If you have a big enough sample you might discount spurious
correlations by “controlling” for specific variables:
For instance you might control whether the correlation still holds if
you compare a group of people who regularly excercise and take
vitamin C with a group of people who do excersise but do not
take vitamin C.
P(F/C & E) VS. P(F/¬ C & E);
If the rate of flu incidence is still different in the two groups (in a
statistically significant measure), then Vitamin C might be
considered to bring a distinctive contribution to dicrese flu rate.

The problem of confounders
• If people who take vitamins generally also have a healthier lifestyle (they have
healthier eating habits, they are less likely to smoke and they practice sport
regularly), than the difference in specific health indicators (such as for instance the
frequency of infections incurred in a defined time lapse) could be due not to vitamins
intake but to the other concomitant factors.
• This phenomenon is called self-selection bias, because it refers to the fact that the
group of people who take the treatment is biased by the very fact that they choose to
take it.
Health
behaviour
Vitamin C intake
Excercise
Healthy diet
Non-smoker
Flu
frequency

Controlling for confounders
Vitamin C intake
Excercise
Healthy diet
Non-smoker
¬Vitamin C intake
Excercise
Healthy diet
Non-smoker
If the outcome difference is statistically significant, then Vitamin C intake
is considered to bring a distinctive contribution to the reduced
frequency of flu.
However, what about other possible causes which we know nothing of
and could as well make Vitamin C seem to be causal whereas it is not?

Adjusting for potential confounders is not
possible for unknown confounders, i.e. for
causal factors about which the researcher is
unaware.
Hence randomization should warrant that the
causal link between health behaviour and
vitamin intake is severed, by making vitamin
intake independent from health behaviour
and its effects.

Putative functions of randomization
• 1) balance between treatment and control group: this
allows to experimentally isolate the (distinctive
contribution of) the cause under investigation from
other prognostic factors (confounders);
• 2) repeated randomization of the treatment among the
subjects in the sample, allows to approach in the limit
(in the long run) the true mean difference between
treated and untreated sample population (see Basu
1980, and Teira 2011) = true effect size;
• 3) Randomization as an aid against (self-)selection bias.

Randomization works together with:
1. control (partition of the sample into
treatment and control group/s)
2. intervention (treatment administration by
the experimenter), and
3. double-blinding and placebo (concealment
of treatment allocation from subjects and
researchers).

1. Philosophers’ skepticism against RCTs and
evidence hierarchies in general
2. Rationale underpinning ranking approach
3. Alternative approaches
4. Justification of “lower level evidence” in
different approaches
5. Distinctive advantages of alternative
approaches when dealing with unintended
effects

Worral’s critique
1) Clinical researchers never randomize forever, so RCTs do not
reflect the “limiting average”
2) “no sense in which we can ever know how close a particular RCT
is to yielding this ‘limiting average” (2007: 15);
3) Repeated randomization is, epistemically speaking, impossible:
“If a particular patient in the study receives, say, the ‘active drug’ on
the first round, then since this is expected to have some effect on
his or her condition, the second randomization would not be
rigorously a repetition of the first. The second trial population,
though consisting of the same individuals, would, in a possibly
epistemically significant sense, not be the same population as
took part in the initial trial” (2007: 22)

Worral’s critique
4) allowing sufficient “wash out” times between the rounds does not represent a
perfect warrant against “contamination”,
5) repeated randomization is practically and ethically unfeasible
6) randomization is only a means to the end of balancing the experimental groups and
this aim can be reached also through other tools such as deliberate matching and
“haphazard” allocation;
7) strictly speaking, it is not randomization but rather masking treatment allocation,
which wards off bias due to experimenters’ or subjects’ interests and expectations
(allocation and self-selection bias);
8) Comparison of reliability of observational vs. randomized studies by taking the latter
as the gold standard amounts to a petitio principii.

Papineau
Sampling error vs. confounders
Worral’s worry about disproportionate representation of
some possibly confounding factor in the treatment or
control arm = worry about sampling error
Bigger samples alleviate sampling error
But they do not alleviate worry about confounders
 randomization alleviate confounding
But does not affect sampling error

Papineau
Equipose = spurious ethical justification from an equivocation on
uncertainty
It is enough to warrant an RCT that the medical community/
reasonable doctor is not certain that T is better than non T.
0 < P (U(T) > U(¬T)) < 1
But real equipose would require that they are indifferent on the
balance of the probabilities:
P(T)U(T) = P (¬T) U (¬T)

Teira’s Defence: Impartiality through randomization
• David Teira (2011) acknowledges the
methodological limitations attributed to RCTs
• However according to him, randomization “is still
a warrant that the allocation was not done on
purpose with a view to promoting somebody’s
interests”.
• Randomization serves the purpose to avoid that
the uncertainty related to causal inference be
advantageously exploited by one party or the
other  impartiality.

Nancy Cartwright on RCTs: the problem of extrapolation
• Cartwright (2007) details the assumptions which should be met in
order to export the claim of efficacy from the sample to the target
population:
• at least one causally homogeneous subgroup in the target population
must have the same causal structure and probability measures of at
least one causally homogeneous subpopulation in the experimental
sample.
• Thus the evidence provided even by an ideal (i.e. perfectly internally
valid) RCT can be only with great caution extended to the target
population.
• Randomization is also not recommended for most practical purposes
it is supposed to pay service to (see also Cartwright, 2010).

RCTs test whether a given causal law works in a given
study situation
Ex: Y(u) c= a(u) + β(u)X(u) + W(u).
Thus, granted that
T =def < Y(u)/X(u) = xt> - < Y(u)/X(u) = xc>
Then T = < a(u)/X(u) = xt> - < a(u)/X(u) = xc> +
< β(u)/X(u) = xt> xt - < β(u)/X(u) = xc> xc +
< W(u)/X(u) = xt> - < W(u)/X(u) = xc>.

If, on grounds of random assignment we are prepared to assume that for
the units in the study, X is probabilistically independent of a, β and W,
then expectation of a, β, and W will be the same for any value of X;
Thus:
< a(u)/X(u) = xt> = < a(u)/X(u) = xc> ;
< W(u)/X(u) = xt> = < W(u)/X(u) = xc>;
< β(u)/X(u) = xt> = < β(u)/X(u) = xc>
Hence the first and last addends cancel out
< a(u)/X(u) = xt> - < a(u)/X(u) = xc> = 0
< W(u)/X(u) = xt> - < W(u)/X(u) = xc> = 0
And < β(u)/X(u) = xt> xt - < β(u)/X(u) = xc> xc = < β(u)> xt –xc
So T = < β(u)> xt –xc

Now, in order to use T = < β(u)> xt –xc
to predict whether a given intervention will
work in your target population you need to
know:
1) Whether the same law holds in the target
population;
2) whether at least some subgroups have the
right set of support factors

β does not represent a single factor but a
complex function of further factors that
together fix whether and how much X
contributes to Y:
β= f1 (z11, …, z1n) + … + fm (zm1, …, zmp)
Hence, you will get the same result in target pop
only if it shares the same mean value of β = i.e.
the same distribution of different values of β,
which represent different combinations of
values for support factors.

Indeed there might be different sets of complexily
interacting factors anyone of which allows X to
contribute to Y:
Ex: Y(u) c= a(u) + β(u)X(u) + γ(u)X(u) - δ(u)X(u) + W(u).
Or more generally:
Y(u) c= C1 + … + Cn – P1 – … – Pm.

Support factors for adverse reactions
For instance you might have that together with
β= f1 (z11, …, z1n) + f2 (z21, …, z2p) the drug
produces an intended outcome Y,
whereas with δ = f1 (z11, …, z1n) + f3 (z31, …, z3p) it
produces an undesired outcome (adverse drug
reaction) Q:
Y(u) c= a(u) + β(u)X(u) + W(u).
Q(u) c= k(u) + δ(u)X(u) + Z(u).

Support factors for adverse reactions
Indeed, the possibility of undesired
outcomes might be even enhanced
in the target population because of
different anagraphical and clinical
conditions (age, co-morbidity,
multidrug therapy).
 Post-marketing surveillance

Evidence hierarchies do not differentiate between benefits and risks. So when
it comes to evaluate evidence, observational data on side-effects tend to
be discounted until they are not “prooved” by RCTs.
• http://www.cebm.net/index.aspx?o=1025
• GRADE (Guyatt et al.)
• CEBM levels of evidence, which is at pains to distinguish between different
hierarchies depending on different evaluation goals (therapy, prognosis,
diagnosis, economic analysis), coalesce efficacy and harm assessment in
one and the same column: therapy-prevention-etiology-harm, putting
meta-analyses of RCTs, followed by single RCTs, at the top of the ranking.
• Similarly, Guyatt and colleague’s Grade System (Guyatt et al. 2011) admit
the difficulties inherent in the evaluation of evidence for harm, but
propose a framework where its quality is assessed with the same criteria
proposed for efficacy evaluation. Particularly, evidence for harm coming,
say, from observational studies is given lower weight than evidence for
efficacy coming, say, from RCTs thus biasing the overall risk-benefit
assessment in favor of the drug.

The problem with evidence guidelines:
They are concentrated on
1) whether independence of X from other factors hold
in the study situation (INTERNAL VALIDITY)
but they fail to appreciate the importance of:
2) How to tell that law holds in target pop and what
support factors can be assumed to be there;
(RELEVANCE)
3) Whether the set of evidence we consider in support
of H = total (no relevant facts = left out) (TOTAL
EVIDENCE)
4) Evidence speak for truth of H (EVIDENTIAL
SUPPORT)

Clinchers vs. vouchers
Clinchers = deductive methods of evaluating
hypotheses  hypothesis testing
Vouchers = non-deductive methods, where
evidence is symptomatic for the conclusion but
not sufficient for it.
RCTs = belong to clinchers = statistic version of
hypothetico-deductive method:
a statistically significant result rejects the null
Hypothesis that the treatment has no effect.

Hypothetico-deductive method and
modus tollens
1. Conjecture: H (Vitamin C has some effect on
Flu)
Experimental hypothesis: ¬H  ¬Δ
2. Test and observe result: Δ
3. Infer: ¬¬H (reject ¬H)
p-value = probability of observing Δ if ¬H = true
Very law prob speaks for rejecting ¬H.

Hypothesis testing = H-D + Abduction
Peircean abduction:
Sthg surprising happened (let's call it S)
But if H were the case, than S would be no longer surprising
Hence we have good reasons to believe that H is the case.
Hypothesis testing:
the experiment result would be very unprobable if the Null Hypothesis
were true
hence either sthg very unprobable happened
or the null hypothesis is false
--> we reject the null hypothesis (on what grounds do we choose this
second alternative?)

Clinchers: Hypothesis testing (I)
• The aim of hypothesis testing is to provide a means to
reject hypotheses on the basis of statistical evidence.
• In classical hypothesis-testing, the result is expressed
as the probability of observing the experimental result
– or more “extreme” results in the sample space – (p-
value), if the treatment makes no difference (so called
null Hypothesis: H0).
• For the result to be at all meaningful, it is essential that
the observed difference among groups is due to the
treatment and only to it. Which in turn explains the
insistence on the exclusion of confounders.

Clinchers: Hypothesis testing (I)
•  the more likely a method is to be able to exclude
confounders (i.e. additional contributing factors to the
observed result), the more reliable is the inference we
base on it
•   the higher is the method ranked in the hierarchy
(the better the evidence);
• Corollary: case reports and observational data are
considered sufficient evidence for causal claims only to
the extent that possible confounders can be
confidently excluded.

Clinchers: conclusive evidence without
randomization
• Glasziou et al. (2007) for instance consider cases where the relation
between treatment and effect is so dramatic that bias and
confounding can be safely excluded even in the absence of
randomization.
• Howick et al. (2009) relax the requirement of dramatic effect and
reduce it to the desideratum that the effect size be greater than
the combined effect of plausible confounders.
• Vandenbroucke (2008) ascribes RCTs and observational studies the
same reliability when assessing harm, on grounds that ignorance
about the unexpected consequences of an intervention achieves
the same lack of bias obtained through blinding (i.e. ignorance
about whom will receive the treatment).

Vouchers
(inductive methods – Bayesian epistemology)
• Distinctive points between the inductive-bayesian
framework and classical hypothesis testing:
• Hypothesis testing = hypotheses are formulated
and then tested for rejection/acceptance.
• Bayesian updating = hypotheses are assigned a
probability, which is then updated in light of data.
Also, evidence is interpreted in light of all
possible alternative hypotheses. Probability
measures specify the degree of support enjoyed
by hypotheses.

Vouchers
(abduction)
• Instead of experimentally isolating the causal factor under
investigation, different pieces of evidential facts are put together
and the implication of their joint occurrence is then inferred.
• Rather than filtering evidence by ranking it, this approach aims to
accommodate all data in a unifying picture. It is more or less
knowingly advocated by different authors:
• Aronson and Hauben (2006): “In some cases other types of
evidence may be more useful than a randomised controlled trial.
And combining randomised trials with observational studies and
case series can sometimes yield information that is not available
from randomised trials alone” (my emphasis).
• Howick et al. (2009) and Stegenga (2011) propose to integrate
evidence hierarchies with Bradford-Hill criteria for causal inference:
Bradford-Hill criteria are not meant as truth conditions for causality
but rather as imperfect indicators which jointly support the
hypothesis of causation.

Bradford Hill criteria for causal assessment
1. Consistency of data within population / across
populations;
2. Strength of the association;
3. Relationship in time;
4. Biological gradient;
5. Specificity;
6. Coherence of evidence;
7. Biological plausibility;
8. Reasoning by analogy;
9. Experimental evidence.

Bradford Hill criteria for causal assessment
1. “None of my nine viewpoints can bring indisputable evidence for
or against the cause-and-effect hypothesis and non can be
required as a sine qua non. What they can do, with greater or less
strength, is to help us make up our minds in the fundamental
question – is there any other way of explaining the set of facts
before us, is there any other equally, or more, likely than cause and
effect?”
2. Thus, Bradford Hill both refers to explanatory power and
likelihood as reliable grounds to justify causal judgments, and
explicitly presents his approach as an alternative to hypothesis
testing:
3. “No formal tests of significance can answer those questions. Such
tests can, and should, remind us of the effects that the play of
chance can create, and they will instruct us on the likely
magnitude of those effects. Beyond that, they contribute nothing
to the proof of our hypothesis”.

Epistemology Method Main assumptions Justification of “lower
level” evidence
Hypothetico-
deductive
(statistical
mode)
Hypothesis
testing:
likelihood of
evidence if H0 =
true (p-value)
Investigated factor is
isolated by balancing
the experimental
groups as to all other
prognostic factors
Only if alternative
explanations for the
observed result
(confounders) can be
safely excluded, or
treatment effect swamps
them by a statistically
significant amount.
Abduction Connection of
data in light of
explanatory
hypothesis
Account for as much
evidence as possible
Explanatory power of
hypothesis in light of data.
Inductive-
Bayesian
Bayes theorem Principle of total
evidence – coherence
Probability of hypothesis
given likelihood function
and prior.

Principle of total evidence
• The essential distinction between clinchers
and vouchers is that the latter are guided by
the idea that all relevant evidence – or as
much data as possible in the case of
abduction – should be taken into account in
order for the inferential procedure to be valid.

Principle of total evidence
The principle of total evidence has been a topic of
hot debate among philosophers such as Hempel,
Carnap, Ayer, Braithwaite, and Kneale among
others.
Keynes (1921) traces back the origin of the principle
of total evidence to Bernoulli’s maxim that:
“in reckoning a probability, we must take into
account all the information which we have”
(Carnap, 1947: 138, footnote 10; Keynes, 1921:
313).

Principle of total evidence – nonmonotonic logic
• Induction/abduction can be characterized as
an inference where the evidence does not
entail the hypothesis, but only more or less
strongly supports/undermines it (Ayer, 1956).
• Thus, whereas in deductive inference,
additional evidence can neither add further
support nor disconfirm a hypothesis when
conclusive evidence is already available; not
so in the case of induction.

• A doctor thinks that a patient is celiac, because all his
available evidence E (adverse reactions to certain foods,
iron deficiency, a series of additional symptomatic
phenomena) points to this diagnosis.
• H  E
• However he cannot be sure that this evidence necessary
entails his diagnosis: ¬ (E  H).
• Thus he prescribes a series of serum tests and they all
result negative (evidence F).
•  the strong support to the diagnosis of celiac disease
provided by E is “corroded” by the negative evidence F and
the doctor needs to look for a hypothesis which accounts
for both E and F: for instance a simple food intolerance.

Probabilistic consequences of entailment relation
If H E, then P(H/E) > P(H) where P(H)> 0 and P(E) <1
Proof (from Howson and Urbach 2006):
P(H/E) = P(H&E)/P(E) > P(H&E)
Hence P(H/E) > P(H&E) = P(E/H) x P(H)
So: P(H/E) > P(E/H) x P(H)
But P(E/H) = 1
So P(H/E) > P(H)
QDE

Inconclusive evidence is used to assess the
plausibility of a hypothesis and to possibly
quantify it in a probabilistic fashion, so that, for
instance P(H/E) = .9; but there may always be
additional information F, which may lower this
support, so that, for instance, P (H/E&F) = .2.
Nonmonotonicity = conclusions of inductive
inferences are contingent and may be invalidated
by additional information (Kyburg & Teng 2001).
This means that “acquired support” may get lost if
additional information undermines it.

Deductive inference: conclusive
evidence
Inductive inference: inconclusive
evidence
Modus ponens
E  H
E .
H
No other additional evidence can change
the conclusion. If, in addition to E, you
come to know F, you always have H as a
conclusion:
E  H
E, F, ...:
H
When E represents non-conclusive
evidence for H, there may always be the
possibility that
P(H/E) > P(H),
and that additional evidence F might
reverse this inequality thus leading to the
following result:
P (H/E) < P(H/E,F)
The bearing of this phenomenon is most
evident when comparing the strength of
support provided by the evidence to the
hypothesis H and its complement (¬H). So
that you may have:
P(H/E) > P(¬H/E)
And, after learning F:
P(H/E,F) < P (¬H/E,F).
The same is valid for modus tollens:
H  E
¬ E .
¬H
No additional evidence would change this
conclusion.

• Statistical hypothesis-testing is a kind of approach which admittedly
follows a Popperian hypothetico-deductive method of scientific
enquiry.
 it does not feel urged to address the issue of non-monotonicity:
 Once you have conclusive evidence E rejecting hypothesis H, any
other piece of evidence becomes irrelevant.
 Thus the closer the evidence gets to this deductive ideal, the
better: best evidence means evidence which gives you the
guarantee that the observed difference is due to the treatment
and only to it = internal validity maximization.
 Evidence hierarchies are grounded on the assumption that if you
have a study which has the capacity to eliminate more confounders
than others, then the former should trump the latter.

• Trumping means that higher level evidence discards any evidence of
inferior ranking, and also makes it irrelevant.
• Lexicographic rule: when two studies of different levels deliver
contradictory findings, then the higher in the evidence hierarchy is
considered more reliable and is allowed to discard the lower level
one;
furthermore, lower level evidence adds nothing to higher level one
and thus it can be neglected without loss of information.
• More generally, the very idea of ranking or up- and downgrading
evidence on the basis of its internal validity is at the opposite side
of a unifying approach which aims to account for all the evidence at
disposal. In fact, non-deductive approaches must take into account
all available evidence, because no matter how much a piece of
evidence supports a given hypothesis, the possibility of defeating
evidence is always possible.

Harm – benefit distinction
• Recent contributions by philosophers and health
scientists have acknowledged the role of so called
"lower level" evidence as a valid source of information
contributory to assessing the risk profile of medications
both on theoretical (Aronson and Hauben, 2006;
Howick et al. 2009) and on empirical grounds (Benson
and Hartz, 2000; Golder et al. 2011).
• Nevertheless current practices have difficulty in
assigning a precise epistemic status to this kind of
evidence and in amalgamating it with standard
methods of hypothesis testing.

In their comparative analysis of RCTs and
observational studies, Papanikolau et al. (2006)
assert:
“it may be unfair to invoke bias and confounding
to discredit observational studies as a source
of evidence on harms” (p. 640, my emphasis).

1. Different epistemologies may justify “lower
level” evidence on different grounds
(Vandenbroucke’s proposal to reverse
rankings);
2. In the case of risk detection and assessment
non-deductive epistemologies (“Vouchers”)
are better suited to the purpose.

Hierarchy reversal for risk assessment
Vandenbroucke J.P. (2008) Observational Research, Randomised Trials, and Two
Views of Medical Science. Plos Medicine, 5 (3): 339-43
Hierarchy of study designs for intended
effects of therapy
Hierarchy of study designs for discovery
and explanation
i. Randomised controlled trials i. Anecdotal: case report and series,
findings in data, literature
ii. Prospective follow-up studies ii. Case-control studies
iii. Retrospective follow-up studies iii. Retrospective follow-up studies
iv. Case-control studies iv. Prospective follow-up studies
v. Anecdotal: case report and series v. Randomised controlled trials

Vandenbroucke’s defence of hierarchy
reversal (I)
1. Methodological point:
Observational studies concerning adverse reactions
will not suffer from confounding in the same way
as observational studies for intended effects do.
selection bias is less likely to affect observational studies with
respect to adverse reactions.
This because unintended effects, qua unintended, are not known
in advance, and thus also not known by the drug prescriber, who
cannot take them into consideration and thus bias treatment
allocation.
Ignorance of possible effect = “natural masking”

reversal (II)
2. Epistemological point:
Context of discovery vs. context of evaluation:
Discovery is focused on explanation and hypothesis
generation;
Evaluation instead on hypothesis testing/confirmation.
And research methods differ in the opportunities they offer
with respect to either of these goals.

reversal (III)
• Vandenbroucke (2008) formalizes the contrast
between the context of evaluation and the
context of discovery in terms of different
priors assigned to hypotheses of benefits and
of adverse reactions.
• High priors for intended effects
• Low priors for unintended ones

reversal (III)
1. It is the higher priors which make the results more robust,
not the method (Vandenbroucke, 2008: 16-17).
2. The reason why we accept uncertain results for risks
rather than for benefits is that evaluation and discovery
studies are associated with different loss functions:
1. evaluation is related to the approval of health technologies
and is required to assure stakeholders about their efficacy and
safety,
2. whereas discovery is more related to the context of research
for its own sake, which might explain why certain study
designs are preferred to others in different circumstances.

reversal (III)
1. Priors are quickly swamped by data
2. Stakes are not lower for detecting risks than
for testing the drug benefit: adverse drug
reactions might be so severe as to reverse
the safety profile of the drug and determine
its withdrawal.

Prior knowledge about drug’s general capacity
to produce unintended adverse reactions
• The acceptability of anecdotal evidence or of uncontrolled
studies for assessing risk has to do with a high prior about
the general capacity of the drug to bring about side-effects.
• Whereas there is total ignorance as to some specific side
effects which might be possibly caused by the drug, still
there is almost certainty about the fact that the drug will
indeed cause side-effects beyond the ones already
detected in the pre-marketing phase.
• This high prior derives from historical knowledge and past
experience with pharmaceutical products and is also
strongly reflected in the regulation which introduced the
notion of “development (or potential) risk”, the
pharmacosurveillance system, and the precautionary
principle.

Reasons for preferring vouchers to clinchers in
causal assessment for harms
1. Explicit integration of prior knowledge
2. Categorical vs. probabilistic causal
assessment
3. Internal vs. external validity
4. Impartiality

Explicit integration of prior knowledge
Frequentist statistics does not allow to
incorporate priors in hypothesis evaluation.
----------------------------------------------------------------
1. Knowledge of the drug behavior may be
inferred analogically from same-class
molecules or similar entities.
2. Theory
3. Historical knowledge about drugs
harmfulness in general

Categorical vs. probabilistic assessment of causality
From the time a risk is not known, to the moment in
which it is incontrovertibly proven to be causally
associated with the drug, there is a period of evidence
accumulation which constitutes a state of partial and
imperfect (but continuously increasing) knowledge.
In this period it cannot be claimed that there is a causal
link between the drug and the detected risk; but
neither can we behave as if we knew nothing about it.
Still, the latter attitude is precisely the only possible
policy allowed by an epistemology grounded on
hypothesis rejection.

Internal and external validity
• The problem of external validity is a particularly delicate one when
inferring efficacy in the real population of users (a.k.a effectiveness)
from the efficacy assessed by RCTs.
• In biological sciences such phenomena as feedback loops
(homeostasis), interactive and multiple causality as well as
threshold effects characterize causality in a very peculiar way (see
for instance Joffe, 2011).
• In the case of side-effects this problem is even more striking,
because the contributing factors necessary to trigger adverse drug
reactions are supposed to be rare and may contribute to the side
effect through a totally unexpected pathway (Hauben and Aronson
2006; Smith et al. 2012).
• Thus not only RCTs might be unable to detect side effects because
no or too few subjects in the sample have the necessary
characteristics, but also because they are too simplistic tools for
that purpose.

Impartiality (I)
• The issue of impartiality assumes in the case of
benefit vs. risk assessment opposite
characteristics.
• Since benefit is intended and desired, but may be
counterfeited for obvious commercial interests,
the most natural way to deal with bogus products
is to put the claim of efficacy to the test of strict
trials.
• For the risk, the situation is quite different.
• By regimenting benefit and risk assessment
within the same standards, we forget that in the
case of risk, the question we want to answer is
not whether the drug really causes it, but
whether we can safely exclude that it does.

Impartiality (II)
• The higher the expected harm with respect to the
expected benefit, the less we need to be sure about
the causal link in order to decide to adopt risk
preventive measures (precautionary principle).
• The causal link need be no more probable than it is
necessary to compensate the risk-benefit unbalance
(Osimani, 2013 a,b).
• Instead on the side of the industry, there is all interest
in discounting the drug as a possible causal contributor
to the side effects, thus the stricter are the standards
for causal assessment, the easier it is for them to
provide whitewashed drug profiles.

Impartiality (III)
• Teira (2011) conceptualizes impartiality as a way
to deal with uncertainty such that it cannot be
exploited by some party’s private interest.
• Waiting for an RCT to definitively prove that an
observed risk is really associated with a
suspected drug exactly represents the case in
which the uncertainty about the causal
association is exploited by the industry’s private
interest.
• I think this is what Papanikolau et al. (2006) have
in mind when they say that “it may be unfair to
invoke bias and confounding to discredit
observational studies as a source of evidence on
harms”.

Case Study: hypothesis of causal connection
between paracetamol and asthma
Asthma increase in the United States and in
Western countries in the last 3 decades:
up to a 75% increase among adults
and to a 160% among children in the same
period.
(Burr et al., 1989; Eneli et al., 2005, Ninan and
Russel, 1992; Mannino et al., 1998, 2002,
Seaton et al. 1994).

Explanatory hypotheses for asthma epidemic
1) increased exposure to outdoor and indoor pollutants;
2) decreased exposure to bacteria and childhood illnesses
during infancy (the “hygiene hypothesis”);
3) increased obesity incidence and prevalence;
4) changes in diet and oxidant intake;
5) cytokine imbalance as a reaction to environmental allergens
in early childhood leading to lifelong T-helper type 2 (allergic)
dominance over T-helper type 1 (nonallergic) reactions, thus
increasing the risk for atopic disease
Eneli et al., 2005; Seaton et al. 1994, Shaheen et al. 2000.

How suspicion fell upon paracetamol
• Varner and colleagues (1998) detected a precise correspondence between
increase of asthma incidence and increased paracetamol use as a substitute
for aspirin (following the recognition of an association between aspirin and
Reye’s syndrome).
• The trend levelled off in the 1990s, i.e. at a time when paracetamol had
already become one of the most widespread analgesics.
• Varner and colleagues tentative explanation was however that asthma
increase was due to aspirin avoidance, for the reason that aspirin may protect
from asthma through inhibition of prostaglandins.
• However, this hypothesis was soon discounted on grounds that, if this had
been the case, then one should have observed a decrease of asthma incidence
when aspirin was first introduced (Shaheen et al. 2000).
• Thus the suspicion finally fell upon paracetamol itself and subsequent
investigations explicitly aimed to examine the hypothesis of causal connection
between paracetamol and asthma.

Evidence for causal association between
paracetamol and asthma
“Many observations suggest that the epidemiologic association between
acetaminophen and asthma is causative:
1) consistency of the association across geography, culture and age;
2) strength of the association (comparative studies);
3) the dose-response relationship between paracetamol exposure and
asthma;
4) the coincidence of the timing of increasing asthma prevalence and
increasing paracetamol use;
5) the relationship between per-capita sales of paracetamol and asthma
morbidity across countries;
6) our inability to identify any other abrupt environmental change that
could explain this increase in asthma morbidity;
7) plausible mechanism: glutathione depletion in airway mucosa caused by
paracetamol”.
McBride JT (2011) The Association of Acetaminophen and Asthma Prevalence
and Severity, Pediatrics, 128 (6).

Consistency of the association across geography, culture and age (I)
Source Year of
study
Study objective Population Results
Beasley et al.
Cross-cultural
study
2008 Examine the risk of
asthma
rhynoconjunctivitis
and eczema in
children using
paracetamol
122 centers in 54
countries
200,000 children 6-7 yr
Dose dependent increase in prevalence and
severity of asthma
> once per year: OR 1.61 (95% CI 1.46-1.77)
≥ once per month: OR 3.23 (95% CI 2.91-3.60)
Association identified at almost all sites
regardless of geography, culture, stage of
development
Beasley et al.
Cross-cultural
study
2011 Examine the risk of
asthma
rhynoconjunctivitis
and eczema in
adolescents using
paracetamol
122 centers in 54
countries
320,000 children 13-14
yr old
Dose dependent increase in prevalence and
severity of asthma
> once per year: OR 1.43 (95% CI 1.33-1.53)
≥ once per month: OR 2.51 (95% CI 2.33-2.70)
Association identified at almost all sites
regardless of geography, culture, stage of
development
Systematic review
and meta-analysis
of epidemiol ogic
studies
Etminan et al.
2009 Quantify the
association between
acetaminophen use
and the risk of asthma
in children and adults.
Thirteen cross-sectional
studies, four cohort
studies, and two case-
control studies
comprising 425,140
subjects
Pooled odds ratio (OR) for asthma among
subjects using acetaminophen was 1.63 (95%
CI, 1.46 to 1.77).
The risk of asthma in children among users of
acetaminophen in the year prior to asthma
diagnosis and within the first year of life was
elevated (OR: 1.60 [95% CI, 1.48 to 1.74] and
1.47 [95% CI, 1.36 to 1.56], respectively). Only
one study reported the association between
high acetaminophen dose and asthma in
children (OR, 3.23; 95% CI, 2.9 to 3.6). There
was an increase in the risk of asthma and
wheezing with prenatal use of acetaminophen
(OR: 1.28 [95% CI, 1.16 to 41] and 1.50 [95% CI,
1.10 to 2.05], respectively).

Consistency of the association across geography, culture and age (II)
Longitudinal birth-
cohort study
Amberbir et al.
2011 Investigate the
independent effects
of paracetamol and
geohelminth infection
on the incidence of
wheeze and eczema
in a birth cohort.
population-based
cohort of 1,065
pregnant women from
Butajira, Ethiopia,
Paracetamol use was significantly associated
with a dose-dependent increased risk of
incident wheeze (adjusted odds ratio = 1.88 and
95% confidence interval 1.03-3.44 for one to
three tablets and 7.25 and 2.02-25.95 for ≥ 4
tablets in the past month at age 1 vs. never),
but not eczema.
Wickens et al.
Birth cohort study
2011 investigate the
associations between
infant and childhood
paracetamol use and
atopy and allergic
disease at 5-6 years.
New Zealand
Paracetamol exposure
between birth and 15
months in Christchurch
(n=505) and between 5
and 6 years for all
participants
(Christchurch and
Wellington) (n=914).
Outcome data collected
at 6 years for all
participants. Logistic
regression models were
adjusted for potential
confounders
Paracetamol exposure before the age of 15
months was associated with atopy at 6 years
[adjusted odds ratio (OR)=3.61, 95% confidence
interval (CI) 1.33-9.77]. Paracetamol exposure
between 5 and 6 years showed dose-dependent
associations with reported wheeze and current
asthma but there was no association with
atopy. Compared with use 0-2 times, the
adjusted OR (95% CI) were wheeze 1.83 (1.04-
3.23) for use 3-10 times, and 2.30 (1.28-4.16)
for use >10 times: current asthma 1.63 (0.92-
2.89) for use 3-10 times and 2.16 (1.19-3.92) for
use >10 times: atopy 0.96 (0.59-1.56) for use 3-
10 times, and 1.05 (0.62-1.77) for use >10
times.
Cross-sectional
analysis
McKeever et al.
2005 To investigate the
associations between
use of pain
medication,
particularly
paracetamol, and
asthma, COPD, and
FEV1 in adults.
Data from the Third
National Health and
Nutrition Examination
Survey (U.S.)
Participants aged
between 20 and
80 years, with complete
data for relevant
exposures, outcomes
Dose–response association of paracetamol use
and asthma (adjusted
odds ratio, 1.20; 95% CI, 1.12–1.28; p value for
trend 0.001).

Shaheen et al.
Case-control study
2000 To investigate
whether frequent use
in humans was
associated with
asthma.
Adults aged
16–49 years registered
with 40 general
practices in Greenwich,
South London.
Frequency of use of
paracetamol and
aspirin was compared in
664 individuals
with asthma and in 910
without asthma.
After controlling
for potential confounding factors OR for
asthma, compared with never users, was
1.06 (95% CI 0.77 to 1.45) in infrequent users
(<monthly),
1.22 (0.87 to 1.72) in monthly users,
1.79 (1.21 to 2.65) in weekly users, and
2.38 (1.22 to 4.64) in daily users
(p (trend) = 0.0002).
This association was present in users and
nonusers of aspirin.
Shaheen et al.
Multicentric case-
control study
2008 To examine whether
or not frequent
paracetamol use is
associated with adult
asthma across
Europe.
The network compared
521 cases with a
diagnosis of asthma and
reporting of asthma
symptoms with 507
controls with no
diagnosis of asthma and
no asthmatic symptoms
across 12 European
centres.
Weekly use of paracetamol, compared with less
frequent use, was strongly positively associated
with asthma after controlling for confounders.
OR 2.87 95% CI 1.49-5.37 No association was
seen between use of other analgesics and
asthma.
Consistency of the association across geography, culture and age (III)

asthma;
morbidity cross countries;
paracetamol”.

Comparative studies
Source Year of
study
Study objective Population Results
Case-control
study
Shaheen et al.
2000 Determine if
frequent
paracetamol use is
a risk factor for
asthma.
Adults aged 16-51 yr
in South London
Cases: n = 720 (51%
response rate)
Controls: n = 980
(49% response rate).
Never users: OR 1.06 (95% CI 0.77-1.45);
Monthly users: OR 1.22 (95% CI 0.87-1.72);
Weekly users: OR 1.79 (95% CI 1.21-2.65);
Daily users: OR 2.38 (95% CI 1.22-4.64);
P value for trend = 0.0002
Prospective
cohort study
Shaheen et al.
2002 Examine the
relationship
between prenatal
paracetamol use
and wheezing in
offspring at 6 mo.
9400 women Increased risk of wheezing before 6 mo for
offspring of frequent paracetamol users
over 20-32 wk prenatally:
OR 2.34 (95% CI 1.24-4.40).
Prospective
cohort study
Barr et al.
Nurses’ Health
Study
2004 Examine the
relationship
between
paracetamol use
and new onset of
asthma
73,321 women
(44-69 yr)
Increased risk of diagnosis of new-onset
asthma with frequency of use
Adjusted RR 1.63, 95% CI 1.11-2.39
Dose dependence: p value for trend =
0.006
Randomized
double blind trial
without placebo
Boston
University Fever
Study
2002 Compare the
incidence of
adverse reactions
among children
administered
paracetamol or
ibuprofen
84,000 febrile
children
Age ≤ 12 yr
Randomly assigned
paracetamol or low –
dose ibuprofen, or
high dose ibuprofen
Among 1879 children with pre-existing
asthma, outpatient visits for asthma were
lower in the ibuprofen arm than the
paracetamol arm (RR 0.56 95% CI 0.34-
0.95); + dose-dependence
Hospitalizations were nonsignificantly
lower (RR 0.63 95% CI 0.25-1.60).

asthma;
morbidity and across countries;
paracetamol”.

Ecologic Study
Newson et al.
2000 Examine the rate
of Asthma and
aggregate
consumption of
acetaminophen in
1994-95.
English speaking
countries in the
ECHRIS study.
Prevalence of wheeze increased by 0.52%
for 13-14 yr olds;
By 0.26% for young adults,
For each gram increase in per capita
paracetamol sales.
Prevalence of childhood wheezing in 36
countries around the world is predicted by
each country’s per-capita sales of
paracetamol.
Relationship between per-capita sales of paracetamol and asthma morbidity
and across countries (5)

Acetaminophen
Reduced
Gluthatione in the
airways
Alteration of
antigen
presentation and
recognition
Shift from Th1 (non-
allergic) to Th2 (allergic)
cytokine profile
Lower inability to
counteract
oxidative stress
Tissue injury
Smooth muscle
contraction
Bronchia hyper-
responsiveness
Release of pro-
inflammatory mediators
(leukotrienes)
Impaired β-receptor
function
Stimulation of additional
inflammatory cells
Lower ability to
scavenge
acetaminophen
toxic metabolite:
N-acetil-p-
benzoquinonemin
e (NAPQI)
Acetaminophen
toxic metabolite:
N-acetil-p-
benzoquinonemin
e (NAPQI)
Reduced
immune
response to and
prolongation of
rhinovirus
infection
Antipiretic effect
Cytokine storm
2Reduced IFN-γ
and IL-2
ASTHMA

• While all pathways are only indirectly relevant to
asthma pathogenesis, their plausibility is strongly
supported by experimental data at different
levels (in vitro, in vivo, and clinical studies).
• For some, this evidence provides some
mechanistic rationale, and strengthens the
support to the causal hypothesis provided by the
evidence obtained at the population level, at the
point that no additional randomized studies are
needed in order to consider acetaminophen as a
causative factor for asthma exacerbation or
insurgence.
• Others instead hold a conservative view and are
concerned by confounding.

Some authors show some reluctance
in accepting such evidence as a sufficient
basis for practice change
and for establishing a causal relationship
between acetaminophen and asthma,
on grounds that it does not result from
randomized clinical trials
(Eneli et al. 2005, Allmers et al. 2009, Johnson
and Ownby, 2011;
Karimi et al., 2006, Wickens et al. 2011, Chang
et al. 2011).
Particularly, these authors express the
concern that the acetaminophen-asthma
relationship may be explained by
1) reverse causation,
2) confounding by indication or
3) preference for acetaminophen rather than
ibuprofen in children at risk for asthma

Other authors, although less
sceptical about the causal
relationship, nevertheless equally
require or recommend the
performance of adequately
powered placebo-controlled trials
to establish causation
(Holgate, 2011; Henderson and
Shaheen, 2013).

Martinez-Gimeno and García-Marcos 2013,:
“apart from tobacco smoke exposure, no other
genetic or environmental factors, including
genes, allergens, infections and bacterial
substances, has shown the stubborn and
consistent association with wheezing disorders
prevalence as acetaminophen has done”
They recommend against a too liberal use of
acetaminophen in children, while waiting for
regulatory agencies to do their part and
reconsider the safety profile of acetaminophen
Furthermore they are against the performance of
double blind RCTs with placebo:
“contrary to common claims, a placebo arm
would be impractical and unethical, because it
would subject participants to a substandard and
unacceptable treatment during a very long time”
(p. 114).

Beasley et al (2011):
“When the study findings are considered
together with other available data, there is
substantive evidence that acetaminophen
use in childhood may be an important risk
factor for the development and/or
maintenance of asthma, and that its
widespread increasing use over the last 30
years may have contributed to the rising
prevalence of asthma in different countries
worldwide”

McBride (2011):
“The balance between the likely risks and
benefits of acetaminophen has shifted for
children with a history or family history of
asthma. I can understand how those
responsible for regulation or policy
statements of professional organizations
might be more comfortable waiting for
incontrovertible evidence [...]
At present, however, I need further studies
not to prove that acetaminophen is
dangerous but, rather, that it is safe. Until
such evidence is forthcoming, I will
recommend avoidance of acetaminophen by
all children with asthma or those at risk for
asthma.”

By shifting the burden of proof,
McBride assumes that, given the
available evidence, the hypothesis of
causal connection between
acetaminophen and asthma is
stronger than that of its absence; or,
at least, that given the expected
harm and benefit, the probability of
causal connection between
acetaminophen and asthma is high
enough as to shift the balance against
its use.

The dissent concerning the best course of action among
scholars is ultimately caused by differing epistemological
views which are left implicit.
Who’s right?

Integration of prior knowledge and
available evidence
• Biological data point to potential inflammatory effects of acetaminophen on the airways
through multiple (possibly additive) pathways.
• Dismissal of the causal link because of possible confounding factors at the epidemiological
level explicitly eludes this evidence.
• This is also valid for other supporting evidence such as the dose-response relationship
found in many studies, and in general for the higher likelihood of the entire set of data on
the hypothesis of causation rather than on its denial;
• However low prior: in the acetaminophen case, prior knowledge about the molecule itself
would be rather against the hypothesis of harmfulness, in that it has been generally
considered an harmless analgesics
• this might explain according to Martinez-Gimeno and García-Marcos (2013) the reluctance
to accept this causal hypothesis
•  this means that instead of explicitly taking prior knowledge into account,
• Prior knowledge is allowed to influence the interpretation of observational evidence
implicitly:
The concern about confounders hides a conservative low prior for harmfulness.

Categorical vs. cumulative causal assessment
• Detractors of the causal hypothesis seem to feel
uncommitted until contrary proven and advocate
for the performance of RCTs before taking any
action
• Supporters feel challenged by the evidence
already available and consider what should be
thought and done on its basis.
• Contrary to what expected, the former attitude is
not neutral since its default is that there is no
causal association, until proved by RCTs, whereas
the available evidence does no longer warrant
the categorical denial of this hypothesis.

Internal vs. external validity
• Premarketing studies (RCTs) show
acetaminophen to be relatively harmless
• But extreme conservatism (+ ignoring non-
experimental evidence, because of lower
internal validity) ends up with neglecting data
on robustness of association across contexts in
observational studies.

Impartiality
• Dismissal of the causal association between
acetaminophen and asthma on grounds that the
overwhelming epidemiological evidence may be
produced by confounders represents a case
where uncertainty about causal connection may
be exploited by interested parties (Lowe et al.
2010 and Holgate, 2011 have conflicting interests
for instance).
• A too rigid attitude towards evidence quality may
run against the reasons for which quality
standards have been introduced.

Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (8)

Similar to Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013

Similar to Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013 (20)

Recently uploaded

Recently uploaded (20)

Barbara Osimani, Problems with Evidence of Pharmaceutical Harm. King's College London, Department of Philosophy CHH - Concepts of Health Seminar 14 May 2013