Prediction Intervals Provide Valuable Insight in Meta-Analyses

The use of Prediction Intervals in Meta-Analysis
Nikesh Patel
March 28, 2013

Abstract
Background
Systematic reviews containing meta-analyses of randomised controlled trials provide
the best and most reliable information on health care interventions. Meta-analysis
combines treatment effects from included studies to produce overall summary results.
In the fixed-effect analysis, a common effect is assumed whereas in a random-effects
analysis, the model allows for between-study heterogeneity. The goal of analysing
heterogeneous studies is not only to report a summary estimate but to explain the observed differences. Whilst a random-effects model remains gold standard for analysing
heterogeneous studies, solely reporting the summary estimate and its 95% confidence
interval masks the potential effects of heterogeneity. A 95% prediction interval, which
takes into the account the full uncertainty surround the summary estimate, describes
the whole distribution of effects in a random-effects model, the degree of betweenstudy heterogeneity and conveniently gives a range for which we are 95% sure that
the treatment effect in a brand new study lies within.
Aims
I aim to apply a 95% prediction interval to a collection of meta-analyses of randomised
controlled trials and observe the impact it has on their outcomes. I also aim to apply
a 95% prediction interval to meta-epidemiological studies which assesses the influence
of trial characteristics on the treatment effect estimates in meta-analyses.
Results
I carried out an empirical review to look at the impact of 95% prediction intervals on
existing meta-analyses of randomised controlled trials on the Lancet. From 26 studies,
I extracted 36 meta-analyses containing between three and thirty-four randomised
controlled trials (median eight, IQ range seven) and reproduced each using a randomeffects model with a 95% prediction interval. I found 19 (52.8%) had significant 95%
confidence intervals of which 10 (27.8%) had insignificant 95% prediction intervals, 9
(25%) had significant 95% prediction intervals. Also, 95% prediction intervals were
applied to 4 meta-epidemiological studies revealing extra information concerning the
summary estimates.

Conclusion
Every random-eﬀects meta-analysis should include a 95% prediction interval but for
best performance, the analysis should include a suﬃcient number of good quality
unbiased randomised controlled trials. To enhance quality and robustness of metaepidemiological studies, a 95% prediction interval should be included.

2

Contents
1 Introduction
1.1 Systematic Review . . . . . . . . . . . . . . .
1.2 Meta-Analysis . . . . . . . . . . . . . . . . . .
1.3 Fixed-Effect Meta-Analysis . . . . . . . . . . .
1.4 Carrying out a Fixed-Effect Meta-Analysis . .
1.5 Heterogeneity . . . . . . . . . . . . . . . . . .
1.6 Random-Effects Meta-Analysis . . . . . . . . .
1.7 Carrying out a Random-Effects Meta-Analysis
1.8 Fixed-Effect v Random-Effects . . . . . . . . .

.
.
.
.
.
.
.
.

3
3
4
5
6
9
11
12
14

2 Prediction Interval
2.1 95% Prediction Interval . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Calculating a Prediction Interval . . . . . . . . . . . . . . . . . . . .
2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17
18
18
20

3 Empirical review of the impact of using prediction
isting meta-analyses
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Search Strategy and Selection Criteria . . . .
3.2.2 Data Calculations . . . . . . . . . . . . . . . .
3.2.3 Software . . . . . . . . . . . . . . . . . . . . .
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Principal Findings . . . . . . . . . . . . . . .
3.4.2 Limitations . . . . . . . . . . . . . . . . . . .
3.4.3 Comparison with other studies . . . . . . . . .
3.4.4 Final Remarks and Implications . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

22
22
23
23
24
27
27
36
37
39
40
40

4 Prediction intervals in Meta-Epidemiological studies
4.1 Meta-Epidemiological Study . . . . . . . . . . . . . . . . . . . . . . .
4.2 Prediction Intervals in Meta-Epidemiological Studies . . . . . . . . .

42
43
43

1

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

intervals on ex.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

4.3

4.2.1 Example
4.2.2 Example
4.2.3 Example
4.2.4 Example
Discussion . . .

1
2
3
4
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

44
46
48
50
52

5 Final Discussion and Conclusion

53

A STATA Codes

57

2

Chapter 1
Introduction
In health care and medicine, clinicians, researchers and other important figures require quality and accurate information to assist them in being able to make the best
possible decisions on health care interventions. Such information is normally found
in systematic reviews containing meta-analyses of randomised controlled trials. 1 The
aim of this paper is to investigate the use of prediction intervals in meta-analysis, a
typical statistical component of a systematic review and how its application can help
aid interpretation of meta-analysis results to a higher degree of quality and accuracy.

1.1

Systematic Review

Since the 1990s, systematic reviews have become very important in medicine and
health care. Reasons for this are down to the sheer volume of medical literature
produced annually and the requirement for clinicians and other health care officials
to have up to data quality and accurate information on health care interventions. 1
The objective of a systematic review is to present a balanced and impartial summary
of all the available research on a well-defined research question. 1 It uses systematic
and explicit methods to identify, assess, select and synthesise all the evidence that is
relevant to answering a well-defined research questions in an objective and unbiased
manner. Systematic reviews have replaced traditional narrative reviews since the
former does not follow peer-protocol, do not use any kind of rigorous methods and
tend to lack transparency causing bias; a systematic review corrects these issues. 2
A systematic review begins by clearly defining a research question of interest, this
may include what treatments are being compared, what outcomes are being measured, what is the population of interest etc. The next step is to search for studies
that are relevant to the research question, this is done by searching all of the published
and unpublished information against a well-defined quality search criterion which can
3

involve searching databases such as MEDLINE, PubMed etc. The studies which pass
through the search criterion go through further quality assessment to remove any irrelevant studies. The next step is extract all the relevant data from the included studies
and then carry out a statistical synthesis of the data which is done using meta-analysis
(see Meta-Analysis). The final step is to present all the findings from the analysis as
well as analysing any possible heterogeneity between the studies, commenting on the
quality of the studies (e.g. bias) and identifying areas of further research. 1
Examples of systematic reviews can be found easily on the internet on websites of
the British Medical Journal (BMJ) or the Cochrane Collaboration and many more.
These websites dedicate themselves to provide information on health care interventions to the health care and medicine industry. A robust methodology for preparing
and producing systematic reviews can be found on these websites for example, The
Cochrane handbook for systematic reviews of interventions. 3

1.2

Meta-Analysis

“The Statistical analysis of a large collection of analysis results from individual
studies for the purpose of integrating the findings”
Gene V. Glass definition of Meta-Analysis
A meta-analysis is a statistical technique whereby results from studies, included in
the analysis, are combined to produce a total and complete summary of the studies.
In epidemiology, a stereotypical systematic review of randomised controlled trials will
use meta-analysis as its statistical component whereby treatment effects from individual trials are synthesised in the aim of assessing clinical effectiveness of healthcare
interventions. 4 Meta-analysis is based on one of two models, the fixed-effect and the
random-effects model. In this chapter, I discuss both models and when each type
should be used.
It first seems appropriate to address the reasons why we would want to use a metaanalysis and not the traditional narrative approach. In a narrative approach, the focus
tends to be on p-values of individual studies and observing if there is significant effect
in each study. Since there is no rigorous way of synthesising p-values, the findings from
a narrative approach tends to lack transparency and in many cases, the researchers
may only include studies that support their own opinions which leads to the results
being biased towards their own opinions. 1;2 A meta-analysis, on the other hand, works
directly with the treatment effects of each study and their respective standard errors
and performs one single synthesis of all the data to produce an aggregate summary
ˆ
estimate, which I denote as θ. 4 Since we are combining all the information across the
studies, we reduce uncertainty compared to any individual study since we are increasing the sample size and in turn, increasing the power to detect clinically meaningful
4

results. 2 A meta-analysis also addresses the consistency of treatment effects across
the studies, something a narrative approach fails to do. If the treatment effects are
consistent, then the focus is on the summary estimate and making sure we estimate
this accurately as possible. If the treatment effects are not consistent, the not only
should we estimate the summary effect but explain the differences that exist between
the studies. 2;5 Treatment effects are generally much more important to clinicians and
other health care officials compared to p-values. The effect size tells us not only if
the treatment effect is better/worse (i.e. greater or less than the null value), but
also the magnitude of the effect. Also, p-values can be easily misinterpreted as some
researchers may deem a non-significant p-value to suggest the treatment effect has no
effect. 2 I later return to the argument of a narrative approach against a meta-analysis
when I consider an example (see page 8).
A vital requirement for a strong meta-analysis is a well-conducted systematic review.
If the underlying systematic review isn’t carried out under good conduct, the metaanalysis will produce results that may lead to misleading conclusions. 1;2 A metaanalysis should also be carried out under good conduct, once again I recommend the
Cochrane handbook for systematic reviews on how to conduct a good meta-analysis. 3

1.3

Fixed-Effect Meta-Analysis

The first type of meta-analysis I discuss is the fixed-effect meta-analysis. The fixedeffect model assumes that all the studies included in the analysis are estimating the
same underlying treatment effect or in other words, we believe the true treatment
effect is common across all the other studies and each study is estimating that same
true treatment effect. The repercussions of this model is that any differences observed
between the individual treatment effects are down solely to random sampling error
(within-study error). If we had an infinite number of studies with an infinitely large
sample size, we expect the within-study error in each study to tend to zero and the
individual treatment effects to be the same as the true common treatment effect. 2
In the fixed-effect model, we can express the observed treatment effects in the following
way,
ˆ
Y k = θ + εk

(1.1)

ˆ
where Yk is the observed treatment effect in study k, θ is the estimate of the summary
treatment effect and εk is the random sampling error in study k. We can assume
that the errors follow a normal distribution with mean 0 and variance equal to the
variance of the treatment effect in study k, i.e. that εk ∼ N(0,Var(Yk )). Here the

5

errors account for the within-study error in each study since in the fixed-effect model,
we assume this is the only source of variation. 2
ˆ
For the fixed-effect meta-analysis, the aim is to compute the summary estimate θ,
which is interpreted as the best estimate of the common treatment effect that underlies
each of the studies in the analysis, along with a 95% confidence interval.

1.4

Carrying out a Fixed-Effect Meta-Analysis

A general approach to meta-analysis is given by the inverse-variance method, this
method works for any type of data as long as we can obtain a treatment effect and
its standard error. 2 For continuous data, we need a mean difference (or any kind of
difference), for survival data, we need a log hazard ratio and for binary outcomes, we
need a log odds ratio or log relative risk along with their respective standard errors
(log standard errors for ratios).
In the fixed-effect model, the weight assigned to each study is one over the variance
of the study, hence the term inverse variance method. Studies with smaller variances
are assigned larger weights than studies with larger variances.
The fixed-effect inverse-variance weighting is therefore given by
Wk =

1
.
V ar(Yk )

(1.2)

where V ar(Yk ) is the variance of the observed treatment effect in study k.
ˆ
The formula for θ using a fixed-effect model is given by
ˆ
θ=

n
k=1 Yk Wk
n
k=1 Wk

(1.3)

which has variance given by
1

ˆ
V ar(θ) =

n
k=1

Wk

.

(1.4)

Here Wk is the weighting given using the inverse variance given by (1.2). I note that
ˆ
θ is the maximum likelihood estimate for θ and it is asymptotically unbiased, efficient
ˆ
and normal. 6 I reiterate that θ should be interpreted as the best estimate of the
common treatment effect since the fixed-effect model assumes that each of the studies
in the analysis are estimating the same treatment effect.
6

We also calculate a 95% confidence interval to express our uncertainty around our
ˆ
ˆ
summary estimate θ, assuming that θ is approximately normally distributed, using
the following formula
ˆ
ˆ
θ ± 1.96s.e.(θ) .

(1.5)

If we are working on the log scale, i.e. we are using some type of ratio, we must
ˆ
remember to exponentiate θ in (1.3) and the end points of the confidence interval in
(1.5). I could also present a 100(1 − α)% confidence interval but for convention, I am
only going to calculate 95% confidence intervals in this paper.
Example 1
Table 1.1, presented below, shows the results from ten randomised controlled trials
each comparing the benefit of an anti-hyperintensive treatment, treatment A, against
placebo. Each trial is presented with its unbiased estimated mean difference in change
in systolic blood pressure (mmHg), variance and a 95% confidence interval. 7
T rial(k)
1
2
3
4
5
6
7
8
9
10

Yk
-0.49
-0.17
-0.52
-0.48
-0.26
-0.36
-0.47
-0.30
-0.15
-0.28

V ark
0.12
0.05
0.06
0.14
0.06
0.08
0.05
0.02
0.07
0.25

95% C.I.
[-1.17,0.19]
[-0.61,0.27]
[-1.00,-0.04]
[-1.21,0.25]
[-0.74,0.22]
[-0.91,0.19]
[-0.91,0.03]
[-0.58,-0.02]
[-0.67,0.37]
[-1.26,0.70]

Table 1.1: Results of trials comparing treatment A against placebo (A value < 0
represents a reduction in blood pressure and therefore beneficial)
Using a fixed-effect model, we weight each study using (1.2) and then obtain a summary estimate for the treatment effect along with a 95% confidence interval. Using
ˆ
(1.3), we calculated our summary estimate θ to be -0.33, so we expect treatment A
to consistently reduce systolic blood pressure by 0.33mmHg. Our 95% confidence
interval calculated using (1.5) is [-0.48,-0.18]. Since the null value of 0 is not in the
ˆ
95% confidence interval for θ, there is strong evidence at 5% level that treatment A
is effective in reducing systolic blood pressure. The results are presented in a forest
plot given below in figure 1.1.
7

Figure 1.1: Forest plot of a meta-analysis of randomised controlled trials showing the
effects of treatment A on reducing systolic blood-pressure (SMD = standardised mean
difference) 7
On the forest plot, the squares represent the weight that is assigned to the corresponding study with the centre of the square depicting the observed treatment effect
for that study. The 95% confidence interval for each study is represented by the lines
going through the squares with them beginning and ending at the end points of the
interval. The diamond at the bottom of the forest plot represents the 95% confidence
interval of the summary estimate with the centre representing the summary estimate.
I know return to the argument of using meta-analysis over a narrative approach.
If we observing the forest plot in figure 1.1, eight trials have a confidence interval
that contains the null value 0 and therefore have insignificant p-values. If we took a
narrative approach and consider each study separately, we would most likely conclude
that since 80% of the studies produced insignificant p-values, the treatment isn’t
beneficial. When we perform a meta-analysis, the 95% confidence interval for the
summary estimate doesn’t contain the null value and therefore we obtain a significant
p-value since we have increased the power to detect significant results. 2

8

1.5

Heterogeneity

In the fixed-effect model, we assumed that all the studies in the analysis are estimating the same treatment effect and the only error we allow for is random sampling error
(within-study heterogeneity), but is this always a plausible assumption. In general,
studies looking at the same treatment may differ in many ways such as patient characteristics (age, patient health etc), location of study, intervention applied (dosage etc)
and many more known and unknown factors causing the treatment effects across the
studies to longer remain consistent. 2 If the treatment effects are no longer consistent,
then there exist real differences between the studies (between-study heterogeneity)
and the aim of a meta-analysis should be to assess the heterogeneity between the
treatment effects as well as calculate a summary estimate. 2;5;8 If we used a fixed-effect
method in the presence of between-study heterogeneity, we would be wrongly implying
a common effect exists and hence lead to misleading conclusions about the treatment.
I now discuss ways in which we can assess heterogeneity, since heterogeneity is made up
of real differences (between-study heterogeneity) and random sampling error (withinstudy error), we need some tools to help us see if between-study heterogeneity is
present. I first introduce the Q-statistic which is based on the result of the Q-test.
This test is useful if we believe the presence of between-study heterogeneity is causing
more variation in the treatment effects than is expected only by random sampling
error. 2;9
The Q-test is then defined as follows;
H0 : Y1 = Y2 = · · · = Yk (for all k studies)
H1 : At least one Yk differs,
where Yk is the observed treatment in study k and Wk is the fixed-effect weighting of
study k.
The Q-statistic, which is given by the following formula
n

Wk Yk2 −

Q=
k=1

(

n
2
k=1 Wk Yk )
,
n
k=1 Wk

(1.6)

is compared to χ2 (α). If we find Q > χ2 (α), then we reject the null hypothesis at
n−1
n−1
(1−α)% level and this suggests that there is evidence of between-study heterogeneity.
If Q < χ2 (α), then we accept the null hypothesis at at (1 − α)% and this suggests
n−1
there is no evidence of between-study heterogeneity. 2;9
Another useful statistic is the I 2 -statistic, this measures approximately the percentage
of total variation that is down to between-study heterogeneity. 9 It is given by the
following formula;
9

I 2 = 100%

(Q − (k − 1))
Q

(1.7)

where Q is the Q-statistic worked out using (1.5).
If our I 2 is 0%, this suggests that all the variability in our summary estimate is down to
random sampling error (within-study heterogeneity) and not down to between-study
variation and therefore it could make sense to use a fixed-effect model. I 2 values are
considered by Higgins et al. to be low, moderate and high on the values of 25%, 50%
and 75% respectively. 2;9 If we obtain a negative value for I 2 , the value is set to 0 and
interpreted in the same way as 0.
I must stress that both the Q-test and the I 2 -statistic should be used as tools to
help us to decide what model we use, the decision on what model we use shouldn’t
be solely based on the performance of the Q-test and I 2 -statistic since they aren’t
precise. If we consider the Q-test, while a significant p-value suggests that there
exists variation in the individual treatment effects, a non-significant p-value doesn’t
necessarily mean a common effect exists. The lack of significant can be as a result
of a lack of power. If there are few trials or we have lots of within-study error as a
result of trials having small sample sizes, the even the presence of a large amount of
between-study heterogeneity may result in a non-significant p-value. 2 If there are few
studies, a significance level of 10% is often used because of lack of power, so a p-value
strictly less than 0.1 would be enough to accept the null hypothesis that there exists
no between-study heterogeneity.
The I 2 -statistic itself is dependent on the Q-statistic therefore if the Q-test lacks
power, then the I 2 will be imprecise. Also, I 2 may tell us what proportion of the
variation is down to real error but what it doesn’t tell us is how spread out the error
is. A high value of I 2 implies a high proportion of the variation is down to real
error but this error may only be spread out narrowly since the studies may have high
precision. Conversely, a low I 2 only implies a low proportion of variation is down to
real error but doesn’t imply the effects are grouped together in a narrow range, they
could easily vary of a wide range if the studies used lack precision. 2 Higgins in his
paper 10 talks about the misunderstanding of the I 2 -statistic and believes it should
only be used a descriptive statistic.
Example 1 (Continued)
I now apply both the Q-test and the I 2 -statistic to example 1 and see if conducting
a fixed-effect analysis to that example was appropriate. Conducting a Q-test leads
to a Q-statistic of 2.490 using (1.6), this is compared to χ2 = 14.684 (we use 10%
9
level of significance, since we only have a few studies). Since our test statistic of

10

2.490 < 14.684, there is no statistical evidence against H0 at 10% level of significance.
This suggests that there is no sign of between-study heterogeneity. I also work out
the I 2 -statistic, here our I 2 value is −261.385% using (1.7), which is set to 0 which
suggests that the total variation across the studies is only down to within-study error.
If we observe the forest plots in figure 1.1, it’s fairly clear to see that the observed
treatment effects do not deviate too far from the summary estimate so using a fixedeffect model seems appropriate, so I can regard our summary estimate as the common
effect.
If we conclude that between-study heterogeneity is present, we cannot use the fixedeffect model, we instead use the random-effects model which is discussed in the next
chapter. I briefly discuss two alternatives that try and eradicate all presence of
between-study heterogeneity which can be ideal from a researchers perspective. The
first is sub-group analysis, in this case, a series of fixed-effect meta-analyses are performed on each sub-group where studies in each group are deemed similar enough to
assume a common effect. Problems with the sub-group analysis is that each sub-group
will contain fewer studies so we have a loss of power and instead of carrying out one
synthesis, we are doing several and we still aren’t guaranteed a sufficient amount of
between-study heterogeneity will be removed. 2 The second option is meta-regression
where the covariates in the model explain the variation in the data and we can obtain
the treatment effect for each covariate while adjusting for the others. A problem with
this method is that unidentified sources of heterogeneity aren’t accounted for. 11 A
problem inherent in both alternatives is that with a few studies, both aren’t useful
since there is a loss of power, i.e. in the case of meta-regression, we have low power
to detect what covariates explain heterogeneity. 2;11

1.6

Random-Effects Meta-Analysis

The second type of meta-analysis I discuss is the random-effects meta-analysis. This
model assumed that the individual treatments effects vary across the studies because
of the presence of real differences (between-study heterogeneity) as well as random
sampling error. A random-effects model assumes that the true effects from the individual studies come from a distribution of true effects with mean θ and variance
equal to the magnitude of the between-study heterogeneity which I denote as τ 2 and
term between-study variance (we can usually assume a normal distribution). The
repercussions of this model is that if we had an infinite number of studies with an
infinitely large sample size, we expect the random sampling error to tend to zero but
expect the individual treatment effects to still differ because of real differences that
exist between them. 2;5
In the random-effects model, we can express the observed treatment effects in the
following way,
11

ˆ
Yk = θ + ζk + εk

(1.8)

ˆ
where θ is the summary estimate, εk is the sampling error in study k and ζk is the
between-study error in study k. We again assume that εk ∼ N(0,Var(Yk )) and assume
that ζk ∼ N(0,ˆ2 ). Here the errors account for the within-study error and the betweenτ
study error since in the random-effects model, we allow for two sources of variation. 2
ˆ
For the fixed-effect meta-analysis, the aim is to compute the summary estimate θ,
which is interpreted as the best estimate of the common treatment effect that underlies
each of the studies in the analysis, along with a 95% confidence interval. For the
random-effects meta-analysis, computing the summary estimate alone and its 95%
confidence interval is insufficient. Since we assume there exists real differences between
the treatment effects, the aim of a random-effects meta-analysis is not only to compute
the summary estimate but also to explain the differences that exists between the
trials and learn about how the individual treatment effects are distributed about the
ˆ
summary estimate. 2;5 I note that the summary estimate θ is now interpreted as the
average effect.

1.7

Carrying out a Random-Effects Meta-Analysis

To carry out a random-effects meta-analysis, we first need to estimate the betweenstudy variance since it describes the magnitude of the between-study heterogeneity
ˆ
and this has to be incorporated into the calculations of the summary estimate θ.
To estimate τ 2 , we use the DerSimonian and Laird method which provides an unbiased
point estimate for τ 2 . 12 This is given by the following formula,
τ2 =
ˆ

Q − (k − 1)
n
k=1

Wk −

n
2
k=1 Wk
n
k=1 Wk

(1.9)

where Q is the Q-statistic calculated using (1.6) and Wk are the weights for each
study from the fixed-effect meta-analysis calculated using (1.2). I note that should
Q < (k − 1), then we set τ 2 = 0. If our point estimate of between-study variance
is zero (implying no between-study heterogeneity), then the random-effects model
reduces to the fixed-effects model.
Similar to the fixed-effect model, we use the inverse variance method to weight the
individual studies. In the fixed-effect model, since we assume each study is estimating
the same common effect, the study with the highest precision is given the largest
weighting since it will contain the most information about the true summary effect
12

θ. In a random-effects model, the weighting has to be given more care since each
study is no longer estimating the same treatment effect. 2 The weighting must now
take into account the estimate of the between-study variance τ 2 so the study with the
ˆ
largest precision doesn’t have as much influence as it would if a fixed-effect model was
assumed. So, in a random-effects model, the weight given to each study is given by
∗
Wk =

1
.
V ar(Yk + τ 2 )
ˆ

(1.10)

ˆ
The formula for θ using a random-effects model is given by
ˆ
θ=

n
∗
k=1 Yk Wk
n
∗
k=1 Wk

(1.11)

and has variance
ˆ
V ar(θ) =

1
n
k=1

∗
Wk

.

(1.12)

ˆ
I reiterate that θ should be interpreted as average or mean treatment effect and not
the common effect, since by using a random-effects model, I am assuming that the
true effects from each of the studies are distributed about the man of a distribution
ˆ
of true effects and θ is the estimate of this mean. I also note that the true treatment
effect in an individual study could be lower or higher than this average effect.
ˆ
A 95% confidence interval for θ is given by
ˆ
ˆ
θ ± 1.96s.e.(θ) .

(1.13)

Example 2
Table 1.2 presented below shows the results from ten randomised trials each comparing the benefit of another anti-hyperintensive treatment, treatment B against placebo.
Each trial is presented with its unbiased estimated mean difference in change in systolic blood pressure (mmHg), variance and a 95% confidence interval. 7

13

T rial(k)
1
2
3
4
5
6
7
8
9
10

θk
0.00
0.10
-0.40
-0.80
-0.63
-0.22
-0.34
-0.51
-0.03
-0.81

V ark
0.423
0.219
0.026
0.199
0.301
0.301
0.071
0.102
0.122
0.301

95% C.I.
[-0.829,0.829]
[-0.329,0.529]
[-0.451,-0.349]
[-1.190,-0.410]
[-1.220,-0.040]
[-0.370,0.810]
[-0.480,-0.201]
[-0.710,-0.310]
[-0.209,0.269]
[-1.340,-0.220]

Table 1.2: Results of trials comparing treatment B against placebo (A value < 0
represents a reduction in blood pressure and therefore beneficial)
I first test for heterogeneity to help us decide what type of meta-analysis we should
use. We obtain a Q-statistic of 30.876 > χ2 (0.05) = 14.684 using (1.6) which suggests
9
evidence of heterogeneity at 10% level of significance. I also obtained an I 2 value of
70.85% using (1.7) which suggests that 70.85% of the variation in treatment effects is
due to between-study heterogeneity and the rest is due to chance. This is considered a
high level of between-study heterogeneity and therefore a random-effects meta-analysis
would seem appropriate to use.
Using the formulas (1.9) through to (1.13), I obtained τ 2 to be 0.029 and summary
ˆ
estimate of -0.33 along with 95% confidence of [-0.48,-0.18]. So on average, treatment
B reduced systolic blood pressure by 0.33mmHg but in an individual study, the treatment effect can vary from this average and since the null value of 0 is not in the 95%
confidence interval, there is strong evidence at 5% level that treatment B, on average,
is beneficial. A forest plot of the results from the meta-analysis is shown in figure 1.2.
We can see that unlike in figure 1.1, there is clear deviations from the individual
treatment effects to the summary estimate so it would therefore seem appropriate to
assume that each trial is estimating a different treatment effect and use a randomeffects model to account for it.

1.8

Fixed-Effect v Random-Effects

It is imperative that when conducting a meta-analysis, the right model is chosen since
it influences how we interpret the results. If we look at examples 1 (figure 1.1 on page
8) and 2 (figure 1.2 on page 15), both of these produce the same summary estimate of
-0.33 and have the same 95% confidence interval of [-0.48,-0.18]. Despite these similarities, the way in which they are interpreted are very different. In example 1, I used
14

Figure 1.2: Forest plot of a meta-analysis of randomised controlled trials showing the
effects of treatment B on reducing systolic blood-pressure (SMD = standardised mean
difference) 7
a fixed-effect model which I justified because I believeed there is no presence of real
differences between the studies so the summary estimate is the common effect across
the studies. In example 2, I decided to use a random-effects model since I believed
the variation between the individual treatment effects were down to real differences
as well as random-sampling error so therefore, I regard the summary estimate as the
average across the studies but in an individual study, the treatment effect can vary
from this average effect.
Despite these differences, there still seems to be some misunderstanding when it comes
to choosing what model to use and in interpreting the results. Riley at al. 7 reviewed
ˆ
44 Cochrane reviews that wrongly interpreted θ as the common effect rather than
the average effect when using a random-effects approach. They also reviewed 31
Cochrane reviews that used a fixed-effect meta-analysis and found that 26 of these
had I 2 values of 25% or more without justifying why a fixed-effect model was used.
Using a fixed-effect model in these situations must be justified, otherwise we end
up making inaccurate conclusions from the results since we are suggesting there is

15

a single common effect when actually no common treatment effect exists because of
real differences amongst the studies. A reason for misinterpretation can be put down
to the fact that if we observed the forest plots for examples 1 and 2, the results are
presented in the same way which causes confusion. Skipka et al. 13 point this out and
also point out that the point estimate of τ 2 is never displayed on the forest plot.
I have already commented that the choice of what model we use shouldn’t be solely
based on the Q-test and the I 2 -statistic but how should we go about choosing what
model we use. Let say we wish to carry out a meta-analysis on a sufficient number of
studies looking at some treatment against placebo. If we know there are a sufficient
number of properties that these studies have in common, for example similar age
range, similar dosage, similar follow-up time etc, then it would seem appropriate to
use a fixed-effect model since we believe there are negligible real differences between
the studies and any factors that do affect the treatment effects are the same across
the studies. A common procedure is to carry out a fixed-effect meta-analysis and
observe the forest plot to see if the observed treatment effects are similar. 2 There are
two problems with this, firstly it isn’t clear if the observed differences are only down
to random sampling error and if this was the incorrect model, then carrying it out
was a waste of time.
If we believe there are real differences, then a random-effects model should be implemented. In this model, each study is expected to be estimating a difference treatment
effect and the job of this type of meta-analysis is to make sense of the differences
between the studies and how the true individual treatment effects are distributed
about the summary estimate. 2;5 A clear advantage of a random-effects meta-analysis
is that we can generalise our results to a range of populations not included in the
analysis given that the analysis includes a sufficient number of studies, this maybe
one of the goals of the underlying systematic review. 2;5 If we wanted to estimate what
the treatment effects will be in a new study, we can draw it from our results as long
as we can describe how the individual treatments are distributed about the summary
estimate with adequate precision. 5 In a fixed-effect model, we cannot generalise since
our results are exclusive to certain properties, for example a particular population. 2

16

Chapter 2
Prediction Interval
In the presence of between-study heterogeneity, the aim of a meta-analysis isn’t just
to calculate the summary estimate but also to make sense of the heterogeneity. I have
already pointed out that methods of eradicating all presence of heterogeneity can
be difficult because of unknown sources of heterogeneity so it would seem better to
assess heterogeneity rather than try and remove it. Higgins 10 believes any amount of
heterogeneity is acceptable provided there is a “sound predefined eligibility criteria”
and that the “data is correct” but stresses that a meta-analysis must provide a stern
assessment of heterogeneity.
Since a random-effects meta-analysis accounts for unidentified sources of heterogeneity 7 , I believe it should be gold standard for explaining heterogeneous data. Unfortunately, once researchers have carried out a random-effects meta-analysis, they
tend to focus on the summary estimate and its 95% confidence interval, this however
isn’t sufficient since, by the assumption of a random-effects model, we allow for real
differences between the individual studies. 2;7 If we were using a fixed-effect model,
then focusing on the summary estimate, which gives the best estimate of the common
effect, and its 95% confidence interval, which describes the impact of within-study
heterogeneity on the summary estimate, is adequate. The random-effects summary
estimate tell us the average effect across the studies and its 95% confidence interval
indicates the region in which we are 95% sure that our estimate lies in, neither tell us
how the individual treatment effects are distributed about the random-effects summary estimate. 5 This leads us to the introduction of the prediction interval which is
discussed in this chapter.

17

2.1

95% Prediction Interval

A 95% prediction interval gives the range for which we are 95% sure that the potential
treatment effect of a brand new individual study lies. The beauty of a prediction
interval is that not only does it quantitatively give a range for a treatment effect in a
new study thus allowing the researcher, clinicians etc to apply the results into future
applications, but it also offers a suitable way to express the full uncertainty around the
summary estimate in a way which acknowledges heterogeneity. A prediction interval
can also describe how the true individual treatment effects are distributed about the
summary estimate. 2;5;7;13 For these reasons, the inclusion of a prediction interval in
a random-effects meta-analysis can make its conclusions more robust and provide a
more complete summary of the results and therefore making the results more relevant
to clinical practice. 14
The notion of a prediction interval was first proposed by Ades et al. 8 where they
propose a predictive distribution of a future treatment effect in a brand new study
using a Bayesian approach to meta-analysis. A further push for the prediction interval
in meta-analysis is seen in a paper by Higgins et al. 5 . The authors acknowledge the
small attention that has been given to predictions to meta-analysis and present the
prediction interval in a classical (frequentist) framework to meta-analysis. Higgins
et al. 10;5 believe that a prediction interval is the most convenient way to present the
findings of a random-effects meta-analysis in a way that acknowledges heterogeneity
since it takes into account the full distribution of effects in the analysis.

2.2

Calculating a Prediction Interval

When calculating a prediction interval, we not only account for the between-study
and within-study heterogeneity, but also for the uncertainty of the summary estimate
ˆ
θ and the uncertainty of the between-study variance τ 2 . 2 Let say we knew the true
ˆ
values of the summary effect θ and the between-study variance τ 2 , if we made the
assumption that the treatment effects across the studies are normally distributed, the
95% prediction interval would be given by
√
θ ± 1.96 τ 2 .

(2.1)

The problem with (2.1) is that we do not know the exact values of theta and τ 2 , rather
we are estimating them and because of this, there is uncertainty surrounding these
estimates. 2 To account for this, we use the following formula provided by Higgins et
al. 5 for a 95% prediction interval which is given by

18

ˆ 0.05
θ ± tn−2

ˆ
τ 2 + V ar(θ) .
ˆ

(2.2)

ˆ
ˆ
Here, θ is the summary estimate form the random-effects meta-analysis, V ar(θ) is the
variance of the summary estimate accounting for the uncertainty of the estimate of
0.05
θ, τ 2 is the estimate of the between-study variance, tn−2 is the t-value corresponding
ˆ
to the 95th percentile of the t-distribution where there are n − 2 degrees of freedom
(where n is the number of studies) which accounts for the uncertainty of the estimate
of τ 2 . 2;5 We require at least three studies to calculate a prediction interval 7 and we
also must remember to exponentiate the end points of (2.2) if we are working on the
log scale.
Example 2 with a Prediction Interval
In example 2, I used a random-effects model and found the summary estimate to be
-0.33mmHg, between-study variance τ 2 to be 0.029 and the 95% confidence interval
ˆ
for the summary estimate to be [-0.48,-0.18] (see figure 1.2). I can now calculate a
prediction interval for example 2 using (2.2), I obtained the interval [-0.76,0.09]. We
notice that the null value of 0 is now in the prediction interval so therefore, it isn’t
statistically significant at the 5% level. So, in a brand new individual study setting,
we are 95% sure that the potential treatment effect for this study will be between 0.76mmHg and 0.09mmHg. Although on average, the treatment will be beneficial (as
indicated from the 95% confidence interval), in a single study setting, we cannot rule
out that the treatment may actually be harmful (since the 95% prediction interval
contains values < 0). The prediction interval therefore acknowledges the impact
of heterogeneity that was masked by just focusing on the random-effects summary
estimate and its 95% confidence by themselves. A forest plot for example 2 is given
in figure 2.1 but now includes a 95% prediction interval.
The prediction interval is given by the diamond at the bottom of the forest plot
in figure 2.1. The centre of the diamond represents the random-effects summary
estimate, the width of the diamond represents the 95% confidence interval for the
summary estimate and the width of the lines going through the diamond represents
the 95% prediction interval. Skipka et al. 13 discuss different methods that have been
proposed of how a prediction interval should be presented in a forest plot. They also
suggests that the inclusion of a prediction interval in a forest plot is a good way of
distinguishing between a random-effects and fixed-effect forest plot. Throughout this
paper, I will present a 95% prediction interval in a forest plot as is seen in figure 2.1.

19

Figure 2.1: Forest plot of a meta-analysis of randomised controlled trials showing
the effects of treatment B on reducing systolic blood pressure with a 95% prediction
interval (SMD = standardised mean difference) 7

2.3

Discussion

It is important that I address a few issues that arise when working with a prediction
interval. A problem that exists in both prediction interval and in a random-effects
meta-analysis is when the analysis has few studies. If we have few studies, regardless how large they are, the prediction interval will be wide because of the lack of
precision in the DerSimonian and Laird method (using (1.9)) estimate of τ 2 . 2;5 If our
meta-analysis contains few studies and has substantial between-study heterogeneity, a
random-effects meta-analysis remains the correct option but an alternative approach
could be to use a Bayesian approach to estimate τ 2 instead of using the DerSimonian
and Laird method which is sensitive to the number of studies in the analysis. A
Bayesian approach uses prior information outside the studies to calculate an estimate
to τ 2 . This approach has the advantage of naturally allowing the full uncertainty
ˆ
around all the parameters in the model and incorporation information that may not
be considered in a frequentist model. The approach however can be difficult to im20

plement and could be prone to bias. I refer papers by Higgins et al. 5 and Ades et al. 8
which provide a more thorough description on the Bayesian approach to prediction
intervals.
Another problem that occurs because of having a small number of studies is the
validity of the assumption that when calculating a prediction interval, the population
in a new study “sufficiently similar” to those in the studies already included in the
analysis. In a random-effects meta-analysis, since we allow for real differences, each
study will be different in many ways, the more studies we have, the broader the range
of populations we cover thus validating this assumption. 5 We also assume that each
study has a low risk of bias, i.e. that each study included in the analysis has been
carried out under good conduct. If this wasn’t the case, the prediction interval will
inherit heterogeneity caused by these biases. 7
Finally, it seems meaningful to make it absolutely clear the differences between a
random-effects 95% confidence interval and a 95% prediction interval since. A 95%
confidence interval in a random-effects meta-analysis contains the region in which we
are 95% sure that our summary estimate (regarded as the average effect) lies within.
The width of the confidence interval accounts for the error in the summary estimate
and with an infinite number of infinitely large studies, the end points of the confidence
interval will tend to the summary estimate. 2 The mistake that is made is that the
95% confidence interval from a random-effects meta-analysis measures the extent of
heterogeneity but this wrong since we only consider the error in the summary estimate. 5 A 95% prediction interval contains the region in which we are 95% sure that
the potential treatment effect in a brand new individual study lies within. Another
way to describe a 95% prediction interval is that we can draw the potential treatment
effect, denoted ynew with 95% precision from the prediction interval since the prediction interval describes how the true individual treatment effects are distributed about
the summary estimate. 5 If we had an infinite number of infinitely large studies, we
expect the width of the prediction interval to reflect the actual variation between the
true treatment effects. 2 Since the 95% prediction interval accounts for all the uncertainty, the 95% prediction interval will never be smaller than its corresponding 95%
random-effects confidence interval so we can regard the 95% random-effects confidence
interval as a subset of the 95% prediction interval.

21

Chapter 3
Empirical review of the impact of
using prediction intervals on
existing meta-analyses
3.1

Introduction

A random-effects meta-analysis should remain gold standard for analysing heterogeneous studies but solely presenting the summary estimate from the random-effects
meta-analysis and its 95% confidence interval masks the potential effects of heterogeneity. 7 The addition of a prediction interval gives a more complete summary of the
results from a random-effects meta-analysis in a way that acknowledges heterogeneity
and therefore making it easy to apply to clinical practice. 5 A 95% prediction interval, with enough studies, can describe the distribution of true treatment effects and
therefore gives a range for which we can be 95% sure that the potential treatment
effect in a brand new study, ynew , is within. 2;5
The aim of this review is to assess the impact of a 95% prediction interval on the
outcomes of existing meta-analyses of randomised controlled trials. I want to see if
the inclusion of a 95% prediction interval can help interpret the results of a randomeffects meta-analysis to a higher degree of accuracy and therefore recommend whether
or not a random-effects meta-analysis should always include a 95% prediction interval
in its analysis.

22

3.2
3.2.1

Methods
Search Strategy and Selection Criteria

To find the studies for the review, I electronically searched for studies on the Lancet
website (www.lancet.com). I used the Lancet since it is one of the oldest and most
respected medical journals and has vast amounts of medical literature. I used the
advanced search toolbar on the Lancet website using the key words “RANDOMISED
TRIAL” and “META ANALYSIS” in the abstract of all research, reviews and seminars in all years in all Lancet journals. The search was carried out on 20/12/2011 and
produced 61 studies. For each study, I initially obtained a PDF file of the study plus
any supplementary material using Sciencedirect via access through the University of
Birmingham student portal.
The eligibility criteria for the studies to enter the review is that each study must
include at least one meta-analysis of three or randomised controlled trials on their
primary outcomes as defined by the authors of the studies. Of the 61 studies, I
reviewed their abstracts to remove any irrelevant studies. I excluded studies that
only contained a meta-analysis of non-randomised controlled trials (e.g. observational studies) since I am only interested in meta-analyses of randomised controlled
trials whereby patients are randomly assigned to the treatment or control group.
Randomised controlled trials cancel the effects of known and unknown confounding
factors as well as selection bias. 2 I also excluded studies that had a meta-analysis of
less than three randomised controlled trials which is seen as the minimum number of
trials required to calculate a prediction interval. 7 In the case where the meta-analysis
contained a mixture of randomised and non-randomised controlled trial, I took the
meta-analysis of the randomised controlled trials only if the author had explicitly
presented the meta-analysis of the randomised controlled trials along with the overall meta-analysis, if they only presented a meta-analysis covering all randomised and
non-randomised trials, the study is excluded. I also excluded any studies that didn’t
display data by trial. Other reasons for study exclusion were that some of the studies were only randomised controlled trials and not meta-analyses, some studies were
informative studies or research papers on meta-analysis and a couple of studies were
network meta-analyses which were removed since they are potentially more subject
to error than typical meta-analyses. I also came across studies that were duplicates
for which I only considered the most recent study.
The flow chart given below in figure 3.1 describes the process. The boxes contain
the reasons for excluding the studies and the number represents the studies that were
removed for that reasons.

23

Figure 3.1: Flow chart describing the process of excluding studies for the review

3.2.2

Data Calculations

I had a total of 26 studies that passed my eligibility criteria to enter the review. From
these studies, I extracted 36 meta-analyses containing between three to thirty-four
randomised controlled trials. For each meta-analysis, I reproduced the analysis using
a random-effects model (using formulas (1.9) to (1.13)) with a 95% prediction interval
(using formula (2.2)) as well as calculating I 2 -statistic (using formula (1.7)). For 20
of the studies, from which 26 meta-analyses were extracted, I could directly calculate
individual trial treatment effects and its variance (log variance if the effect-size of
interest is a ratio). For these, the individual treatment effects are calculated using
24

the following formulas depending on the relevant outcome of interest.
We define the following
a = Number of events in the treatment group
b = Number of events in the control group
NT = Total number of patients in the treatment group
NC = Total number of patients in the control group
c = NT − a
d = NC − b
Odds Ratio
The odds ratio for trial k is given by 2
a·d
b·c

(3.1)

1 1 1 1
+ + + .
a b c d

(3.2)

YkOR =
and has log variance
ln(V ar(YkOR )) =

A 95% confidence interval for the odds ratio in the k-th trial is given by
exp

ln(YkOR ) ± 1.96 V ar(YkOR )

.

(3.3)

Relative Risk
The relative risk for trial k is given by 2
YkRR =

a · NC
b · NT

(3.4)

ln(V ar(YkRR )) =

1 1
1
1
+ −
−
.
a b NT
NC

A 95% confidence for the relative risk in the k-th trial is given by

25

(3.5)

exp

ln(YkRR ) ± 1.96 V ar(YkRR )

.

(3.6)

Risk Difference
The risk difference for trial k is given by 2
YkRD =

a
b
−
NT
NC

(3.7)

and has variance

V ar(YkRD ) =

a
NT

1−

a
NT

NT

+

b
NC

1−

b
NC

NC

.

(3.8)

A 95% confidence for the risk difference in the k-th trial is given by
YkRD ± 1.96 V ar(YkRD )

(3.9)

Hazard Ratio
To calculate the Hazard Ratio for the k-th trial, we require the difference between the
observed deaths and expected deaths (O − E) and the variance V ar(O − E). 15
YkHR = exp

(O − E)
V ar(O − E)

(3.10)

ln(V ar(YkHR )) =

1
.
V ar(O − E)

(3.11)

A 95% confidence for the hazard ratio in the k-th trial is given by
exp

ln(YkHR ) ± 1.96 V ar(YkHR )

26

.

(3.12)

Extra Formulas
For 6 of the studies, from which 10 meta-analyses were extracted, only the individual
trial treatment effects along with their 95% confidence intervals were reported. For
these studies, I couldn’t directly calculate the individual trial standard errors and
therefore the standard errors are estimated using the following formulas.
We let x− and x+ be the lower and upper bounds respectively of the 95% confidence
interval for θk . For effect-sizes that require us to work on the log scale, i.e. odds ratios,
relative risks and hazard ratios, the standard error in the k-th trial is calculated using
the formula
s.e. YkHR,RR,OR =

1
2

ln(x+ ) − ln(x− )
1.96

,

(3.13)

For differences (continous outcomes), the standard error in the k-th trial is calculated
using the formula
s.e. YkRD =

3.2.3

1
2

x + − x−
1.96

.

(3.14)

Software

I used the statistical software STATA v10.1 to perform a random-effects meta-analysis
with a 95% prediction interval on each meta-analysis that we included in the study.
The software incorporates the formulas (1.7), (1.9) to (1.13), (2.2) and any of the
relevant formulas from (3.1) to (3.12). All forest plots produced in this paper are
created using STATA (see Appendix for STATA codes).

3.3

Results

From 26 studies, I took 36 meta-analyses containing between three to thirty-four randomised controlled trials (median eight trials, IQ range seven trials) and reproduced
each meta-analysis using a random-effects model with a 95% prediction interval. The
results of all 36 random-effects meta-analyses with a 95% prediction interval are presented in the table in figure 3.2.

27

Figure 3.2: Main characteristics of studies included in the review (Note: Outcome of
interest defined as given by authors, HR = Hazard Ratio, OR = Odds Ratio, RD =
ˆ
Risk Difference, RR = Relative Risk, θ is the random-effects summary estimate, 95%
C.I. = 95% confidence interval ,I 2 is percentage of heterogeneity down to real differences, τ 2 is estimate of between-study variance, 95% P.I. = 95% prediction interval)
ˆ
28

I classified each study to the following groups;
1. Their 95% confidence and prediction interval contained their null value
2. Their 95% confidence and prediction interval excluded their null value
3. Their 95% confidence interval excluded the null value but their 95% prediction
included the null value
For the first type, I found 17 (47.2%) of the meta-analyses had their 95% confidence
interval contain their respective null values. For these meta-analyses, the 95% prediction interval will also contain the null value since the 95% confidence interval is
a subset of the 95% prediction interval. Focusing on these studies, 6 of these had
only three trials which is the minimum required to calculate a prediction interval. In
fact, 11 of these 17 meta-analyses had less than ten trials in their analysis which may
explain why their 95% confidence intervals contain their null value, since a randomeffects meta-analysis will have low power to detect significant results when there are
few studies in the analysis. 2
In study ID 15 30 , the meta-analysis contains only three trials (there were originally
four trials but no events occurred in one of the trials so the trial was discarded from the
analysis), yet there is a significant amount of between-study heterogeneity as indicated
by the large I 2 value of 49.4% (suggesting that almost half of the variation in treatment
effects is down to real differences) and τ 2 value of 0.3369. The study itself is primarily
ˆ
a randomised controlled trial looking at assessing whether granulocyte-macrophage
colony stimulating factor (GM-CSF) administered as prophylaxis to preterm neonates
at high risk of neutropenia reduces sepsis, mortality and morbidity. The authors also
carried out a meta-analysis of their trial along with two other published randomised
controlled trials to see if there is a treatment benefit. Each trial estimated on odds
ratio with an odds ratio < 1 indicated treatment is beneficial. The authors used a
fixed-effect model stating “there was no evidence of between-trial heterogeneity” yet
the large τ 2 and I 2 values suggest otherwise so a random-effects model would be better
ˆ
suited to analyse the data. I obtained a summary estimate of 0.84 (authors obtained
0.94) and a 95% confidence interval of [0.32,2.17] (authors obtained [0.55,1.60]). In
both cases, the 95% confidence intervals included the null value so on average, there
isn’t any evidence at 5% level that the treatment is beneficial. The authors look to
use subgroup analysis to analyse the data but a prediction interval can further explain
the results in a way that acknowledges heterogeneity. A 95% prediction interval was
calculated to be (0,12655.86]. All the results are presented in a forest plot in figure
3.3.
The 95% prediction interval is extremely large in this case. The results occurs because
we are using the t-distribution, which accounts for the uncertainty in τ 2 , with few studˆ
ies which results in a large value of tk−2 as well as accounting for large between-study
heterogeneity. When using a random-effects meta-analysis, we make the assumption
29

Figure 3.3: Forest plot showing a meta-analysis of randomised controlled trials of
GM-CSF for preventing neonatal infections 30
that each study is estimating a different treatment effect, if we have studies in the
presence of substantial between-study heterogeneity, irrespective of how large they
are, we have low power to detect significant results. 2;5
Study ID 17 32 , a meta-analysis of three randomised controlled trials, also has a large
95% prediction interval given by (0,91064.69] but unlike study ID 15 30 , has no evidence
ˆ
of between-study heterogeneity suggested by I 2 and τ 2 values of 0. In this case, the
large prediction interval is attributed to the uncertainty in the estimate of τ 2 since
there are too few trials. In these cases, a Bayesian approach to calculating τ 2 may
ˆ
work better. 5;8 The studies that had more than 10 trials which had both their 95%
confidence and prediction intervals contain the null value tended to have narrower
95% confidence intervals and apart from study ID 3c 18 , only slightly include their
respective null value.
For the second type, 9 (25%) meta-analyses had both their 95% confidence and prediction interval exclude their respective null value. In these cases, the prediction interval
remains significant at the 5% level even after we have considered the whole distribution of effects. Out of these 9 meta-analyses, 7 of there had their I 2 and τ 2 values to
ˆ
2
be 0 (or very close to 0) and 1 other meta-analysis had an I value of 6.1% and τ 2
ˆ
30

value of 0.0027. In the case of these 8 meta-analyses, the 95% predictions intervals
are only slightly wider than the 95% confidence intervals. In the general case where a
prediction interval slightly increases the width of a random-effects confidence interval
and I 2 and τ 2 are 0 (suggesting no evidence of between-study heterogeneity), a comˆ
mon effect may be assumed since the impact of heterogeneity is negligible and the
extra width in the prediction interval is only attributable in the uncertainty surround
the estimate of τ 2 (which are 0 or very close to 0 in these cases).
In study ID 11a 26 , the authors carried out two meta-analyses of individual patient
data to investigate the effect of adjuvant chemotherapy in operable non-small-cell lung
cancer. The first meta-analysis was observing the effect of surgery and chemotherapy
against surgery on survival by type of chemotherapy and the second was the effect
of surgery and radiotherapy and chemotherapy versus surgery and radiotherapy on
survival by type of chemotherapy. Both meta-analyses were extracted for the review
but the first meta-analysis is the one of interest. The analysis included thirty-four
randomised controlled trials each estimating a hazard ratio where a hazard ratio < 1
indicates survival better with surgery and chemotherapy. I calculated I 2 and τ 2 values
ˆ
to be 6.1% (authors calculated 4% and 0.0027 respectively) indicating little betweenstudy heterogeneity across the trials despite the trials differing by number of patients,
drug used, number of cycles, etc. The authors used a fixed-effect model to analyse
the data and used χ2 test to investigate any differences in treatment effects across the
trials. Using a random-effects meta-analysis, I obtained a summary estimate of 0.86
(authors also obtained 0.86), a 95% confidence interval of [0.80,0.92] (authors obtained
[0.81,0.92]) and 95% prediction interval of [0.75,0.97], the results are displayed in figure
3.4.
The summary estimate suggests that on average, survival is better with surgery and
chemotherapy compared to surgery alone. The 95% confidence interval didn’t contain
the null value and is entire < 1 so there is strong evidence that on average, survival
better with surgery and chemotherapy. The authors acknowledge this and state along
with their second meta-analysis “The results showed a clear benefit of chemotherapy
with little heterogeneity”, but is this always the case. The 95% prediction interval
is also entirely < 1, so now having considered the whole distribution of effects, we
can say that chemotherapy surgery will increase survival when carried out in at least
95% of brand new individual study settings. I point out that the author’s results,
using a fixed-effect meta-analysis, were very similar to my results using a randomeffects meta-analysis. Furthermore, the 95% prediction interval is only slightly wider
than the 95% confidence interval which indicates that the impact of between-study
heterogeneity is small across all the trials and there maybe justification for using a
fixed-effect model. Despite this, a random-effects model is still useful since it accounts
for all uncertainty 5 . We’ve seen already how a prediction interval can be wide (e.g.
Study ID 15 30 , Study ID 17 32 ) if there is uncertainty in the actual estimates regardless
of whether there is evidence of between-study heterogeneity or not.
31

Figure 3.4: Forest plot showing a meta-analysis of randomised controlled trials assessing the effect of surgery (S) and chemotherapy (CT) versus surgery alone 26
The 1 other meta-analysis that is yet unaccounted for is study ID 3d 18 . The authors
are assessing the use of recombinant tissue plasminogen activator (rt-Pa) for acute
ischaemic stroke. They had updated a previous systematic review by adding a new
large randomised controlled trial to the analysis. The review contained four metaanalyses, all of which were extracted for the review but the meta-analysis of interest
(study ID 3d) is looking at the effect or rt-Pa on systematic intracranial haemorrhage
(SICH) within 7 days on patients who have suffered an acute ischaemic stroke. The
32

analysis included twelve randomised controlled trials each estimating an odds ratio
where an odds ratio < 1 indicates rt-Pa reduced development of SICH. The trials
used in this study differed by dosage, final follow-up time, stroke type etc, which
has resulted in us obtaining large I 2 and τ 2 values of 43.4% and 0.2320 respectively.
ˆ
The authors used a standard fixed-effect model and calculated heterogeneity using
χ2 -statistic if there is presence of substantial heterogeneity. Given the large values of
I 2 and τ 2 and observing the treatment effect as well as taking into account the differˆ
ences between the trials, a random-effects meta-analysis seems more appropriate. So,
using a random-effects meta-analysis, I obtained a summary estimate of 3.93 (authors
obtained 3.72), 95% confidence interval of [3.44,6.35] (authors obtained [2.98,4.64])
and a 95% prediction interval of [1.18,13.10], the results are displayed in figure 3.5.

Figure 3.5: Forest plot showing a meta-analysis of randomised controlled trials assessing the effects of SICH within 7 days (treatment up to 6 hours) 18
The summary estimate suggests that on average, the odds of developing SICH in the
treatment group is 3.93 times the odds of developing SICH in the control group. The
95% confidence interval didn’t contain the null value and is entirely > 1 so provides
33

strong evidence that on average, the treatment is more likely to increase the odds
of SICH but it doesn’t indicate whether it will be always be the case. The 95%
interval is entire > 1 suggesting that the treatment will increase the odds of SICH
when carried out in at least 95% of brand new individual settings. Like study ID
11a 26 , the 95% prediction interval remains significant but unlike study ID 11a, the
95% prediction interval in study ID 3d was much wider than its 95% random-effects
confidence interal. Here the impact of between-study heterogeneity is large, this can
also be seen by the large I 2 ad τ 2 values which result in the large width of the 95%
ˆ
prediction interval. Like study ID 11a, the 95% prediction interval remains significant
but unlike study ID 11a, the 95% prediction interval in study ID 3d is much wider
than its 95% random-effects confidence interval. Here the impact of between-study
heterogeneity is large (in study ID 11a, the impact is low), this can also be seen by
the large I 2 and τ 2 values. The impact is such that in some cases, the odds of SICH,
ˆ
when rt-Pa is given, could be as low as 1.18 times the odds in the control but could
be as high as 13.1 times the odds in the control group. The authors, by using a
fixed-effect method, fail to acknowledge the potential effects of heterogeneity. They
report that “42 more patients were alive and independent, 55 more were alive with a
favourable outcome at the end of follow up despite an increase in the number of early
symptomatic intracranial haemorrhages and early deaths. Since the odds of SICH in
the treatment group could be as high as 13.1, further research could be carried out to
identify scenarios when this may occur since this could reduce the number of patients
that will have favourable results come the end of follow up.
For the third type, 10 (27.8%) of the meta-analyses had their 95% confidence intervals
exclude the null value but had their 95% prediction interval include the null. In these
cases, the 95% prediction intervals are not significant at the 5% level after we have
considered the whole distribution of effects. Most of the studies, apart from two,
tended to have a significant amount of between-study heterogeneity based on the I 2
value ranging from 22.3% to 62.7% and τ 2 values ranging from 0.022 to 0.098. Two
ˆ
2
2
studies had I value and τ values of 0. These were study ID 9 24 , which had 3 trials
ˆ
and justifiably use a fixed-effect method, and study ID 16b 31 , which had 9 trials, used
a random-effects meta-analysis but do exercise caution since there are few trials which
can result in the summary estimates carrying large uncertainty.
In study ID 20 35 , the authors are looking at the efficacy of probiotics in prevention
of acute diarrhoea . They carried out a meta-analysis of thirty-four randomised controlled trials each estimating a relative risk with a relative risk < 1 indicating the
probiotic has a beneficial effect. The authors used a random-effects meta-analysis
acknowledging the potential effects of heterogeneity since the studies differed in many
such as study setting, age grow, follow-up duration, probiotic administered, dosage
etc which resulted in a large I 2 value of 62.7% and τ 2 value of 0.0980. I obtained
ˆ
identical results to the authors, a summary estimate of 0.65 and a 95% confidence
interval of [0.55,0.78]. Additionally, I obtained a 95% prediction interval of [0.34,1.27],
34

the results are displayed in figure 3.6.

Figure 3.6: Forest plot of a meta-analysis of randomised controlled trials assessing
the effects of probiotics on diarrhoeal morbidity 35
The summary estimate of 0.65 indicates on average, the risk of diarrhoea morbidity is
0.65 times the risk of diarrhoea morbidity in the placebo group. The 95% confidence
interval is entirely < 1 providing strong evidence that on average, the probiotics are
beneficial but is this always the case. The authors acknowledge heterogeneity first
by using a random-effects model and then by carrying out a subgroup and stratified
35

analysis by assessing the effect of age, setting of trial, type of diarrhoea, probiotic
strains used, formulation of probiotics administered, influence of setting and quality
score of trials. A more formal way of acknowledging heterogeneity is to consider a 95%
prediction interval which I calculated to be [0.34,1.27]. This interval now contains the
null value and contains values > 1, so although on average the use of probiotics are
beneficial, it may not always be the case in a brand new individual setting, in fact
in some cases it may be harmful and further research is required to identify these
scenarios.
In study 23 38 , the authors are looking at the efficacy and safety of electroconvulsive
therapy in depressive disorders. They carried out a meta-analysis of twenty-two randomised controlled trials each estimating a standardised risk difference where a risk
difference > 0 favoured unilateral ECT and a risk difference < 0 favoured bilateral
ECT. The authors reported both fixed-effect and random-effects results and acknowledge heterogeneity since the trials differ by dosage, methods of administration etc
and this can be seen by the I 2 value of 24.00% and τ 2 value of 0.0286. I obtained
ˆ
slightly different results to the authors when using a random-effects meta-analysis, a
summary estimate of -0.34 (authors obtained -0.32) and a 95% confidence interval of
[-0.49,-0.20] (authors obtained [-0.46,-0.19]). I also obtained a 95% prediction interval
of [-0.73,0.04], the results are displayed in figure 3.7.
The summary estimate suggests that on average, out of a 100 patients, 34 more
patients had favourable results in the bilateral group compared to the unilateral group.
The 95% confidence interval is entirely < 0 providing strong evidence that on average,
the bilateral group is better but is this always the case. The authors acknowledge
heterogeneity by firstly reporting random-effects results and then by carrying out a
meta-regression analysis but considering a prediction interval would be a more formal
way of acknowledging heterogeneity. The 95% prediction interval is [-0.73,0.04] which
now contains the null value 0 and slightly exceeds 0. This suggests that although
on average the bilateral group is better, in a brand new individual study setting, the
bilateral group may not be better and further research is required to identify such
scenarios.

3.4

Discussion

From 26 studies that entered my review, 36 meta-analyses were extracted and each
reproduced using a random-effects model with a 95% prediction interval. My aim was
to see whether or not these intervals had a significant impact on the conclusions of
these studies. Most of the studies that I found reported a summary estimates (fixed
or random-effects) along with a 95% confidence interval and carried out some type
of analysis to assess heterogeneity. An observation worth noting is that none of the
studies post 2005 mentioned the idea of predictions in the context of meta-analysis.
36

Figure 3.7: Forest plot of a meta-analysis of randomised controlled trials assessing
the effect of bilateral versus unilateral electrode placement on depressive symptoms 38
Papers by Ades et al. 8 and Higgins et al. 5 set the foundations for the use of prediction
intervals in traditional and Bayesian meta-analysis and how presenting it can describe
the extent of heterogeneity, how the true individual treatment effects are distributed
about the random-effects summary estimate as well as giving a range for which the
true treatment effect in an individual brand new study setting lies within. 2;5

3.4.1

Principal Findings

I found that 17 (47.2%) of the 36 meta-analyses had their 95% confidence interval
contain the null value. In these cases, the average effect across the trials is not significant at the 5% level and the 95% prediction interval will also include the null value.
Presenting a 95% prediction interval in these cases is still useful since it helps describe
37

the distribution of effects across the studies given there is between-study heterogeneity. The other 19 (52.8%) meta-analyses had their 95% confidence interval exclude the
null values. In these cases, the average effect is significant at the 5% level, the aim is
to see how many of their 95% predictions intervals now include the null value. I found
that 9 of the meta-analyses had their 95% prediction interval exclude the null value
whilst the other 10 included the null value. In terms of clinical practice, the prediction
interval excluding the null indicates that in 95% of the times the treatment is applied
in brand new study settings, the treatment will be beneficial/worse which is much
more useful to clinicians than just reporting the average effect and the uncertainty
around it. If the prediction interval included the null, then although the average effect is beneficial/worse, in some brand new individual study settings, the effect may
be worse/beneficial. Again, this is much useful to clinicians and researchers since it
reveals the impact of heterogeneity and can motivate further research to identify such
cases.
Another way of discussing our results is to consider the size of heterogeneity across
the meta-analyses. I reiterate that describing heterogeneity is a key motivation for a
prediction interval. If heterogeneity wasn’t a problem, then we could use a fixed-effect
model in all cases but even the slightest differences between studies must be considered. 2 I found 12 meta-analyses had no evidence of between-study heterogeneity (I 2
and τ 2 values of 0), only in two of these cases 20;26 did they have more than ten trials.
ˆ
In many of these cases, the authors would tend to use a fixed-effect model but since
there are few studies, we have low power to detect heterogeneity and therefore there
may be uncertainty around I 2 and τ 2 values. 2 A common-effect should be assumed
ˆ
if there is no evidence of between-study heterogeneity and the 95% confidence and
prediction intervals are close suggesting that the impact of heterogeneity is negligible
and the uncertainty around the parameters are low (e.g. Study ID 11b 26 ). In some
cases, there may seem to be no evidence of heterogeneity but if there are few studies,
the uncertainty around τ 2 can be large resulting in wide prediction intervals (e.g.
ˆ
32
Study ID 17 ).
The other 24 meta-analyses had evidence of between-study heterogeneity (I 2 ranging
from 0.30% to 62.90% and τ 2 ranging from 0.0001 to 0.3369). Whilst the randomˆ
effects model wasn’t always used in these cases, in most of these cases, the authors
did carry out some analysis of heterogeneity (e.g. subgroup analysis, meta regression
etc). The problem that occurs is that if there are few trials in the analysis, the power
to detect sources of heterogeneity is low and therefore the analysis lacks precision. 2;11 .
A prediction interval when calculated with few studies will be large (e.g. study ID
15 30 and may not be useful from a clinicians point of view since the range of effects
is so wide. On the other hand, in study ID 3d 18 , the 95% prediction interval is large
yet was entirely above the null value, so even though there is uncertainty on what
the effect could be in an individual study setting, we know that 95% of the times the
treatment will have a negative effect (in that case), we just don’t know how bad of
38

an effect it could be. From a researchers point of view, large prediction intervals can
still have meaning since it reveals the uncertainty surrounding the parameters and
therefore may just indicate that more trials, further research or other information
(incorporate a Bayesian approach 5;8 ) should be required whereas a 95% confidence
interval only tells us the average effect is significant/insignificant but this result may
be imprecise due to the lack of trials.

3.4.2

Limitations

It is important that potential limitations of this review are acknowledged. I decided
to only use the Lancet database to search for studies since it is regarded as one of the
world’s most respected medical journal. I expected each study to be of high standard
in terms of methodology and conduct. Unfortunately, I cannot be sure that this is
the case, flaws in procedure at trial level and meta-analysis level can result in error
prone results and may not reflect the true performance of the intervention. 42 In these
cases, the prediction interval will be wider since it mixes heterogeneity caused by
real differences with heterogeneity as a result of methodological errors. 7 I also only
included meta-analyses of randomised controlled trials since such trials cancel the effects of known and unknown confounding factors. I did come across meta-analyses
of non-randomised trials (mainly observational studies) but excluded them since they
are more influenced by confounders. Whilst randomised controlled trials are held in
higher regard relative to observational studies, the jury remains out on whether we
would take randomised trials of low or even average quality over high quality observational studies. Stroup et al. 44 “inclusion of sufficient detail to allow a reader to
replicate meta-analytic methods was the only characteristic related to acceptance for
publication” suggesting that high quality observational studies could be considered.
I could’ve extended our search beyond the Lancet to other databases but I felt the
Lancet already covered a wide variety of studies. There are also technical limitations to the review that must be addressed. Whilst there was a criteria that every
meta-analysis must have at least three randomised controlled trials, with few studies, assumptions made when calculating a prediction interval may become violated.
We assume a normal distribution but with few studies, this may be an inappropriate
choice. 5 When considering the true treatment effect of a brand new study, I assume
the population in this new study is “sufficiently similar” to those already covered
in the analysis. If we have few studies, we fail to cover a sufficient range of populations resulting in a wider prediction interval accounting for large uncertainty. 2;5 I
also wasn’t specific on what types of outcomes we allowed into the review. There
is evidence that suggests that certain biases are more likely to arise when subjective
outcomes (e.g. favourable outcome (Study ID 3d 18 , poor outcome (Study ID 2 17 or
any outcome that requires human input). 45 It may have been more prudent to only
consider outcomes such as survival, mortality or continuous outcomes that have no
39

chance of being influenced by an external source.

3.4.3

Comparison with other studies

A related study complied by Graham et al. 14 explored prediction intervals on metaanalysis. They performed a meta-epidemiological study of binary from meta-analyses
published between 2002 to 2010. Their study included 72 meta-analyses from 70
studies each containing between 3-80 studies and for each, they calculated a randomeffects meta-analysis incorporating DerSimonian and Laird 12 method and calculated
traditional and Bayesian 95% prediction intervals for odds ratios and risk ratios. They
found that 50 out of 72 meta-analyses had their 95% random-effects confidence interval
for odds ratios exclude their null value, of these, 18 had their 95% prediction intervals
exclude the null. They also found that 46 out of the 72 meta-analyses had their 95%
random-effects confidence interval for risk ratios exclude the null value, of these, 19
had their 95% prediction intervals exclude the null. They concluded “meta-analytic
conclusions may be appropriately signaled by consideration of initial interval estimates
with prediction intervals” but also stress that increasing heterogeneity can result in
wide predictions intervals and caution must be taken when writing conclusions on a
meta-analysis. 14
Comparing my results to theirs, I found less meta-analyses had their 95% prediction
interval include the null when their 95% confidence interval had excluded theirs. Their
study was larger than mine and they also were able to directly calculate odds ratios
and relative risks for each meta-analysis. I worked out the effect size according to
the authors of the studies and in some cases, couldn’t directly work out the summary
estimate since the relevant data wasn’t available, only the individual treatment effects
along with their 95% confidence intervals were reported.

3.4.4

Final Remarks and Implications

Perhaps only looking at focusing on cases where prediction intervals include the null
when their corresponding 95% confidence intervals didn’t may somewhat deviate away
from why a prediction interval is useful. Since we were able to apply a 95% prediction
interval to all cases, whether the analysis had high between-study heterogeneity, no
between-study heterogeneity, whether the analysis had few or large trials, I was able
to describe the results of random-effects meta-analysis more accurately since we are
considering the whole distribution of effects, even if what I am deducing is that the
authors require more trials or further research/information in cases where there are
few studies. In the case where there is no evidence of between-study heterogeneity
(indicated by I 2 , τ equal to 0), if we used a random-effects model with a predicˆ
tion interval, if the prediction interval is significant wider than the random-effects
40

confidence interval, then this suggests there is uncertainty amongst the parameters
(e.g. lack of power if there are few studies). If the prediction interval is fairly close
to the confidence interval, then this suggests a common effect may exists since we
have considered the whole distribution of effects and the impact of heterogeneity is
negligible. If there is evidence of betweens-study heterogeneity, then a prediction
interval can reveal the impact of between-study heterogeneity which is useful to clinicians/researchers regardless if the average effect is significant. I therefore believe a
95% prediction interval should be presented in every random-effects meta-analysis to
enhance the interpretation of its results, but I stress the need for the analysis to have
a sufficient number of good quality unbiased randomised controlled trials.

41

Chapter 4
Prediction intervals in
Meta-Epidemiological studies
It seems widely agreed that systematic reviews which contain a meta-analysis of randomised controlled trials provide the strongest and most reliable evidence of the effects
of health care interventions since they use systematic and explicit methods to summarise all the evidence to answer a research question of interest. 1;42;46 Unfortunately,
they are not impervious to bias, if the meta-analysis is biased or includes biased trials;
the results from a meta-analysis will incorporate these biases resulting in either an
over/underestimation of the summary treatment effect which can lead to misleading
conclusions of how well the intervention works. 42;46
In the process of systematic reviews, when the relevant trials are searched for, we
must make sure that al ofl the evidence (published and unpublished) is searched for
so we can get the most accurate results. There is evidence that supports the fact that
published studies are more likely to reflect a statistical significant results and more
likely to report larger treatment effects and moreover, published studies are more
likely to be used in a systematic review and therefore a meta-analysis, which can
lead to a biased summary treatment effect in a meta-analysis (publication bias). 2;47
Furthermore, randomised controlled trials themselves are in danger of bias if there are
imperfections in their methodological properties, i.e. there wasn’t proper allocation
concealment, lack of blinding etc. 46 If we were to calculate a prediction interval in the
presence of bias, heterogeneity accounting for real differences mixes with heterogeneity
caused by these bias resulting in a much wider prediction interval. 7 Other biases that
can arise are citation bias, language bias, cost bias etc. 2 The fundamental idea here
is that bias must be assessed to make the conclusions of a meta-analysis more robust,
failure to acknowledge it can result in misleading results.

42

4.1

Meta-Epidemiological Study

A way in which we can inspect bias is to carry out a meta-epidemiological study
which assesses the influence of trial characteristics on the treatment effect estimates
in a meta-analysis. 43;42;46 A meta-epidemiological study will assess a specific trial
characteristic by carrying out a meta-analysis on summary effects from a collection
of meta-analysis (essentially a ’meta-analysis of meta-analyses’). 43;42;46 Like a normal
meta-analysis, meta-epidemiological study should describe the distribution of all evidence, describe any heterogeneity between the meta-analyses, inspect associated risk
factors and identify and control bias.
The first time meta-epidemiology surfaced was in an editorial in the BMJ by David
Naylor 48 , in 1997, where cautions are raised concerning the summary effect of a metaanalysis. The author mentions how meta-analyses can generate “inflated and unduly
precise” estimates if biases exist. He also refers to evidence stating statistically significant outcomes were more likely to be published than non-significant studies and adds
“readers need to examine any meta-analyses critically to see whether researchers have
overlooked important sources of clinical heterogeneity among the included trials”. In
2002, meta-epidemiology is defined, by Sterne et al. 46 , as a statistical method to “identify and quantify the influence of study level characteristics”. In 2007, the method has
been generalised in a systematic review conducted by OARSI (Osteoarthritis Research
Society International). 49 This has resulted in many published meta-epidemiological
studies which can be founded on the internet such as the BMJ website. These types of
studies have provided strong evidence that flaws in trial characteristics lead on average
to exaggeration of intervention effect estimates and in turn increase heterogeneity. 42

4.2

Prediction Intervals in Meta-Epidemiological
Studies

The aim of this chapter is to apply a 95% prediction interval to meta-epidemiological
studies. Meta-epidemiological studies will use either a fixed-effect or a random-effects
model and report a summary estimate with a 95% confidence interval. They still,
however, need to describe the extent of heterogeneity that exits across all the evidence
so the inclusion of a prediction interval can help formally describe it.
We searched for meta-epidemiological studies on the website of the British Medical
Journal (www.bmj.com). We used the advanced search toolbar and used the keyword
“META EPIDEMIOLOGICAL” in text, abstract and title in all articles in all years.
Any meta-epidemiological study looking at a trial characteristic was eligible as long as
we are able to carry out their meta-analysis ourselves. We took 4 studies at random
and carried out their meta-epidemiological meta-analysis using a random-effects meta43

analysis with a 95% prediction interval using the formulas (1.9 to 1.13) and (2.2). In
all 4 of the examples we use, we estimated the standard errors using the formulas
(3.13) or (3.14) depending on outcome of interest, since we couldn’t work them out
directly.

4.2.1

Example 1

A trial characteristic that can influence the estimates of individual trial treatment
effect is the status of the study centre, i.e. is it carried out in a single centre or
in multicentres. Bafeta et al. 50 carry out a meta-epidemiological study in the aim
to compare estimates of intervention effects between single centre and multicentre
randomised controlled trials on continuous outcomes. They address a previous study
that concluded the effect of interventions using binary outcomes are larger in single
centre randomised controlled trials compare to multicentre ones 51 and address a paper
by Bellomo et al. 52 who state single centre trials often contradict multi centre trials.
The authors included 26 meta-analyses with a total of 292 randomised controlled
trials (177 in single centres and 115 in multicentres) with continuous outcomes that
were published between January 2007 to January 2010 in the Cochrane database for
systematic reviews (which they state as having “high methodological quality”). They
ignored meta-analyses of non-randomised trials, IPD meta-analyses and meta-analyses
where all trials were only single centre or only multicentres and any meta-analysis
that had less than 5 randomised controlled trials. They used the risk of bias tool as
recommended by the Cochrane Collaboration 3 to assess risk of bias from individual
reports for each trial. For each meta-analysis, they used a random-effects metaanalysis incorporated DerSimonian and Laird estimate for τ 2 to combine treatment
effects across the trials and assessed heterogeneity using χ2 and I 2 . The authors then
estimate a standardised mean difference between single centre and multicentre trials
using a random-effects meta-regression to incorporate potential heterogeneity between
trials. They then synthesised these using a random-effects model and used I 2 , Qtest to assess between-meta-analysis heterogeneity. A standardised mean difference
< 0 indicates that single centre trials, on average, showed larger treatment effects
than multicentre trials. They calculated a summary estimate of -0.09 with a 95%
confidence interval of [-0.17,-0.01] with low between-meta-analysis heterogeneity (I 2
and τ 2 values of 0). We obtained the same random-effects summary estimate of -0.09
ˆ
and same 95% confidence interval of [-0.17,-0.01], additionally we calculated a 95%
prediction interval of [-0.18,0.00]. The results are shown in the forest plot below in
figure 4.1.
The summary estimate (-0.09) indicates that on average, single centre trials produced
a larger estimate of the intervention effect than multicentre trials. Since the 95%
confidence interval ([-0.17,-0.01]) is entirely < 0, there is strong evidence that on
average, single centre trials show a larger effect than multicentre trials looking at the
44

Figure 4.1: Forest plot of a meta-epidemiological analysis assessing the difference
in intervention effect estimates between single centre and multicentre randomised
controlled trials 50
same intervention but is this always the case. The authors report “on average single
centre trials with continuous outcomes showed slightly larger intervention effects than
multicentre” and acknowledge between-meta-analysis heterogeneity and risk of bias
by using subgroup and sensitive analysis but a 95% prediction interval can describe all
the uncertainty more formally. The calculated 95% prediction interval ([-0.18,0.00])
now includes the null value 0 but doesn’t exceed it and is only slightly wider than
the 95% random-effects confidence interval revealing the impact of heterogeneity is
low. We can say, that after considering the whole distribution of effects, in at least
95% of the times, the effect in a multicentre will never be strictly larger than the
corresponding effect in a single centre but we cannot rule out that the effect might be
the same. We mirror the authors views that further research is needed to investigate
45

potential causes of these differences.

4.2.2

Example 2

Nuesch et al. 53 carried out a meta-epidemiological study to examine whether or not
excluding patients from the analysis of randomised controlled trials are associated
with biased estimates of treatment effects and whether or not it causes heterogeneity between trials. They address evidence that departure from protocol and losses
to follow-up in randomised controlled trials can lead to exclusion of patients from
the final analysis, and such handling of these patients lead to treatment effects that
differ methodically from the true treatment effects. 54;55 Such bias is termed attrition bias 56 or selection bias and this study aims to see how it affects the summary
effects in a meta-analysis and does it increase between-study heterogeneity. The authors include 14 meta-analyses, with a total of 167 trials (39 with all patients in the
analysis, 128 where some patients excluded). Eligible meta-analyses were those of
random/quasi-randomised trials in patients with osteoarthritis of the knee or hip and
reported non-binary patient reported outcome (e.g. pain intensity) which assessed
any intervention with placebo or a non-intervention control. If a meta-analysis only
included trials that had patient exclusions or had trials where there were no exclusions, it is ignored. Within each meta-analysis, the authors used a random-effects
meta-analysis to calculated a summary effect for trials with and trials without exclusions before deriving differences between them. A difference of < 0 suggests trials
with exclusions have a more beneficial treatment effect. These differences were then
synthesised using a random-effects meta-analysis for which the authors state “fully
accounted for variability in bias between meta-analysis” and they estimate τ 2 as a
measure of between-study heterogeneity. They obtained a summary estimate of 0.13 with a 95% confidence interval of [-0.29,0.04] with what they consider as high
between-meta-analysis heterogeneity indicated by τ 2 value of 0.07. We obtained the
ˆ
same random-effects summary estimate of -0.13 but a different confidence interval of
[-0.31,0.05] noticing an error in the 3rd meta-analysis in the forest plot presented in
the paper. We also obtained an I 2 value of 78.2% and a slightly larger τ 2 value of
ˆ
0.0811 as well as a 95% prediction interval of [-0.78,0.52].The results are shown in the
forest plot below in figure 4.2.
The summary estimate (-0.13) indicates that on average, trials with exclusions produce a larger estimate of the treatment effect compare to those without exclusions.
The 95% confidence interval ([-0.31,0.05]) contains the null value so the average isn’t
significant (nor is the authors 95% confidence interval). However, both ours and the
authors 95% confidence interval suggests there is evidence (albeit non-significant at
5% level) that on average, patient exclusion leads to more beneficial treatment effects.
This may have lead the authors to report that “excluding patients from the analysis
of randomised trials often resulted in biased estimates of treatment effects, but the
46

Figure 4.2: Forest plot of a meta-epidemiological analysis assessing the difference in
effect sizes between trials with and without exclusions of patients from analysis 50
extent and direction of bias remained unpredictable in a specific situation” and recommend “results from intention to treat analysis should always be described in reports of
randomised trials”. They acknowledge the large between-meta-analysis heterogeneity
by carrying out stratified analysis but a 95% prediction interval can reveal the full
uncertainty around the summary estimate. The calculated 95% prediction interval
([-0.78,0.52]) is fairly wide since it is accounting for the large between-meta-analysis
heterogeneity (indicated by I 2 and τ 2 values of 78.2% and 0.0811 respectively). I
ˆ
can say that after considering the whole distribution of effects, although on average
it seems as though studies with exclusions lead to more beneficial treatment effect,
analysis where the trials have no patient exclusions could quite easily have a more
beneficial treatment effect compared to those where there are exclusions. Here, the
impact of heterogeneity is much more evidential than the 95% confidence interval and
further reveals in a brand new situation, the chance of a trial with exclusion being
better than a trial without exclusions is unpredictable. Possible reasons for such unpredictability could be down to the fact the analysis had a combined 39 trials without
47

Prediction Intervals Provide Valuable Insight in Meta-Analyses

Prediction Intervals Provide Valuable Insight in Meta-Analyses

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Prediction Intervals Provide Valuable Insight in Meta-Analyses

Similar a Prediction Intervals Provide Valuable Insight in Meta-Analyses (20)

Último

Último (20)

Prediction Intervals Provide Valuable Insight in Meta-Analyses