Clinical prediction models

Clinical prediction models
Regression analyses, calibration, internal and external validation
Maarten van Smeden, PhD
Leiden University Medical Center
The Netherlands
Department of Clinical Epidemiology
15 Feb 2019
Clinical Trials and Research Methodology in Cardiology Meeting
Turkish Society of Cardiology
Istanbul
Sides available: https://www.slideshare.net/MaartenvanSmeden
Twitter: @MaartenvSmeden Clinical prediction models 15 Feb 2019

Prediction algorithms
SPAM
• $20 billion annual expenses
• Spam filter algorithms since 1990s
• Prediction (!)
• Noise filter function
• Dominated by classification
Rao, Journal of Economic Perspectives, 2012. doi: 10.1257/jep.26.3.87

SPAM
• $20 billion annual expenses
• Prediction (!)
• Noise ﬁlter function
Healthcare
• $7, 200 billion expenses (WHO, 2015)
• Algorithms since at least 1950s

Apgar score
Apgar, JAMA, 1958. doi: 10.1001/jama.1958.03000150027007

SPAM
• $20 billion annual expenses globally
• Prediction (!)
• ”Noise” ﬁltering function
Healthcare
• $7, 200 billion expenses (WHO, 2015)
• Algorithms popularized since 1950s
• Prediction (!)
• Informing medical decision making
• Dominated by explicit risk prediction
It is important to distinguish [risk] prediction and classification.
In many decision making contexts, classification represents a
premature decision, because classification combines prediction
and decision making..."
Frank Harrell
source: http://www.fharrell.com/post/classiﬁcation/ (last accessed: Feb 12, 2019); bold-facing and [risk] added for clarity

Risk estimation example: SCORE
10 year fatal cardiovascular disease risk
Conroy, European Heart Journal, 2003. doi: 10.1016/S0195-668X(03)00114-3

Development of a risk prediction model
Probability of outcome = f (predictor variables)
Pr(Y = 1) = f (X)

Pr(10 year coronary heart disease risk) = f (age, cholesterol, SBP, diabetes, smoking)
Pr(Y = 1) = f (X)

Pr(Y = 1) = f(β1age + β2cholesterol+ β3SBP + β4diabetes + β5smoking)

Pr(Y = 1) = f (β1age + β2cholesterol+ β3SBP + β4diabetes + β5smoking)
These are the building blocks (simpliﬁed) of the Framingham risk score.

Framingham risk score
10 year CVD risk
To online calculator
D’Agostino, Circulation, 2008. doi: 10.1161/CIRCULATIONAHA.107.699579

Model speciﬁcation
f (X) → linear predictor (lp)
Simplest case: lp = β0 + β1x1 + . . . + βP xP (only ”main eﬀects”)
Note: in practice this simplest case is often too simple
linear regression
Y = lp +
logistic regression
ln{Pr(Y = 1)/(1-Pr(Y = 1))} = lp
Pr(Y = 1) = 1/(1+exp{-lp})
Cox regression
h(t)=h0(t)exp(lp)

Logistic function

Discrimination
• Sensitivity/specificity trade-off
• Arbitrary choice threshold → many
possible sensitivity/specificity pairs
• All pairs in 1 graph: ROC curve
• Area under the ROC-curve:
probability that a random individual
with event has a higher predicted
probability than a random individual
without event
• Area under the ROC-curve: the c-
statistic (for logistic regression) takes
on values between 0.5 (no better
than a coin-flip) and 1.0 (perfect
discrimination)
Read more: Sedgwick, BMJ, 2015, doi: 10.1136/bmj.h2464

Calibration plot

The curse of statistical modeling: overﬁtting
What you see is not what you get1
Idiosyncrasies in the data are fitted rather than
generalizable patterns. A model may hence not be
applicable to new patients, even when the setting of
application is very similar to the development setting2
Note: prediction models are developed for new patients
1Babyak, Psychosomatic Medicine, 2004, PMID: 15184705; 2 Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.

Overﬁtting artist impression
https://twitter.com/LesGuessing/status/997146590442799105

Overﬁtting causes and consequences
Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.

Overﬁtting: typical calibration plot
• Low probabilities are predicted too low
• high probabilities are predicted too high

How to avoid overﬁtting?
Be conservative selecting/removing variable predictor variables
• Avoid univariable, stepwise and forward selection
• When using backward elimination use conservative p-values (e.g. p = 0.10 or 0.20)
Figure: Steyerberg, JCE, 2018, doi: 10.1016/j.jclinepi.2017.11.013; Read more: Heinze, Biometrical J, 2018, doi: 10.1002/bimj.201700067

Apply penalized regression
• Ridge regression (penalizes high regression coeﬃcients)
• Lasso regression (penalizes high regression coeﬃcients + automatic variable selection)
See: https://www.slideshare.net/MaartenvanSmeden/improving-predictions-lasso-ridge-and-steins-paradox-91544782

• Adequate sample size: suﬃcient number of ”events” relative to number of variables
(considered) in the prediction model
• Traditional rule of thumb (10 events per variable) has been shown to have no theoretical
basis and perform poorly in simulation studies1; in many cases too lenient for development
of prediction models
• Alternative and more formal sample size calculations have recently been proposed
van Smeden, BMC med res meth, 2016, doi: 10.1186/s12874-016-0267-3

Optimism
Optimsm
Predictive performance evaluations are too optimistic when estimated
on the same data where the risk prediction model was developed. This
is therefore called apparent performance of the model
• Optimism can be large, especially in small datasets and with a large number of predictors
• To get a better estimate of the predictive performance:
- Internal validation (same data sample)
- External validation (other data sample)

Internal validation
• Evaluate performance of risk prediction model on data from the same population from
which model was developed
• Say that we start with one dataset with all data available: the original data
• Option 1: Splitting original data
- One portion to develop (’training set’); one portion to evaluate (’test set’)
- Non-random vs random split
- Generates 1 test of performance
• Option 2: Resampling from original data
- Cross-validation
- Bootstrapping
- Generates a distribution of performances
• General advice: avoid splitting (option 1) because
- Ineﬃcient → especially when original data is small
- Usually leads to a too small test set
See: Steyerberg, JCE, 2001, doi: 10.1016/S0895-4356(01)00341-9

External validation
• Study of the predictive performance of the risk prediction model in data of new subjects
that were not used to develop it
• The larger the difference between development and validation data, the more likely the
model will be useful in (as yet) untested populations
- Case-mix (distributions of predictors and outcome)
• External validation is the strongest test of a prediction model
- Different time period (’temporal’)
- Different areas/centres (’geographical’)
- Ideally by independent investigators
See: Collins, BMJ, 2012, doi: 10.1136/bmj.e3186

External validation is not
It is not repeating model development steps
• Whether the same predictors, regression coeﬃcients and predictive performance would be
found in new data is not in question
It is not re-estimating a previously developed model
• Updating regression coeﬃcients is sometimes done when the performance at external
validation is unsatisfactory. This can be viewed as model (model revision) and calls for new
external validation

What to expect at external validation
• Decreased predictive performance compared to development is expected
• Many possible causes:
- Overfitting of the model at development
- Different type of patients (case mix)
- Different outcome occurrence
- Differences in care over time
- Differences in treatments
- Improvement in measurements over time (e.g.previous CTs less accurate than spiral
CT for PE detection)
- . . .
• When predictive performance is judged too low → consider model updating

Model updating
• Recalibration in the large: re-estimate the intercept
• Recalibration: re-estimate the intercept + additional factor that multiplies all coeﬃcients
with same factor (calibration slope)
Table from Vergouwe, Stat Med, 2017, doi: 10.1002/sim.7179

Discrimination vs calibration
• Discrimination: extent to which risks diﬀerentiate between cases on non-cases
• Calibration: extent to which estimated risks are valid
• Discrimination is usually the no. 1 performance measure
- Risk models are typically compared on discriminative performance; not calibration
- A risk prediction model with no discriminative performance is uninformative
- A risk prediction model that is poorly calibrated is misleading
Read more: Van Calster, JCE, 2016, doi: 10.1016/j.jclinepi.2015.12.005

Books

TRIPOD statement
TRIPOD, Ann Int Med, 2016, doi: 10.7326/M14-0697 and 10.7326/M14-0698

Final remarks

Toward machine learning and artiﬁcial intelligence?
Source: Topol, Nature Medicine, 2019. doi: 10.1038/s41591-018-0300-7

Value of ML and AI for clinical prediction models?
A systematic review of 282 direct comparisons between machine learning and logistic regression:
”We found no evidence of superior performance of ML over LR for clinical prediction modeling...”
Christodoulou, J Clin Epi, 2019, doi: 10.1016/j.jclinepi.2019.02.004

Do we even need new clinical prediction models at all?
• > 110 models for prostate cancer (Shariat 2008)
• > 100 models for traumatic brain injury (Perel 2006)
• 83 models for stroke (Counsell 2001)
• 54 models for breast cancer (Altman 2009)
• 43 models for type 2 diabetes (Collins 2011; Dieren 2012)
• 31 models for osteoporotic fracture (Steurer 2011)
• 29 models in reproductive medicine (Leushuis 2009)
• 26 models for hospital readmission (Kansagara 2011)
• > 25 models for length of stay in cardiac surgery (Ettema 2010)
• > 350 models for cardiovascular disease outcomes (Damen 2016)
• What if your model becomes number 300-something?
• What about the clinical beneﬁt/utility of number 300-something?
Courtesy of KGM Moons and GS Collins for this overview

Flow diagram

Flow diagram in Turkish
Courtesy of Prof Ibrahim Halil Tanboga

This presentation is available at https://www.slideshare.net/MaartenvanSmeden

Clinical prediction models

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Clinical prediction models

Similar a Clinical prediction models (20)

Más de Maarten van Smeden

Más de Maarten van Smeden (16)

Último

Último (20)

Clinical prediction models