Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
MRCPsych Teaching 2010 Critical Appraisal of Diagnostic Tests
1. MRCPsych Teaching 2010 www.slideshare.net/ajmitchell
MRCPsych 2010
A. Critical Appraisal of Diagnostic Tests
Studies of Accuracy, Validity, Screening & Case finding
Alex J Mitchell
Consultant in Liaison Psychiatry
University of Leicester
2. 1. Importance of understanding diagnostic tests
1. Importance of diagnostic tests
2. Concept of diagnostic tests: traits to diseases
3. Statistics of diagnostic tests
4. Clinical Value of diagnostic tests
5. Worked examples
6. Advances techniques
3. What Is a Diagnostic Test in Psychiatry?
MRCPsych 2010
• CT/MRI
• CSF
• Blood tests eg TFTs
• SCAN/SCID/PSE/MINI
• Neuropsychological Testing
• MMSE
• HADS/BDI/CESD?
• Clinical Judgement
• Self-report
4. Why Is a HADS score not a diagnosis?
MRCPsych 2010
1. No core features
2. No symptom ranking
3. No functional assessment
4. Duration unclear
5. What if Missing items?
6. Imprecise
5. Defining Diagnostic Testing
MRCPsych 2010
INTENTION
• Screening
– The systematic application of a test or inquiry, to identify
individuals at sufficient risk of a specific disorder to warrant
further actions among those who have not sought medical help
for that disorder
• Case-Finding
– The selected application of a test or inquiry, to identify
individuals with a suspected disorder and exclude those without
a disorder, usually in those who have sought medical help for
that disorder
Adapted from Department of Health. Annual report of the national screening committee.
London: DoH, 1997.
6. Defining Diagnostic Testing
MRCPsych 2010
PRACTICAL
• Screening
– Rule out those without the disorder with high accuracy
(high NPV)
• Case-Finding
– Rule in those with the disorder with high accuracy
(high PPV)
7. Defining Diagnostic Testing
MRCPsych 2010
APPLICATION
• Routine Screening
– The systematic application of a test or inquiry, to all individuals
who may have (or who have not sought medical help for that
disorder)
• Targeted (High Risk)
– The highly selected application of a test or inquiry, to identify
individuals at high risk of a specific disorder by virtue of known
risk factors
Adapted from Department of Health. Annual report of the national screening committee.
London: DoH, 1997.
8. Defining Diagnostic Testing
MRCPsych 2010
COMPARATOR
• Accuracy (aka convergent validity)
– The degree of approximation (veracity) to a robust comparator
• Validity (aka criterion validity)
– The degree of approximation (veracity) to a criterion reference
• Precision
– The degree of predictability (low SD) in the measure
9. Aims of Detection
MRCPsych 2010
• Screening:
– Short; Easy; some false +ve (low SpS PPV), few false
–ve (High Sens, NPV)
• Diagnosis (case-finding)
– Accurate, Few false +ve or –ve
• Rating
– Simple, patient rated, correl. With QoL and other
outcomes
10. UK National Screening Committee Guidelines
MRCPsych 2010
• The condition should: • The screening program should:
• • Be an important health issue • • Show evidence that benefits of screening
• • Have a well-understood history, with a detectable outweighing risks
risk factor or disease marker • • Be acceptable to public and professionals
• • Have cost-effective primary preventions • • Be cost effective (and have ongoing evaluation)
implemented. • • Have quality-assurance strategies in place.
• Adapted from: UK National Screening Committee
• The screening tool should: Criteria for appraising the viability, effectiveness and
• • Be a valid tool with known cut-off appropriateness of a screening programme
• • Be acceptable to the public
• • Have agreed diagnostic procedures. • http://www.nsc.nhs.uk/pdfs/criteria.pdf
• The treatment should:
• • Be effective, with evidence of benefits of early
intervention
• • Have adequate resources
• • Have appropriate policies as to who should be
treated.
11. Development of Diagnostic Tests
MRCPsych 2010
Stage Type Purpose Description
Pre-clinical Development Development of the proposed tool or Here the aim is to develop a screening method that is likely to help in the detection of the
test underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the
tool to both patients and staff must be considered in order for implementation to be
successful.
Phase Diagnostic validity Early diagnostic validity testing in a The aim is to evaluate the early design of the screening method against a known (ideally
I_screen selected sample and refinement of tool accurate) standard known as the criterion reference. In early testing the tool may be
refined, selecting most useful aspects and deleting redundant aspects in order to make the
tool as efficient (brief) as possible whilst retaining its value.
Phase Diagnostic validity Diagnostic validity in a representative The aim is to assess the refined tool against a criterion (gold standard) in a real world
II_screen sample sample where the comparator subjects may comprise several competing condition which
may otherwise cause difficulty regarding differential diagnosis.
Phase Implementation Screening RCT; clinicians using vs not This is an important step in which the tool is evaluated clinically in one group with access
III_screen using a screening tool to the new method compared to a second group (ideally selected in a randomized fashion)
who make assessments without the tool.
Phase Implementation Screening implementation studies using In this last step the screening tool /method is introduced clinically but monitored to discover
IV_screen real-world outcomes the effect on important patient outcomes such as new identifications, new cases treated
and new cases entering remission.
Citation: Mitchell AJ. Screening for depression in clinical practice: evidence based approach
13. Graphical – Screening principles
MRCPsych 2010
#
of
Individuals
Non-Depressed
Severity of Depression
Depressed
#
of
Individuals
14. Graphical – Screening principles
MRCPsych 2010
#
of Cut-Off
Individuals
Low High
Non-Depressed
<<<< high Specificity
Severity of Depression
High Sensitivity >>>> Depressed
#
of
Individuals
15. Graphical – Screening principles
MRCPsych 2010
#
of Cut-Off
Individuals
Low High
Non-Depressed
<<<< low Specificity
Severity of Depression
High Sensitivity >>>> Depressed
#
of
Individuals
16. Graphical – Definition of NPV
MRCPsych 2010
Cut-Off
Low High
True +ve / ALL +ve = PPV
Non-Depressed
True -ve
True +ve
Depressed
False alarms
17. Graphical – Definition of PPV
MRCPsych 2010
True –VE / ALL -ve = NPV
Cut-Off
Low High
Non-Depressed
True -ve
True +ve
Depressed
Missed cases
18. Theory of Diagnostic Tests
MRCPsych 2010
Cut-off value
Non-Depressed
Depressed
#
of
Individuals True -ve
True +ve
False -ve False +ve
Test
Result
19. Low Prevalence (Se Sp = same)
MRCPsych 2010
Cut-off value
Non-Depressed
Mj Depression
#
of
Individuals
False –ve False +ve
SMALL LARGE
Test
Result
20. High Prevalence (Se Sp = same)
MRCPsych 2010
Cut-off value
Non-Depressed Mj+Mn Depression
#
of
Individuals
False –ve False +ve
LARGE SMALL
Test
Result
22. Example: A Clear Disease [#1]
Point of Partial Rarity
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
Test Result
23. Example: A Probable Syndrome
[#2]
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
MMSE Cognitive Score
24. Example: A Normally Distributed Trait
[#3]
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
MMSE Cognitive Score
28. 0
500
1000
1500
2000
2500
3000
Ze
ro
O
ne MRCPsych 2010
Tw
o
Th
re
e
Fo
ur
Fi
ve
Si
x
Se
ve
n
ei
gh
t
N
in
e
Te
n
El
ev
en
Tw
el
ve
Th
irt
ee
n
Fo
ur
te
en
Fi
fte
en
Thompson et al (2001) n=18,414
Si
xt
ee
Se n
ve
nt
ee
n
Ei
gh
te
en
29. Mitchell, Coyne et al (2008)
MRCPsych 2010
110
100 Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women
90
80
70
60
Early Pregnancy
50 3months Post-Partum
12months Post-Partum
40
30
20
10
0
Healthy Depressive Symptoms Mild Depression Moderate to Severe Depression
30. PHQ9 Linear distribution
35 MRCPsych 2010
30
PHQ9 (Major Depression)
25 PHQ9 (Minor Depression)
PHQ9 (Non-Depressed)
20
15
10
5
0
ve
n
en
n
ro
e
e
o
ve
n
en
n
ur
en
en
ne
x
t
n
gh
ee
Tw
re
Te
ve
n
ee
Si
ee
Ze
Fo
el
Fi
ev
Ni
te
te
O
fte
Th
Ei
nt
Se
Tw
irt
xt
ur
gh
El
Fi
ve
Th
Si
Fo
Ei
Se
Baker-Glen, Mitchell et al (2008)
32. Accuracy 2x2 Table Reference Standard
Disorder Present
Reference Standard
No Disorder
MRCPsych 2010
Test A/A + B
+ve A B PPV
Depression Depression Test
-ve C D
D/C + D
NPV
PRESENT ABSENT Total A/ A + C D/ B + D
Sn Sp
Test +ve True +ve False +ve PPV
Test -ve False -Ve True -Ve NPV
Sensitivity Specificity Prevalence
34. Basic Measures of Accuracy
MRCPsych 2010
• Sensitivity (Se) a/(a + c) TP / (TP + FN)
• A measure of accuracy defined the proportion of patients with disease in whom
the test result is positive: a/(a + c)
• Specificity (Sp) d/(b + d) TN / (TN + FP)
• A measure of accuracy defined as the proportion of patients without disease in
whom the test result is negative
• Positive Predictive Value a/(a+b) TP / (TP + FP)
• A measure of rule-in accuracy defined as the proportion of true positives in
those that screen positive screening result, as follows
• Negative Predictive Value c/(c+d) TN / (TN + FN)
• A measure of rule-out accuracy defined as the proportion of true negatives in
those that screen negative screening result, as follows
35. Accuracy in words
MRCPsych 2010
• Sensitivity
– The chance of testing positive among those with the condition
– The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis
• Specificity
– The chance of testing negative among those without the condition
– The chance of accepting the null hypothesis among those that satisfy the null hypothesis
• Positive Predictive Value
– The chance of having the condition among those that test positive
– The chance of not satisfying the null hypothesis among those that reject the null hypothesis
• Negative Predictive Value
– The chance of not having the condition among those that test negative
– The chance of satisfying the null hypothesis among those that accept the null hypothesis
• Type I Error or α (alpha) or p-Value or false positive rate
– The chance of testing positive among those without the condition
– The chance of rejecting the null hypothesis among those that satisfy the null hypothesis
• Type II Error or β (beta) or false negative rate
– The chance of testing negative among those with the condition
– The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis
• False Discovery Rate or q-Value
– The chance of not having the condition among those that test positive
– The chance of satisfying the null hypothesis among those that reject the null hypothesis
• False Omission Rate
– The chance of having the condition among those that test negative
– The chance of not satisfying the null hypothesis among those that accept the null hypothesis
36. Rule-in Accuracy
MRCPsych 2010
Depression Depression
PRESENT ABSENT
Test +ve True +ve False +ve PPV
(type I error) (discrimination)
Test -ve False –Ve True -Ve NPV
(type II error)
Sensitivity Specificity Prevalence
(occurrence)
38. Likelihood Ratios
MRCPsych 2010
Likelihood Ratio for Positive Tests
The chance of testing positive among those with
the condition; divided by the chance of testing
positive among those without the condition
Sensitivity / (1 - Specificity)
[ TP / (TP + FN) ] / [ FP / (FP + TN) ]
= PPV / Prevalence
Likelihood Ratio for Negative Tests
The chance of testing negative among those with
the condition; divided by the chance of testing
negative among those without the condition
Specificity / (1 – Sensitivity)
[ FN / (FN + TP) ] / [ TN / (TN + FP) ]
= NPV / Prevalence
46. Added Value
MRCPsych 2010
• Definition 1:
– The additional ability of a test to rule-in or rule-out
compared with the baseline rate
– PPV minus Prevalence
– NPV minus prevalence
• Definition 2:
– The additional of a test to rule-in or rule-out compared
with the unassisted rate
– PPV test minus PPV no test (assuming equal prevalence)
– LR+ test minus LR+ no test
– AUC test minus AUC no test
47. 0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Loss of energy
Diminished drive
Sleep disturbance
MRCPsych 2010
Concentration/indecision
Depressed mood
Anxiety
Diminished concentration
Insomnia
Diminished interest/pleasure
Psychic anxiety
Helplessness
Worthlessness
Hopelessness
Somatic anxiety
Thoughts of death
Anger
Excessive guilt
Psychomotor change
Indecisiveness
Decreased appetite
Psychomotor agitation
Psychomotor retardation
Decreased weight
Lack of reactive mood
Increased appetite
All Case Proportion
Hypersomnia
Depressed Proportion
Non-Depressed Proportion
Increased weight
Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted
48. -0.10
0.00
0.10
0.20
0.30
0.40
0.50
A nge
r
A nxie
ty
Decr
ea s e
d app
eti te
MRCPsych 2010
Decr
eas e
d weig
ht
Depr
es sed
m ood
Dimin
is hed
c onc
entr a
t io n
Dimin
is hed
dr ive
Dimin
is hed
int er
est /p
leasu
re
Exc e
ss ive
guilt
Help
less n
ess
Hope
le s snes
s
Hy pe
rsom
ni a
Inc re
a sed a
ppet
ite
Inc re
a sed w
eight
Indec
isiv enes
s
Ins om
nia
L ac k
of re
act iv
e mo
od
L os s
of en
erg y
Ps ych
i c a nx
iety
Ps ych
o mot o
r agi ta
tion
Ps ych
o mot o
r c han
ge
Ps ych
o mot o
r ret ar
da tion
Sl eep
dis tu
rban
ce
Soma
ti c a
nx iet
y
Rule-In Added Value (PPV-Prev)
Thou
g hts
Rule-Out Added Value (NPV-Prev)
of de
ath
Wor t
hles s
ness
49. Accuracy of Tests: Visual Post-test Probabilities
MRCPsych 2010
Very unlikely unlikely likely Very likely
Overall
10% - (22) -50% = 54%
CIDI (computer) Any Depression
PHQ-2
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
WHO5 (1+3)
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Mj Depression
1 Question
3% - (37) - 63% = 60% Arroll B et al (2003) BMJ
CIDI (computer) Mj Depression
2 Questions
25% 75%
0% 32% - (37) - 96% = 64% 100%
50. 1.00
MRCPsych 2010Post-test Probability
0.90
0.80
0.70
0.60
0.50
0.40
Clinician Positive (Fallowfield et al, 2001)
0.30 Clinician Negative (Fallowfield et al, 2001)
Baseline Probability
0.20 HADS-D Positive (Mata-analysis)
HADS-D Negative (Meta-analysis)
0.10
Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
53. PostStroke Mj Depression vs NonMj
MRCPsych 2010
• Clinicians diagnosis using DSMIV vs SCAN/PSE
– 50 people with major depression
– 150 healthy people
– 50 with subsyndromal depression
54. Clinicians using DSMIV
MRCPsych 2010
• IF: Clinicians diagnosed 50 cases with Mj depression
• IF: Their specificity was 95%
• Q. What was the sensitivity?
• Q. What was the prevalence?
• Q. What was the PPV?
• Q. What was the % correctly identified per every 100
screened?
55. Test vs Major Depression
MRCPsych 2010
Depression Depression
On SCAN ABSENT
Test +ve ?? 50
(Clinician) PPV ??%
Test -ve ??
NPV ??%
50 200
Sensitivity Specificity
Prevalence ??%
50% 95%
56. Test vs Major Depression
MRCPsych 2010
Depression Depression
On SCAN ABSENT
Test +ve 40 10 50
(Clinician) PPV 80%
Test -ve 10 190 200
NPV 95%
50 200
Sensitivity Specificity
Prevalence 20%
80% 95%
57. 6. Advanced Techniques
sROC
Real World Numbers
NND; NNS
Bivariate meta-analysis
Economics
61. Further Reading
MRCPsych 2010
• David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet
2002; 359: 881–84
• Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ
VOLUME 329 17 JULY 2004
• Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative
accuracy: assessing new tests against existing diagnostic pathways. BMJ
• 2006;332;1089-1092
• Reitsma JB et al Bivariate analysis of sensitivity and specificity produces
informative summary measures in diagnostic reviews. Journal of Clinical
Epidemiology 58 (2005) 982–990
62. MRCPsych Teaching 2010
MRCPsych 2010
B. Critical Appraisal of Prognostic Tests
Risk, predictors, measuring outcomes
Alex J Mitchell
Consultant in Liaison Psychiatry
University of Leicester
63. Measuring Risk
MRCPsych 2010
Risk
– the probability of some untoward event
•e.g., disease, death
Risk Factor
– characteristics or behaviours associated with an
increased risk of becoming diseased
64. Healthy
Healthy
Healthy
With SMC
MCI
With SMC
FTD
Dementia VaD
AD
LBD
Mixed
65. Modelling Progression on MCI-Dementia
MRCPsych 2010
Disease Severity
Healthy
MMSE
30
MCI
23v24
Mild Dementia
20v21
Moderate Dementia
11v12
Severe Dementia
0
T0 T4 T+8 T+12
Time in Years
66. Modelling Progression on MCI-Dementia
MRCPsych 2010
Disease Severity
Healthy
MMSE
30
MCI
23v24
Mild Dementia
20v21
Moderate Dementia
11v12
Severe Dementia
0
T0 T4 T+8 T+12
Time in Years
67. Modelling Progression on MCI-Dementia
MRCPsych 2010
Disease Severity
Healthy
MMSE
30
MCI MCI-Progressive
Moderate Risk
23v24
Mild Dementia
MCI-Progressive 20v21
High Risk
Moderate Dementia
11v12
Severe Dementia
0
T0 T4 T+8 T+12
Time in Years
68. Accuracy 2x2 Table
MRCPsych 2010
OUTCOME OUTCOME
PRESENT/ ABSENT /
POOR GOOD
RISK +ve True +ve False +ve PPV of
predictor
RISK -ve False -Ve True -Ve NPV of
predictor
Sensitivity Specificity Prevalence
Of of
predictor predictor
69. Test for AD vs HC…fill in missing cells
MRCPsych 2010
AD MCI
Test +ve 600 100 700 PPV 85.0%
Test -ve NPV 81%
800 1000 1800
(44%)
Mitchell (2005) Sensitivity Specificity
Meta-analysis 75% 90%
N=14x
70. Test for AD vs HC…..good test?
MRCPsych 2010
AD MCI
Test +ve 600 100 700 PPV 85.0%
Test -ve 200 900 1100 NPV 81%
800 1000 1800
(44%)
Mitchell (2005) Sensitivity Specificity
Meta-analysis 75% 90%
N=14x
73. AD vs HC – P-Tau181
MRCPsych 2010
AD MCI
Test +ve 595 113 708
PPV 84.0%
Test -ve 257 576 833
NPV 69.1%
852 689 1541
Mitchell (2005) Sensitivity Specificity
Meta-analysis 69.8% 83.6%
N=14x
74. Classifying Predictors
MRCPsych 2010
Demographic Disease Related
• Age • MCI Type
• Gender • MCI Subtype
• Education
• Structural Imaging
Service Related • Functional Imaging
• CSF Studies
• Recruitment Setting • Genetic testing (ApoE4)
• Education • Cognitive Testing
• Length of follow-up • Non-memory impairment
• Delay in diagnosis • Depression/anxiety
• Treatment • Subjective Performance
• Size of study • Functional status
• Vascular status
75. Mayo Data Survival (Kaplan-Mayer)
MRCPsych 2010
100 Normals
Normals
All amnestic MCI
All amnestic MCI
80 A-MCI single domain
A-MCI single domain
A-MCI multidomain
A-MCI multidomain
60
Alive
(%)
40
20
P<0.023
P<0.023
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Years after enrollment
CP1183493-11
76. Measuring Risk
MRCPsych 2010
• Absolute Risk Reduction (ARR)
– is the absolute difference in event rates between
the experimental and control patients.
ARR = CER - EER
• Number Needed to Treat (NNT)
– is the number of patients a clinician needs to
treat in order to prevent one additional adverse
outcome
77. Measuring Risk - Examples
MRCPsych 2010
CER EER RRR ARR NNT
0 .6 0 .4 33% 20% 5
0 .0 6 0 .0 4 33% 2% 50
0 .0 0 6 0 .0 0 4 33% .2 % 500
RRR remains the same despite differences in absolute rate of events.
ARRs reflect underlying susceptibility of patients
NNTs provide a measure of the clinical effort that must be expended
78. Checklist criteria
MRCPsych 2010
• Are the study population similar to our patients?
• Is the study design appropriate for the research question?
• Were the study subjects representative of patients with the disease in question?
• Were the patients at a similar point in the course of their disease?
• Were the outcomes objectively-defined, and were the people recording the outcomes
blinded to the prognostic factors?
• Were patients followed long enough for outcomes to occur?
• Was the dropout rate excessive?
• Did the authors adjust for differences between groups?