Absence of a gold standard in diagnostic test accuracy research
1. Absence of a gold standard in diagnostic test
accuracy research
with application in context of childhood TB
Maarten van Smeden, PhD
Post-doctoral researcher Julius Center for Health Sciences and Primary Care
WEON 2017 Pre-conference Accounting for Measurement Error in Epidemiology
Antwerp, June 7, 2017
2. Outline
• Diagnostic test accuracy
• The problem: absence of a gold standard
• Possible solution: latent class analysis in context of TB
6. Diagnostic testing
• “New test better than the existing test(s)?”
• “(Where to) add new test to diagnostic pathway?”
• “Recommend new test in practice guidelines?”
Fig from: Bossuyt, BMJ, 2006
7. Diagnostic test accuracy studies (DTA)
• Evaluation of “new” diagnostic tests (=index test) by
comparison to a “gold standard”
• Misclassification probabilities of index test: sensitivity,
specificity, negative/positive predictive values, etc.
12. All that glitters is not gold
• Commonly the best available reference standard: Se < 1 and
Sp < 1: not a “gold standard”.
Because:
detection limits (e.g. culture), infeasible/not ethical to execute
in some patients (e.g. biopsy), observer errors (e.g. MRI), etc.
13. All that glitters is not gold
• Commonly the best available reference standard: Se < 1 and
Sp < 1: not a “gold standard”.
-> misclassifications of the target condition by the reference
standard (= measurement error)
14. When using imperfect reference standard
Assuming: reference standard Se = 1, index test Sp = Se = 0.7, conditional independence reference standard and index test
0.5 0.6 0.7 0.8 0.9 1.0
Specificity Reference Standard
E[SenstivityIndexTest]
Disease prevalence = 0.05
Disease prevalence = 0.25
Disease prevalence = 0.50
0.3
0.4
0.5
0.6
0.7
15. When using imperfect reference standard
• Bias, sometimes called “reference standard bias”. Not
necessarily a lower bound of Se/Sp
• Philosophical problems when index test is believed to be
more accurate than the best available reference standard
16. When using imperfect reference standard
Absence of a gold standard
Misclassifications by the reference standard ->
no straightforward approaches to estimation of
misclassification probabilities of index tests (that are valid)
18. Tuberculosis (TB)
Paulsen, Nature, 2013
■ FIGURE 2.16a
Top causes of death worldwide in 2012.a,b Deaths from TB
among HIV-positive people are shown in grey.c
Road injury
HIV/AIDS
Diabetes mellitus
Diarrheal diseases
Tracheal, bronchus,
lung cancers
TB
Chronic obstructive
pulmonary disease
Lower respiratory
infections
Stroke
Ischaemic heart
disease
0 1 2 3 4 5 6 7
Millions
■ F
Est
20
in g
a This is the latest year for which estimates for all causes are currently
available. See WHO Global Health Observatory data repository,
available at http://apps.who.int/gho/data/node.main.GHECOD
(accessed 27 August 2015).
b For HIV/AIDS, the latest estimates of the number of deaths in 2012
a F
t
o
b
i
b D
d
HIV
WPR 9.2 8.3–10.0 0.29
Global 35.2 30.9–39.4 8.4
WHO Global TB report 2015
19. Data
• 749 hospitalised children with suspected pulmonary TB in
Cape Town, South Africa
• Study procedures, a number of tests for TB for each subject:
• Microscopy
• Culture
• Xpert (NAAT)
• TST (skin test)
• Radiography
28. Heuristic model for TB data
• Conditional independence
between all tests is unlikely
• Conditional dependence
between: Xpert, culture,
microscopy, and TST among TB
diseased due to “bacterial load”
• Bacterial load modelled by a
random effect
32. Is latent class analysis useful?
• In TB example, I believe: yes
• More realistic than assuming reference standard (culture)
has Se = Sp = 1
• Results ‘robust’ to changing prior distributions and
conditional dependence structure
• Lack of robust alternative approaches for DTA in the
absence of a gold standard
33. Is latent class analysis useful?
• But:
• Latent class analysis for DTA is still rare
34. Latent class analysis in diagnostic research
Systematic review from 2014
• 69 theoretical papers
• 64 applied papers in human research + 47 in veterinary sciences
• applications of LCA still not common in human diagnostic research
van Smeden, AJE, 2014
35. Is latent class analysis useful?
• But:
• Latent class analysis for DTA is still rare
• Robustness to misspecification of the conditional
dependence structure is a concern
37. Is latent class analysis useful?
• But:
• Latent class analysis for DTA is still rare
• Robustness to misspecification of the conditional
dependence structure is a concern
• Identifiability requirements
38. Why Bayesian?
• Practical arguments:
• Model specifications in non-commercial software packages
(e.g. randomLCA vs rjags in R)
• (Weakly) informative prior distributions can solve non-
identifiability problems
• Additional calculations (e.g. positive/negative predictive
values with CrI)
39. Final remarks
• Misclassification in DTA studies is often both the primary topic
of study (for the index test) and the problem (when occurring
in the reference standard)
• Model based estimation of index test accuracy by latent class
analysis can be useful
• There is some evidence that robustness of the latent class
model can be improved when disease status can be verified
with certainty in a subset
• While the focus of this talk was on DTA, other studies such as
“incremental value” studies suffer from the same problems
40. Acknowledgements
Thanks to all co-authors in:
Supported by a grant from Canadian Institutes of Health Research (MOP
#89857)