Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Seventy years of RCTs

241 visualizaciones

Publicado el

This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.

Publicado en: Salud y medicina
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Seventy years of RCTs

  1. 1. 70 Years and Still Here The Randomised Clinical Trial and its Critics Stephen Senn 1(c)Stephen Senn 2018 stephen@senns.demon.co.uk @stephensenn
  2. 2. Acknowledgements and a Clarification (c)Stephen Senn 2018 2 Acknowledgements Many thanks to Ursula Garczarek, EUGM & Cytel for the invitation The historical aspect of this work has benefitted greatly from reading articles by Peter Armitage, John Gower and Nancy Hall A Clarification I am going to pick on a number of papers and take the authors to task regarding some of the misunderstanding regarding randomisation that the promote 1. This does not mean that the papers as a whole are bad 2. Where they are wrong, it does not mean that they are uniquely wrong. I could have picked on many other examples
  3. 3. Outline Topic Number of Slides Randomisation: from psychology to clinical trials via agriculture 10 A game of two dice 8 The critics of randomisation 6 The critics answered 19 Conclusions 6 (c)Stephen Senn 2018 3
  4. 4. Randomisation: from psychology to clinical trials via agriculture Strength from uncertainty (c)Stephen Senn 2018 4
  5. 5. A timeline of randomisation & related matters When Who Where What 1843-1900 JB Laws & JH Gilbert Rothamsted 57 year scientific partnership developing agricultural experiments 1883-1885 CS Peirce & J Jastrow Johns Hopkins First? use of randomisation. Experiment to determine ability to detect small weight differences 1910 TH Wood & FJM Stratton Cambridge Applied to agriculture approach astronomers used to determining precision of means 1925-1935 RA Fisher Rothamsted Proposed using randomisation in experiments developed ANOVA 1933-1940 F Yates Rothamsted Developed theory of factorials, confounding, recovering inter-block information 1948 Bradford Hill LSHTM, London MRC Streptomycin trial has random allocation 1965 John Nelder NVRS Wellsbourne Theory of general balance based on block and treatment structure (c)Stephen Senn 2018 5
  6. 6. Seventy years ago (c)Stephen Senn 2018 6
  7. 7. Bradford Hill on randomisation (c)Stephen Senn 2018 7 It ensures that neither our personal idiosyncrasies (our likes or dislikes consciously or unwittingly applied) nor our lack of balanced judgement has entered into the construction of the different treatment groups—the allocation has been outside our control and the groups are therefore unbiased; … it removes the danger, inherent in an allocation based on personal judgement, that believing we may be biased in our judgements we endeavour to allow for that bias, to exclude it, and that in doing so we may overcompensate and by thus ‘leaning over backward’ introduce a lack of balance from the other direction; … and, having used a random allocation, the sternest critic is unable to say when we eventually dash into print that quite probably the groups were differentially biased through our predilections or through our stupidity Selected by Armitage, 2003
  8. 8. The Rothamsted School (c)Stephen Senn 2018 8 RA Fisher 1890-1962 Variance, ANOVA Randomisation, design, significance tests Frank Yates 1902-1994 Factorials, recovering Inter-block information John Nelder 1924-2010 General balance, computing Genstat® and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc
  9. 9. General Balance History • An idea of John Nelder’s • Two papers in the Proceedings of the Royal Society, 1965 concerning “The analysis of randomized experiments with orthogonal block structure” • Block structure and the null analysis of variance • Treatment structure and the general analysis of variance Basic idea • Splits an experiment into two radically different components • The block structure, which describes the way that the experimental units are organised • The way that variation amongst units can be described • Null ANOVA – an idea of Anscombe’s • The treatment structure, which reflects the way that treatments are combined for the scientific purpose of the experiment (c)Stephen Senn 2018 9
  10. 10. (c)Stephen Senn 2018 10 Design Driven Modelling • Together with a third piece of information, the design matrix, these determine the analysis of variance • Note that because both block and treatments structure can be hierarchical such a design matrix is not, on its own sufficient to derive an ANOVA • But together with John’s block and treatment structure it is • For designs exhibiting general balance • This approach is incorporated in Genstat®
  11. 11. Genstat® Help File Example (c)Stephen Senn 2018 11 Block Plot S N Yield 1 1 0 0 0.750 1 4 0 180 1.204 1 3 0 230 0.799 1 12 10 0 0.925 1 5 10 180 1.648 1 8 10 230 1.463 1 7 20 0 0.654 1 2 20 180 1.596 1 10 20 230 1.594 1 11 40 0 0.526 1 9 40 180 1.672 1 6 40 230 1.804 2 8 0 0 0.503 2 10 0 180 0.489 etc " This is a field experiment to study the effects of nitrogen and sulphur on the yield of wheat with a randomized block design." BLOCKSTRUCTURE Block / Plot TREATMENTSTRUCTURE N * S ANOVA [PRINT=aov; FPROBABILITY=yes] Yield
  12. 12. (c)Stephen Senn 2018 12
  13. 13. Morals • Design matters! • Experimental material may have some structure • Block structure may be complex • More than one variance may be relevant • The way that the design maps treatments onto the block structure is important • Determines the error term • Randomisation of that which is declared irrelevant guarantees the marginal probability statements • These, in turn calibrate the conditional ones (c)Stephen Senn 2018 13
  14. 14. A Game of Two Dice The role of the roll (c)Stephen Senn 2018 14
  15. 15. Game of Chance • Two dice are rolled – Red die – Black die • You have to call correctly the odds of a total score of 10 • Three variants – Game 1 You call the odds and the dice are rolled together – Game 2 the red die is rolled first, you are shown the score and then must call the odds – Game 3 the red die is rolled first, you are not shown the score and then must call the odds 15(c)Stephen Senn 2018
  16. 16. Total Score when Rolling Two Dice Variant 1. Three of 36 equally likely results give a 10. The probability is 3/36=1/12. 16(c)Stephen Senn 2018
  17. 17. Variant 2: If the red die score is 1,2 or 3, probability of a total of 10 is 0. If the red die score is 4,5 or 6 the probability of a total of 10 is 1/6. Variant 3: The probability = (½ x 0) + (½ x 1/6) = 1/12 Total Score when Rolling Two Dice 17(c)Stephen Senn 2018
  18. 18. The Morals • You can’t treat game 2 like game 1. – You must condition on the information you receive in order to act wisely – You must use the actual data from the red die • You can treat game 3 like game 1. – You can use the distribution in probability that the red die has • You can’t ignore an observed prognostic covariate in analysing a clinical trial just because you randomised – That would be to treat game 2 like game 1 • You can ignore an unobserved covariate precisely because you did randomise – Because you are entitled to treat game 3 like game 1 18(c)Stephen Senn 2018
  19. 19. The Critics of Randomisation It ain’t what we don’t know that harms us but what we know that ain’t so (c)Stephen Senn 2018 19
  20. 20. Some criticisms Who When What Urbach 1985 You might just as well let the patients choose Worral 2007 Infinitely many confounders: something must be imbalanced Borgerson 2009 “As Worral has shown” Deaton and Cartwright 2017 Type I error rate not maintained if data skewed Krauss 2018 You should balance the patients at the start of the trial (c)Stephen Senn 2018 20
  21. 21. Some quotations (c)Stephen Senn 2018 21
  22. 22. More quotations (c)Stephen Senn 2018 22
  23. 23. Yet more quotations (c)Stephen Senn 2018 23
  24. 24. To sum up the claims • Trials aren’t perfectly balanced • There are indefinitely many confounders • We could do better by creating equal groups • Blinding is important but randomisation isn’t • We might as well let the patients choose which (blinded) group they join (c)Stephen Senn 2018 24
  25. 25. The Critics Answered Being a statistician means never having to say you are certain (c)Stephen Senn 2018 25
  26. 26. Points to understand • Balance isn’t necessary • Indefinitely many confounders argument is a red herring • It’s all about ratios • Probability matters • You can’t gather the patients together at the beginning of a trial • Effective blinding requires randomisation • Allowing patients to choose is not a good idea (c)Stephen Senn 2018 26
  27. 27. A Tale of Two Tables Trial 1 Treatment Sex Verum Placebo Total Male 34 26 60 Female 15 25 40 Total 49 51 100 Trial 2 Treatment Sex Verum Placebo Male 26 26 52 Female 15 15 30 Total 41 41 82 • Trial two balanced but trial one not • Surely trial two must be more reliable • Things are not so simple 27(c)Stephen Senn 2018
  28. 28. A Tale of Two Tables Trial 1 Treatment Sex Verum Placebo Total Male 26+8 26 60 Female 15 15+10 40 Total 49 51 100 Trial 2 Treatment Sex Verum Placebo Male 26 26 52 Female 15 15 30 Total 41 41 82 • Trial two contains trial one • How can more information be worse than less • If statistical theory could not deal with Trial 1 there would be something wrong with it 28(c)Stephen Senn 2018
  29. 29. A Red Herring • One sometimes hears that the fact that there are indefinitely many covariates means that randomisation is useless • This is quite wrong • It is based on a misunderstanding that variant 3 of our game should not be analysed like variant 1 • I showed you that it should • Just because a series of terms is not finite does not mean that their sum is not bounded (c)Stephen Senn 2018 29 1 1 1 2 4 81 .... 2   
  30. 30. You are not free to imagine anything at all • Imagine that you are in control of all the thousands and thousands of covariates that patients will have • You are now going to allocate the covariates and their effects to patients o As in a simulation • If you respect the actual variation in human health that there can be you will find that the net total effect of these covariates is bounded 𝑌 = 𝛽0 + 𝑍 + 𝛽1 𝑋1 + ⋯ 𝛽 𝑘 𝑋 𝑘 + ⋯ Where Z is a treatment indicator and the X are covariates. You are not free to arbitrarily assume any values you like for the Xs and the 𝛽𝑠 because the variance of Y must be respected. (c)Stephen Senn 2018 30
  31. 31. The importance of ratios • In fact from one point of view there is only one covariate that matters o potential outcome  If you know this, all other covariates are irrelevant • And just as this can vary between groups in can vary within • The t-statistic is based on the ratio of differences between to variation within • Randomisation guarantees (to a good approximation) the unconditional behaviour of this ratio and that is all that matters for what you can’t see (game 3) • An example follows (c)Stephen Senn 2018 31
  32. 32. Corollary – unobserved covariates can be ignored if you have randomised • The error is to assume that because you can’t use randomisation as a justification for ignoring information it is useless • It is useful for what you don’t see • Knowing that the two-dice game is fairly run is important even though the average probability is not relevant to game two • Average probabilities are important for calibrating your inferences o Your conditional probabilities must be coherent with your marginal ones  See the relationship between the games (c)Stephen Senn 2018 32
  33. 33. Hills andArmitageEneuresis Data 10 8 14 2 12 6 1210 6 4 2 0 40 8 Drynights placebo Line of equality Sequence Drug Placebo Sequence placebo drug Cross-over trial in Eneuresis Two treatment periods of 14 days each 1. Hills, M, Armitage, P. The two-period cross-over clinical trial, British Journal of Clinical Pharmacology 1979; 8: 7-20. 33(c)Stephen Senn 2018
  34. 34. 0.7 4 0.5 2 0.3 0 0.1 -2-4 0.6 0.2 0.4 0.0 Permutatedtreatment effect Blue diamond shows treatment effect whether or not we condition on patient as a factor. It is identical because the trial is balanced by patient. However the permutation distribution is quite different and our inferences are different whether we condition (red) or not (black) and clearly balancing the randomisation by patient and not conditioning the analysis by patient is wrong 34(c)Stephen Senn 2018
  35. 35. The two permutation* distributions summarised Summary statistics for Permuted difference no blocking Number of observations = 10000 Mean = 0.00561 Median = 0.0345 Minimum = -3.828 Maximum = 3.621 Lower quartile = -0.655 Upper quartile = 0.655 P-value for observed difference 0.0340 *Strictly speaking randomisation distributions Summary statistics for Permuted difference blocking Number of observations = 10000 Mean = 0.00330 Median = 0.0345 Minimum = -2.379 Maximum = 2.517 Lower quartile = -0.517 Upper quartile = 0.517 P-value for observed difference 0.0014 35(c)Stephen Senn 2018
  36. 36. Two Parametric Approaches Not fitting patient effect Estimate s.e. t(56) t pr. 2.172 0.964 2.25 0.0282 (P-value for permutation is 0.034) Fitting patient effect Estimate s.e. t(28) t pr . 2.172 0.616 3.53 0.00147 (P-value for Permutation is 0.0014) 36(c)Stephen Senn 2018
  37. 37. What happens if you balance but don’t condition? Approach Variance of estimated treatment effect over all randomisations* Mean of variance of estimated treatment effect over all randomisations* Completely randomised Analysed as such 0.987 0.996 Randomised within- patient Analysed as such 0.534 0.529 Randomised within- patient Analysed as completely randomised 0.534 1.005 *Based on 10000 random permutations (c)Stephen Senn 2018 37 That is to say, permute values respecting the fact that they come from a cross- over but analysing them as if they came from a parallel group trial
  38. 38. In terms of t-statistics Approach Observed variance of t-statistic over all randomisations* Predicted theoretical variance Completely randomised Analysed as such 1.027 1.037 Randomised within- patient Analysed as such 1.085 1.077 Randomised within- patient Analysed as completely randomised 0.534 1.037@ *Based on 10000 random permutations @ Using the common falsely assumed theory (c)Stephen Senn 2018 38
  39. 39. The Shocking Truth • The validity of conventional analysis of randomised trials does not depend on covariate balance • It is valid because they are not perfectly balanced • If they were balanced the standard analysis would be wrong (c)Stephen Senn 2018 39
  40. 40. Randomisation is Necessary for Blinding Fisher, in a letter to Jeffreys, explained the dangers of using a haphazard method thus … if I want to test the capacity of the human race for telepathically perceiving a playing card, I might choose the Queen of Diamonds, and get thousands of radio listeners to send in guesses. I should then find that considerably more than one in 52 guessed the card right... Experimentally this sort of thing arises because we are in the habit of making tacit hypotheses, e.g. ‘Good guesses are at random except for a possible telepathic influence.’ But in reality it appears that red cards are always guessed more frequently than black(Bennett, 1990).(pp268-269) …if the trial was, and remained, double-blind then randomization could play no further role in this respect. (Worrall, 2007)(P454) 40(c)Stephen Senn 2018
  41. 41. Avoiding Double Guessing • If you don’t randomise you have to assume that your strategy has not been guessed by the investigator • You are using ‘the argument from the stupidity of others’ • Not publishing the block size in your protocol is a classic example 41(c)Stephen Senn 2018
  42. 42. Deaton and Cartwright Their claim • Generate from a highly skewed population • Just have 50 patients per arm • Mean difference between treatments in the population is zero • Type I error rate is 11% for a nominal 5% My simulation (c)Stephen Senn 2018 42
  43. 43. One reason why Urbach’s proposal is not a good idea • Other things being equal distributions centred on 0.5 are better • The narrower the distribution the better • The narrowest distribution centred on 0.5 is randomisation (c)Stephen Senn 2018 43
  44. 44. Conclusion The beginning of the end (c)Stephen Senn 2018 44
  45. 45. My Philosophy of Clinical Trials • Your (reasonable) beliefs dictate the model • You should try measure what you think is important • You should try fit what you have measured – Caveat : random regressors and the Gauss-Markov theorem • If you can balance what is important so much the better – But fitting is more important than balancing • Randomisation deals with unmeasured covariates – You can use the distribution in probability of unmeasured covariates – For measured covariates you must use the actual observed distribution • Claiming to do ‘conservative inference’ is just a convenient way of hiding bad practice – Who thinks that analysing a matched pairs t as a two sample t is acceptable? 45(c)Stephen Senn 2018
  46. 46. What’s out and What’s in Out In • Log-rank test • T-test on change scores • Chi-square tests on 2 x 2 tables • Responder analysis and dichotomies • Balancing as an excuse for not conditioning • Proportional hazards • Analysis of covariance fitting baseline • Logistic regression fitting covariates • Analysis of original values • Modelling as a guide for designs 46(c)Stephen Senn 2018
  47. 47. Unresolved Issue • In principle you should never be worse off by having more information • The ordinary least squares approach has two potential losses in fitting covariates – Loss of orthogonality – Losses of degrees of freedom • This means that eventually we lose by fitting more covariates 47(c)Stephen Senn 2018
  48. 48. Resolution? • The Gauss-Markov theorem does not apply to stochastic regressors • In theory we can do better by having random effect models • However there are severe practical difficulties • Possible Bayesian resolution in theory • A pragmatic compromise of a limited number of prognostic factors may be reasonable 48(c)Stephen Senn 2018
  49. 49. To sum up • There are a lot of people out there who fail to understand what randomisation can and cannot do for you • We need to tell them firmly and clearly what they need to understand • The RCT may be 70 years old but it still looks quite lively 49(c)Stephen Senn 2018
  50. 50. Finally I leave you with this thought Statisticians are always tossing coins but do not own many 50(c)Stephen Senn 2018
  51. 51. References Papers 1. Armitage P. Fisher, Bradford Hill, and randomization. International journal of epidemiology. 2003;32(6):925-928; discussion 945-928. 2. Gower J. Statistics and agriculture. Journal of the Royal Statistical Society. 1988;151(1):179-200. 3. Hall NS. RA Fisher and his advocacy of randomization. Journal of the History of Biology. 2007;40(2):295- 325. 4. Senn SJ. Seven myths of randomisation in clinical trials. Statistics in Medicine. 2013;32(9):1439-1450. Blogs 3 blogs on Deborah Mayo’s Error Statistics website. Email stephen@senns.demon.co.uk for details (c)Stephen Senn 2018 51

×