Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

In Search of Lost Infinities: What is the “n” in big data?

136 visualizaciones

Publicado el

In designing complex experiments, agricultural scientists, with the help of their statistician collaborators, soon came to realise that variation at different levels had very different consequences for estimating different treatment effects, depending on how the treatments were mapped onto the underlying block structure. This was a key feature of the Rothamsted approach to design and analysis and a strong thread running through the work of Fisher, Yates and Nelder, being expressed in topics such as split-pot designs, recovering inter-block information and fractional factorials. The null block-structure of an experiment is key to this philosophy of design and analysis. However modern techniques for analysing experiments stress models rather than symmetries and this modelling approach requires much greater care in analysis, with the consequence that you can easily make mistakes and often will.
In this talk I shall underline the obvious, but often unintentionally overlooked, fact that understanding variation at the various levels at which it occurs is crucial to analysis. I shall take three examples, an application of John Nelder’s theory of general balance to Lord’s Paradox, the use of historical data in drug development and a hybrid randomised non-randomised clinical trial, the TARGET study, to show that the data that many, including those promoting a so-called causal revolution, assume to be ‘big’ may actually be rather ‘small’. The consequence is that there is a danger that the size of standard errors will be underestimated or even that the appropriate regression coefficients for adjusting for confounding may not be identified correctly.
I conclude that an old but powerful experimental design approach holds important lessons for observational data about limitations in interpretation that mere numbers cannot overcome. Small may be beautiful, after all.

Publicado en: Ciencias
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

In Search of Lost Infinities: What is the “n” in big data?

  1. 1. In Search of Lost Infinities What is the “n” in big data? Stephen Senn, Edinburgh (c) Stephen Senn 2019 1
  2. 2. Acknowledgements (c) Stephen Senn 2019 2 My thanks for the kind invitation This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”. Work on historical controls is joint with Olivier Collignon, Anna Schritz and Riccardo Spezia
  3. 3. Outline • Why block structure matters • An example analysed with the help of the Rothamsted School and Genstat® • The TARGET study • Historical controls • Lord’s paradox • Conclusions and lessons (c) Stephen Senn 2019 3
  4. 4. The Rothamsted School (c) Stephen Senn 2019 4 RA Fisher 1890-1962 Variance, ANOVA Randomisation, design, significance tests Frank Yates 1902-1994 Factorials, recovering Inter-block information John Nelder 1924-2010 General balance, computing Genstat® and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc
  5. 5. Trial in asthma Basic situation • Two beta-agonists compared • Zephyr(Z) and Mistral(M) • Block structure has several levels • Different designs will be investigated • Cluster • Parallel group • Cross-over Trial • Each design will be blocked at a different level • NB Each design will collect 336 measurements Block structure Level Number within higher level Total Number Centre 6 6 Patient 4 24 Episodes 2 48 Measurements 7 336 (c) Stephen Senn 2019 5
  6. 6. Block structure • Patients are nested with centres • Episodes are nested within patients • Measurements are nested within episodes • Centres/Patients/Episodes/Measurements (c) Stephen Senn 2019 6 Measurements not shown
  7. 7. Possible designs • Cluster randomised • In each centre all the patients either receive Zephyr (Z) or Mistral (M) in both episodes • Three centres are chosen at random to receive Z and three to receive M • Parallel group trial • In each centre half the patients receive Z and half M in both episodes • Two patients per centre are randomly chosen to receive Z and two receive M • For each patient the patient receives M in one episode and Z in another • The order of allocation, ZM or MZ is random (c) Stephen Senn 2019 7
  8. 8. (c) Stephen Senn 2019 8
  9. 9. (c) Stephen Senn 2019 9
  10. 10. (c) Stephen Senn 2019 10
  11. 11. Null (skeleton) analysis of variance with Genstat ® Code Output (c) Stephen Senn 2019 11 BLOCKSTRUCTURE Centre/Patient/Episode/Measurement ANOVA
  12. 12. Full (skeleton) analysis of variance with Genstat ® Additional Code Output (c) Stephen Senn 2019 12 TREATMENTSTRUCTURE Design[] ANOVA (Here Design[] is a pointer with values corresponding to each of the three designs.)
  13. 13. Variance matters Points • Which variances apply depends on the design • All three for cluster trial • First two for parallel trial • Third only for cross-over trial • It is possible for the number of observations to go to infinity without the variance going to zero • There is no ‘design-free’ n (c) Stephen Senn 2019 13 Variances       2 2 2 , centres , patients per centre , episodes per patient 1) 2) 3) C P E C C P C P E C P E n n n n between centre contribution n n between patient contribution n n n within patient contribution   
  14. 14. What about the measurement level? • I put this in to remind us that not everything you measure brings exploitable information to the same degree • Randomisation between measurements was not possible in any of the schemes • This makes it difficult to exploit them except in a summary way • By averaging • Warning: some repeated measures analyses are very strongly reliant on assumed model structure (c) Stephen Senn 2019 14       1 6 1 12 1 7 1 6 1 7 2 2 1 1 1 , , 1 , , usually 7 7 7 M i i Y Y Var Y Y Var Y Var Y                           Y Y
  15. 15. The TARGET study • One of the largest studies ever run in osteoarthritis • 18,000 patients • Randomisation took place in two sub-studies of equal size • Lumiracoxib versus ibuprofen • Lumiracoxib versus naproxen • Purpose to investigate CV and GI tolerability of lumiracoxib (c) Stephen Senn 2019 15
  16. 16. Baseline Demographics Sub-Study 1 Sub Study 2 Demographic Characteristic Lumiracoxib n = 4376 Ibuprofen n = 4397 Lumiracoxib n = 4741 Naproxen n = 4730 Use of low-dose aspirin 975 (22.3) 966 (22.0) 1195 (25.1) 1193 (25.2) History of vascular disease 393 (9.0) 340 (7.7) 588 (12.4) 559 (11.8) Cerebro- vascular disease 69 (1.6) 65 (1.5) 108 (2.3) 107 (2.3) Dyslipidaemias 1030 (23.5) 1025 (23.3) 799 (16.9) 809 (17.1) Nitrate use 105 (2.4) 79 (1.8) 181 (3.8) 165 (3.5) (c) Stephen Senn 2019 16
  17. 17. Baseline Deviances Model Term Demographic Characteristic Sub-study (DF=1) Treatment given Sub- study (DF=2) Treatment (DF=2) Use of low-dose aspirin 23.57 0.13 13.40 History of vascular disease 70.14 5.23 47.41 Cerebro- vascular disease 13.54 0.14 7.75 Dyslipidaemias 117.98 0.17 54.72 Nitrate use 39.83 4.62 29.17 (c) Stephen Senn 2019 17
  18. 18. Baseline Chi-square P-values Model Term Demographic Characteristic Sub-study (DF=1) Treatment given Sub- study (DF=2) Treatment (DF=2) Use of low-dose aspirin < 0.0001 0.94 0.0012 History of vascular disease < 0.0001 0.07 <0.0001 Cerebro- vascular disease 0.0002 0.93 0.0208 Dyslipidaemias <0.0001 0.92 <0.0001 Nitrate use < 0.0001 0.10 <0.0001 (c) Stephen Senn 2019 18
  19. 19. Outcome Variables Lumiracoxib only Sub-Study 1 Sub Study 2 Outcome Variables Lumiracoxib n = 4376 Lumiracoxib n = 4741 Total of discontinuations 1751 (40.01) 1719 (36.26) CV events 33 (0.75) 52 (1.10) At least one AE 699 (15.97) 710 (14.98) Any GI 1855 (42.39) 1785 (37.65) Dyspepsia 1230 (28.11) 1037 (21.87) (c) Stephen Senn 2019 19
  20. 20. Deviances and P-Values Lumiracoxib only fitting Sub-study Statistic Outcome Variables Deviance P-Value Total of discontinuations 37.43 < 0.0001 CV events 0.92 0.33 At least one AE 0.005 0.94 Any GI 0.004 0.95 Dyspepsia 16.85 < 0.0001 (c) Stephen Senn 2019 20
  21. 21. Lessons from TARGET • If you want to use historical controls you will have to work very hard • You need at least two components of variation in your model • Between centre • Between trial • And possibly a third • Between eras • What seems like a lot of information may not be much (c) Stephen Senn 2019 21
  22. 22. Implications for historical controls Variation between studies puts a severe limit on what can be learned using historical controls. The fact that you have lots of patients in your current one-armed study, the fact that you had masses in all previous studies and even the fact that there were many previous studies cannot compensate for the fact that there is only one current study. The most efficient way to deal with between-study variation is nearly always to have concurrent controls. (c) Stephen Senn 2019 22 2 2 22 22 2 , historical studies patients per historical study patients in current study γ between study variance σ between patient variance lim c h c h c k n k n n n nk                    Control ‘group’ variance Experimental group variance
  23. 23. (c) Stephen Senn 2019 23 Lord’s Paradox Lord, F.M. (1967) “ A paradox in the interpretation of group comparisons”, Psychological Bulletin, 68, 304- 305. “A large university is interested in investigating the effects on the students of the diet provided in the university dining halls….Various types of data are gathered. In particular the weight of each student at the time of his arrival in September and his weight in the following June are recorded” We shall consider this in the Wainer and Brown version (also considered by Pearl & McKenzie) in which there are two halls each assigned a different one of two diets being compared.
  24. 24. (c) Stephen Senn 2019 24 Two Statisticians Statistician One (Say John) • Calculates difference in weight (outcome-baseline) for each hall • No significant difference between diets as regards this ‘change score’ • Concludes no evidence of difference between diets Statistician Two (Say Jane) • Adjusts for initial weight as a covariate • Finds significant diet effect on adjusted weight • Concludes there is a difference between diets
  25. 25. (c) Stephen Senn 2019 25
  26. 26. (c) Stephen Senn 2019 26 John’s analysis: comparing change-scores)
  27. 27. (c) Stephen Senn 2019 27 Jane’s analysis: Comparing covariate adjusted scores
  28. 28. Pearl & Mackenzie, 2018 (c) Stephen Senn 2019 28 D (Diet) WF W1 “However, for statisticians who are trained in ‘conventional’ (i.e. model-blind) methodology and avoid using causal lenses, it is deeply paradoxical “ The Book of Why p217 “In this diagram, W1, is a confounder of D and WF and not a mediator. Therefore, the second statistician would be unambiguously right here.” The Book of Why p216 NB This diagram adapted from theirs, which covers change rather than final weight.
  29. 29. Start with the randomised equivalent • We suppose that the diets had been randomised to the two halls • Le us suppose there are 100 students per hall • Generate some data • See what Genstat® says about analysis • Note that ( as we have seen) it is a particular feature of Genstat® that it does not have to have outcome data to do this • Given the block and treatment structure alone it will give us a skeleton ANOVA • We start by ignoring the covariate (c) Stephen Senn 2019 29
  30. 30. Skeleton ANOVA (c) Stephen Senn 2019 30 BLOCKSTRUCTURE Hall/Student TREATMENTSTRUCTURE Diet ANOVA Analysis of variance Source of variation d.f. Hall stratum Diet 1 Hall.Student stratum 198 Total 199 Code Output Gentstat® points out the obvious (which, however, has been universally overlooked). There are no degrees of freedom to estimate the variability of the Diet estimate which appears in the Hall and not the Hall.Student stratum
  31. 31. Adding initial weight as a covariate (c) Stephen Senn 2019 31 BLOCKSTRUCTURE Hall/Student TREATMENTSTRUCTURE Diet COVARIATE Base ANOVA Analysis of variance (adjusted for covariate) Covariate: Base Source of variation d.f. Hall stratum Diet 0 Covariate 1 Residual 0 Hall.Student stratum Covariate 1 Residual 197 Total 199 Code Output Again Gentstat® points out the obvious (which, however, has been universally overlooked). There are no degrees of freedom to estimate the treatment effect because the single degree of freedom is needed to estimate the between-hall slope. Conclusion: The Book of Why is far from being unambiguously right. It is only right if the strong but untestable assumption can be made that the between-hall regression is the same as the within-hall regression
  32. 32. A warning for epidemiology (c) Stephen Senn 2019 32 Things that are a problem for controlled clinical trials are very rarely less of a problem so for observational analysis. Propensity score, Mendelian randomisation, causal analysis blah, blah, blah are all very well but if you aren’t thinking about components of variation you should be.
  33. 33. Conclusions • Local control is valuable • Design matters • Components of variation matter • The Rothamsted approach brings insight • Causal analysis needs to be developed further to include components of variation • Just because you are rich in data does not mean you are rich in information • Be sceptical about “big data” (c) Stephen Senn 2019 33
  34. 34. Finally, I leave you with this thought (c) Stephen Senn 2019 34 A big data-analyst is an expert at producing misleading conclusions from huge datasets. It is much more efficient to use a statistician, who can do the same with small ones.
  35. 35. References 35 1. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:147-62. 2. Nelder JA. The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:163-78. 3. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin. 1967;66:304-5. 4. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, editors. Principals of Modern Psychological Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983. 5. Liang KY, Zeger SL. Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankhya-the Indian Journal of Statistics Series B. 2000;62:134-48. 6. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. American Statistician. 2004;58(2):117-23. 7. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine. 2006;25(24):4334–44. 8. Senn SJ, Graf E, Caputo A. Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure. Statistics in Medicine. 2007;26(30):5529-44. 9. Van Breukelen GJ. ANCOVA versus change from baseline had more power in randomized studies and more bias in nonrandomized studies. Journal of clinical epidemiology. 2006;59(9):920-5. 10. Pearl J, Mackenzie D. The Book of Why: Basic Books; 2018.
  36. 36. Blogpost with the first part of the talk (c) Stephen Senn 2019 36 To infinity and beyond: how big are your data, really?