SlideShare una empresa de Scribd logo
1 de 327
Descargar para leer sin conexión
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM
An Institute of Medicine Workshop sponsored by the Patient-Centered Outcomes Research Institute
April 25-26, 2013
The National Academies
2101 Constitution Avenue, NW
Washington, DC 20418
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM

Table of Contents
SECTION 1: WORKSHOP FRAMING MATERIALS
• Agenda
• Planning Committee Roster
• Participant List
SECTION 2: ENGAGING THE ISSUE OF BIAS
• Hernan, Miguel A. and Hernandez-Diaz, Sonia. Beyond the intention-to-treat in comparative
effectiveness research. Clinical Trials. 9:48–55. 2012.
• Prasad, Vinay and Jena, Anupam. Prespecified falsification end points: Can they validate true
observational associations? JAMA. 309(3). 2013.
• Walker, Alexander A. Orthogonal predictions: Follow-up questions for suggestive data.
Pharmacoepidemiology and Drug Safety. 19: 529–532. 2010.
• Ryan, PB, et al. Empirical assessment of methods for risk identification in healthcare data:
results from the experiments of the Observational Medical Outcomes Partnership. Statistics
in Medicine. 2012.
• Lorch, SA, et al. The differential impact of delivery hospital on the outcomes of premature
infants. Pediatrics. 130(2). 2012.
• Small, Dylan S. and Rosenbaum, Paul R. War and wages: The strength of instrumental
variables and their sensitivity to unobserved biases. Journal of the American Statistical Association.
103(483). 2008.
• Brookhart, MA, et al. Comparative Mortality Risk of Anemia Management Practices in
Incident Hemodialysis Patients.JAMA. 303(9). 2010.
• Cornfield, J. Principles of Research. Statistics in Medicine. 31:2760-2768. 2012.
SECTION 3: GENERALIZING RCT RESULTS TO BROADER POPULATIONS
• Kaizar, Eloise E. Estimating treatment effect via simple cross design synthesis. Statistics in
Medicine. 30:2986–3009. 2011.
• Go, AS, et al. Anticoagulation therapy for stroke prevention in atrial fibrillation: How well
do randomized trials translate into clinical practice? JAMA. 290(20). 2003.
• Hernan, MA, et al. Observational studies analyzed like randomized experiments: An
application to postmenopausal hormone therapy and Coronary Heart Disease. Epidemiology.
19(6). 2008.
• Weintraub, WS, et al. Comparative effectiveness of revascularization strategies. The New
England Journal of Medicine. 366(16). 2012.
SECTION 4: DETECTING TREATMENT EFFECT HETEROGENEITY
• Hlatky, MA, et al. Coronary artery bypass surgery compared with percutaneous coronary
interventions for multivessel disease: A collaborative analysis of individual patient data from
ten randomised trials. The Lancet. 373: 1190–97. 2009.
• Kent, DM, et al. Assessing and reporting heterogeneity in treatment effects in clinical trials:
A proposal. Trials. 11:85. 2010.
• Kent, David M. and Hayward, Rodney A. Limitations of applying summary results of clinical
trials to individual patients: The need for risk stratification. JAMA. 298(10):1209-1212. 2007.
• Basu, A, et al. Heterogeneity in Action: The Role of Passive Personalization in Comparative
Effectiveness Research. 2012
• Basu, Anirban. Estimating Person-centered Treatment (PeT) Effects using Instrumental
Variables: An application to evaluating prostate cancer treatments. 2013.
SECTION 5: PREDICTING INDIVIDUAL RESPONSES
• Byar, David P. Why Databases should not replace Randomized Clinical Trials. Biometrics.
36(2): 337-342. 1980.
• Lee, KL, et al. Clinical judgment and statistics. Lessons from a simulated randomized trial in
coronary artery disease. Circulation. 61:508-515. 1980.
• Pencina, Michael J. and D’Agostino, Ralph B. Thoroughly modern risk prediction? Science
Translational Medicine. 4(131). 2012.
• Tatonetti, NP, et al. Data-driven prediction of drug effects and interactions. Science
Translational Medicine. 4(125). 2012.
SECTION 6: ORGANIZATIONAL BACKGROUND
• IOM Roundtable on Value & Science-Driven Health Care (VSRT)
1. VSRT Background Information and Roster
2. VSRT Charter and Vision
3. Clinical Effectiveness Research Innovation Collaborative Background Information
• Patient-Centered Outcomes Research Institute (PCORI)
1. National Priorities for Research and Research Agenda
SECTION 7: BIOGRAPHIES AND MEETING LOGISTICS
• Planning Committee Biographies
• Speaker Biographies
• Location, Hotel, and Travel
----------------------------------
Workshop Framing Materials
WorkshopFraming
Materials
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM

An Institute of Medicine Workshop
Sponsored by the Patient-Centered Outcomes Research Institute

A LEARNING HEALTH SYSTEM ACTIVITY
IOM ROUNDTABLE ON VALUE & SCIENCE-DRIVEN HEALTH CARE
APRIL 25-26, 2013
THE NATIONAL ACADEMY OF SCIENCES
2101 CONSTITUTION AVENUE, NW
WASHINGTON, DC
Day 1: Thursday, April 25th
8:00 am Coffee and light breakfast available
8:30 am Welcome, introductions and overview
Welcome, framing of the meeting and agenda overview
Welcome from the IOM
Michael McGinnis, Institute of Medicine
Opening remarks and meeting overview
Joe Selby, Patient-Centered Outcomes Research Institute
Ralph Horwitz, GlaxoSmithKline
Meeting objectives
1. Explore the role of observational studies (OS) in the generation of evidence to guide clinical and
health policy decisions, with a focus on individual patient care, in a learning health system;
2. Consider concepts of OS design and analysis, emerging statistical methods, use of OS’s to
supplement evidence from experimental methods, identifying treatment heterogeneity, and
providing effectiveness estimates tailored for individual patients;
3. Engage colleagues from disciplines typically underrepresented in clinical evidence discussions;
4. Identify strategies for accelerating progress in the appropriate use of OS for evidence generation.
2
9:00 am Workshop stage-setting
 Session format
o Workshop overview and stage-setting
Steve Goodman, Stanford University
Q&A and open discussion
 Session questions:
o How do OS contribute to building valid evidence to support effective
decision making by patients and clinicians? When are their findings
useful, when are they not?
o What are the major challenges (study design, methodological, data
collection/management/analysis, cultural etc.) facing the field in the use
of OS data for decision making? Please include consideration of the
following issues: bias, methodological standards, publishing requirements.
o What can workshop participants expect from the following sessions?
9:45 am Engaging the issue of bias
Moderator: Michael Lauer, National Heart Lung and Blood Institute
 Session format
o Introduction to issue
Sebastian Schneeweiss, Harvard University
o Presentations:
 Instrumental variables and their sensitivity to unobserved biases
Dylan Small, University of Pennsylvania
 An empirical approach to measuring and calibrating for error in
observational analyses
Patrick Ryan, Johnson & Johnson
o Respondents and panel discussion:
 John Wong, Tufts University
 Joel Greenhouse, Carnegie Mellon University
Q&A and open discussion
 Session questions:
o What are the major bias-related concerns with the use of observational
study methods? What are the sources of bias?
o How many of these concerns relate to methods and how much to the
quality and availability of suitable data? What barriers have these concerns
created for the use of the results of observational studies to drive
decision-making?
3
o What are the most promising approaches to reduction of bias through
the use of statistical methods? Through study design (e.g. Dealing with
issues of multiplicity)?
o What are the circumstances under which administrative (claims) data can
be used to assess treatment benefits? What data are needed from EHRs
to strengthen the value of administrative data?
o What methods are best to adjust for the changes in treatment and clinical
conditions among patients followed longitudinally?
o What are the implications of these promising approaches for the use of
observational study methods moving forward?
11:30 pm Lunch
Participants will be asked to identify among their lunch table, what they think the
most critical questions are for PCOR in the topics covered by the workshop. These
topics will them be circulated to the moderators of the proceeding sessions.
12:30 pm Generalizing RCT results to broader populations
Moderator: Harold Sox, Dartmouth University
 Session format
o Introduction to issue
Robert Califf, Duke
o Presentations:
 Generalizing the right question
Miguel Hernan, Harvard University
 Using observational studies to determine RCT generalizability
Eloise Kaizar, Ohio State
o Respondents and panel discussion:
 William Weintraub, Christiana Medical Center
 Constantine Frangakis, Johns Hopkins University
Q&A and open discussion
 Session questions:
o What are the most cogent methodological and clinical considerations in
using observational study methods to test the external validity of findings
from RCTs?
o How do data collection, management, and analysis approaches impact
generalizability?
o What are the generalizability questions of greatest interest? Or, where
does the greatest doubt arise? (Age, concomitant illness, concomitant
treatment) What examples represent well established differences?
o What statistical methods are needed to generalize RCT results?
4
o Are the standards for causal inference from OS different when prior
RCTs have been performed? How does statistical methodology vary in
this case?
o What are the implications when treatment results for patients
not included in the RCT differ from the overall results reported in the
original RCT?
o What makes an observed difference in outcome credible? Finding the
RCT-shown effect on the narrower population? Replication in >1
environment? Confidence interval of the result? Size of the effect in the
RCT?
o Can subset analyses in the RCT, even if underpowered, be used to
support or rebut the OS finding?
2:15 pm Break
2:30 pm Detecting treatment-effect heterogeneity
Moderator: Richard Platt, Harvard Pilgrim Health Care Institute
 Session format
o Introduction to issue
David Kent, Tufts University
o Presentations:
 Comparative effectiveness of coronary artery bypass grafting and
percutaneous coronary intervention
Mark Hlatky, Stanford University
 Identification of effect heterogeneity using instrumental variables
Anirban Basu, University of Washington
o Respondents and panel discussion:
 Mary Charlson, Cornell University
 Mark Cullen, Stanford University
Q&A and open discussion
 Session questions:
o What is the potential for OS in assessing treatment response
heterogeneity and individual patient decision-making?
o What clinical and other data can be collected routinely to affect this
potential?
o How can longitudinal information on change in treatment categories and
clinical condition be used to assess variation in treatment response and
individual patient decision-making?
 What are the statistical methods for time varying changes in
treatment (including co-therapies) and clinical condition
5
o What are the best methods to form distinctive patient subgroups in
which to examine for heterogeneity of treatment response?
 What data elements are necessary to define these distinctive
patient subgroups?
o What are the best methods to assess heterogeneity in multi-dimensional
outcomes?
o How could further implementation of best practices in data collection,
management, and analysis impact treatment response heterogeneity?
o What is needed in order for information about treatment response
heterogeneity to be validated and used in practice?
4:15 pm Summary and preview of next day
4:45 pm Reception
5:45 pm Adjourn
*********************************************
Day 2: Friday, April 26th
8:00 am Coffee and light breakfast available
8:30 am Welcome, brief agenda overview, summary of previous day
Welcome, framing of the meeting and agenda overview
9:00 am Predicting individual responses
Moderator: Ralph Horwitz, GSK
 Session format
o Introduction to issue
Burton Singer, University of Florida
o Presentations:
 Data-driven prediction models
Nicholas Tatonetti, Columbia University
 Individual prediction
Michael Kattan, Cleveland Clinic
o Respondents and panel discussion:
 Peter Bach, Sloan Kettering
 Mitchell Gail, National Cancer Institute
6
Q&A and open discussion
 Session questions:
o How can patient-level observational data be used to create predictive
models of treatment response in individual patients? What statistical
methodologies are needed?
o How can predictive analytic methods be used to study the interactions of
treatment with multiple patient characteristics?
o How should the clinical history (longitudinal information) for a given
patient be utilized in the creation of prediction rules for responses of that
patient to one or more candidate treatment regimens?
o What are effective methodologies for producing prediction rules to guide
the management of an individual patient based on their comparability to
results of RCTs, OS, and archived patient records?
o How can we blend predictive models, which can predict impact of
treatment choices, and causal modeling, that compare predictions under
different treatments?
11:00 am Conclusions and strategies going forward
Panel members will be charged with highlighting very specific next steps laid out in
the course of workshop presentations and discussions and/or suggesting some of
their own.
 Panel:
o Rob Califf , Duke University
o Cynthia Mulrow, University of Texas
o Jean Slutsky, Agency for Healthcare Quality and Research
o Steve Goodman, Stanford University
 Session questions:
o What are the major themes and conclusions from the workshop’s
presentations and discussions?
o How can these themes be translated into actionable strategies with
designated stakeholders?
o What are the critical next steps in terms of advancing analytic methods?
o What are the critical next steps in developing data bases that will generate
evidence to guide clinical decision making.
o What are critical next steps in disseminating information on new methods
to increase their appropriate use?
10:45 am Break
7
12:15 pm Summary and next steps
Comments from the Chairs
Joe Selby, Patient-Centered Outcomes Research Institute
Ralph Horwitz, GlaxoSmithKline
Comments and thanks from the IOM
Michael McGinnis, Institute of Medicine
12:45 pm Adjourn
*******************************************
Planning Committee
Co–Chairs
Ralph Horwitz, GlaxoSmithKline
Joe Selby, Patient-Centered Outcomes Research Institute
Members
Anirban Basu, University of Washington
Troy Brennan, CVS/Caremark
Louis Jacques, Centers for Medicare & Medicaid Services
Steve Goodman, Stanford University
Jerry Kassirer, Tuft University
Michael Lauer, National Heart Lung and Blood Institute
David Madigan, Columbia University
Sharon-Lise Normand, Harvard University
Richard Platt, Harvard Pilgrim Health Care Institute
Robert Temple, Food and Drug Administration
Burton Singer, University of Florida
Jean Slutsky, Agency for Healthcare Research and Quality
Staff officer:
Claudia Grossmann
cgrossmann@nas.edu
202.334.3867
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM
Workshop Planning Committee
Co–Chairs
Ralph I. Horwitz, MD
Senior Vice President, Clinical Sciences Evaluation
GlaxoSmithKline
Joe V. Selby, MD, MPH
Executive Director
PCORI
Members
Anirban Basu, MS, PhD
Associate Professor and Director
Health Economics and Outcomes Methodology
University of Washington
Troyen A. Brennan, MD, JD, MPH
Executive Vice President and Chief Medical Officer
CVS Caremark
Steven N. Goodman, MD, PhD
Associate Dean for Clinical & Translational Research
Stanford University School of Medicine
Louis B. Jacques, MD
Director
Coverage and Analysis Group
Centers for Medicare & Medicaid Services
Jerome P. Kassirer, MD
Distinguished Professor
Tufts University School of Medicine
Michael S. Lauer, MD, FACC, FAHA
Director
Division of Prevention and Population Sciences
National Heart, Lung, and Blood Institute
David Madigan, PhD
Chair of Statistics
Columbia University
Sharon-Lise T. Normand, PhD, MSc
Professor
Department of Biostatistics and Health Care Policy
Harvard Medical School
Richard Platt, MD, MS
Chair, Ambulatory Care and Prevention
Chair, Population Medicine
Harvard University
Burton H. Singer, PhD, MS
Professor
Emerging Pathogens Institute
University of Florida
Jean Slutsky, PA, MS
Director
Center for Outcomes and Evidence
Agency for Healthcare Research and Quality
Robert Temple, MD
Deputy Director for Clinical Science
Centers for Drug Evaluation and Research
Food and Drug Administration
Current as of 12pm, April 24
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSYTEM
April 25-26, 2013
Workshop Participants
Jill Abell, PhD, MPH
Senior Director, Clinical Effectiveness
and Safety
GlaxoSmithKline
Joseph Alper
Writer and Technology Analyst
LSN Consulting
Naomi Aronson
Executive Director
Blue Cross, Blue Shield
Peter Bach, MD, MAPP
Attending Physician
Department of Epidemiology & Biostatistics
Memorial Sloan-Kettering Cancer Center
Anirban Basu, MS, PhD
Associate Professor and Director
Program in Health Economics
and Outcomes Methodology
University of Washington
Lawrence Becker
Director, Benefits
Xerox Corporation
Marc L. Berger, MD
Vice President, Real World Data and Analytics
Pfizer Inc.
Robert M. Califf, MD
Vice Chancellor for Clinical Research
Duke University Medical Center
Mary E. Charlson, MD
Chief, Clinical Epidemiology and
Evaluative Sciences Research
Weill Cornell Medical College
Jennifer B. Christian, PharmD, MPH, PhD
Senior Director, Clinical Effectiveness
and Safety
GlaxoSmithKline
Michael L. Cohen, PhD
Senior Program Officer
Committee on National Statistics
Mark R. Cullen, MD
Professor of Medicine
Stanford School of Medicine
Steven R. Cummings, MD, FACP
Professor Emeritus, Department of Medicine
University of California, San Francisco
Robert W. Dubois, MD, PhD
Chief Science Officer
National Pharmaceutical Council
Rachael L. Fleurence, PhD
Acting Director, Accelerating PCOR
Methods Program
PCORI
Dean Follmann, PhD
Branch Chief-Associate Director for Biostatistics
National Institutes of Health
Constantine Frangakis, PhD
Professor, Department of Biostatistics
Johns Hopkins Bloomberg School
of Public Health
Mitchell H. Gail, MD, PhD
Senior Investigator
National Cancer Institute
Kathleen R. Gans-Brangs, PhD
Senior Director, Medical Affairs
AstraZeneca
Current as of 12pm, April 24
Steven N. Goodman, MD, PhD
Associate Dean for Clinical and
Translational Research
Stanford University School of Medicine
Sheldon Greenfield, MD
Executive Co-Director, Health Policy
Research Institute
University of California, Irvine
Joel B. Greenhouse, PhD
Professor of Statistics
Carnegie Mellon University
Sean Hennessy, PharmD, PhD
Associate Professor of Epidemiology
University of Pennsylvania
Miguel Hernan, MD, DrPH, ScM, MPH
Professor of Epidemiology
Harvard University
Mark A. Hlatky, MD
Professor of Health Research & Policy,
Professor of Medicine
Stanford University
Ralph I. Horwitz, M.D.
Senior Vice President, Clinical
Science Evaluation
GlaxoSmithKline
Gail Hunt
President and CEO
National Alliance for Caregiving
Robert Jesse, MD, PhD
Principal Deputy Under Secretary for Health
Department of Veterans Affairs
Eloise E. Kaizar, PhD
Associate Professor
Department of Statistics
The Ohio State University
Jerome P. Kassirer, MD
Distinguished Professor
Tufts University School of Medicine
Michael Kattan, PhD
Quantitative Health Sciences Department Chair
Cleveland Clinic
David M. Kent, MD, MSc
Director, Clinical and Translational
Science Program
Tufts University Sackler School
of Graduate Biomedical Sciences
Michael S. Lauer, MD, FACC, FAHA
Director, Division of Prevention
and Population Sciences
National Heart, Lung, and Blood Institute
J. Michael McGinnis, MD, MPP, MA
Senior Scholar
Institute of Medicine
David O. Meltzer, PhD
Associate Professor
University of Chicago
Nancy E. Miller, PhD
Senior Science Policy Analyst
Office of Science Policy
National Institutes of Health
Sally Morton, PhD
Professor and Chair, Department of Biostatistics
Graduate School of Public Health
University of Pittsburgh
Cynthia D. Mulrow, MD, MSc
Senior Deputy Editor
Annals of Internal Medicine
Robin Newhouse
Chair and Professor
University of Maryland School of Nursing
Perry D. Nisen, MD, PhD
SVP, Science and Innovation
GlaxoSmithKline
Michael Pencina, PhD
Associate Professor
Boston University
Current as of 12pm, April 24
Richard Platt, MD, MS
Chair, Ambulatory Care and Prevention
Chair, Population Medicine
Harvard University
James Robins, MD
Mitchell L. and Robin LaFoley Dong
Professor of Epidemiology
Harvard University
Patrick Ryan, PhD
Head of Epidemiology Analytics
Janssen Research and Development
Nancy Santanello, MD, MS
Vice President, Epidemiology
Merck
Richard L. Schilsky, MD, FASCO
Chief Medical Officer
American Society of Clinical Oncology
Sebastian Schneeweiss, MD
Associate Professor, Epidemiology
Division of Pharmacoepidemiology
and Pharmacoeconomics
Brigham and Women's Hospital
Michelle K. Schwalbe, PhD
Program Officer
Board on Mathematical Sciences
and Their Applications
National Research Council
Jodi Segal, MD, MPH
Director, Pharmacoepidemiology Program
The John Hopkins Medical Institutions
Joe V. Selby, MD, MPH
Executive Director
PCORI
Burton H. Singer, PhD, MS
Professor, Emerging Pathogens Institute
University of Florida
Jean Slutsky, PA, MS
Director, Center for Outcomes and Evidence
Agency for Healthcare Research and Quality
Dylan Small, PhD
Associate Professor of Statistics
University of Pennsylvania
Harold C. Sox, MD
Professor of Medicine (emeritus, active)
The Dartmouth Institute for Health Policy
and Clinical Practice
Dartmouth Geisel School of Medicine
Elizabeth A. Stuart
Associate Professor, Department of Biostatistics
Johns Hopkins Bloomberg School
of Public Health
Nicholas Tatonetti, PhD
Assistant Professor of Biomedical Informatics
Columbia University
Robert Temple, MD
Deputy Center Director for Clinical Science
Food and Drug Administration
Scott T. Weidman, PhD
Director, Board on Mathematical Sciences
and their Applications
National Research Council
William S. Weintraub, MD, FACC
John H. Ammon Chair of Cardiology
Christiana Care Health Services
Harlan Weisman
Managing Director
And-One Consulting, LLC
Ashley E. Wivel, MD, MSc
Senior Director in Clinical Effectiveness
and Safety
GlaxoSmithKline
John B. Wong, MD
Professor of Medicine
Tufts University Sackler School
of Graduate Biomedical Sciences
Current as of 12pm, April 24
IOM Staff
Claudia Grossmann, PhD
Senior Program Officer
Diedtra Henderson
Program Officer
Elizabeth Johnston
Program Assistant
Valerie Rohrbach
Senior Program Assistant
Julia Sanders
Senior Program Assistant
Robert Saunders, PhD
Senior Program Officer
Barret Zimmermann
Program Assistant
----------------------------------
Engaging the Issue of Bias
EngagingtheIssueofBias
CLINICAL
TRIALS
Clinical Trials 2012; 9: 48–55ARTICLE
Beyond the intention-to-treat in comparative
effectiveness research
Miguel A Herna´na,b
and Sonia Herna´ndez-Dı´aza
Background The intention-to-treat comparison is the primary, if not the only,
analytic approach of many randomized clinical trials.
Purpose To review the shortcomings of intention-to-treat analyses, and of ‘as
treated’ and ‘per protocol’ analyses as commonly implemented, with an emphasis
on problems that are especially relevant for comparative effectiveness research.
Methods and Results In placebo-controlled randomized clinical trials, intention-
to-treat analyses underestimate the treatment effect and are therefore nonconser-
vative for both safety trials and noninferiority trials. In randomized clinical trials with
an active comparator, intention-to-treat estimates can overestimate a treatment’s
effect in the presence of differential adherence. In either case, there is no guarantee
that an intention-to-treat analysis estimates the clinical effectiveness of treatment.
Inverse probability weighting, g-estimation, and instrumental variable estimation
can reduce the bias introduced by nonadherence and loss to follow-up in ‘as treated’
and ‘per protocol’ analyses.
Limitations These analyse require untestable assumptions, a dose-response model,
and time-varying data on confounders and adherence.
Conclusions We recommend that all randomized clinical trials with substantial lack
of adherence or loss to follow-up are analyzed using different methods. These
include an intention-to-treat analysis to estimate the effect of assigned treatment
and ‘as treated’ and ‘per protocol’ analyses to estimate the effect of treatment after
appropriate adjustment via inverse probability weighting or g-estimation. Clinical
Trials 2012; 9: 48–55. http://ctj.sagepub.com
Introduction
Randomized clinical trials (RCTs) are widely viewed
as a key tool for comparative effectiveness research
[1], and the intention-to-treat (ITT) comparison has
long been regarded as the preferred analytic
approach for many RCTs [2].
Indeed, the ITT, or ‘as randomized,’ analysis has
two crucial advantages over other common alterna-
tives – for example, an ‘as treated’ analysis. First, in
double-blind RCTs, an ITT comparison provides a
valid statistical test of the hypothesis of null effect of
treatment [3,4]. Second, in placebo-controlled trials,
an ITT comparison is regarded as conservative
because it underestimates the treatment effect
when participants do not fully adhere to their
assigned treatment.
Yet excessive reliance on the ITT approach is
problematic, as has been argued by others before us
[5]. In this paper, we review the problems of ITT
comparisons with an emphasis on those that are
especially relevant for comparative effectiveness
research. We also review the shortcomings of ‘as
treated’ and ‘per protocol’ analyses as commonly
implemented in RCTs and recommend the routine
use of analytic approaches that address some of
a
Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA, b
Harvard-MIT Division of Health
Sciences and Technology, Boston, MA, USA
Author for correspondence: Miguel Herna´n, Department of Epidemiology, Harvard School of Public Health, Boston,
MA 02115, USA
E-mail: miguel_hernan@post.harvard.edu
Ó The Author(s), 2011
Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774511420743
those shortcomings. Let us start by defining two
types of causal effects that can be estimated in RCTs.
The effect of assigned treatment
versus the effect of treatment
Consider a double-blind clinical trial in which par-
ticipants are randomly assigned to either active
treatment (Z ¼ 1) or placebo (Z ¼ 0) and are then
followed for 5 years or until they die (Y ¼ 1 if they die
within 5 years, Y ¼ 0 otherwise). An ITT analysis
would compare the 5-year risk of death in those
assigned to treatment with the 5-year risk of death in
those assigned to placebo. An ITT comparison
unbiasedly estimates the average causal effect of
treatment assignment Z on the outcome Y. For
brevity, we will refer to this as the effect of assigned
treatment.
Trial participants may not adhere to, or comply
with, the assigned treatment Z. Some of those
assigned to placebo may decide to take treatment,
and some of those assigned to active treatment may
decide not to take it. We use A to refer to the
treatment actually received. Thus, regardless of their
assigned treatment Z, some subjects will take treat-
ment (A ¼ 1) and others will not take it (A ¼ 0). The
use of ITT comparisons is sometimes criticized when
not all trial participants adhere to their assigned
treatment Z, that is, when Z is not equal to A for
every trial participant. For example, consider two
RCTs: in the first trial, half of the participants in
the Z ¼ 1 group decide not to take treatment; in the
second trial, all participants assigned to Z ¼ 1 decide
to take the treatment. An ITT comparison will
correctly estimate the effect of assigned treatment
Z in both trials, but the effects will be different even
if the two trials are otherwise identical. The direction
and magnitude of the effect of assigned treatment
depends on the adherence pattern.
Now suppose that, in each of the two trials with
different adherence, we could estimate the effect
that would have been observed if all participants had
fully adhered to the value of treatment A (1 or 0)
originally assigned to them. We will refer to such
effect as the average causal effect of treatment A on
the outcome Y or, for brevity, the effect of treatment.
The effect of treatment A appears to be an attractive
choice to summarize the findings of RCTs with
substantial nonadherence because it will be the
same in two trials that differ only in their adherence
pattern.
However, estimating the magnitude of the effect
of treatment A without bias requires assumptions
grounded on expert knowledge (see below). No
matter how sophisticated the statistical analysis,
the estimate of the effect of A will be biased if one
makes incorrect assumptions.
The effect of assigned treatment
may be misleading
An ITT comparison is simple and therefore very
attractive [4]. It bypasses the need for assumptions
regarding adherence and dose-response by focusing
on estimating the effect of assigned treatment Z
rather than the effect of treatment A. However, there
is a price to pay for this simplicity, as reviewed in this
section.
We start by considering placebo-controlled
double-blind RCTs. It is well known that if treatment
A has a null effect on the outcome, then both the
effect of assigned treatment Z and the effect of treat-
ment A will be null. This is a key advantage of the ITT
analysis: it correctly estimates the effect of treatment
A under the null, regardless of the adherence pattern.
It is also well known that if treatment A has a non-
null effect (that is, either increases or decreases the
risk of the outcome) and some participants do not
adhere to their assigned treatment, then the effect of
assigned treatment Z will be closer to the null that
the actual effect of treatment A [3]. This bias toward
the null is due to contamination of the treatment
groups: some subjects assigned to treatment (Z ¼ 1)
may not take it (A ¼ 0) whereas some subjects
assigned to placebo (Z ¼ 0) may find a way to take
treatment (A ¼ 1). As long as the proportion of
patients who end up taking treatment (A ¼ 1) is
greater in the group assigned to treatment (Z ¼ 1)
than in the group assigned to placebo (Z ¼ 0), the
effect of assigned treatment Z will be in between the
effect of treatment A and the null value.
The practical effect of this bias varies depending
on the goal of the trial. Some placebo-controlled
RCTs are designed to quantify a treatment’s benefi-
cial effects – for example, a trial to determine
whether sildenafil reduces the risk of erectile dys-
function. An ITT analysis of these trials is said to be
‘conservative’ because the effect of assigned treat-
ment Z is biased toward the null. That is, if an ITT
analysis finds a beneficial effect for treatment assign-
ment Z, then the true beneficial effect of treatment A
must be even greater. The makers of treatment A
have a great incentive to design a high-quality study
with high levels of adherence. Otherwise, a small
beneficial effect of treatment might be missed by the
ITT analysis.
Other trials are designed to quantify a treatment’s
harmful effects – for example, a trial to determine
whether sildenafil increases the risk of cardiovascular
disease. An ITT analysis of these trials is antic-
onservative precisely because the effect of assigned
Beyond the intention-to-treat 49
http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
treatment Z is biased toward the null. That is, if an
ITT analysis fails to find a toxic effect, there is no
guarantee that treatment A is safe. A trial designed to
quantify harm and whose protocol foresees only an
ITT analysis could be referred to as a ‘randomized
cynical trial.’
Now let us consider double-blind RCTs that com-
pare two active treatments. These trials are often
designed to show that a new treatment (A ¼ 1) is not
inferior to a reference treatment (A ¼ 0) in terms of
either benefits or harms. An example of a noninfer-
iority trial would be one that compares the reduction
in blood glucose between a new inhaled insulin and
regular injectable insulin. The protocol of the trial
would specify a noninferiority margin, that is, the
maximum average difference in blood glucose that is
considered equivalent (e.g., 10 mg/dL). Using an ITT
comparison, the new insulin (A ¼ 1) will be declared
not inferior to classical insulin (A ¼ 0) if the average
reduction in blood glucose in the group assigned to
the new treatment (Z ¼ 1) is within 10 mg/dL of the
average reduction in blood glucose in the group
assigned to the reference treatment (Z ¼ 0) plus/
minus random variability. Such ITT analysis may be
misleading in the presence of imperfect adherence.
To see this, consider the following scenario.
Scenario 1
The new treatment A ¼ 1 is actually inferior to the
reference treatment A ¼ 0, for example, the average
reduction in blood glucose is 10 mg/dL under treat-
ment A ¼ 1 and 22 mg/dL under treatment A ¼ 0. The
type and magnitude of adherence is equal in the two
groups, for example 30% of subjects in each group
decided not to take insulin. As a result, the average
reduction is, say, 7 mg/dL in the group assigned to
the new treatment (Z ¼ 1) and 15 mg/dL in the group
assigned to the reference treatment (Z ¼ 0). An ITT
analysis, which is biased toward the null in this
scenario, may incorrectly suggest that the new
treatment A ¼ 1 is not inferior to the reference
treatment A ¼ 0.
Other double-blind RCTs with an active compar-
ator are designed to show that a new treatment
(A ¼ 1) is superior to the reference treatment (A ¼ 0)
in terms of either benefits or harms. An example of a
superiority trial would be one that compares the risk
of heart disease between two antiretroviral regimes.
Using an ITT comparison, the new regimen (A ¼ 1)
will be declared superior to the reference regime
(A ¼ 0) if the heart disease risk is lower in the group
assigned to the new regime (Z ¼ 1) than in the group
assigned to the reference regime (Z ¼ 0) plus/minus
random variability. Again, such ITT analysis may be
misleading in the presence of imperfect adherence.
Consider the following scenario.
Scenario 2
The new treatment A ¼ 1 is actually equivalent to the
reference treatment A ¼ 0, for example, the 5-year
risk of heart disease is 3% under either treatment
A ¼ 1 or treatment A ¼ 0, and the risk in the absence
of either treatment is 1%. The type or magnitude of
adherence differs between the two groups, for exam-
ple, 50% of subjects assigned to the new regime and
10% of those assigned to the reference regime
decided not to take their treatment because of
minor side effects. As a result, the risk is, say, 2% in
the group assigned to the new regime (Z ¼ 1) and
2.8% in the group assigned to the reference regime
(Z ¼ 0). An ITT analysis, which is biased away from
the null in this scenario, may incorrectly suggest that
treatment A ¼ 1 is superior to treatment A ¼ 0.
An ITT analysis of RCTs with an active comparator
may result in effect estimates that are biased toward
(Scenario 1) or away from (Scenario 2) the null. In
other words, the magnitude of the effect of assigned
treatment Z may be greater than or less than the
effect of treatment A. The direction of the bias
depends on the proportion of subjects that do not
adhere to treatment in each group, and on the
reasons for nonadherence.
Yet, a common justification for ITTcomparisons is
the following: Adherence is not perfect in clinical
practice. Therefore, clinicians may be more inter-
ested in consistently estimating the effect of
assigned treatment Z, which already incorporates
the impact of nonadherence, than the effect of
treatment A in the absence of nonadherence. That
is, the effect of assigned treatment Z reflects a
treatment’s clinical effectiveness and therefore
should be privileged over the effect of treatment A.
In the next section, we summarize the reasons why
this is not necessarily true.
The effect of assigned treatment
is not the same as the effectiveness
of treatment
Effectiveness is usually defined as ‘how well a treat-
ment works in everyday practice,’ and efficacy as
‘how well a treatment works under perfect adherence
and highly controlled conditions.’ Thus, the effect of
assigned treatment Z in postapproval settings is
often equated with effectiveness, whereas the effect
of treatment Z in preapproval settings (which is close
to the effect of A when adherence is high) is often
50 MA Herna´n and S Herna´ndez-Dı´az
Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
equated with efficacy. There is, however, no guaran-
tee that the effect of assigned treatment Z matches
the treatment’s effectiveness in routine medical
practice. A discrepancy may arise for multiple rea-
sons, including differences in patient characteristics,
monitoring, or blinding, as we now briefly review.
The eligibility criteria for participants in RCTs are
shaped by methodologic and ethical considerations.
To maximize adherence to the protocol, many RCTs
exclude individuals with severe disease, comorbid-
ities, or polypharmacy. To minimize risks to vulner-
able populations, many RCTs exclude pregnant
women, children, or institutionalized populations.
As a consequence, the characteristics of participants
in an RCT may be, on average, different from those
of the individuals who will receive the treatment in
clinical practice. If the effect of the treatment under
study varies by those characteristics (e.g., treatment
is more effective for those using certain concomitant
treatments) then the effect of assigned treatment Z
in the trial will differ from the treatment’s effective-
ness in clinical practice.
Patients in RCTs are often more intensely moni-
tored than patients in clinical practice. This greater
intensity of monitoring may lead to earlier detection
of problems (i.e., toxicity, inadequate dosing) in
RCTs compared with clinical practice. Thus, a treat-
ment’s effectiveness may be greater in RCTs because
the earlier detection of problems results in more
timely therapeutic modifications, including modifi-
cations in treatment dosing, switching to less toxic
treatments, or addition of concomitant treatments.
Blinding is a useful approach to prevent bias from
differential ascertainment of the outcome [6]. There
is, however, an inherent contradiction in conduct-
ing a double-blind study while arguing that the goal
of the study is estimating the effectiveness in routine
medical practice. In real life, both patients and
doctors are aware of the assigned treatment. A true
effectiveness measure should incorporate the effects
of assignment awareness (e.g., behavioral changes)
that are eliminated in ITT comparisons of double-
blind RCTs.
Some RCTs, commonly referred to as pragmatic
trials [7–9], are specifically designed to guide decisions
in clinical practice. Compared with highly controlled
trials, pragmatic trials include less selected partici-
pants and are conducted under more realistic condi-
tions, which may result in lower adherence to the
assigned treatment. It is often argued that an ITT
analysis of pragmatic trials is particularly appropriate
to measure the treatment’s effectiveness, and
thus that pragmatic trials are the best design for
comparative effectiveness research. However, this
argument raises at least two concerns.
First, the effect of assigned treatment Z is influ-
enced by the adherence patterns observed in the
trial, regardless of whether the trial is a pragmatic
one. Compared with clinical practice, trial partici-
pants may have a greater adherence because they are
closely monitored (see above), or simply because
they are the selected group who received informed
consent and accepted to participate. Patients outside
the trial may have a greater adherence after they
learn, perhaps based on the trial’s findings, that
treatment is beneficial. Therefore, the effect of
assigned treatment estimated by an ITT analysis
may under- or overestimate the effectiveness of the
treatment.
Second, the effect of assigned treatment Z is
inadequate for patients who are interested in initi-
ating and fully adhering to a treatment A that has
been shown to be efficacious in previous RCTs. In
order to make the best informed decision, these
patients would like to know the effect of treatment A
rather than an effect of assigned treatment Z, which
is contaminated by other patients’ nonadherence
[5]. For example, to decide whether to use certain
contraception method, a couple may want to know
the failure rate if they use the method as indicated,
rather than the failure rate in a population that
included a substantial proportion of nonadherers.
Therefore, the effect of assigned treatment Z may be
an insufficient summary measure of the trial data,
even if it actually measures the treatment’s
effectiveness.
In summary, the effect of assigned treatment Z –
estimated via an ITT comparison – may not be a valid
measure of the effectiveness of treatment A in
clinical practice. And even if it were, effectiveness
is not always the most interesting effect measure.
These considerations, together with the inappropri-
ateness of ITT comparisons for safety and noninfer-
iority trials, make it necessary to expand the
reporting of results from RCTs beyond ITT analyses.
The next section reviews other analytic approaches
for data from RCTs.
Conventional ‘as treated’ and ‘per
protocol’ analyses
Two common attempts to estimate the effect of
treatment A are ‘as treated’ and ‘per protocol’ com-
parisons. Neither is generally valid.
An ‘as treated’ analysis classifies RCT participants
according to the treatment that they took (either
A ¼ 1 or A ¼ 0) rather than according to the treat-
ment that they were assigned to (either Z ¼ 1 or
Z ¼ 0). Then an ‘as treated’ analysis compares the risk
(or the mean) of the outcome Y among those who
took treatment (A ¼ 1) with that among those who
did not take treatment (A ¼ 0), regardless of their
treatment assignment Z. That is, an ‘as treated’
comparison ignores that the data come from an
Beyond the intention-to-treat 51
http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
RCT and rather treats them as coming from an
observational study. As a result, an ‘as treated’
comparison will be confounded if the reasons that
moved participants to take treatment were associ-
ated with prognostic factors. The causal diagram in
Figure 1 represents the confounding as a noncausal
association between A and Y when there exist
prognostic factors L that also affect the decision to
take treatment A (U is an unmeasured common cause
of L and Y). Confounding arises in an ‘as treated’
analysis when not all prognostic factors L are appro-
priately measured and adjusted for.
A ‘per protocol’ analysis – also referred to as an ‘on
treatment’ analysis – only includes individuals who
adhered to the clinical trial instructions as specified
in the study protocol. The subset of trial participants
included in a ‘per protocol’ analysis, referred to as
the per protocol population, includes only partici-
pants with A equal to Z: those who were assigned to
treatment (Z ¼ 1) and took it (A ¼ 1), and those who
were not assigned to treatment (Z ¼ 0) and did not
take it (A ¼ 0). A ‘per protocol’ analysis compares the
risk (or the mean) of the outcome Y among those
who were assigned to treatment (Z ¼ 1) with that
among those who were not assigned to treatment
(Z ¼ 0) in the per protocol population. That is, a ‘per
protocol’ analysis is an ITT analysis in the per
protocol population. This contrast will be affected
by selection bias [10] if the reasons that moved
participants to adhere to their assigned treatment
were associated with prognostic factors L. The causal
diagram in Figure 2 includes S as an indicator of
selection into the ‘per protocol’ population. The
selection indicator S is fully determined by the values
of Z and A, that is, S ¼ 1 when A ¼ Z, and S ¼ 0
otherwise. The selection bias is a noncausal associ-
ation between Z and Y that arises when the analysis
is restricted to the ‘per protocol’ population (S ¼ 1)
and not all prognostic factors L are appropriately
measured and adjusted for.
As an example of biased ‘as treated’ and ‘per
protocol’ estimates of the effect of treatment A,
consider the following scenario.
Scenario 3
An RCT assigns men to either colonoscopy (Z ¼ 1) or
no colonoscopy (Z ¼ 0). Suppose that undergoing a
colonoscopy (A ¼ 1) does not affect the 10-year risk
of death from colon cancer (Y) compared with not
undergoing a colonoscopy (A ¼ 0), that is, the effect
of treatment A is null. Further suppose that, among
men assigned to Z ¼ 1, those with family history of
colon cancer (L ¼ 1) are more likely to adhere to their
assigned treatment and undergo the colonoscopy
(A ¼ 1).
Even though A has a null effect, an ‘as treated’
analysis will find that men undergoing colonoscopy
(A ¼ 1) are more likely to die from colon cancer
because they include a greater proportion of men
with a predisposition to colon cancer than the others
(A ¼ 0). This is the situation depicted in Figure 1.
Similarly, a ‘per protocol’ analysis will find a greater
risk of death from colon cancer in the group Z ¼ 1
than in the group Z ¼ 0 because the per protocol
restriction A ¼ Z overloads the group assigned to
colonoscopy with men with a family history of colon
cancer. This is the situation depicted in Figure 2.
The confounding bias in the ‘as treated’ analysis
and the selection bias in the ‘per protocol’ analysis
can go in either direction – for example, suppose
that L represents healthy diet rather than family
history of colon cancer. In general, the direction of
the bias is hard to predict because it is possible that
the proportions of people with a family history,
healthy diet, and any other prognostic factor will
vary between the groups A ¼ 1 and A ¼ 0 condi-
tional on Z.
In summary, ‘as treated’ and ‘per protocol’ anal-
yses transform RCTs into observational studies for all
practical purposes. The estimates from these analyses
Z L
U
A Y
Figure 1. Simplified causal diagram for a randomized clinical
trial with assigned treatment Z, received treatment A, and
outcome Y. U represents the unmeasured common causes of A
and Y. An ‘as treated’ analysis of the A-Y association will be
confounded unless all prognostic factors L are adjusted for.
Z L
U
S
A Y
Figure 2. Simplified causal diagram for a randomized clinical
trial with assigned treatment Z, received treatment A, and
outcome Y. U represents the unmeasured common causes of A
and Y, and S an indicator for selection into the ‘per protocol’
population. The Z-Y association in the ‘per protocol’ population
(a restriction represented by the box around S) will be affected
by selection bias unless all prognostic factors L are adjusted for.
52 MA Herna´n and S Herna´ndez-Dı´az
Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
can only be interpreted as the effect of treatment A if
the analysis is appropriately adjusted for the con-
founders L. If the intended analysis of the RCT is ‘as
treated’ or ‘per protocol,’ then the protocol of the
trial should describe the potential confounders and
how they will be measured, just like the protocol of
an observational study would do.
More general ‘as treated’ and ‘per
protocol’ analyses to estimate the
effect of treatment
So far we have made the simplifying assumption that
adherence is all or nothing. But in reality, RCT
participants may adhere to their assigned treatment
intermittently. For example, they may take their
assigned treatment for 2 months, discontinue it for
the next 3 months, and then resume it until the end
of the study. Or subjects may take treatment con-
stantly but at a lower dose than assigned. For
example, they may take only one pill per day when
they should take two. Treatment A is generally a
time-varying variable – each day you may take it or
not take it – rather than a time-fixed variable – you
either always take it or never take it during the
follow-up.
An ‘as treated’ analysis with a time-varying treat-
ment A usually involves some sort of dose-response
model. A ‘per protocol’ analysis with a time-varying
treatment A includes all RCT participants but censors
them if/when they deviate from their assigned
treatment. The censoring usually occurs at a fixed
time after nonadherence, say, 6 months. The per
protocol population in this variation refers to the
adherent person-time rather than to the adherent
persons.
Because previous sections were only concerned
with introducing some basic problems of ITT, ‘as
treated’ and ‘per protocol’ analyses, we considered A
as a time-fixed variable. However, this simplification
may be unrealistic and misleading in practice.
When treatment A is truly time-varying (i) the
effect of treatment needs to be redefined and (ii)
appropriate adjustment for the measured confoun-
ders L cannot generally be achieved by using con-
ventional methods such as stratification, regression,
or matching.
The definition of the average causal effect of a
time-fixed treatment involves the contrast between
two clinical regimes. For example, we defined the
causal effect of a time-fixed treatment as a contrast
between the average outcome that would be
observed if all participants took treatment A ¼ 1
versus treatment A ¼ 0. The two regimes are ‘‘taking
treatment A ¼ 1’’ and ‘‘taking treatment A ¼ 0’’. The
definition of the causal effect of a time-varying
treatment also involves a contrast between two
clinical regimes. For example, we can define the
causal effect of a time-varying treatment as a contrast
between the average outcome that would be
observed if all participants had continuous treat-
ment with A ¼ 1 versus continuous treatment with
A ¼ 0 during the entire follow-up. We sometimes
refer to this causal effect as the effect of continuous
treatment.
When the treatment is time-varying, so are the
confounders. For example, the probability of taking
antiretroviral therapy increases in the presence of
symptoms of HIV disease. Both therapy and con-
founders evolve together during the follow-up.
When the time-varying confounders are affected by
previous treatment – for example, antiretroviral
therapy use reduces the frequency of symptoms –
conventional methods cannot appropriately adjust
for the measured confounders [10]. Rather, inverse
probability (IP) weighting or g-estimation are gener-
ally needed for confounding adjustment in ‘as
treated’ and ‘per protocol’ analyses involving time-
varying treatments [11–13].
Both IP weighting and g-estimation require that
time-varying confounders and time-varying treat-
ments are measured during the entire follow-up.
Thus, if planning to use these adjustment methods,
the protocol of the trial should describe the potential
confounders and how they will be measured.
Unfortunately, like in any observational study,
there is no guarantee that all confounders will be
identified and correctly measured, which may result
in biased estimates of the effect of continuous
treatment in ‘as treated’ and ‘per protocol’ analyses
involving time-varying treatments.
An alternative adjustment method is instrumen-
tal variable (IV) estimation, a particular form of
g-estimation that does not require measurement of
any confounders [14–17]. In double-blind RCTs, IV
estimation eliminates confounding for the effect of
continuous treatment A by exploiting the fact that
the initial treatment assignment Z was random.
Thus, if the time-varying treatment A is measured
and a correctly specified structural model used, IV
estimation adjusts for confounding without measur-
ing, or even knowing, the confounders.
A detailed description of IP weighting,
g-estimation, and IV estimation is beyond the
scope of this paper. Toh and Herna´n review
these methods for RCTs [18]. IP weighting and
g-estimation can also be used to estimate the
effect of treatment regimes that may be more clin-
ically relevant than the effect of continuous treat-
ment [19,20]. For example, it may be more
interesting to estimate the effect of treatment taken
continuously unless toxic effects or counterindica-
tions arise.
Beyond the intention-to-treat 53
http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
Discussion
An ITT analysis of RCTs is appealing for the same
reason it may be appalling: simplicity. As described
above, ITT estimates may be inadequate for the
assessment of comparative effectiveness or safety.
In the presence of nonadherence, the ITT effect is a
biased estimate of treatment’s effects such as the
effect of continuous treatment. This bias can be
corrected in an appropriately adjusted ‘as treated’
analysis via IP weighting, g-estimation, or IV estima-
tion. However, IP weighting and g-estimation
require untestable assumptions similar to those
made for causal inference from observational data.
IV estimation generally requires a dose-response
model and its validity is questionable for nonblinded
RCTs.
The ITT approach is also problematic if a large
proportion of participants drop out or are otherwise
lost to follow-up, or if the outcomes are incompletely
ascertained among those completing the study. In
these studies, an ITT comparison cannot be con-
ducted because the value of the outcome is missing
for some individuals. To circumvent this problem,
the ITT analysis is often replaced by a pseudo-ITT
analysis that is restricted to subjects with complete
data or in which the last observation is carried
forward. These pseudo-ITT analyses may be affected
by selection bias in either direction. Adjusting for
this bias is possible via IP weighting if information
on the time-varying determinants of loss to follow-
up is available, but again, the validity of the adjust-
ment relies on untestable assumptions about the
unmeasured variables [18].
RCTs with long follow-up periods, as expected in
many comparative effectiveness research settings,
are especially susceptible to bias due to nonadher-
ence and loss to follow-up. As these problems accu-
mulate over time, the RCT starts to resemble a
prospective observational study, and the ITT analysis
yields an increasingly biased estimate of the effect of
continuous treatment. Consider, for example, a
Women’s Health Initiative randomized trial that
assigned postmenopausal women to either estrogen
plus progestin hormone therapy or placebo [21].
About 40% of women had stopped taking at least
80% of their assigned treatment by the 6th year of
follow-up. The ITT hazard ratio of breast cancer was
1.25 (95% CI: 1.01, 1.54) for hormone therapy versus
placebo. The IP weighted hazard ratio of breast
cancer was 1.68 (1.24, 2.28) for 8 years of continuous
hormone therapy versus no hormone therapy [22].
These findings suggest that the effect of continuous
treatment was more than twofold greater than the
effect of assigned treatment. Of course, neither of
these estimates reflects the long-term effect of hor-
mone therapy in clinical practice (e.g., the adherence
to hormone therapy was much higher in the trial
than in the real world).
When analyzing data from RCTs, the question is
not whether assumptions are made but rather which
assumptions are made. In an RCT with incomplete
follow-up or outcome ascertainment, a pseudo-ITT
analysis assumes that the loss to follow-up occurs
completely at random whereas an IP weighted ITT
analysis makes less strong assumptions (e.g., loss to
follow-up occurs at random conditional on the
measured covariates). In an RCT with incomplete
adherence, an ITT analysis shifts the burden of
assessing the actual magnitude of the effect from
the data analysts to the clinicians and other decision
makers, who will need to make assumptions about
the potential bias introduced by lack of adherence.
Supplementing the ITT effects with ‘as treated’ or
‘per protocol’ effects can help decision makers [23],
but only if a reasonable attempt is made to appro-
priately adjust for confounding and selection bias.
In summary, we recommend that all RCTs with
substantial lack of adherence or loss to follow-up
be analyzed using different methods, including an
ITT analysis to estimate the effect of assigned treat-
ment, and appropriately adjusted ‘per protocol’ and
‘as treated’ analyses (i.e., via IP weighting or g-
estimation) to estimate the effect of received treat-
ment. Each approach has relative advantages and
disadvantages, and depends on a different combi-
nation of assumptions [18]. To implement this
recommendation, RCT protocols should include a
more sophisticated statistical analysis plan, as well as
plans to measure adherence and other postrando-
mization variables. This added complexity is neces-
sary to take full advantage of the substantial societal
resources that are invested in RCTs.
Acknowledgement
We thank Goodarz Danaei for his comments to an
earlier version of this manuscript.
Funding
This study was funded by National Institutes of
Health grants R01 HL080644-01 and R01 HD056940.
References
1. Luce BR, Kramer JM, Goodman SN, et al. Rethinking
randomized clinical trials for comparative effectiveness
research: the need for transformational change. Ann Intern
Med 2009; 151: 206–09.
2. Food and Drug Administration. International
Conference on Harmonisation; Guidance on Statistical
54 MA Herna´n and S Herna´ndez-Dı´az
Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
Principles for Clinical Trials. Federal Register 1998; 63:
49583–98.
3. Rosenberger WF, Lachin JM. Randomization in Clinical
Trials: Theory and Practice. Wiley-Interscience, New York,
NY, 2002.
4. Piantadosi S. Clinical Trials: A Methodologic Perspective
(2nd edn). Wiley-Interscience, Hoboken, NJ, 2005.
5. Sheiner LB, Rubin DB. Intention-to-treat analysis and the
goals of clinical trials. Clin Pharmacol Ther 1995; 57: 6–15.
6. Psaty BM, Prentice RL. Minimizing bias in randomized
trials: the importance of blinding. JAMA 2010; 304:
793–94.
7. McMahon AD. Study control, violators, inclusion criteria
and defining explanatory and pragmatic trials. Stat Med
2002; 21: 1365–76.
8. Schwartz D, Lellouch J. Explanatory and pragmatic
attitudes in therapeutical trials. J Chronic Dis 1967; 20:
637–48.
9. Tunis SR, Stryer DB, Clancy CM. Practical clinical trials:
increasing the value of clinical research for decision
making in clinical and health policy. JAMA 2003; 290:
1624–32.
10. Herna´n MA, Herna´ndez-Dı´az S, Robins JM. A structural
approach to selection bias. Epidemiology 2004; 15: 615–25.
11. Robins JM. Correcting for non-compliance in randomized
trials using structural nested mean models. Communin Stat
1994; 23: 2379–412.
12. Robins JM. Correction for non-compliance in equivalence
trials. Stat Med 1998; 17: 269–302.
13. Robins JM, Finkelstein D. Correcting for non-
compliance and dependent censoring in an AIDS clinical
trial with inverse probability of censoring weighted
(IPCW) Log-rank tests. Biometrics 2000; 56: 779–88.
14. Herna´n MA, Robins JM. Instruments for causal inference:
an epidemiologist’s dream? Epidemiology 2006; 17:
360–72.
15. Ten Have TR, Normand SL, Marcus SM, et al. Intent-to-
treat vs. non-intent-to-treat analyses under treatment
non-adherence in mental health randomized trials.
Psychiatr Ann 2008; 38: 772–83.
16. Cole SR, Chu H. Effect of acyclovir on herpetic ocular
recurrence using a structural nested model. Contemp Clin
Trials 2005; 26: 300–10.
17. Mark SD, Robins JM. A method for the analysis of
randomized trials with compliance information: an appli-
cation to the Multiple Risk Factor Intervention Trial. Contr
Clin Trials 1993; 14: 79–97.
18. Toh S, Herna´n MA. Causal Inference from longitudinal
studies with baseline randomization. Int J Biostat 2008; 4:
Article 22.
19. Herna´n MA, Lanoy E, Costagliola D, Robins JM.
Comparison of dynamic treatment regimes via inverse
probability weighting. Basic Clin Pharmacol Toxicol 2006;
98: 237–42.
20. Cain LE, Robins JM, Lanoy E, et al. When to start
treatment? a systematic approach to the comparison of
dynamic regimes using observational data. Int J Biostat
2006; 6: Article 18.
21. Writing group for the Women’s Health Initiative
Investigators. Risks and benefits of estrogen plus proges-
tin in healthy postmenopausal women: principal results
from the women’s health initiative randomized controlled
trial. JAMA 2002; 288: 321–33.
22. Toh S, Herna´ndez-Dı´az S, Logan R, et al. Estimating
absolute risks in the presence of nonadherence: an appli-
cation to a follow-up study with baseline randomization.
Epidemiology 2010; 21: 528–39.
23. Thorpe KE, Zwarenstein M, Oxman AD, et al. A prag-
matic-explanatory continuum indicator summary
(PRECIS): a tool to help trial designers. J Clin Epidemiol
2009; 62: 464–75.
Beyond the intention-to-treat 55
http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
VIEWPOINT
Prespecified Falsification End Points
Can They Validate True Observational Associations?
Vinay Prasad, MD
Anupam B. Jena, MD, PhD
A
S OBSERVATIONAL STUDIES HAVE INCREASED IN NUM-
ber—fueled by a boom in electronic recordkeep-
ing and the ease with which observational analy-
ses of large databases can be performed—so too
have failures to confirm initial research findings.1
Several
solutions to the problem of incorrect observational results
have been suggested,1,2
emphasizing the importance of a rec-
ord not only of significant findings but of all analyses con-
ducted.2
An important and increasingly familiar type of observa-
tional study is the identification of rare adverse effects (de-
fined by organizations such as the Council for Interna-
tional Organizations and Medical Sciences as occurring
among fewer than 1 per 1000 individuals) from population
data. Examples of these studies include whether macrolide
antibiotics such as azithromycin are associated with higher
rates of sudden cardiac death3
; whether proton pump in-
hibitors (PPIs) are associated with higher rates of pneumo-
nia4
; or whether bisphosphonates are associated with an in-
creased risk of atypical (subtrochanteric) femur fractures.5
Rare adverse events, such as these examples, occur so in-
frequently that almost by definition they may not be iden-
tified in randomized controlled trials (RCTs). Postmarket-
ing data from thousands of patients are required to identify
such low-frequency events. In fact, the ability to conduct
postmarketing surveillance of large databases has been her-
alded as a vital step in ensuring the safe dissemination of
medical treatments after clinical trials (phase 4) for pre-
cisely this reason.
Few dispute the importance of observational studies for
capturing rare adverse events. For instance, in early stud-
ies of whether bisphosphonate use increases the rate of atypi-
cal femur fractures, pooled analysis of RCTs demonstrated
no elevated risk.6
However, these data were based on a lim-
ited sample of 14 000 patients with only 284 hip or femur
fractures and only 12 atypical fracture events over just more
than 3.5 years of follow-up. In contrast, later observational
studies addressing the same question were able to leverage
much larger and more comprehensive data. One analysis that
examined 205 466 women who took bisphosphonates for
an average of 4 years identified more than 10 000 hip or fe-
mur fractures and 716 atypical fractures.5
This analysis dem-
onstrated an increased risk of atypical fractures associated
with bisphosphonate use and was validated by another large
population-based study.
However, analyses in large data sets are not necessarily
correct simply because they are larger. Control groups might
not eliminate potential confounders, or many varying defi-
nitions of exposure to the agent may be tested (alternative
thresholds for dose or duration of a drug)—a form of mul-
tiple-hypothesis testing.2
Just as small, true signals can be
identified by these analyses, so too can small, erroneous as-
sociations. For instance, several observational studies have
found an association between use of PPIs and development
of pneumonia, and it is biologically plausible that elevated
gastric pH may engender bacterial colonization.4
However,
it is also possible that even after statistical adjustment for
known comorbid conditions, PPI users may have other un-
observed health characteristics (such as poor health lit-
eracy or adherence) that could increase their rates of pneu-
monia, apart from use of the drug. Alternatively, physicians
who are more likely to prescribe PPIs to their patients also
may be more likely to diagnose their patients with pneu-
monia in the appropriate clinical setting. Both mecha-
nisms would suggest that the observational association be-
tween PPI use and pneumonia is confounded. In light of the
increasing prevalence of such studies and their importance
in shaping clinical decisions, it is important to know that
the associations identified are true rather than spurious cor-
relations. Prespecified falsification hypotheses may pro-
vide an intuitive and useful safeguard when observational
data are used to find rare harms.
A falsification hypothesis is a claim, distinct from the one
being tested, that researchers believe is highly unlikely to
be causally related to the intervention in question.7
For in-
stance, a falsification hypothesis may be that PPI use in-
creases the rate of soft tissue infection or myocardial infarc-
tion. A confirmed falsification test—in this case, a positive
association between PPI use and risks of these conditions—
Author Affiliations: Medical Oncology Branch, National Cancer Institute, Na-
tional Institutes of Health, Bethesda, Maryland (Dr Prasad); Department of Health
Care Policy, Harvard Medical School, and Massachusetts General Hospital, Bos-
ton (Dr Jena); and National Bureau of Economic Research, Cambridge, Massa-
chusetts (Dr Jena).
Corresponding Author: Anupam B. Jena, MD, PhD, Department of Health Care
Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (jena
@hcp.med.harvard.edu).
©2013 American Medical Association. All rights reserved. JAMA, January 16, 2013—Vol 309, No. 3 241
Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
would suggest that an association between PPI use and pneu-
monia initially suspected to be causal is perhaps con-
founded by unobserved patient or physician characteristics.
Ideally, several prespecified false hypotheses can be tested
and, if found not to exist, can support the main study as-
sociation of interest. In the case of PPIs, falsification analy-
ses have shown that many improbable conditions—chest
pain, urinary tract infections, osteoarthritis, rheumatoid ar-
thritis flares, and deep venous thrombosis—are also linked
to PPI use,4
making the claim of an increased risk of pneu-
monia related to use of the drug unlikely.
Another example of falsification analysis applied to ob-
servational associations involves the reported relationship
of social networks with the spread of complex phenomena
such as smoking, obesity, and depression. In social net-
work studies, persons with social ties are shown to be more
likely to gain or lose weight, or to start or stop smoking, at
similar time points than 2 random persons in the same group.
Several studies supported these claims; however, other stud-
ies have shown that even implausible factors—acne, height,
and headaches—may also exhibit “network effects.”8
Falsification analysis can be operationalized by asking in-
vestigators to specify implausible hypotheses up front and
then testing those claims using statistical methods similar
to those used in the primary analysis. Falsification could be
required both for studies that aim to show a rare harm of a
particular medical intervention as well as for studies that
aim to show deleterious interactions between medications.
For instance, in evaluating whether concomitant use of clopi-
dogrel and PPIs is associated with decreased effectiveness
of the former drug and worsens cardiovascular outcomes,
does the use of PPIs also implausibly diminish the effect of
antihypertensive agents or metformin?
Prespecifying falsification end points and choosing them
appropriately is important for avoiding the problem of mul-
tiple hypothesis testing. For instance, if many falsification
hypotheses are tested to support a particular observational
association, a few falsification outcomes will pass the falsi-
fication test—ie, will not be associated with the drug or in-
tervention of interest—whereas other falsification tests may
fail. If the former are selectively reported, some associa-
tions may be mistakenly validated. This issue cannot be ad-
dressed by statistical testing for multiple hypotheses alone
because selective reporting may still occur. Instead, pre-
specifying falsification outcomes and choosing outcomes that
are common may mitigate concerns about post hoc data min-
ing. In the case of PPIs and risk of pneumonia, falsification
analyses used prevalent ambulatory complaints such chest
pain, urinary tract infections, and osteoarthritis.4
Observational studies of rare effects of a drug may be fur-
ther validated by verification analyses that demonstrate the
presence of known adverse effects of a drug in the data set
being studied. For instance, an observational study suggest-
ing an unknown adverse effect of clopidogrel (for ex-
ample, seizures) should also be able to demonstrate the pres-
ence of known adverse effects such as gastrointestinal
hemorrhage associated with clopidogrel use. The inability
of a study to verify known adverse effects should raise ques-
tions about selection in the study population.
Although no published recommendations exist, standard-
ized falsification analyses with 3 to 4 prespecified or highly
prevalent disease outcomes may help to strengthen the va-
lidity of observational studies, as could inclusion of verifi-
cation analyses. Information on whether falsification and
validation end points were used in a study should be in-
cluded in a registry for observational studies that others have
suggested.2
Prespecified falsification hypotheses can improve the va-
lidity of studies finding rare harms when researchers can-
not determine answers to these questions from RCTs, either
because of limited sample sizes or limited follow-up. How-
ever, falsification analysis is not a perfect tool for validat-
ing the associations in observational studies, nor is it in-
tended to be. The absence of implausible falsification
hypotheses does not imply that the primary association of
interest is causal, nor does their presence guarantee that real
relations do not exist. However, when many false relation-
ships are present, caution is warranted in the interpreta-
tion of study findings.
Conflict of Interest Disclosures: The authors have completed and submitted the
ICMJE Form for Disclosure of Potential Conflicts of Interest and none were re-
ported.
REFERENCES
1. Thomas L, Peterson ED. The value of statistical analysis plans in observational
research: defining high-quality research from the start. JAMA. 2012;308(8):
773-774.
2. Ioannidis JP. The importance of potential studies that have not existed and reg-
istration of observational data sets. JAMA. 2012;308(6):575-576.
3. Ray WA, Murray KT, Hall K, Arbogast PG, Stein CM. Azithromycin and the
risk of cardiovascular death. N Engl J Med. 2012;366(20):1881-1890.
4. Jena AB, Sun E, Goldman DP. Confounding in the association of proton pump
inhibitor use with risk of community-acquired pneumonia [published online Sep-
tember 7, 2012]. J Gen Intern Med. doi:10.1007/s11606-012-2211-5.
5. Park-Wyllie LY, Mamdani MM, Juurlink DN, et al. Bisphosphonate use and the
risk of subtrochanteric or femoral shaft fractures in older women. JAMA. 2011;
305(8):783-789.
6. Black DM, Kelly MP, Genant HK, et al; Fracture Intervention Trial Steering
Committee; HORIZON Pivotal Fracture Trial Steering Committee. Bisphospho-
nates and fractures of the subtrochanteric or diaphyseal femur. N Engl J Med. 2010;
362(19):1761-1771.
7. Bertrand M, Duflo E, Mullainathan S. How much should we trust differences-
in-differences estimates? Q J Econ. 2004;119:249-275.
8. Cohen-Cole E, Fletcher JM. Detecting implausible social network effects in acne,
height, and headaches: longitudinal analysis. BMJ. 2008;337:a2533.
VIEWPOINT
242 JAMA, January 16, 2013—Vol 309, No. 3 ©2013 American Medical Association. All rights reserved.
Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
PERSPECTIVE
Orthogonal predictions: follow-up questions for suggestive datay
Alexander M. Walker MD, DrPH1,2*
1
World Health Information Science Consultants, LLC, Newton, MA 02466, USA
2
Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA
SUMMARY
When a biological hypothesis of causal effect can be inferred, the hypothesis can sometimes be tested in the selfsame database that gave rise to
the study data from which the hypothesis grew. Valid testing happens when the inferred biological hypothesis has scientific implications that
predict new relations between observations already recorded. Testing for the existence of the new relations is a valid assessment of the
biological hypothesis, so long as the newly predicted relations are not a logical correlate of the observations that stimulated the hypothesis in
the first place. These predictions that lead to valid tests might be called ‘orthogonal’ predictions in the data, and stand in marked contrast to
‘scrawny’ hypotheses with no biological content, which predict simply that the same data relations will be seen in a new database. The
Universal Data Warehouse will shortly render moot searches for new databases in which to test. Copyright # 2010 John Wiley & Sons, Ltd.
key words — databases; hypothesis testing; induction; inference
Received 2 October 2009; Accepted 13 January 2010
INTRODUCTION
In 2000, the Food and Drug Administration’s (FDA)
Manette Niu and her colleagues had found something
that might have been predicted by medicine, but not by
statistics.1
They were looking for infants who had
gotten into trouble after a dose of Wyeth’s RotaShield
vaccine in the Centers for Disease Control’s Vaccine
Adverse Event Reporting System.
A vaccine against rotavirus infection in infants,
RotaShield was already off the market.2
In the United
States, rotavirus causes diarrhea so severe that it can
lead to hospitalization. By contrast, the infection is
deadly in poor countries. The 1999 withdrawal had
arguably cost hundreds of thousands of lives of
children whose death from rotavirus-induced diarrhea
could have been avoided through widespread vaccina-
tion with RotaShield.3,4
The enormity of the con-
sequences of the withdrawal made it important that the
decision had been based at least in sound biology.
Wyeth suspended sales of RotaShield because the
vaccine appeared to cause intussusception, an infant
bowel disorder in which a portion of the colon slips
inside of itself. The range of manifestations of
intussusception varies enormously. It can resolve on
its own, with little more by way of signs than the baby’s
fussiness from abdominal pain. Sometimes tissue
damage causes bloody diarrhea. Sometimes the bowel
infarcts and must be removed, or the baby dies.
Dr Niu had used a powerful data-mining tool, Bill
DuMouchel’s Multi-Item Gamma Poisson Shrinker to
sift through the Vaccine Adverse Event Reporting
System (VAERS) data, and she found that intussuscep-
tion was not alone in its association with RotaShield.5
So too were gastrointestinal hemorrhage, intestinal
obstruction, gastroenteritis, and abdominal pain.
My argument here is that those correlations
represented independent tests of the biological
hypothesis that had already killed the vaccine. The
observations were sufficient to discriminate hypotheses
of biological causation from those of chance, though
competing (and testable) hypotheses of artifact may
have remained.
The biologic hypothesis was that if RotaShield had
caused intussusception, it was likely to have caused
pharmacoepidemiology and drug safety 2010; 19: 529–532
Published online 22 March 2010 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/pds.1929
* Correspondence to: A. M. Walker, World Health Information Science
Consultants, LLC, 275 Grove St., Suite 2-400, Newton, MA 02466, USA.
E-mail: Alec.Walker@WHISCON.com
y
The author declared no conflict of interest.
Copyright # 2010 John Wiley & Sons, Ltd.
cases that did not present as fully recognized instances
of the disease, but which nonetheless represented the
same pathology. Looking for these other conditions
was a test of the biologic hypothesis raised by the
occurrence of the severest cases. Like the original
observations, the test data resided in VAERS, but were
nonetheless independent, in that different physicians in
different places, acting more or less concurrently,
reported them about different patients.
INDUCTION AND TESTING
The key step in Niu’s activity was induction of a
biological hypothesis of cause from an observation of
association. Testing the biological hypothesis differs
fundamentally from testing the data-restatement that
‘There is an association between RotaShield and
intussusception.’ The latter, by itself a scrawny
hypothesis if you could call it a hypothesis at all,
might be examined in other environments, though
probably not in real time, since VAERS is a national
system and RotaShield had been marketed only in the
United States. Scrawny hypotheses have no meat to
them, that is they do no more than predict more of the
same, and even then only when the circumstances of
observation are identical. The biological hypothesis, by
contrast, was immediately testable through its implica-
tions in VAERS, and could produce a host of other
empiric tests.
From the perspective of the Wyeth, the FDA and the
Centers for Disease Control and Prevention (CDC), the
parties who had to act in that summer of crisis, only
biologic causation really mattered.
Biologic causation was not the only theory that
predicted reports of multiple related diseases in
association with RotaShield. Most of the reports came
in after the CDC had announced the association and
Wyeth had suspended distribution of RotaTeq. Phys-
icians who did not know one another might have been
similarly sensitized to the idea that symptom com-
plexes compatible with intussusception should be
reported. Stimulated reporting is therefore another
theory that competes with biological causation to
account for the findings.
For the present discussion, the key point is not how
well the competing hypotheses (biological causation,
stimulated reporting, and chance) explain the newly
found data. The key is whether one can rationally look
at the non-intussusception diagnoses in VAERS to test
theories about the RotaShield-intussusception associ-
ation, and whether such looks ‘into the same data’ are
logically suspect.
Trudy Murphy and collaborators offered another
example of testing implications of the biological
hypothesis of causation in a subsequent case-control
study of RotaShield and intussusception.6
Looking at
any prior receipt of RotaShield, they found an
adjusted odds ratio of 2.2 (95% CI 1.5–3.3). Murphy’s
data also provided a test of the theory of biological
causation, no form of which would predict a uniform
distribution of cases over time after vaccination.
Indeed there were pronounced aggregations of cases
3–7 days following first and second immunization.
Interestingly, a theory of stimulated reporting would
not have produced time clustering, at least not without
secondary theories added on top, and so the Murphy
data weighed against the leading non-biologic theory
for the Niu observations.
ORTHOGONAL PREDICTIONS
Niu’s and Murphy’s findings share a common element.
In neither case did the original observation (case
reports of intussusception for Niu, or an association
between RotaShield in and ever-immunization for
Murphy) imply the follow-up observations (other
diagnoses and time-clusters) as a matter of logic, on
the null hypothesis. That is, neither set of follow-up
observations was predicted by the corresponding
scrawny hypothesis, since neither was simply a
restatement of the initiating finding. In this sense, I
propose that we call the predictions that Niu and
Murphy tested ‘orthogonal’ to the original observation.
In the very high-dimensional space of medical
observations, the predicted data are not simply a
rotation of the original findings.
Where did the orthogonal predictions come from?
The investigators stepped out of the data and into the
physical world. We do not know about the world
directly, but we can have theories about how it works,
and we can test those theories against what we see.
Reasoning about the nature of the relations that gave
rise to observed data, we can look for opportunities to
test the theories. With discipline, we can restrict our
‘predictions’ to relations that are genuinely new, and
yet implied by our theories.
SHOCKED, SHOCKED
‘I’m shocked, shocked to find that gambling is going on
in here!’ says Captain Renault in Casablanca, just
before he discretely accepts his winnings and closes
down Rick’s Cafe´ Ame´ricain to appease his Nazi
minders. Advocates for finding new data sources to test
hypotheses might feel kinship with the captain. While
Copyright # 2010 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532
DOI: 10.1002/pds
530 a. m. walker
sincerely believing in the importance of independent
replication, they find that they too examine different
dimensions of outcomes in suggestive data to evaluate
important hypotheses, particularly those hypotheses
that would require immediate action if true. This is
already the core of regulatory epidemiology, which
concerns itself with the best decision on the available
data. The necessity to act sometimes plays havoc with
prescriptions that cannot be implemented quickly.
Exploration of data in hand is not limited to public
health epidemiologists, regulators among them. In fact
most epidemiologists check causal hypotheses in the
data that generated them. Whenever observational
researchers see an important effect, they worry (or
should do) whether they have missed some confound-
ing factor. Confounding is a causal alternative
hypothesis for an observed association, and the
hypothesis of confounding often has testable implica-
tions in the data at hand. Will the crude effect disappear
when we control for age? It would be hard to describe
the search for confounders as anything other than
testing alternative causal hypotheses in the data that
gave rise to them.
Far from public health, sciences in which there is
little opportunity for experiment, such as geology,
regularly test hypotheses in existing data. Ebel and
Grossman, for example, could ‘predict for the first
time’ (their words) events 65 million years ago, in a
headline-grabbing theory that explained a world-wide
layer of iridium at just the geological stratum that
coincided with the disappearance of the dinosaurs.7
There is nothing illegitimate in the exercise.
THE UNIVERSAL DATA WAREHOUSE
The question that motivated this Symposium, ‘One
Database or Two?’ was whether it is necessary to seek
out a new database to test theories derived from a
database at hand. Above I have argued that the issue is
not the separation of the databases, but rather the
independence of the test and hypothesis-generating
data. Clearly, two physically separate databases whose
information was independently derived by different
investigators working in different sources meet the
criterion of independence, but so do independently
derived domains of single databases. Fortunately, the
question may shortly be moot, because there will be in
the future only one database.
Let me explain
In 1993, Philip Cole, a professor at the University of
Alabama at Birmingham, provided a radical solution to
the repeated critique that epidemiologists were finding
unanticipated relations in data, and that the researchers
were presuming to make statements about hypotheses
that had not been specified in advance. In ‘The
Hypothesis Generating Machine’, Cole announced the
creation of the HGM, a machine that had integrated data
on every agent, every means of exposure and every time
relation together with every disease. From these, the
HGM had formed every possible hypothesis about every
possible relationship.8
Never again would a hypothesis
be denigrated for having been newly inferred from data.
In the same elegant paper, Cole also likened the idea that
studies generate hypotheses to the once widely held view
that piles of rags generate mouse pups. People generate
hypotheses; inanimate studies do not.
With acknowledgment to Cole, couldn’t we imagine
a Universal Data Warehouse consisting of all data ever
recorded? Some twists of relativity theory might even
get us to postulate that the UDW could contain all
future data as well.9
Henceforward, all tests of all
hypotheses would occur by necessity in the UDW,
whether or not the investigator was aware that his or her
data were simply a view into the warehouse.
Researchers would evermore test and measure the
impact of hypotheses in the data that suggested them.
The new procedure of resorting to the UDW will not
constitute a departure from current practice, and may
result in more efficient discussion.
The reluctance of statisticians and philosophers to
test a hypothesis in the data that generated it makes
rigorous sense. I think that our disagreement, if there
was one, on the enjoyable morning of our Symposium
was definitional rather than scientific. In an earlier
era, when dedicated, expensive collection was the
only source of data, the sensible analysis plan
extracted everything to be learned the first time
through.
Overwhelmed by information from public and private
data streams, researchers now select out the pieces that
seem right to answer the questions they pose. The
KEY POINTS
 When examination of complex data leads to a
biological hypothesis, that hypothesis may have
implications that are testable in the original data.
 The test data need to be independent of the
hypothesis-generating data.
 The ‘‘Universal Data Warehouse’’ reminds us of
the futility of substituting data location for data
independence.
Copyright # 2010 John Wiley  Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532
DOI: 10.1002/pds
orthogonal predictions 531
answersraisenewquestions,differentones,anditmakes
sense to pick out (from the same fire hose spurting facts)
new data that will help us make sense of what we think
we may have learned the first time through.
Recorded experience is the database through which
we observe, theorize, test, theorize, observe again, test
again, and so on for as long as we have stamina and
means. We certainly should have standards as to when
data test a theory, but the standard does not need to be
that the originating databases are different.
ACKNOWLEDGEMENTS
This paper owes much to many people, none of whom
should be held accountable for its shortcomings, as author
did not always agree with his friends’ good advice. Author
is indebted to his co-participants in the Symposium, Larry
Gould particularly, Patrick Ryan and Sebastian Schnee-
weiss, and the deft organizers, Susan Sacks and Nancy
Santanello for their valuable advice. He also thanks Phil
Cole, Ken Rothman and Paul Stang for their careful reading
and to-the-point commentary. There are no relevant finan-
cial considerations to disclose.
REFERENCES
1. Niu MT, Erwin DE, Braun MM. Data mining in the US Vaccine Adverse
Event Reporting System (VAERS): early detection of intussusception and
other events after rotavirus vaccination. Vaccine 2001; 19: 4627–4634.
2. Centers for Disease Control and Prevention (CDC). Suspension of
rotavirus vaccine after reports of intussusception—United States,
1999. MMWR Morb Mortal Wkly Rep 2004; 53(34): 786–789. Erratum
in: MMWR Morb Mortal Wkly Rep 2004; 53(37): 879.
3. World Health Organization. Report of the Meeting on Future Directions
for Rotavirus Vaccine Research in Developing Countries, Geneva, 9–11
February 2000, Geneva (Publication WHO/VB/00.23).
4. Linhares AC, Bresee JS. Rotavirus vaccines and vaccination in Latin
America. Rev Panam Salud Publica 2000; 8(5): 305–331.
5. DuMouchel W. Bayesian data mining in large frequency tables, with an
application to the FDA Spontaneous Reporting System. Am Stat 1999;
53: 177–190.
6. Murphy TV, Gargiullo PM, Massoudi MS et al. Intussusception among
infants given an oral rotavirus vaccine. N Engl J Med 2001; 344: 564–
572.
7. Ebel DS, Grossman L. Spinel-bearing spherules condensed from the
Chicxulub impact-vapor plume. Geology 2005; 33(4): 293–296.
8. Cole P. The hypothesis generating machine. Epidemiology 1993; 4(3):
271–273.
9. Rindler W. Essential Relativity (rev. 2nd edn). Springer Verlag: Berlin,
1977. See Section 2.4. ‘The Relativity of Simultaneity’ for a particularly
lucid presentation of this phenomenon. The warehouse does not of
course contain all future data, as we will restrict it to information
generated by and about humans.
Copyright # 2010 John Wiley  Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532
DOI: 10.1002/pds
532 a. m. walker
Special Issue Paper
Received 4 November 2011, Accepted 28 August 2012 Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/sim.5620
Empirical assessment of methods for
risk identification in healthcare
data: results from the experiments
of the Observational Medical
Outcomes Partnership‡
Patrick B. Ryan,a,b,c*† David Madigan,b,d Paul E. Stang,a,b J. Marc
Overhage,b,e Judith A. Racoosinb,f and Abraham G. Hartzemab,g§
Background: Expanded availability of observational healthcare data (both administrative claims and electronic
health records) has prompted the development of statistical methods for identifying adverse events associated
with medical products, but the operating characteristics of these methods when applied to the real-world data
are unknown.
Methods: We studied the performance of eight analytic methods for estimating of the strength of association-
relative risk (RR) and associated standard error of 53 drug–adverse event outcome pairs, both positive and
negative controls. The methods were applied to a network of ten observational healthcare databases, comprising
over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of
methods at RR thresholds achieving statistical significance of p  0.05 or p  0.001 and with absolute threshold
RR  1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC).
Results: Although no specific method demonstrated superior performance, the aggregate results provide
a benchmark and baseline expectation for risk identification method performance. At traditional levels of
statistical significance (RR  1, p  0.05), all methods have a false positive rate 18%, with positive predictive
value 38%. The best predictive model, high-dimensional propensity score, achieved an AUC D 0.77. At 50%
sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods
ranged from 9% to 33%.
Conclusions: Systematic processes for risk identification can provide useful information to supplement an
overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying
false positive associations. Copyright © 2012 John Wiley  Sons, Ltd.
Keywords: product surveillance, postmarketing; pharmacoepidemiology; epidemiologic methods; causality;
electronic health records; adverse drug reactions
1. Introduction
The U.S. Food and Drug Administration Amendments Act of 2007 required the establishment of an
‘active postmarket risk identification and analysis system’ with access to patient-level observational
data from 100 million lives by 2012 [1]. In this context, we define ‘risk identification’ as a systematic
aJohnson  Johnson Pharmaceutical Research and Development LLC, Titusville, NJ, U.S.A.
bObservational Medical Outcomes Partnership, Foundation for the National Institutes of Health, Bethesda, MD, U.S.A.
cUNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A.
dDepartment of Statistics, Columbia University, New York, NY, U.S.A.
eRegenstrief Institute and Indiana University School of Medicine, Indianapolis, IN, U.S.A.
fCenter for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, U.S.A.
gCollege of Pharmacy, University of Florida, Gainesville, FL, U.S.A.
*Correspondence to: Patrick B. Ryan, Johnson  Johnson 1125 Trenton-Harbourton Road PO Box 200 MS K304 Titusville,
NJ 08560, U.S.A.
†E-mail: ryan@omop.org
‡This article expresses the views of the authors and does not necessarily represent those of their affiliated organizations.
§At the time of this work, Dr. Hartzema was on sabbatical at the U.S. Food and Drug Administration.
Copyright © 2012 John Wiley  Sons, Ltd. Statist. Med. 2012
P. B. RYAN ET AL.
and reproducible process to efficiently generate evidence to support the characterization of the potential
effects of medical products. This system applied to a network of observational healthcare databases
would provide another source of evidence to complement existing safety information contributed by
preclinical data, clinical trials, spontaneous adverse event reports, registries, and pharmacoepidemiology
evaluation studies. When used in conjunction with evidence of the benefits of the product and alternative
treatments a more comprehensive understanding of the effects of medical products promises to inform
medical decision making. The practicing clinician has a critical role in both the generation of quality
data that can be used for these efforts and integration of the findings from safety assessments into routine
practice; both of which become increasingly important in the evolution of the electronic health record
and the creation of a ‘learning healthcare system’ [2].
The secondary use of observational healthcare databases (e.g., administrative claims and electronic
health records) has become the predominant resource in pharmacoepidemiology, health outcomes,
and health services research because it reflects ‘real-world’ experience. Unlike well-designed and
well-performed randomized clinical trials, the use of observational data requires special consideration
of potential biases that can distort the measurement of the true effect size. Researchers can choose
from a variety of analytic methods that attempt to control for these biases; however, the operating
characteristics of these methods and their potential utility within a risk identification system have not
been systematically studied.
The Observational Medical Outcomes Partnership (OMOP; http://omop.fnih.org) conducts
methodological research to support the development of a national risk identification and analysis system;
the details of which have been previously published [3]. The OMOP research plan consists of a series
of empirical assessments of the performance characteristics of a number of analysis methods conducted
across a network of observational data sources. This paper reports findings from a series of assessments
of risk identification methods to determine their ability to correctly identify ‘true’ drug–adverse event
outcome associations and drug–adverse outcome negative controls as ‘not associated’.
2. Methods
The OMOP established a network of ten data sources capturing the healthcare experience of 130 million
patients. The data network included administrative claims data (SDI Health, Humana Inc., and four
Thomson Reuters MarketScan® Research Databases reflecting commercial claims with and without
laboratory records, Medicare supplemental, and multistate Medicaid populations) and electronic health
records (Regenstrief Institute, Partners Healthcare System, GE Centricity, and Department of Veterans
Affairs Center for Medication Safety/Outcomes Research). Table I depicts the characteristics and popu-
lation sizes of each data source. The data sources in the OMOP were selected to reflect the diversity of
U.S. observational data [4]. This research program was approved or granted exemption by the Insti-
tutional Review Boards at each participating organizations. All of these datasets were transformed
to a common data model, where data about drug exposure and condition occurrence were structured
in a consistent fashion and defined using the same controlled terminologies, to facilitate subsequent
analysis [5].
A total of 13 different analytic methods were implemented during the OMOP experiment. Complete
descriptions, references, and source code for each method are available at http://omop.fnih.org/Methods
Library, of those eight report estimates of relative risk (RR) and its standard error. In this paper, we
examine these eight methods. Results for the remaining five methods are available upon request.
Each method had multiple parameter settings corresponding to various study design decisions,
including definition of time-at-risk, identification of outcomes based on first occurrence or all
occurrences of diagnosis codes, choice of comparator group, and specific confounding adjustment
strategy. The specific parameters for each method and the number of parameter combinations studied
for each method are shown in Table II.
The performance of the analytical methods was assessed on the basis of their ability to correctly
identify nine drug–outcome pairs that were classified as ‘positive controls’ and 44 drug–outcome pairs
classified as ‘negative controls’. Positive controls were true associations as determined by the listing
of the corresponding outcome as an adverse event in the drug product label along with prior published
observational database research suggesting an association; subsequently, these positive controls were
endorsed by expert panel consensus; negative controls lacked such evidence in their labeling and
published literature and were ruled out as having a positive association by the expert panel. Members of
the OMOP’s advisory boards and other participants [3] and literature references for the test cases [6]
Copyright © 2012 John Wiley  Sons, Ltd. Statist. Med. 2012
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System

Más contenido relacionado

La actualidad más candente

Critical appraisal of a journal article
Critical appraisal of a journal articleCritical appraisal of a journal article
Critical appraisal of a journal articlePreethi Selvaraj
 
Evidence Based Urology
Evidence Based UrologyEvidence Based Urology
Evidence Based Urologyfhammoud
 
Evidenced based dentistry - Dr Harshavardhan Patwal
Evidenced based dentistry - Dr Harshavardhan PatwalEvidenced based dentistry - Dr Harshavardhan Patwal
Evidenced based dentistry - Dr Harshavardhan PatwalDr Harshavardhan Patwal
 
Evidence Based Prosthodontics
Evidence Based ProsthodonticsEvidence Based Prosthodontics
Evidence Based ProsthodonticsTazeen Raees
 
Policy and research gap
Policy and research gapPolicy and research gap
Policy and research gapNayyar Kazmi
 
Clinical Research Methodology
Clinical  Research  MethodologyClinical  Research  Methodology
Clinical Research Methodologydrmomusa
 
# 5th lect clinical trial process
# 5th lect clinical trial process# 5th lect clinical trial process
# 5th lect clinical trial processDr. Eman M. Mortada
 
Evidence based medicine (frequently asked DNB theory question)
Evidence based medicine (frequently asked DNB theory question)Evidence based medicine (frequently asked DNB theory question)
Evidence based medicine (frequently asked DNB theory question)Raghavendra Babu
 
evidence based periodontics
 evidence based periodontics    evidence based periodontics
evidence based periodontics neeti shinde
 
Ebp Lab Sum 09 A (2)
Ebp Lab Sum 09 A (2)Ebp Lab Sum 09 A (2)
Ebp Lab Sum 09 A (2)CCCLibrary
 
Sonal evidence based orthodontics
Sonal evidence based orthodonticsSonal evidence based orthodontics
Sonal evidence based orthodonticsSahasrabudheSonal
 
Evidence based periodontology
Evidence based periodontologyEvidence based periodontology
Evidence based periodontologySumalatha Appam
 
Publication bias in service delivery research - Yen-Fu Chen
Publication bias in service delivery research - Yen-Fu ChenPublication bias in service delivery research - Yen-Fu Chen
Publication bias in service delivery research - Yen-Fu ChenNIHR CLAHRC West Midlands
 

La actualidad más candente (20)

Critical appraisal of a journal article
Critical appraisal of a journal articleCritical appraisal of a journal article
Critical appraisal of a journal article
 
Evidence Based Urology
Evidence Based UrologyEvidence Based Urology
Evidence Based Urology
 
Research methodology simplified
Research methodology simplifiedResearch methodology simplified
Research methodology simplified
 
Evidenced based dentistry - Dr Harshavardhan Patwal
Evidenced based dentistry - Dr Harshavardhan PatwalEvidenced based dentistry - Dr Harshavardhan Patwal
Evidenced based dentistry - Dr Harshavardhan Patwal
 
Dhiwahar ppt
Dhiwahar pptDhiwahar ppt
Dhiwahar ppt
 
Evidence Based Prosthodontics
Evidence Based ProsthodonticsEvidence Based Prosthodontics
Evidence Based Prosthodontics
 
Policy and research gap
Policy and research gapPolicy and research gap
Policy and research gap
 
Ebd/cosmetic dentistry courses
Ebd/cosmetic dentistry coursesEbd/cosmetic dentistry courses
Ebd/cosmetic dentistry courses
 
Clinical Research Methodology
Clinical  Research  MethodologyClinical  Research  Methodology
Clinical Research Methodology
 
# 5th lect clinical trial process
# 5th lect clinical trial process# 5th lect clinical trial process
# 5th lect clinical trial process
 
Evidence based dentistry
Evidence based dentistryEvidence based dentistry
Evidence based dentistry
 
Journal club 1
Journal club 1Journal club 1
Journal club 1
 
Evidence based medicine (frequently asked DNB theory question)
Evidence based medicine (frequently asked DNB theory question)Evidence based medicine (frequently asked DNB theory question)
Evidence based medicine (frequently asked DNB theory question)
 
Critical apprasial 2
Critical apprasial 2Critical apprasial 2
Critical apprasial 2
 
evidence based periodontics
 evidence based periodontics    evidence based periodontics
evidence based periodontics
 
Evaluation of scientific literature
Evaluation of scientific literatureEvaluation of scientific literature
Evaluation of scientific literature
 
Ebp Lab Sum 09 A (2)
Ebp Lab Sum 09 A (2)Ebp Lab Sum 09 A (2)
Ebp Lab Sum 09 A (2)
 
Sonal evidence based orthodontics
Sonal evidence based orthodonticsSonal evidence based orthodontics
Sonal evidence based orthodontics
 
Evidence based periodontology
Evidence based periodontologyEvidence based periodontology
Evidence based periodontology
 
Publication bias in service delivery research - Yen-Fu Chen
Publication bias in service delivery research - Yen-Fu ChenPublication bias in service delivery research - Yen-Fu Chen
Publication bias in service delivery research - Yen-Fu Chen
 

Similar a Observational Studies in a Learning Health System

systematic review and metaanalysis
systematic review and metaanalysis systematic review and metaanalysis
systematic review and metaanalysis DrSridevi NH
 
What Outcomes Matter in Cancer? A Literature Review
What Outcomes Matter in Cancer? A Literature ReviewWhat Outcomes Matter in Cancer? A Literature Review
What Outcomes Matter in Cancer? A Literature ReviewOffice of Health Economics
 
Newhouse arkansas 4-7-14(v2)
Newhouse arkansas 4-7-14(v2)Newhouse arkansas 4-7-14(v2)
Newhouse arkansas 4-7-14(v2)TRIuams
 
Research Gaps and Evidences in Perioperative nursing
Research Gaps and Evidences in Perioperative nursingResearch Gaps and Evidences in Perioperative nursing
Research Gaps and Evidences in Perioperative nursingRyan Michael Oducado
 
Knowledge transfer, and evidence informed health policy-minster's meeting
Knowledge transfer, and evidence informed health policy-minster's meetingKnowledge transfer, and evidence informed health policy-minster's meeting
Knowledge transfer, and evidence informed health policy-minster's meetingDr Ghaiath Hussein
 
EBP & Health Sciences Librarianship
EBP & Health Sciences LibrarianshipEBP & Health Sciences Librarianship
EBP & Health Sciences LibrarianshipLorie Kloda
 
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...PEPGRA Healthcare
 
Mona Nasser: Research waste when designing new research (role of funders)
Mona Nasser: Research waste when designing new research (role of funders)Mona Nasser: Research waste when designing new research (role of funders)
Mona Nasser: Research waste when designing new research (role of funders)Caroline Blaine
 
EVIDENCE BASED.ppt
EVIDENCE BASED.pptEVIDENCE BASED.ppt
EVIDENCE BASED.pptmalti19
 
Eblm pres final
Eblm pres finalEblm pres final
Eblm pres finalprasath172
 
How to plan a research study.pdf
How to plan a research study.pdfHow to plan a research study.pdf
How to plan a research study.pdfAbduElhabshy
 
240220-Critical Appraisal xxxxxxxxx.pptx
240220-Critical Appraisal xxxxxxxxx.pptx240220-Critical Appraisal xxxxxxxxx.pptx
240220-Critical Appraisal xxxxxxxxx.pptxMyThaoAiDoan
 
SLC CME- Evidence based medicine 07/27/2007
SLC CME- Evidence based medicine 07/27/2007SLC CME- Evidence based medicine 07/27/2007
SLC CME- Evidence based medicine 07/27/2007cddirks
 

Similar a Observational Studies in a Learning Health System (20)

systematic review and metaanalysis
systematic review and metaanalysis systematic review and metaanalysis
systematic review and metaanalysis
 
What Outcomes Matter in Cancer? A Literature Review
What Outcomes Matter in Cancer? A Literature ReviewWhat Outcomes Matter in Cancer? A Literature Review
What Outcomes Matter in Cancer? A Literature Review
 
Initial Medical Policy and Model Coverage Guidelines
Initial Medical Policy and Model Coverage GuidelinesInitial Medical Policy and Model Coverage Guidelines
Initial Medical Policy and Model Coverage Guidelines
 
Evidences
EvidencesEvidences
Evidences
 
Newhouse arkansas 4-7-14(v2)
Newhouse arkansas 4-7-14(v2)Newhouse arkansas 4-7-14(v2)
Newhouse arkansas 4-7-14(v2)
 
PCORI Methodology Committee Report
PCORI Methodology Committee ReportPCORI Methodology Committee Report
PCORI Methodology Committee Report
 
Research Gaps and Evidences in Perioperative nursing
Research Gaps and Evidences in Perioperative nursingResearch Gaps and Evidences in Perioperative nursing
Research Gaps and Evidences in Perioperative nursing
 
Knowledge transfer, and evidence informed health policy-minster's meeting
Knowledge transfer, and evidence informed health policy-minster's meetingKnowledge transfer, and evidence informed health policy-minster's meeting
Knowledge transfer, and evidence informed health policy-minster's meeting
 
Panel: PCORI- Claire Snyder
Panel: PCORI- Claire SnyderPanel: PCORI- Claire Snyder
Panel: PCORI- Claire Snyder
 
PCORI Methodology Committee Report
PCORI Methodology Committee ReportPCORI Methodology Committee Report
PCORI Methodology Committee Report
 
EBP & Health Sciences Librarianship
EBP & Health Sciences LibrarianshipEBP & Health Sciences Librarianship
EBP & Health Sciences Librarianship
 
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...
Clinical trail-designchallenges-in-the-study-design-conduct-and-analysis-of-r...
 
Setting Standards for Research Methods
Setting Standards for Research Methods Setting Standards for Research Methods
Setting Standards for Research Methods
 
Mona Nasser: Research waste when designing new research (role of funders)
Mona Nasser: Research waste when designing new research (role of funders)Mona Nasser: Research waste when designing new research (role of funders)
Mona Nasser: Research waste when designing new research (role of funders)
 
EVIDENCE BASED.ppt
EVIDENCE BASED.pptEVIDENCE BASED.ppt
EVIDENCE BASED.ppt
 
Eblm pres final
Eblm pres finalEblm pres final
Eblm pres final
 
How to plan a research study.pdf
How to plan a research study.pdfHow to plan a research study.pdf
How to plan a research study.pdf
 
240220-Critical Appraisal xxxxxxxxx.pptx
240220-Critical Appraisal xxxxxxxxx.pptx240220-Critical Appraisal xxxxxxxxx.pptx
240220-Critical Appraisal xxxxxxxxx.pptx
 
PCORI Methodology Workshop for Prioritizing Specific Research Topics
PCORI Methodology Workshop for Prioritizing Specific Research TopicsPCORI Methodology Workshop for Prioritizing Specific Research Topics
PCORI Methodology Workshop for Prioritizing Specific Research Topics
 
SLC CME- Evidence based medicine 07/27/2007
SLC CME- Evidence based medicine 07/27/2007SLC CME- Evidence based medicine 07/27/2007
SLC CME- Evidence based medicine 07/27/2007
 

Más de Patient-Centered Outcomes Research Institute

Más de Patient-Centered Outcomes Research Institute (20)

New Patient-Centered Study on Preventing Fall-Related Injuries in Older Adults
New Patient-Centered Study on Preventing Fall-Related Injuries in Older AdultsNew Patient-Centered Study on Preventing Fall-Related Injuries in Older Adults
New Patient-Centered Study on Preventing Fall-Related Injuries in Older Adults
 
From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...From Research to Practice: New Models for Data-sharing and Collaboration to I...
From Research to Practice: New Models for Data-sharing and Collaboration to I...
 
Advisory Panel on Improving Healthcare Systems Spring 2014 Meeting
Advisory Panel on Improving Healthcare Systems Spring 2014 MeetingAdvisory Panel on Improving Healthcare Systems Spring 2014 Meeting
Advisory Panel on Improving Healthcare Systems Spring 2014 Meeting
 
Advisory Panel on Clinical Trials Spring 2014 Meeting
Advisory Panel on Clinical Trials Spring 2014 MeetingAdvisory Panel on Clinical Trials Spring 2014 Meeting
Advisory Panel on Clinical Trials Spring 2014 Meeting
 
Advisory Panel on Advisory Panel on Assessment of Prevention, Diagnosis, and ...
Advisory Panel on Advisory Panel on Assessment of Prevention, Diagnosis, and ...Advisory Panel on Advisory Panel on Assessment of Prevention, Diagnosis, and ...
Advisory Panel on Advisory Panel on Assessment of Prevention, Diagnosis, and ...
 
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 1
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 1Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 1
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 1
 
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 2
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 2Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 2
Advisory Panel on Patient Engagement Spring 2014 Meeting: Day 2
 
Advisory Panel on Addressing Disparities Spring 2014 Meeting
Advisory Panel on Addressing Disparities Spring 2014 MeetingAdvisory Panel on Addressing Disparities Spring 2014 Meeting
Advisory Panel on Addressing Disparities Spring 2014 Meeting
 
Combined Meeting of the Spring 2014 Advisory Panels on Patient Engagement and...
Combined Meeting of the Spring 2014 Advisory Panels on Patient Engagement and...Combined Meeting of the Spring 2014 Advisory Panels on Patient Engagement and...
Combined Meeting of the Spring 2014 Advisory Panels on Patient Engagement and...
 
Advisory Panel on Rare Disease Spring 2014 Meeting
Advisory Panel on Rare Disease Spring 2014 MeetingAdvisory Panel on Rare Disease Spring 2014 Meeting
Advisory Panel on Rare Disease Spring 2014 Meeting
 
PCORnet: Building Evidence through Innovation and Collaboration
PCORnet: Building Evidence through Innovation and CollaborationPCORnet: Building Evidence through Innovation and Collaboration
PCORnet: Building Evidence through Innovation and Collaboration
 
PCORnet: Building Evidence through Innovation and Collaboration
PCORnet: Building Evidence through Innovation and CollaborationPCORnet: Building Evidence through Innovation and Collaboration
PCORnet: Building Evidence through Innovation and Collaboration
 
Patient-Powered Research Network Workshop
Patient-Powered Research Network WorkshopPatient-Powered Research Network Workshop
Patient-Powered Research Network Workshop
 
Patient-Powered Research Network Workshop
Patient-Powered Research Network WorkshopPatient-Powered Research Network Workshop
Patient-Powered Research Network Workshop
 
Seeking Input on Future PROMIS® Research: Educating Patients and Stakeholders...
Seeking Input on Future PROMIS® Research: Educating Patients and Stakeholders...Seeking Input on Future PROMIS® Research: Educating Patients and Stakeholders...
Seeking Input on Future PROMIS® Research: Educating Patients and Stakeholders...
 
Launching the Eugene Washington PCORI Engagement Awards Program
Launching the Eugene Washington PCORI Engagement Awards ProgramLaunching the Eugene Washington PCORI Engagement Awards Program
Launching the Eugene Washington PCORI Engagement Awards Program
 
Promising Practices of Meaningful Engagement in the Conduct of Research
Promising Practices of Meaningful Engagement in the Conduct of ResearchPromising Practices of Meaningful Engagement in the Conduct of Research
Promising Practices of Meaningful Engagement in the Conduct of Research
 
PCORI Merit Review: Learning from Patients, Scientists and other Stakeholders
PCORI Merit Review: Learning from Patients, Scientists and other StakeholdersPCORI Merit Review: Learning from Patients, Scientists and other Stakeholders
PCORI Merit Review: Learning from Patients, Scientists and other Stakeholders
 
Opening a Pipeline to Patient-Centered Research Proposals
Opening a Pipeline to Patient-Centered Research ProposalsOpening a Pipeline to Patient-Centered Research Proposals
Opening a Pipeline to Patient-Centered Research Proposals
 
Special Board of Governors Teleconference/Webinar
Special Board of Governors Teleconference/WebinarSpecial Board of Governors Teleconference/Webinar
Special Board of Governors Teleconference/Webinar
 

Último

Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 

Último (20)

Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 

Observational Studies in a Learning Health System

  • 1. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM An Institute of Medicine Workshop sponsored by the Patient-Centered Outcomes Research Institute April 25-26, 2013 The National Academies 2101 Constitution Avenue, NW Washington, DC 20418
  • 2.
  • 3. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM  Table of Contents SECTION 1: WORKSHOP FRAMING MATERIALS • Agenda • Planning Committee Roster • Participant List SECTION 2: ENGAGING THE ISSUE OF BIAS • Hernan, Miguel A. and Hernandez-Diaz, Sonia. Beyond the intention-to-treat in comparative effectiveness research. Clinical Trials. 9:48–55. 2012. • Prasad, Vinay and Jena, Anupam. Prespecified falsification end points: Can they validate true observational associations? JAMA. 309(3). 2013. • Walker, Alexander A. Orthogonal predictions: Follow-up questions for suggestive data. Pharmacoepidemiology and Drug Safety. 19: 529–532. 2010. • Ryan, PB, et al. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Statistics in Medicine. 2012. • Lorch, SA, et al. The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics. 130(2). 2012. • Small, Dylan S. and Rosenbaum, Paul R. War and wages: The strength of instrumental variables and their sensitivity to unobserved biases. Journal of the American Statistical Association. 103(483). 2008. • Brookhart, MA, et al. Comparative Mortality Risk of Anemia Management Practices in Incident Hemodialysis Patients.JAMA. 303(9). 2010. • Cornfield, J. Principles of Research. Statistics in Medicine. 31:2760-2768. 2012. SECTION 3: GENERALIZING RCT RESULTS TO BROADER POPULATIONS • Kaizar, Eloise E. Estimating treatment effect via simple cross design synthesis. Statistics in Medicine. 30:2986–3009. 2011. • Go, AS, et al. Anticoagulation therapy for stroke prevention in atrial fibrillation: How well do randomized trials translate into clinical practice? JAMA. 290(20). 2003. • Hernan, MA, et al. Observational studies analyzed like randomized experiments: An application to postmenopausal hormone therapy and Coronary Heart Disease. Epidemiology. 19(6). 2008. • Weintraub, WS, et al. Comparative effectiveness of revascularization strategies. The New England Journal of Medicine. 366(16). 2012.
  • 4. SECTION 4: DETECTING TREATMENT EFFECT HETEROGENEITY • Hlatky, MA, et al. Coronary artery bypass surgery compared with percutaneous coronary interventions for multivessel disease: A collaborative analysis of individual patient data from ten randomised trials. The Lancet. 373: 1190–97. 2009. • Kent, DM, et al. Assessing and reporting heterogeneity in treatment effects in clinical trials: A proposal. Trials. 11:85. 2010. • Kent, David M. and Hayward, Rodney A. Limitations of applying summary results of clinical trials to individual patients: The need for risk stratification. JAMA. 298(10):1209-1212. 2007. • Basu, A, et al. Heterogeneity in Action: The Role of Passive Personalization in Comparative Effectiveness Research. 2012 • Basu, Anirban. Estimating Person-centered Treatment (PeT) Effects using Instrumental Variables: An application to evaluating prostate cancer treatments. 2013. SECTION 5: PREDICTING INDIVIDUAL RESPONSES • Byar, David P. Why Databases should not replace Randomized Clinical Trials. Biometrics. 36(2): 337-342. 1980. • Lee, KL, et al. Clinical judgment and statistics. Lessons from a simulated randomized trial in coronary artery disease. Circulation. 61:508-515. 1980. • Pencina, Michael J. and D’Agostino, Ralph B. Thoroughly modern risk prediction? Science Translational Medicine. 4(131). 2012. • Tatonetti, NP, et al. Data-driven prediction of drug effects and interactions. Science Translational Medicine. 4(125). 2012. SECTION 6: ORGANIZATIONAL BACKGROUND • IOM Roundtable on Value & Science-Driven Health Care (VSRT) 1. VSRT Background Information and Roster 2. VSRT Charter and Vision 3. Clinical Effectiveness Research Innovation Collaborative Background Information • Patient-Centered Outcomes Research Institute (PCORI) 1. National Priorities for Research and Research Agenda SECTION 7: BIOGRAPHIES AND MEETING LOGISTICS • Planning Committee Biographies • Speaker Biographies • Location, Hotel, and Travel
  • 6.
  • 7. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM  An Institute of Medicine Workshop Sponsored by the Patient-Centered Outcomes Research Institute  A LEARNING HEALTH SYSTEM ACTIVITY IOM ROUNDTABLE ON VALUE & SCIENCE-DRIVEN HEALTH CARE APRIL 25-26, 2013 THE NATIONAL ACADEMY OF SCIENCES 2101 CONSTITUTION AVENUE, NW WASHINGTON, DC Day 1: Thursday, April 25th 8:00 am Coffee and light breakfast available 8:30 am Welcome, introductions and overview Welcome, framing of the meeting and agenda overview Welcome from the IOM Michael McGinnis, Institute of Medicine Opening remarks and meeting overview Joe Selby, Patient-Centered Outcomes Research Institute Ralph Horwitz, GlaxoSmithKline Meeting objectives 1. Explore the role of observational studies (OS) in the generation of evidence to guide clinical and health policy decisions, with a focus on individual patient care, in a learning health system; 2. Consider concepts of OS design and analysis, emerging statistical methods, use of OS’s to supplement evidence from experimental methods, identifying treatment heterogeneity, and providing effectiveness estimates tailored for individual patients; 3. Engage colleagues from disciplines typically underrepresented in clinical evidence discussions; 4. Identify strategies for accelerating progress in the appropriate use of OS for evidence generation.
  • 8. 2 9:00 am Workshop stage-setting  Session format o Workshop overview and stage-setting Steve Goodman, Stanford University Q&A and open discussion  Session questions: o How do OS contribute to building valid evidence to support effective decision making by patients and clinicians? When are their findings useful, when are they not? o What are the major challenges (study design, methodological, data collection/management/analysis, cultural etc.) facing the field in the use of OS data for decision making? Please include consideration of the following issues: bias, methodological standards, publishing requirements. o What can workshop participants expect from the following sessions? 9:45 am Engaging the issue of bias Moderator: Michael Lauer, National Heart Lung and Blood Institute  Session format o Introduction to issue Sebastian Schneeweiss, Harvard University o Presentations:  Instrumental variables and their sensitivity to unobserved biases Dylan Small, University of Pennsylvania  An empirical approach to measuring and calibrating for error in observational analyses Patrick Ryan, Johnson & Johnson o Respondents and panel discussion:  John Wong, Tufts University  Joel Greenhouse, Carnegie Mellon University Q&A and open discussion  Session questions: o What are the major bias-related concerns with the use of observational study methods? What are the sources of bias? o How many of these concerns relate to methods and how much to the quality and availability of suitable data? What barriers have these concerns created for the use of the results of observational studies to drive decision-making?
  • 9. 3 o What are the most promising approaches to reduction of bias through the use of statistical methods? Through study design (e.g. Dealing with issues of multiplicity)? o What are the circumstances under which administrative (claims) data can be used to assess treatment benefits? What data are needed from EHRs to strengthen the value of administrative data? o What methods are best to adjust for the changes in treatment and clinical conditions among patients followed longitudinally? o What are the implications of these promising approaches for the use of observational study methods moving forward? 11:30 pm Lunch Participants will be asked to identify among their lunch table, what they think the most critical questions are for PCOR in the topics covered by the workshop. These topics will them be circulated to the moderators of the proceeding sessions. 12:30 pm Generalizing RCT results to broader populations Moderator: Harold Sox, Dartmouth University  Session format o Introduction to issue Robert Califf, Duke o Presentations:  Generalizing the right question Miguel Hernan, Harvard University  Using observational studies to determine RCT generalizability Eloise Kaizar, Ohio State o Respondents and panel discussion:  William Weintraub, Christiana Medical Center  Constantine Frangakis, Johns Hopkins University Q&A and open discussion  Session questions: o What are the most cogent methodological and clinical considerations in using observational study methods to test the external validity of findings from RCTs? o How do data collection, management, and analysis approaches impact generalizability? o What are the generalizability questions of greatest interest? Or, where does the greatest doubt arise? (Age, concomitant illness, concomitant treatment) What examples represent well established differences? o What statistical methods are needed to generalize RCT results?
  • 10. 4 o Are the standards for causal inference from OS different when prior RCTs have been performed? How does statistical methodology vary in this case? o What are the implications when treatment results for patients not included in the RCT differ from the overall results reported in the original RCT? o What makes an observed difference in outcome credible? Finding the RCT-shown effect on the narrower population? Replication in >1 environment? Confidence interval of the result? Size of the effect in the RCT? o Can subset analyses in the RCT, even if underpowered, be used to support or rebut the OS finding? 2:15 pm Break 2:30 pm Detecting treatment-effect heterogeneity Moderator: Richard Platt, Harvard Pilgrim Health Care Institute  Session format o Introduction to issue David Kent, Tufts University o Presentations:  Comparative effectiveness of coronary artery bypass grafting and percutaneous coronary intervention Mark Hlatky, Stanford University  Identification of effect heterogeneity using instrumental variables Anirban Basu, University of Washington o Respondents and panel discussion:  Mary Charlson, Cornell University  Mark Cullen, Stanford University Q&A and open discussion  Session questions: o What is the potential for OS in assessing treatment response heterogeneity and individual patient decision-making? o What clinical and other data can be collected routinely to affect this potential? o How can longitudinal information on change in treatment categories and clinical condition be used to assess variation in treatment response and individual patient decision-making?  What are the statistical methods for time varying changes in treatment (including co-therapies) and clinical condition
  • 11. 5 o What are the best methods to form distinctive patient subgroups in which to examine for heterogeneity of treatment response?  What data elements are necessary to define these distinctive patient subgroups? o What are the best methods to assess heterogeneity in multi-dimensional outcomes? o How could further implementation of best practices in data collection, management, and analysis impact treatment response heterogeneity? o What is needed in order for information about treatment response heterogeneity to be validated and used in practice? 4:15 pm Summary and preview of next day 4:45 pm Reception 5:45 pm Adjourn ********************************************* Day 2: Friday, April 26th 8:00 am Coffee and light breakfast available 8:30 am Welcome, brief agenda overview, summary of previous day Welcome, framing of the meeting and agenda overview 9:00 am Predicting individual responses Moderator: Ralph Horwitz, GSK  Session format o Introduction to issue Burton Singer, University of Florida o Presentations:  Data-driven prediction models Nicholas Tatonetti, Columbia University  Individual prediction Michael Kattan, Cleveland Clinic o Respondents and panel discussion:  Peter Bach, Sloan Kettering  Mitchell Gail, National Cancer Institute
  • 12. 6 Q&A and open discussion  Session questions: o How can patient-level observational data be used to create predictive models of treatment response in individual patients? What statistical methodologies are needed? o How can predictive analytic methods be used to study the interactions of treatment with multiple patient characteristics? o How should the clinical history (longitudinal information) for a given patient be utilized in the creation of prediction rules for responses of that patient to one or more candidate treatment regimens? o What are effective methodologies for producing prediction rules to guide the management of an individual patient based on their comparability to results of RCTs, OS, and archived patient records? o How can we blend predictive models, which can predict impact of treatment choices, and causal modeling, that compare predictions under different treatments? 11:00 am Conclusions and strategies going forward Panel members will be charged with highlighting very specific next steps laid out in the course of workshop presentations and discussions and/or suggesting some of their own.  Panel: o Rob Califf , Duke University o Cynthia Mulrow, University of Texas o Jean Slutsky, Agency for Healthcare Quality and Research o Steve Goodman, Stanford University  Session questions: o What are the major themes and conclusions from the workshop’s presentations and discussions? o How can these themes be translated into actionable strategies with designated stakeholders? o What are the critical next steps in terms of advancing analytic methods? o What are the critical next steps in developing data bases that will generate evidence to guide clinical decision making. o What are critical next steps in disseminating information on new methods to increase their appropriate use? 10:45 am Break
  • 13. 7 12:15 pm Summary and next steps Comments from the Chairs Joe Selby, Patient-Centered Outcomes Research Institute Ralph Horwitz, GlaxoSmithKline Comments and thanks from the IOM Michael McGinnis, Institute of Medicine 12:45 pm Adjourn ******************************************* Planning Committee Co–Chairs Ralph Horwitz, GlaxoSmithKline Joe Selby, Patient-Centered Outcomes Research Institute Members Anirban Basu, University of Washington Troy Brennan, CVS/Caremark Louis Jacques, Centers for Medicare & Medicaid Services Steve Goodman, Stanford University Jerry Kassirer, Tuft University Michael Lauer, National Heart Lung and Blood Institute David Madigan, Columbia University Sharon-Lise Normand, Harvard University Richard Platt, Harvard Pilgrim Health Care Institute Robert Temple, Food and Drug Administration Burton Singer, University of Florida Jean Slutsky, Agency for Healthcare Research and Quality Staff officer: Claudia Grossmann cgrossmann@nas.edu 202.334.3867
  • 14.
  • 15. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM Workshop Planning Committee Co–Chairs Ralph I. Horwitz, MD Senior Vice President, Clinical Sciences Evaluation GlaxoSmithKline Joe V. Selby, MD, MPH Executive Director PCORI Members Anirban Basu, MS, PhD Associate Professor and Director Health Economics and Outcomes Methodology University of Washington Troyen A. Brennan, MD, JD, MPH Executive Vice President and Chief Medical Officer CVS Caremark Steven N. Goodman, MD, PhD Associate Dean for Clinical & Translational Research Stanford University School of Medicine Louis B. Jacques, MD Director Coverage and Analysis Group Centers for Medicare & Medicaid Services Jerome P. Kassirer, MD Distinguished Professor Tufts University School of Medicine Michael S. Lauer, MD, FACC, FAHA Director Division of Prevention and Population Sciences National Heart, Lung, and Blood Institute David Madigan, PhD Chair of Statistics Columbia University Sharon-Lise T. Normand, PhD, MSc Professor Department of Biostatistics and Health Care Policy Harvard Medical School Richard Platt, MD, MS Chair, Ambulatory Care and Prevention Chair, Population Medicine Harvard University Burton H. Singer, PhD, MS Professor Emerging Pathogens Institute University of Florida Jean Slutsky, PA, MS Director Center for Outcomes and Evidence Agency for Healthcare Research and Quality Robert Temple, MD Deputy Director for Clinical Science Centers for Drug Evaluation and Research Food and Drug Administration
  • 16.
  • 17. Current as of 12pm, April 24 OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSYTEM April 25-26, 2013 Workshop Participants Jill Abell, PhD, MPH Senior Director, Clinical Effectiveness and Safety GlaxoSmithKline Joseph Alper Writer and Technology Analyst LSN Consulting Naomi Aronson Executive Director Blue Cross, Blue Shield Peter Bach, MD, MAPP Attending Physician Department of Epidemiology & Biostatistics Memorial Sloan-Kettering Cancer Center Anirban Basu, MS, PhD Associate Professor and Director Program in Health Economics and Outcomes Methodology University of Washington Lawrence Becker Director, Benefits Xerox Corporation Marc L. Berger, MD Vice President, Real World Data and Analytics Pfizer Inc. Robert M. Califf, MD Vice Chancellor for Clinical Research Duke University Medical Center Mary E. Charlson, MD Chief, Clinical Epidemiology and Evaluative Sciences Research Weill Cornell Medical College Jennifer B. Christian, PharmD, MPH, PhD Senior Director, Clinical Effectiveness and Safety GlaxoSmithKline Michael L. Cohen, PhD Senior Program Officer Committee on National Statistics Mark R. Cullen, MD Professor of Medicine Stanford School of Medicine Steven R. Cummings, MD, FACP Professor Emeritus, Department of Medicine University of California, San Francisco Robert W. Dubois, MD, PhD Chief Science Officer National Pharmaceutical Council Rachael L. Fleurence, PhD Acting Director, Accelerating PCOR Methods Program PCORI Dean Follmann, PhD Branch Chief-Associate Director for Biostatistics National Institutes of Health Constantine Frangakis, PhD Professor, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Mitchell H. Gail, MD, PhD Senior Investigator National Cancer Institute Kathleen R. Gans-Brangs, PhD Senior Director, Medical Affairs AstraZeneca
  • 18. Current as of 12pm, April 24 Steven N. Goodman, MD, PhD Associate Dean for Clinical and Translational Research Stanford University School of Medicine Sheldon Greenfield, MD Executive Co-Director, Health Policy Research Institute University of California, Irvine Joel B. Greenhouse, PhD Professor of Statistics Carnegie Mellon University Sean Hennessy, PharmD, PhD Associate Professor of Epidemiology University of Pennsylvania Miguel Hernan, MD, DrPH, ScM, MPH Professor of Epidemiology Harvard University Mark A. Hlatky, MD Professor of Health Research & Policy, Professor of Medicine Stanford University Ralph I. Horwitz, M.D. Senior Vice President, Clinical Science Evaluation GlaxoSmithKline Gail Hunt President and CEO National Alliance for Caregiving Robert Jesse, MD, PhD Principal Deputy Under Secretary for Health Department of Veterans Affairs Eloise E. Kaizar, PhD Associate Professor Department of Statistics The Ohio State University Jerome P. Kassirer, MD Distinguished Professor Tufts University School of Medicine Michael Kattan, PhD Quantitative Health Sciences Department Chair Cleveland Clinic David M. Kent, MD, MSc Director, Clinical and Translational Science Program Tufts University Sackler School of Graduate Biomedical Sciences Michael S. Lauer, MD, FACC, FAHA Director, Division of Prevention and Population Sciences National Heart, Lung, and Blood Institute J. Michael McGinnis, MD, MPP, MA Senior Scholar Institute of Medicine David O. Meltzer, PhD Associate Professor University of Chicago Nancy E. Miller, PhD Senior Science Policy Analyst Office of Science Policy National Institutes of Health Sally Morton, PhD Professor and Chair, Department of Biostatistics Graduate School of Public Health University of Pittsburgh Cynthia D. Mulrow, MD, MSc Senior Deputy Editor Annals of Internal Medicine Robin Newhouse Chair and Professor University of Maryland School of Nursing Perry D. Nisen, MD, PhD SVP, Science and Innovation GlaxoSmithKline Michael Pencina, PhD Associate Professor Boston University
  • 19. Current as of 12pm, April 24 Richard Platt, MD, MS Chair, Ambulatory Care and Prevention Chair, Population Medicine Harvard University James Robins, MD Mitchell L. and Robin LaFoley Dong Professor of Epidemiology Harvard University Patrick Ryan, PhD Head of Epidemiology Analytics Janssen Research and Development Nancy Santanello, MD, MS Vice President, Epidemiology Merck Richard L. Schilsky, MD, FASCO Chief Medical Officer American Society of Clinical Oncology Sebastian Schneeweiss, MD Associate Professor, Epidemiology Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital Michelle K. Schwalbe, PhD Program Officer Board on Mathematical Sciences and Their Applications National Research Council Jodi Segal, MD, MPH Director, Pharmacoepidemiology Program The John Hopkins Medical Institutions Joe V. Selby, MD, MPH Executive Director PCORI Burton H. Singer, PhD, MS Professor, Emerging Pathogens Institute University of Florida Jean Slutsky, PA, MS Director, Center for Outcomes and Evidence Agency for Healthcare Research and Quality Dylan Small, PhD Associate Professor of Statistics University of Pennsylvania Harold C. Sox, MD Professor of Medicine (emeritus, active) The Dartmouth Institute for Health Policy and Clinical Practice Dartmouth Geisel School of Medicine Elizabeth A. Stuart Associate Professor, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Nicholas Tatonetti, PhD Assistant Professor of Biomedical Informatics Columbia University Robert Temple, MD Deputy Center Director for Clinical Science Food and Drug Administration Scott T. Weidman, PhD Director, Board on Mathematical Sciences and their Applications National Research Council William S. Weintraub, MD, FACC John H. Ammon Chair of Cardiology Christiana Care Health Services Harlan Weisman Managing Director And-One Consulting, LLC Ashley E. Wivel, MD, MSc Senior Director in Clinical Effectiveness and Safety GlaxoSmithKline John B. Wong, MD Professor of Medicine Tufts University Sackler School of Graduate Biomedical Sciences
  • 20. Current as of 12pm, April 24 IOM Staff Claudia Grossmann, PhD Senior Program Officer Diedtra Henderson Program Officer Elizabeth Johnston Program Assistant Valerie Rohrbach Senior Program Assistant Julia Sanders Senior Program Assistant Robert Saunders, PhD Senior Program Officer Barret Zimmermann Program Assistant
  • 22.
  • 23. CLINICAL TRIALS Clinical Trials 2012; 9: 48–55ARTICLE Beyond the intention-to-treat in comparative effectiveness research Miguel A Herna´na,b and Sonia Herna´ndez-Dı´aza Background The intention-to-treat comparison is the primary, if not the only, analytic approach of many randomized clinical trials. Purpose To review the shortcomings of intention-to-treat analyses, and of ‘as treated’ and ‘per protocol’ analyses as commonly implemented, with an emphasis on problems that are especially relevant for comparative effectiveness research. Methods and Results In placebo-controlled randomized clinical trials, intention- to-treat analyses underestimate the treatment effect and are therefore nonconser- vative for both safety trials and noninferiority trials. In randomized clinical trials with an active comparator, intention-to-treat estimates can overestimate a treatment’s effect in the presence of differential adherence. In either case, there is no guarantee that an intention-to-treat analysis estimates the clinical effectiveness of treatment. Inverse probability weighting, g-estimation, and instrumental variable estimation can reduce the bias introduced by nonadherence and loss to follow-up in ‘as treated’ and ‘per protocol’ analyses. Limitations These analyse require untestable assumptions, a dose-response model, and time-varying data on confounders and adherence. Conclusions We recommend that all randomized clinical trials with substantial lack of adherence or loss to follow-up are analyzed using different methods. These include an intention-to-treat analysis to estimate the effect of assigned treatment and ‘as treated’ and ‘per protocol’ analyses to estimate the effect of treatment after appropriate adjustment via inverse probability weighting or g-estimation. Clinical Trials 2012; 9: 48–55. http://ctj.sagepub.com Introduction Randomized clinical trials (RCTs) are widely viewed as a key tool for comparative effectiveness research [1], and the intention-to-treat (ITT) comparison has long been regarded as the preferred analytic approach for many RCTs [2]. Indeed, the ITT, or ‘as randomized,’ analysis has two crucial advantages over other common alterna- tives – for example, an ‘as treated’ analysis. First, in double-blind RCTs, an ITT comparison provides a valid statistical test of the hypothesis of null effect of treatment [3,4]. Second, in placebo-controlled trials, an ITT comparison is regarded as conservative because it underestimates the treatment effect when participants do not fully adhere to their assigned treatment. Yet excessive reliance on the ITT approach is problematic, as has been argued by others before us [5]. In this paper, we review the problems of ITT comparisons with an emphasis on those that are especially relevant for comparative effectiveness research. We also review the shortcomings of ‘as treated’ and ‘per protocol’ analyses as commonly implemented in RCTs and recommend the routine use of analytic approaches that address some of a Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA, b Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA Author for correspondence: Miguel Herna´n, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA E-mail: miguel_hernan@post.harvard.edu Ó The Author(s), 2011 Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774511420743
  • 24. those shortcomings. Let us start by defining two types of causal effects that can be estimated in RCTs. The effect of assigned treatment versus the effect of treatment Consider a double-blind clinical trial in which par- ticipants are randomly assigned to either active treatment (Z ¼ 1) or placebo (Z ¼ 0) and are then followed for 5 years or until they die (Y ¼ 1 if they die within 5 years, Y ¼ 0 otherwise). An ITT analysis would compare the 5-year risk of death in those assigned to treatment with the 5-year risk of death in those assigned to placebo. An ITT comparison unbiasedly estimates the average causal effect of treatment assignment Z on the outcome Y. For brevity, we will refer to this as the effect of assigned treatment. Trial participants may not adhere to, or comply with, the assigned treatment Z. Some of those assigned to placebo may decide to take treatment, and some of those assigned to active treatment may decide not to take it. We use A to refer to the treatment actually received. Thus, regardless of their assigned treatment Z, some subjects will take treat- ment (A ¼ 1) and others will not take it (A ¼ 0). The use of ITT comparisons is sometimes criticized when not all trial participants adhere to their assigned treatment Z, that is, when Z is not equal to A for every trial participant. For example, consider two RCTs: in the first trial, half of the participants in the Z ¼ 1 group decide not to take treatment; in the second trial, all participants assigned to Z ¼ 1 decide to take the treatment. An ITT comparison will correctly estimate the effect of assigned treatment Z in both trials, but the effects will be different even if the two trials are otherwise identical. The direction and magnitude of the effect of assigned treatment depends on the adherence pattern. Now suppose that, in each of the two trials with different adherence, we could estimate the effect that would have been observed if all participants had fully adhered to the value of treatment A (1 or 0) originally assigned to them. We will refer to such effect as the average causal effect of treatment A on the outcome Y or, for brevity, the effect of treatment. The effect of treatment A appears to be an attractive choice to summarize the findings of RCTs with substantial nonadherence because it will be the same in two trials that differ only in their adherence pattern. However, estimating the magnitude of the effect of treatment A without bias requires assumptions grounded on expert knowledge (see below). No matter how sophisticated the statistical analysis, the estimate of the effect of A will be biased if one makes incorrect assumptions. The effect of assigned treatment may be misleading An ITT comparison is simple and therefore very attractive [4]. It bypasses the need for assumptions regarding adherence and dose-response by focusing on estimating the effect of assigned treatment Z rather than the effect of treatment A. However, there is a price to pay for this simplicity, as reviewed in this section. We start by considering placebo-controlled double-blind RCTs. It is well known that if treatment A has a null effect on the outcome, then both the effect of assigned treatment Z and the effect of treat- ment A will be null. This is a key advantage of the ITT analysis: it correctly estimates the effect of treatment A under the null, regardless of the adherence pattern. It is also well known that if treatment A has a non- null effect (that is, either increases or decreases the risk of the outcome) and some participants do not adhere to their assigned treatment, then the effect of assigned treatment Z will be closer to the null that the actual effect of treatment A [3]. This bias toward the null is due to contamination of the treatment groups: some subjects assigned to treatment (Z ¼ 1) may not take it (A ¼ 0) whereas some subjects assigned to placebo (Z ¼ 0) may find a way to take treatment (A ¼ 1). As long as the proportion of patients who end up taking treatment (A ¼ 1) is greater in the group assigned to treatment (Z ¼ 1) than in the group assigned to placebo (Z ¼ 0), the effect of assigned treatment Z will be in between the effect of treatment A and the null value. The practical effect of this bias varies depending on the goal of the trial. Some placebo-controlled RCTs are designed to quantify a treatment’s benefi- cial effects – for example, a trial to determine whether sildenafil reduces the risk of erectile dys- function. An ITT analysis of these trials is said to be ‘conservative’ because the effect of assigned treat- ment Z is biased toward the null. That is, if an ITT analysis finds a beneficial effect for treatment assign- ment Z, then the true beneficial effect of treatment A must be even greater. The makers of treatment A have a great incentive to design a high-quality study with high levels of adherence. Otherwise, a small beneficial effect of treatment might be missed by the ITT analysis. Other trials are designed to quantify a treatment’s harmful effects – for example, a trial to determine whether sildenafil increases the risk of cardiovascular disease. An ITT analysis of these trials is antic- onservative precisely because the effect of assigned Beyond the intention-to-treat 49 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  • 25. treatment Z is biased toward the null. That is, if an ITT analysis fails to find a toxic effect, there is no guarantee that treatment A is safe. A trial designed to quantify harm and whose protocol foresees only an ITT analysis could be referred to as a ‘randomized cynical trial.’ Now let us consider double-blind RCTs that com- pare two active treatments. These trials are often designed to show that a new treatment (A ¼ 1) is not inferior to a reference treatment (A ¼ 0) in terms of either benefits or harms. An example of a noninfer- iority trial would be one that compares the reduction in blood glucose between a new inhaled insulin and regular injectable insulin. The protocol of the trial would specify a noninferiority margin, that is, the maximum average difference in blood glucose that is considered equivalent (e.g., 10 mg/dL). Using an ITT comparison, the new insulin (A ¼ 1) will be declared not inferior to classical insulin (A ¼ 0) if the average reduction in blood glucose in the group assigned to the new treatment (Z ¼ 1) is within 10 mg/dL of the average reduction in blood glucose in the group assigned to the reference treatment (Z ¼ 0) plus/ minus random variability. Such ITT analysis may be misleading in the presence of imperfect adherence. To see this, consider the following scenario. Scenario 1 The new treatment A ¼ 1 is actually inferior to the reference treatment A ¼ 0, for example, the average reduction in blood glucose is 10 mg/dL under treat- ment A ¼ 1 and 22 mg/dL under treatment A ¼ 0. The type and magnitude of adherence is equal in the two groups, for example 30% of subjects in each group decided not to take insulin. As a result, the average reduction is, say, 7 mg/dL in the group assigned to the new treatment (Z ¼ 1) and 15 mg/dL in the group assigned to the reference treatment (Z ¼ 0). An ITT analysis, which is biased toward the null in this scenario, may incorrectly suggest that the new treatment A ¼ 1 is not inferior to the reference treatment A ¼ 0. Other double-blind RCTs with an active compar- ator are designed to show that a new treatment (A ¼ 1) is superior to the reference treatment (A ¼ 0) in terms of either benefits or harms. An example of a superiority trial would be one that compares the risk of heart disease between two antiretroviral regimes. Using an ITT comparison, the new regimen (A ¼ 1) will be declared superior to the reference regime (A ¼ 0) if the heart disease risk is lower in the group assigned to the new regime (Z ¼ 1) than in the group assigned to the reference regime (Z ¼ 0) plus/minus random variability. Again, such ITT analysis may be misleading in the presence of imperfect adherence. Consider the following scenario. Scenario 2 The new treatment A ¼ 1 is actually equivalent to the reference treatment A ¼ 0, for example, the 5-year risk of heart disease is 3% under either treatment A ¼ 1 or treatment A ¼ 0, and the risk in the absence of either treatment is 1%. The type or magnitude of adherence differs between the two groups, for exam- ple, 50% of subjects assigned to the new regime and 10% of those assigned to the reference regime decided not to take their treatment because of minor side effects. As a result, the risk is, say, 2% in the group assigned to the new regime (Z ¼ 1) and 2.8% in the group assigned to the reference regime (Z ¼ 0). An ITT analysis, which is biased away from the null in this scenario, may incorrectly suggest that treatment A ¼ 1 is superior to treatment A ¼ 0. An ITT analysis of RCTs with an active comparator may result in effect estimates that are biased toward (Scenario 1) or away from (Scenario 2) the null. In other words, the magnitude of the effect of assigned treatment Z may be greater than or less than the effect of treatment A. The direction of the bias depends on the proportion of subjects that do not adhere to treatment in each group, and on the reasons for nonadherence. Yet, a common justification for ITTcomparisons is the following: Adherence is not perfect in clinical practice. Therefore, clinicians may be more inter- ested in consistently estimating the effect of assigned treatment Z, which already incorporates the impact of nonadherence, than the effect of treatment A in the absence of nonadherence. That is, the effect of assigned treatment Z reflects a treatment’s clinical effectiveness and therefore should be privileged over the effect of treatment A. In the next section, we summarize the reasons why this is not necessarily true. The effect of assigned treatment is not the same as the effectiveness of treatment Effectiveness is usually defined as ‘how well a treat- ment works in everyday practice,’ and efficacy as ‘how well a treatment works under perfect adherence and highly controlled conditions.’ Thus, the effect of assigned treatment Z in postapproval settings is often equated with effectiveness, whereas the effect of treatment Z in preapproval settings (which is close to the effect of A when adherence is high) is often 50 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  • 26. equated with efficacy. There is, however, no guaran- tee that the effect of assigned treatment Z matches the treatment’s effectiveness in routine medical practice. A discrepancy may arise for multiple rea- sons, including differences in patient characteristics, monitoring, or blinding, as we now briefly review. The eligibility criteria for participants in RCTs are shaped by methodologic and ethical considerations. To maximize adherence to the protocol, many RCTs exclude individuals with severe disease, comorbid- ities, or polypharmacy. To minimize risks to vulner- able populations, many RCTs exclude pregnant women, children, or institutionalized populations. As a consequence, the characteristics of participants in an RCT may be, on average, different from those of the individuals who will receive the treatment in clinical practice. If the effect of the treatment under study varies by those characteristics (e.g., treatment is more effective for those using certain concomitant treatments) then the effect of assigned treatment Z in the trial will differ from the treatment’s effective- ness in clinical practice. Patients in RCTs are often more intensely moni- tored than patients in clinical practice. This greater intensity of monitoring may lead to earlier detection of problems (i.e., toxicity, inadequate dosing) in RCTs compared with clinical practice. Thus, a treat- ment’s effectiveness may be greater in RCTs because the earlier detection of problems results in more timely therapeutic modifications, including modifi- cations in treatment dosing, switching to less toxic treatments, or addition of concomitant treatments. Blinding is a useful approach to prevent bias from differential ascertainment of the outcome [6]. There is, however, an inherent contradiction in conduct- ing a double-blind study while arguing that the goal of the study is estimating the effectiveness in routine medical practice. In real life, both patients and doctors are aware of the assigned treatment. A true effectiveness measure should incorporate the effects of assignment awareness (e.g., behavioral changes) that are eliminated in ITT comparisons of double- blind RCTs. Some RCTs, commonly referred to as pragmatic trials [7–9], are specifically designed to guide decisions in clinical practice. Compared with highly controlled trials, pragmatic trials include less selected partici- pants and are conducted under more realistic condi- tions, which may result in lower adherence to the assigned treatment. It is often argued that an ITT analysis of pragmatic trials is particularly appropriate to measure the treatment’s effectiveness, and thus that pragmatic trials are the best design for comparative effectiveness research. However, this argument raises at least two concerns. First, the effect of assigned treatment Z is influ- enced by the adherence patterns observed in the trial, regardless of whether the trial is a pragmatic one. Compared with clinical practice, trial partici- pants may have a greater adherence because they are closely monitored (see above), or simply because they are the selected group who received informed consent and accepted to participate. Patients outside the trial may have a greater adherence after they learn, perhaps based on the trial’s findings, that treatment is beneficial. Therefore, the effect of assigned treatment estimated by an ITT analysis may under- or overestimate the effectiveness of the treatment. Second, the effect of assigned treatment Z is inadequate for patients who are interested in initi- ating and fully adhering to a treatment A that has been shown to be efficacious in previous RCTs. In order to make the best informed decision, these patients would like to know the effect of treatment A rather than an effect of assigned treatment Z, which is contaminated by other patients’ nonadherence [5]. For example, to decide whether to use certain contraception method, a couple may want to know the failure rate if they use the method as indicated, rather than the failure rate in a population that included a substantial proportion of nonadherers. Therefore, the effect of assigned treatment Z may be an insufficient summary measure of the trial data, even if it actually measures the treatment’s effectiveness. In summary, the effect of assigned treatment Z – estimated via an ITT comparison – may not be a valid measure of the effectiveness of treatment A in clinical practice. And even if it were, effectiveness is not always the most interesting effect measure. These considerations, together with the inappropri- ateness of ITT comparisons for safety and noninfer- iority trials, make it necessary to expand the reporting of results from RCTs beyond ITT analyses. The next section reviews other analytic approaches for data from RCTs. Conventional ‘as treated’ and ‘per protocol’ analyses Two common attempts to estimate the effect of treatment A are ‘as treated’ and ‘per protocol’ com- parisons. Neither is generally valid. An ‘as treated’ analysis classifies RCT participants according to the treatment that they took (either A ¼ 1 or A ¼ 0) rather than according to the treat- ment that they were assigned to (either Z ¼ 1 or Z ¼ 0). Then an ‘as treated’ analysis compares the risk (or the mean) of the outcome Y among those who took treatment (A ¼ 1) with that among those who did not take treatment (A ¼ 0), regardless of their treatment assignment Z. That is, an ‘as treated’ comparison ignores that the data come from an Beyond the intention-to-treat 51 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  • 27. RCT and rather treats them as coming from an observational study. As a result, an ‘as treated’ comparison will be confounded if the reasons that moved participants to take treatment were associ- ated with prognostic factors. The causal diagram in Figure 1 represents the confounding as a noncausal association between A and Y when there exist prognostic factors L that also affect the decision to take treatment A (U is an unmeasured common cause of L and Y). Confounding arises in an ‘as treated’ analysis when not all prognostic factors L are appro- priately measured and adjusted for. A ‘per protocol’ analysis – also referred to as an ‘on treatment’ analysis – only includes individuals who adhered to the clinical trial instructions as specified in the study protocol. The subset of trial participants included in a ‘per protocol’ analysis, referred to as the per protocol population, includes only partici- pants with A equal to Z: those who were assigned to treatment (Z ¼ 1) and took it (A ¼ 1), and those who were not assigned to treatment (Z ¼ 0) and did not take it (A ¼ 0). A ‘per protocol’ analysis compares the risk (or the mean) of the outcome Y among those who were assigned to treatment (Z ¼ 1) with that among those who were not assigned to treatment (Z ¼ 0) in the per protocol population. That is, a ‘per protocol’ analysis is an ITT analysis in the per protocol population. This contrast will be affected by selection bias [10] if the reasons that moved participants to adhere to their assigned treatment were associated with prognostic factors L. The causal diagram in Figure 2 includes S as an indicator of selection into the ‘per protocol’ population. The selection indicator S is fully determined by the values of Z and A, that is, S ¼ 1 when A ¼ Z, and S ¼ 0 otherwise. The selection bias is a noncausal associ- ation between Z and Y that arises when the analysis is restricted to the ‘per protocol’ population (S ¼ 1) and not all prognostic factors L are appropriately measured and adjusted for. As an example of biased ‘as treated’ and ‘per protocol’ estimates of the effect of treatment A, consider the following scenario. Scenario 3 An RCT assigns men to either colonoscopy (Z ¼ 1) or no colonoscopy (Z ¼ 0). Suppose that undergoing a colonoscopy (A ¼ 1) does not affect the 10-year risk of death from colon cancer (Y) compared with not undergoing a colonoscopy (A ¼ 0), that is, the effect of treatment A is null. Further suppose that, among men assigned to Z ¼ 1, those with family history of colon cancer (L ¼ 1) are more likely to adhere to their assigned treatment and undergo the colonoscopy (A ¼ 1). Even though A has a null effect, an ‘as treated’ analysis will find that men undergoing colonoscopy (A ¼ 1) are more likely to die from colon cancer because they include a greater proportion of men with a predisposition to colon cancer than the others (A ¼ 0). This is the situation depicted in Figure 1. Similarly, a ‘per protocol’ analysis will find a greater risk of death from colon cancer in the group Z ¼ 1 than in the group Z ¼ 0 because the per protocol restriction A ¼ Z overloads the group assigned to colonoscopy with men with a family history of colon cancer. This is the situation depicted in Figure 2. The confounding bias in the ‘as treated’ analysis and the selection bias in the ‘per protocol’ analysis can go in either direction – for example, suppose that L represents healthy diet rather than family history of colon cancer. In general, the direction of the bias is hard to predict because it is possible that the proportions of people with a family history, healthy diet, and any other prognostic factor will vary between the groups A ¼ 1 and A ¼ 0 condi- tional on Z. In summary, ‘as treated’ and ‘per protocol’ anal- yses transform RCTs into observational studies for all practical purposes. The estimates from these analyses Z L U A Y Figure 1. Simplified causal diagram for a randomized clinical trial with assigned treatment Z, received treatment A, and outcome Y. U represents the unmeasured common causes of A and Y. An ‘as treated’ analysis of the A-Y association will be confounded unless all prognostic factors L are adjusted for. Z L U S A Y Figure 2. Simplified causal diagram for a randomized clinical trial with assigned treatment Z, received treatment A, and outcome Y. U represents the unmeasured common causes of A and Y, and S an indicator for selection into the ‘per protocol’ population. The Z-Y association in the ‘per protocol’ population (a restriction represented by the box around S) will be affected by selection bias unless all prognostic factors L are adjusted for. 52 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  • 28. can only be interpreted as the effect of treatment A if the analysis is appropriately adjusted for the con- founders L. If the intended analysis of the RCT is ‘as treated’ or ‘per protocol,’ then the protocol of the trial should describe the potential confounders and how they will be measured, just like the protocol of an observational study would do. More general ‘as treated’ and ‘per protocol’ analyses to estimate the effect of treatment So far we have made the simplifying assumption that adherence is all or nothing. But in reality, RCT participants may adhere to their assigned treatment intermittently. For example, they may take their assigned treatment for 2 months, discontinue it for the next 3 months, and then resume it until the end of the study. Or subjects may take treatment con- stantly but at a lower dose than assigned. For example, they may take only one pill per day when they should take two. Treatment A is generally a time-varying variable – each day you may take it or not take it – rather than a time-fixed variable – you either always take it or never take it during the follow-up. An ‘as treated’ analysis with a time-varying treat- ment A usually involves some sort of dose-response model. A ‘per protocol’ analysis with a time-varying treatment A includes all RCT participants but censors them if/when they deviate from their assigned treatment. The censoring usually occurs at a fixed time after nonadherence, say, 6 months. The per protocol population in this variation refers to the adherent person-time rather than to the adherent persons. Because previous sections were only concerned with introducing some basic problems of ITT, ‘as treated’ and ‘per protocol’ analyses, we considered A as a time-fixed variable. However, this simplification may be unrealistic and misleading in practice. When treatment A is truly time-varying (i) the effect of treatment needs to be redefined and (ii) appropriate adjustment for the measured confoun- ders L cannot generally be achieved by using con- ventional methods such as stratification, regression, or matching. The definition of the average causal effect of a time-fixed treatment involves the contrast between two clinical regimes. For example, we defined the causal effect of a time-fixed treatment as a contrast between the average outcome that would be observed if all participants took treatment A ¼ 1 versus treatment A ¼ 0. The two regimes are ‘‘taking treatment A ¼ 1’’ and ‘‘taking treatment A ¼ 0’’. The definition of the causal effect of a time-varying treatment also involves a contrast between two clinical regimes. For example, we can define the causal effect of a time-varying treatment as a contrast between the average outcome that would be observed if all participants had continuous treat- ment with A ¼ 1 versus continuous treatment with A ¼ 0 during the entire follow-up. We sometimes refer to this causal effect as the effect of continuous treatment. When the treatment is time-varying, so are the confounders. For example, the probability of taking antiretroviral therapy increases in the presence of symptoms of HIV disease. Both therapy and con- founders evolve together during the follow-up. When the time-varying confounders are affected by previous treatment – for example, antiretroviral therapy use reduces the frequency of symptoms – conventional methods cannot appropriately adjust for the measured confounders [10]. Rather, inverse probability (IP) weighting or g-estimation are gener- ally needed for confounding adjustment in ‘as treated’ and ‘per protocol’ analyses involving time- varying treatments [11–13]. Both IP weighting and g-estimation require that time-varying confounders and time-varying treat- ments are measured during the entire follow-up. Thus, if planning to use these adjustment methods, the protocol of the trial should describe the potential confounders and how they will be measured. Unfortunately, like in any observational study, there is no guarantee that all confounders will be identified and correctly measured, which may result in biased estimates of the effect of continuous treatment in ‘as treated’ and ‘per protocol’ analyses involving time-varying treatments. An alternative adjustment method is instrumen- tal variable (IV) estimation, a particular form of g-estimation that does not require measurement of any confounders [14–17]. In double-blind RCTs, IV estimation eliminates confounding for the effect of continuous treatment A by exploiting the fact that the initial treatment assignment Z was random. Thus, if the time-varying treatment A is measured and a correctly specified structural model used, IV estimation adjusts for confounding without measur- ing, or even knowing, the confounders. A detailed description of IP weighting, g-estimation, and IV estimation is beyond the scope of this paper. Toh and Herna´n review these methods for RCTs [18]. IP weighting and g-estimation can also be used to estimate the effect of treatment regimes that may be more clin- ically relevant than the effect of continuous treat- ment [19,20]. For example, it may be more interesting to estimate the effect of treatment taken continuously unless toxic effects or counterindica- tions arise. Beyond the intention-to-treat 53 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  • 29. Discussion An ITT analysis of RCTs is appealing for the same reason it may be appalling: simplicity. As described above, ITT estimates may be inadequate for the assessment of comparative effectiveness or safety. In the presence of nonadherence, the ITT effect is a biased estimate of treatment’s effects such as the effect of continuous treatment. This bias can be corrected in an appropriately adjusted ‘as treated’ analysis via IP weighting, g-estimation, or IV estima- tion. However, IP weighting and g-estimation require untestable assumptions similar to those made for causal inference from observational data. IV estimation generally requires a dose-response model and its validity is questionable for nonblinded RCTs. The ITT approach is also problematic if a large proportion of participants drop out or are otherwise lost to follow-up, or if the outcomes are incompletely ascertained among those completing the study. In these studies, an ITT comparison cannot be con- ducted because the value of the outcome is missing for some individuals. To circumvent this problem, the ITT analysis is often replaced by a pseudo-ITT analysis that is restricted to subjects with complete data or in which the last observation is carried forward. These pseudo-ITT analyses may be affected by selection bias in either direction. Adjusting for this bias is possible via IP weighting if information on the time-varying determinants of loss to follow- up is available, but again, the validity of the adjust- ment relies on untestable assumptions about the unmeasured variables [18]. RCTs with long follow-up periods, as expected in many comparative effectiveness research settings, are especially susceptible to bias due to nonadher- ence and loss to follow-up. As these problems accu- mulate over time, the RCT starts to resemble a prospective observational study, and the ITT analysis yields an increasingly biased estimate of the effect of continuous treatment. Consider, for example, a Women’s Health Initiative randomized trial that assigned postmenopausal women to either estrogen plus progestin hormone therapy or placebo [21]. About 40% of women had stopped taking at least 80% of their assigned treatment by the 6th year of follow-up. The ITT hazard ratio of breast cancer was 1.25 (95% CI: 1.01, 1.54) for hormone therapy versus placebo. The IP weighted hazard ratio of breast cancer was 1.68 (1.24, 2.28) for 8 years of continuous hormone therapy versus no hormone therapy [22]. These findings suggest that the effect of continuous treatment was more than twofold greater than the effect of assigned treatment. Of course, neither of these estimates reflects the long-term effect of hor- mone therapy in clinical practice (e.g., the adherence to hormone therapy was much higher in the trial than in the real world). When analyzing data from RCTs, the question is not whether assumptions are made but rather which assumptions are made. In an RCT with incomplete follow-up or outcome ascertainment, a pseudo-ITT analysis assumes that the loss to follow-up occurs completely at random whereas an IP weighted ITT analysis makes less strong assumptions (e.g., loss to follow-up occurs at random conditional on the measured covariates). In an RCT with incomplete adherence, an ITT analysis shifts the burden of assessing the actual magnitude of the effect from the data analysts to the clinicians and other decision makers, who will need to make assumptions about the potential bias introduced by lack of adherence. Supplementing the ITT effects with ‘as treated’ or ‘per protocol’ effects can help decision makers [23], but only if a reasonable attempt is made to appro- priately adjust for confounding and selection bias. In summary, we recommend that all RCTs with substantial lack of adherence or loss to follow-up be analyzed using different methods, including an ITT analysis to estimate the effect of assigned treat- ment, and appropriately adjusted ‘per protocol’ and ‘as treated’ analyses (i.e., via IP weighting or g- estimation) to estimate the effect of received treat- ment. Each approach has relative advantages and disadvantages, and depends on a different combi- nation of assumptions [18]. To implement this recommendation, RCT protocols should include a more sophisticated statistical analysis plan, as well as plans to measure adherence and other postrando- mization variables. This added complexity is neces- sary to take full advantage of the substantial societal resources that are invested in RCTs. Acknowledgement We thank Goodarz Danaei for his comments to an earlier version of this manuscript. Funding This study was funded by National Institutes of Health grants R01 HL080644-01 and R01 HD056940. References 1. Luce BR, Kramer JM, Goodman SN, et al. Rethinking randomized clinical trials for comparative effectiveness research: the need for transformational change. Ann Intern Med 2009; 151: 206–09. 2. Food and Drug Administration. International Conference on Harmonisation; Guidance on Statistical 54 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  • 30. Principles for Clinical Trials. Federal Register 1998; 63: 49583–98. 3. Rosenberger WF, Lachin JM. Randomization in Clinical Trials: Theory and Practice. Wiley-Interscience, New York, NY, 2002. 4. Piantadosi S. Clinical Trials: A Methodologic Perspective (2nd edn). Wiley-Interscience, Hoboken, NJ, 2005. 5. Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther 1995; 57: 6–15. 6. Psaty BM, Prentice RL. Minimizing bias in randomized trials: the importance of blinding. JAMA 2010; 304: 793–94. 7. McMahon AD. Study control, violators, inclusion criteria and defining explanatory and pragmatic trials. Stat Med 2002; 21: 1365–76. 8. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutical trials. J Chronic Dis 1967; 20: 637–48. 9. Tunis SR, Stryer DB, Clancy CM. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA 2003; 290: 1624–32. 10. Herna´n MA, Herna´ndez-Dı´az S, Robins JM. A structural approach to selection bias. Epidemiology 2004; 15: 615–25. 11. Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Communin Stat 1994; 23: 2379–412. 12. Robins JM. Correction for non-compliance in equivalence trials. Stat Med 1998; 17: 269–302. 13. Robins JM, Finkelstein D. Correcting for non- compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) Log-rank tests. Biometrics 2000; 56: 779–88. 14. Herna´n MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology 2006; 17: 360–72. 15. Ten Have TR, Normand SL, Marcus SM, et al. Intent-to- treat vs. non-intent-to-treat analyses under treatment non-adherence in mental health randomized trials. Psychiatr Ann 2008; 38: 772–83. 16. Cole SR, Chu H. Effect of acyclovir on herpetic ocular recurrence using a structural nested model. Contemp Clin Trials 2005; 26: 300–10. 17. Mark SD, Robins JM. A method for the analysis of randomized trials with compliance information: an appli- cation to the Multiple Risk Factor Intervention Trial. Contr Clin Trials 1993; 14: 79–97. 18. Toh S, Herna´n MA. Causal Inference from longitudinal studies with baseline randomization. Int J Biostat 2008; 4: Article 22. 19. Herna´n MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin Pharmacol Toxicol 2006; 98: 237–42. 20. Cain LE, Robins JM, Lanoy E, et al. When to start treatment? a systematic approach to the comparison of dynamic regimes using observational data. Int J Biostat 2006; 6: Article 18. 21. Writing group for the Women’s Health Initiative Investigators. Risks and benefits of estrogen plus proges- tin in healthy postmenopausal women: principal results from the women’s health initiative randomized controlled trial. JAMA 2002; 288: 321–33. 22. Toh S, Herna´ndez-Dı´az S, Logan R, et al. Estimating absolute risks in the presence of nonadherence: an appli- cation to a follow-up study with baseline randomization. Epidemiology 2010; 21: 528–39. 23. Thorpe KE, Zwarenstein M, Oxman AD, et al. A prag- matic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. J Clin Epidemiol 2009; 62: 464–75. Beyond the intention-to-treat 55 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  • 31. VIEWPOINT Prespecified Falsification End Points Can They Validate True Observational Associations? Vinay Prasad, MD Anupam B. Jena, MD, PhD A S OBSERVATIONAL STUDIES HAVE INCREASED IN NUM- ber—fueled by a boom in electronic recordkeep- ing and the ease with which observational analy- ses of large databases can be performed—so too have failures to confirm initial research findings.1 Several solutions to the problem of incorrect observational results have been suggested,1,2 emphasizing the importance of a rec- ord not only of significant findings but of all analyses con- ducted.2 An important and increasingly familiar type of observa- tional study is the identification of rare adverse effects (de- fined by organizations such as the Council for Interna- tional Organizations and Medical Sciences as occurring among fewer than 1 per 1000 individuals) from population data. Examples of these studies include whether macrolide antibiotics such as azithromycin are associated with higher rates of sudden cardiac death3 ; whether proton pump in- hibitors (PPIs) are associated with higher rates of pneumo- nia4 ; or whether bisphosphonates are associated with an in- creased risk of atypical (subtrochanteric) femur fractures.5 Rare adverse events, such as these examples, occur so in- frequently that almost by definition they may not be iden- tified in randomized controlled trials (RCTs). Postmarket- ing data from thousands of patients are required to identify such low-frequency events. In fact, the ability to conduct postmarketing surveillance of large databases has been her- alded as a vital step in ensuring the safe dissemination of medical treatments after clinical trials (phase 4) for pre- cisely this reason. Few dispute the importance of observational studies for capturing rare adverse events. For instance, in early stud- ies of whether bisphosphonate use increases the rate of atypi- cal femur fractures, pooled analysis of RCTs demonstrated no elevated risk.6 However, these data were based on a lim- ited sample of 14 000 patients with only 284 hip or femur fractures and only 12 atypical fracture events over just more than 3.5 years of follow-up. In contrast, later observational studies addressing the same question were able to leverage much larger and more comprehensive data. One analysis that examined 205 466 women who took bisphosphonates for an average of 4 years identified more than 10 000 hip or fe- mur fractures and 716 atypical fractures.5 This analysis dem- onstrated an increased risk of atypical fractures associated with bisphosphonate use and was validated by another large population-based study. However, analyses in large data sets are not necessarily correct simply because they are larger. Control groups might not eliminate potential confounders, or many varying defi- nitions of exposure to the agent may be tested (alternative thresholds for dose or duration of a drug)—a form of mul- tiple-hypothesis testing.2 Just as small, true signals can be identified by these analyses, so too can small, erroneous as- sociations. For instance, several observational studies have found an association between use of PPIs and development of pneumonia, and it is biologically plausible that elevated gastric pH may engender bacterial colonization.4 However, it is also possible that even after statistical adjustment for known comorbid conditions, PPI users may have other un- observed health characteristics (such as poor health lit- eracy or adherence) that could increase their rates of pneu- monia, apart from use of the drug. Alternatively, physicians who are more likely to prescribe PPIs to their patients also may be more likely to diagnose their patients with pneu- monia in the appropriate clinical setting. Both mecha- nisms would suggest that the observational association be- tween PPI use and pneumonia is confounded. In light of the increasing prevalence of such studies and their importance in shaping clinical decisions, it is important to know that the associations identified are true rather than spurious cor- relations. Prespecified falsification hypotheses may pro- vide an intuitive and useful safeguard when observational data are used to find rare harms. A falsification hypothesis is a claim, distinct from the one being tested, that researchers believe is highly unlikely to be causally related to the intervention in question.7 For in- stance, a falsification hypothesis may be that PPI use in- creases the rate of soft tissue infection or myocardial infarc- tion. A confirmed falsification test—in this case, a positive association between PPI use and risks of these conditions— Author Affiliations: Medical Oncology Branch, National Cancer Institute, Na- tional Institutes of Health, Bethesda, Maryland (Dr Prasad); Department of Health Care Policy, Harvard Medical School, and Massachusetts General Hospital, Bos- ton (Dr Jena); and National Bureau of Economic Research, Cambridge, Massa- chusetts (Dr Jena). Corresponding Author: Anupam B. Jena, MD, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (jena @hcp.med.harvard.edu). ©2013 American Medical Association. All rights reserved. JAMA, January 16, 2013—Vol 309, No. 3 241 Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
  • 32. would suggest that an association between PPI use and pneu- monia initially suspected to be causal is perhaps con- founded by unobserved patient or physician characteristics. Ideally, several prespecified false hypotheses can be tested and, if found not to exist, can support the main study as- sociation of interest. In the case of PPIs, falsification analy- ses have shown that many improbable conditions—chest pain, urinary tract infections, osteoarthritis, rheumatoid ar- thritis flares, and deep venous thrombosis—are also linked to PPI use,4 making the claim of an increased risk of pneu- monia related to use of the drug unlikely. Another example of falsification analysis applied to ob- servational associations involves the reported relationship of social networks with the spread of complex phenomena such as smoking, obesity, and depression. In social net- work studies, persons with social ties are shown to be more likely to gain or lose weight, or to start or stop smoking, at similar time points than 2 random persons in the same group. Several studies supported these claims; however, other stud- ies have shown that even implausible factors—acne, height, and headaches—may also exhibit “network effects.”8 Falsification analysis can be operationalized by asking in- vestigators to specify implausible hypotheses up front and then testing those claims using statistical methods similar to those used in the primary analysis. Falsification could be required both for studies that aim to show a rare harm of a particular medical intervention as well as for studies that aim to show deleterious interactions between medications. For instance, in evaluating whether concomitant use of clopi- dogrel and PPIs is associated with decreased effectiveness of the former drug and worsens cardiovascular outcomes, does the use of PPIs also implausibly diminish the effect of antihypertensive agents or metformin? Prespecifying falsification end points and choosing them appropriately is important for avoiding the problem of mul- tiple hypothesis testing. For instance, if many falsification hypotheses are tested to support a particular observational association, a few falsification outcomes will pass the falsi- fication test—ie, will not be associated with the drug or in- tervention of interest—whereas other falsification tests may fail. If the former are selectively reported, some associa- tions may be mistakenly validated. This issue cannot be ad- dressed by statistical testing for multiple hypotheses alone because selective reporting may still occur. Instead, pre- specifying falsification outcomes and choosing outcomes that are common may mitigate concerns about post hoc data min- ing. In the case of PPIs and risk of pneumonia, falsification analyses used prevalent ambulatory complaints such chest pain, urinary tract infections, and osteoarthritis.4 Observational studies of rare effects of a drug may be fur- ther validated by verification analyses that demonstrate the presence of known adverse effects of a drug in the data set being studied. For instance, an observational study suggest- ing an unknown adverse effect of clopidogrel (for ex- ample, seizures) should also be able to demonstrate the pres- ence of known adverse effects such as gastrointestinal hemorrhage associated with clopidogrel use. The inability of a study to verify known adverse effects should raise ques- tions about selection in the study population. Although no published recommendations exist, standard- ized falsification analyses with 3 to 4 prespecified or highly prevalent disease outcomes may help to strengthen the va- lidity of observational studies, as could inclusion of verifi- cation analyses. Information on whether falsification and validation end points were used in a study should be in- cluded in a registry for observational studies that others have suggested.2 Prespecified falsification hypotheses can improve the va- lidity of studies finding rare harms when researchers can- not determine answers to these questions from RCTs, either because of limited sample sizes or limited follow-up. How- ever, falsification analysis is not a perfect tool for validat- ing the associations in observational studies, nor is it in- tended to be. The absence of implausible falsification hypotheses does not imply that the primary association of interest is causal, nor does their presence guarantee that real relations do not exist. However, when many false relation- ships are present, caution is warranted in the interpreta- tion of study findings. Conflict of Interest Disclosures: The authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were re- ported. REFERENCES 1. Thomas L, Peterson ED. The value of statistical analysis plans in observational research: defining high-quality research from the start. JAMA. 2012;308(8): 773-774. 2. Ioannidis JP. The importance of potential studies that have not existed and reg- istration of observational data sets. JAMA. 2012;308(6):575-576. 3. Ray WA, Murray KT, Hall K, Arbogast PG, Stein CM. Azithromycin and the risk of cardiovascular death. N Engl J Med. 2012;366(20):1881-1890. 4. Jena AB, Sun E, Goldman DP. Confounding in the association of proton pump inhibitor use with risk of community-acquired pneumonia [published online Sep- tember 7, 2012]. J Gen Intern Med. doi:10.1007/s11606-012-2211-5. 5. Park-Wyllie LY, Mamdani MM, Juurlink DN, et al. Bisphosphonate use and the risk of subtrochanteric or femoral shaft fractures in older women. JAMA. 2011; 305(8):783-789. 6. Black DM, Kelly MP, Genant HK, et al; Fracture Intervention Trial Steering Committee; HORIZON Pivotal Fracture Trial Steering Committee. Bisphospho- nates and fractures of the subtrochanteric or diaphyseal femur. N Engl J Med. 2010; 362(19):1761-1771. 7. Bertrand M, Duflo E, Mullainathan S. How much should we trust differences- in-differences estimates? Q J Econ. 2004;119:249-275. 8. Cohen-Cole E, Fletcher JM. Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis. BMJ. 2008;337:a2533. VIEWPOINT 242 JAMA, January 16, 2013—Vol 309, No. 3 ©2013 American Medical Association. All rights reserved. Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
  • 33. PERSPECTIVE Orthogonal predictions: follow-up questions for suggestive datay Alexander M. Walker MD, DrPH1,2* 1 World Health Information Science Consultants, LLC, Newton, MA 02466, USA 2 Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA SUMMARY When a biological hypothesis of causal effect can be inferred, the hypothesis can sometimes be tested in the selfsame database that gave rise to the study data from which the hypothesis grew. Valid testing happens when the inferred biological hypothesis has scientific implications that predict new relations between observations already recorded. Testing for the existence of the new relations is a valid assessment of the biological hypothesis, so long as the newly predicted relations are not a logical correlate of the observations that stimulated the hypothesis in the first place. These predictions that lead to valid tests might be called ‘orthogonal’ predictions in the data, and stand in marked contrast to ‘scrawny’ hypotheses with no biological content, which predict simply that the same data relations will be seen in a new database. The Universal Data Warehouse will shortly render moot searches for new databases in which to test. Copyright # 2010 John Wiley & Sons, Ltd. key words — databases; hypothesis testing; induction; inference Received 2 October 2009; Accepted 13 January 2010 INTRODUCTION In 2000, the Food and Drug Administration’s (FDA) Manette Niu and her colleagues had found something that might have been predicted by medicine, but not by statistics.1 They were looking for infants who had gotten into trouble after a dose of Wyeth’s RotaShield vaccine in the Centers for Disease Control’s Vaccine Adverse Event Reporting System. A vaccine against rotavirus infection in infants, RotaShield was already off the market.2 In the United States, rotavirus causes diarrhea so severe that it can lead to hospitalization. By contrast, the infection is deadly in poor countries. The 1999 withdrawal had arguably cost hundreds of thousands of lives of children whose death from rotavirus-induced diarrhea could have been avoided through widespread vaccina- tion with RotaShield.3,4 The enormity of the con- sequences of the withdrawal made it important that the decision had been based at least in sound biology. Wyeth suspended sales of RotaShield because the vaccine appeared to cause intussusception, an infant bowel disorder in which a portion of the colon slips inside of itself. The range of manifestations of intussusception varies enormously. It can resolve on its own, with little more by way of signs than the baby’s fussiness from abdominal pain. Sometimes tissue damage causes bloody diarrhea. Sometimes the bowel infarcts and must be removed, or the baby dies. Dr Niu had used a powerful data-mining tool, Bill DuMouchel’s Multi-Item Gamma Poisson Shrinker to sift through the Vaccine Adverse Event Reporting System (VAERS) data, and she found that intussuscep- tion was not alone in its association with RotaShield.5 So too were gastrointestinal hemorrhage, intestinal obstruction, gastroenteritis, and abdominal pain. My argument here is that those correlations represented independent tests of the biological hypothesis that had already killed the vaccine. The observations were sufficient to discriminate hypotheses of biological causation from those of chance, though competing (and testable) hypotheses of artifact may have remained. The biologic hypothesis was that if RotaShield had caused intussusception, it was likely to have caused pharmacoepidemiology and drug safety 2010; 19: 529–532 Published online 22 March 2010 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/pds.1929 * Correspondence to: A. M. Walker, World Health Information Science Consultants, LLC, 275 Grove St., Suite 2-400, Newton, MA 02466, USA. E-mail: Alec.Walker@WHISCON.com y The author declared no conflict of interest. Copyright # 2010 John Wiley & Sons, Ltd.
  • 34. cases that did not present as fully recognized instances of the disease, but which nonetheless represented the same pathology. Looking for these other conditions was a test of the biologic hypothesis raised by the occurrence of the severest cases. Like the original observations, the test data resided in VAERS, but were nonetheless independent, in that different physicians in different places, acting more or less concurrently, reported them about different patients. INDUCTION AND TESTING The key step in Niu’s activity was induction of a biological hypothesis of cause from an observation of association. Testing the biological hypothesis differs fundamentally from testing the data-restatement that ‘There is an association between RotaShield and intussusception.’ The latter, by itself a scrawny hypothesis if you could call it a hypothesis at all, might be examined in other environments, though probably not in real time, since VAERS is a national system and RotaShield had been marketed only in the United States. Scrawny hypotheses have no meat to them, that is they do no more than predict more of the same, and even then only when the circumstances of observation are identical. The biological hypothesis, by contrast, was immediately testable through its implica- tions in VAERS, and could produce a host of other empiric tests. From the perspective of the Wyeth, the FDA and the Centers for Disease Control and Prevention (CDC), the parties who had to act in that summer of crisis, only biologic causation really mattered. Biologic causation was not the only theory that predicted reports of multiple related diseases in association with RotaShield. Most of the reports came in after the CDC had announced the association and Wyeth had suspended distribution of RotaTeq. Phys- icians who did not know one another might have been similarly sensitized to the idea that symptom com- plexes compatible with intussusception should be reported. Stimulated reporting is therefore another theory that competes with biological causation to account for the findings. For the present discussion, the key point is not how well the competing hypotheses (biological causation, stimulated reporting, and chance) explain the newly found data. The key is whether one can rationally look at the non-intussusception diagnoses in VAERS to test theories about the RotaShield-intussusception associ- ation, and whether such looks ‘into the same data’ are logically suspect. Trudy Murphy and collaborators offered another example of testing implications of the biological hypothesis of causation in a subsequent case-control study of RotaShield and intussusception.6 Looking at any prior receipt of RotaShield, they found an adjusted odds ratio of 2.2 (95% CI 1.5–3.3). Murphy’s data also provided a test of the theory of biological causation, no form of which would predict a uniform distribution of cases over time after vaccination. Indeed there were pronounced aggregations of cases 3–7 days following first and second immunization. Interestingly, a theory of stimulated reporting would not have produced time clustering, at least not without secondary theories added on top, and so the Murphy data weighed against the leading non-biologic theory for the Niu observations. ORTHOGONAL PREDICTIONS Niu’s and Murphy’s findings share a common element. In neither case did the original observation (case reports of intussusception for Niu, or an association between RotaShield in and ever-immunization for Murphy) imply the follow-up observations (other diagnoses and time-clusters) as a matter of logic, on the null hypothesis. That is, neither set of follow-up observations was predicted by the corresponding scrawny hypothesis, since neither was simply a restatement of the initiating finding. In this sense, I propose that we call the predictions that Niu and Murphy tested ‘orthogonal’ to the original observation. In the very high-dimensional space of medical observations, the predicted data are not simply a rotation of the original findings. Where did the orthogonal predictions come from? The investigators stepped out of the data and into the physical world. We do not know about the world directly, but we can have theories about how it works, and we can test those theories against what we see. Reasoning about the nature of the relations that gave rise to observed data, we can look for opportunities to test the theories. With discipline, we can restrict our ‘predictions’ to relations that are genuinely new, and yet implied by our theories. SHOCKED, SHOCKED ‘I’m shocked, shocked to find that gambling is going on in here!’ says Captain Renault in Casablanca, just before he discretely accepts his winnings and closes down Rick’s Cafe´ Ame´ricain to appease his Nazi minders. Advocates for finding new data sources to test hypotheses might feel kinship with the captain. While Copyright # 2010 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds 530 a. m. walker
  • 35. sincerely believing in the importance of independent replication, they find that they too examine different dimensions of outcomes in suggestive data to evaluate important hypotheses, particularly those hypotheses that would require immediate action if true. This is already the core of regulatory epidemiology, which concerns itself with the best decision on the available data. The necessity to act sometimes plays havoc with prescriptions that cannot be implemented quickly. Exploration of data in hand is not limited to public health epidemiologists, regulators among them. In fact most epidemiologists check causal hypotheses in the data that generated them. Whenever observational researchers see an important effect, they worry (or should do) whether they have missed some confound- ing factor. Confounding is a causal alternative hypothesis for an observed association, and the hypothesis of confounding often has testable implica- tions in the data at hand. Will the crude effect disappear when we control for age? It would be hard to describe the search for confounders as anything other than testing alternative causal hypotheses in the data that gave rise to them. Far from public health, sciences in which there is little opportunity for experiment, such as geology, regularly test hypotheses in existing data. Ebel and Grossman, for example, could ‘predict for the first time’ (their words) events 65 million years ago, in a headline-grabbing theory that explained a world-wide layer of iridium at just the geological stratum that coincided with the disappearance of the dinosaurs.7 There is nothing illegitimate in the exercise. THE UNIVERSAL DATA WAREHOUSE The question that motivated this Symposium, ‘One Database or Two?’ was whether it is necessary to seek out a new database to test theories derived from a database at hand. Above I have argued that the issue is not the separation of the databases, but rather the independence of the test and hypothesis-generating data. Clearly, two physically separate databases whose information was independently derived by different investigators working in different sources meet the criterion of independence, but so do independently derived domains of single databases. Fortunately, the question may shortly be moot, because there will be in the future only one database. Let me explain In 1993, Philip Cole, a professor at the University of Alabama at Birmingham, provided a radical solution to the repeated critique that epidemiologists were finding unanticipated relations in data, and that the researchers were presuming to make statements about hypotheses that had not been specified in advance. In ‘The Hypothesis Generating Machine’, Cole announced the creation of the HGM, a machine that had integrated data on every agent, every means of exposure and every time relation together with every disease. From these, the HGM had formed every possible hypothesis about every possible relationship.8 Never again would a hypothesis be denigrated for having been newly inferred from data. In the same elegant paper, Cole also likened the idea that studies generate hypotheses to the once widely held view that piles of rags generate mouse pups. People generate hypotheses; inanimate studies do not. With acknowledgment to Cole, couldn’t we imagine a Universal Data Warehouse consisting of all data ever recorded? Some twists of relativity theory might even get us to postulate that the UDW could contain all future data as well.9 Henceforward, all tests of all hypotheses would occur by necessity in the UDW, whether or not the investigator was aware that his or her data were simply a view into the warehouse. Researchers would evermore test and measure the impact of hypotheses in the data that suggested them. The new procedure of resorting to the UDW will not constitute a departure from current practice, and may result in more efficient discussion. The reluctance of statisticians and philosophers to test a hypothesis in the data that generated it makes rigorous sense. I think that our disagreement, if there was one, on the enjoyable morning of our Symposium was definitional rather than scientific. In an earlier era, when dedicated, expensive collection was the only source of data, the sensible analysis plan extracted everything to be learned the first time through. Overwhelmed by information from public and private data streams, researchers now select out the pieces that seem right to answer the questions they pose. The KEY POINTS When examination of complex data leads to a biological hypothesis, that hypothesis may have implications that are testable in the original data. The test data need to be independent of the hypothesis-generating data. The ‘‘Universal Data Warehouse’’ reminds us of the futility of substituting data location for data independence. Copyright # 2010 John Wiley Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds orthogonal predictions 531
  • 36. answersraisenewquestions,differentones,anditmakes sense to pick out (from the same fire hose spurting facts) new data that will help us make sense of what we think we may have learned the first time through. Recorded experience is the database through which we observe, theorize, test, theorize, observe again, test again, and so on for as long as we have stamina and means. We certainly should have standards as to when data test a theory, but the standard does not need to be that the originating databases are different. ACKNOWLEDGEMENTS This paper owes much to many people, none of whom should be held accountable for its shortcomings, as author did not always agree with his friends’ good advice. Author is indebted to his co-participants in the Symposium, Larry Gould particularly, Patrick Ryan and Sebastian Schnee- weiss, and the deft organizers, Susan Sacks and Nancy Santanello for their valuable advice. He also thanks Phil Cole, Ken Rothman and Paul Stang for their careful reading and to-the-point commentary. There are no relevant finan- cial considerations to disclose. REFERENCES 1. Niu MT, Erwin DE, Braun MM. Data mining in the US Vaccine Adverse Event Reporting System (VAERS): early detection of intussusception and other events after rotavirus vaccination. Vaccine 2001; 19: 4627–4634. 2. Centers for Disease Control and Prevention (CDC). Suspension of rotavirus vaccine after reports of intussusception—United States, 1999. MMWR Morb Mortal Wkly Rep 2004; 53(34): 786–789. Erratum in: MMWR Morb Mortal Wkly Rep 2004; 53(37): 879. 3. World Health Organization. Report of the Meeting on Future Directions for Rotavirus Vaccine Research in Developing Countries, Geneva, 9–11 February 2000, Geneva (Publication WHO/VB/00.23). 4. Linhares AC, Bresee JS. Rotavirus vaccines and vaccination in Latin America. Rev Panam Salud Publica 2000; 8(5): 305–331. 5. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. Am Stat 1999; 53: 177–190. 6. Murphy TV, Gargiullo PM, Massoudi MS et al. Intussusception among infants given an oral rotavirus vaccine. N Engl J Med 2001; 344: 564– 572. 7. Ebel DS, Grossman L. Spinel-bearing spherules condensed from the Chicxulub impact-vapor plume. Geology 2005; 33(4): 293–296. 8. Cole P. The hypothesis generating machine. Epidemiology 1993; 4(3): 271–273. 9. Rindler W. Essential Relativity (rev. 2nd edn). Springer Verlag: Berlin, 1977. See Section 2.4. ‘The Relativity of Simultaneity’ for a particularly lucid presentation of this phenomenon. The warehouse does not of course contain all future data, as we will restrict it to information generated by and about humans. Copyright # 2010 John Wiley Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds 532 a. m. walker
  • 37. Special Issue Paper Received 4 November 2011, Accepted 28 August 2012 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5620 Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership‡ Patrick B. Ryan,a,b,c*† David Madigan,b,d Paul E. Stang,a,b J. Marc Overhage,b,e Judith A. Racoosinb,f and Abraham G. Hartzemab,g§ Background: Expanded availability of observational healthcare data (both administrative claims and electronic health records) has prompted the development of statistical methods for identifying adverse events associated with medical products, but the operating characteristics of these methods when applied to the real-world data are unknown. Methods: We studied the performance of eight analytic methods for estimating of the strength of association- relative risk (RR) and associated standard error of 53 drug–adverse event outcome pairs, both positive and negative controls. The methods were applied to a network of ten observational healthcare databases, comprising over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of methods at RR thresholds achieving statistical significance of p 0.05 or p 0.001 and with absolute threshold RR 1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC). Results: Although no specific method demonstrated superior performance, the aggregate results provide a benchmark and baseline expectation for risk identification method performance. At traditional levels of statistical significance (RR 1, p 0.05), all methods have a false positive rate 18%, with positive predictive value 38%. The best predictive model, high-dimensional propensity score, achieved an AUC D 0.77. At 50% sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods ranged from 9% to 33%. Conclusions: Systematic processes for risk identification can provide useful information to supplement an overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying false positive associations. Copyright © 2012 John Wiley Sons, Ltd. Keywords: product surveillance, postmarketing; pharmacoepidemiology; epidemiologic methods; causality; electronic health records; adverse drug reactions 1. Introduction The U.S. Food and Drug Administration Amendments Act of 2007 required the establishment of an ‘active postmarket risk identification and analysis system’ with access to patient-level observational data from 100 million lives by 2012 [1]. In this context, we define ‘risk identification’ as a systematic aJohnson Johnson Pharmaceutical Research and Development LLC, Titusville, NJ, U.S.A. bObservational Medical Outcomes Partnership, Foundation for the National Institutes of Health, Bethesda, MD, U.S.A. cUNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A. dDepartment of Statistics, Columbia University, New York, NY, U.S.A. eRegenstrief Institute and Indiana University School of Medicine, Indianapolis, IN, U.S.A. fCenter for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, U.S.A. gCollege of Pharmacy, University of Florida, Gainesville, FL, U.S.A. *Correspondence to: Patrick B. Ryan, Johnson Johnson 1125 Trenton-Harbourton Road PO Box 200 MS K304 Titusville, NJ 08560, U.S.A. †E-mail: ryan@omop.org ‡This article expresses the views of the authors and does not necessarily represent those of their affiliated organizations. §At the time of this work, Dr. Hartzema was on sabbatical at the U.S. Food and Drug Administration. Copyright © 2012 John Wiley Sons, Ltd. Statist. Med. 2012
  • 38. P. B. RYAN ET AL. and reproducible process to efficiently generate evidence to support the characterization of the potential effects of medical products. This system applied to a network of observational healthcare databases would provide another source of evidence to complement existing safety information contributed by preclinical data, clinical trials, spontaneous adverse event reports, registries, and pharmacoepidemiology evaluation studies. When used in conjunction with evidence of the benefits of the product and alternative treatments a more comprehensive understanding of the effects of medical products promises to inform medical decision making. The practicing clinician has a critical role in both the generation of quality data that can be used for these efforts and integration of the findings from safety assessments into routine practice; both of which become increasingly important in the evolution of the electronic health record and the creation of a ‘learning healthcare system’ [2]. The secondary use of observational healthcare databases (e.g., administrative claims and electronic health records) has become the predominant resource in pharmacoepidemiology, health outcomes, and health services research because it reflects ‘real-world’ experience. Unlike well-designed and well-performed randomized clinical trials, the use of observational data requires special consideration of potential biases that can distort the measurement of the true effect size. Researchers can choose from a variety of analytic methods that attempt to control for these biases; however, the operating characteristics of these methods and their potential utility within a risk identification system have not been systematically studied. The Observational Medical Outcomes Partnership (OMOP; http://omop.fnih.org) conducts methodological research to support the development of a national risk identification and analysis system; the details of which have been previously published [3]. The OMOP research plan consists of a series of empirical assessments of the performance characteristics of a number of analysis methods conducted across a network of observational data sources. This paper reports findings from a series of assessments of risk identification methods to determine their ability to correctly identify ‘true’ drug–adverse event outcome associations and drug–adverse outcome negative controls as ‘not associated’. 2. Methods The OMOP established a network of ten data sources capturing the healthcare experience of 130 million patients. The data network included administrative claims data (SDI Health, Humana Inc., and four Thomson Reuters MarketScan® Research Databases reflecting commercial claims with and without laboratory records, Medicare supplemental, and multistate Medicaid populations) and electronic health records (Regenstrief Institute, Partners Healthcare System, GE Centricity, and Department of Veterans Affairs Center for Medication Safety/Outcomes Research). Table I depicts the characteristics and popu- lation sizes of each data source. The data sources in the OMOP were selected to reflect the diversity of U.S. observational data [4]. This research program was approved or granted exemption by the Insti- tutional Review Boards at each participating organizations. All of these datasets were transformed to a common data model, where data about drug exposure and condition occurrence were structured in a consistent fashion and defined using the same controlled terminologies, to facilitate subsequent analysis [5]. A total of 13 different analytic methods were implemented during the OMOP experiment. Complete descriptions, references, and source code for each method are available at http://omop.fnih.org/Methods Library, of those eight report estimates of relative risk (RR) and its standard error. In this paper, we examine these eight methods. Results for the remaining five methods are available upon request. Each method had multiple parameter settings corresponding to various study design decisions, including definition of time-at-risk, identification of outcomes based on first occurrence or all occurrences of diagnosis codes, choice of comparator group, and specific confounding adjustment strategy. The specific parameters for each method and the number of parameter combinations studied for each method are shown in Table II. The performance of the analytical methods was assessed on the basis of their ability to correctly identify nine drug–outcome pairs that were classified as ‘positive controls’ and 44 drug–outcome pairs classified as ‘negative controls’. Positive controls were true associations as determined by the listing of the corresponding outcome as an adverse event in the drug product label along with prior published observational database research suggesting an association; subsequently, these positive controls were endorsed by expert panel consensus; negative controls lacked such evidence in their labeling and published literature and were ruled out as having a positive association by the expert panel. Members of the OMOP’s advisory boards and other participants [3] and literature references for the test cases [6] Copyright © 2012 John Wiley Sons, Ltd. Statist. Med. 2012