Observational Research designs: detailed description

Research Designs: Observational
Professor Tarek Tawfik Amin
Epidemiology and Public Health, Faculty of Medicine, Cairo University
Geneva Foundation for Medical Education and Training
Asian Pacific Organization for Cancer Prevention
amin55@myway.com dramin55@gmail.com
Basic Research Competency Program for Research Coordinators
August 2015, MEDC, Faculty Of Medicine, Cairo University, Cairo, Egypt.

Objectives
By the end of the day, research coordinators will be able to:
1- Recognize the different types of research designs in relation
to the research question.
2- Recognize the indications, structure and reporting of
observational research designs.
3- Identify the advantages and disadvantages of observational
research designs .
4-Criticize the design components using available literature.
5/7/2016 2Professor Tarek Tawfik Amin

Methodology
1- Interactive lecturing
2- Group and individual works
3- Discussion and brain storming.
4- Assignments

Contents and plan
Session title Duration Methods Activities
Pre-test: select the appropriate design
Research elements (revisited)
9:00-9:30 Group work
Research designs: types and indications 9:30-10:15 Interactive
Individual/group work
Discussion
Descriptive designs
Cross-sectional design
10:15-11:00 Interactive
Criticize the structure of literature
Activity 1
Break 1 11:00-11:30
Cross-sectional
Case-control designs
Individual/group work
Activity 2
Break 2 12:30-1:00
Case-control design 1:00: 1:30 Interactive
Activity 3
Break 3 1:30-1:45
Cohort design
Post-test
Recap questions
Activity 4
5/7/2016 Professor Tarek Tawfik Amin
4

Research process “The 8 steps model”
Formulating
a research
question
Research
design
Instruments
for data
collection
Selecting
a sample
Research
protocol
writing
Data
collection
Data
processing
Research
report
FINER
Variables and
hypotheses: definition
and typology
Literature
review
Research design:
functions
Study designs
Methods and
tools of data
collection
Validity and
reliability of the
research tool
Field test
of the tools
Sampling theory
and designs
Contents of
research proposal
Editing
Code
book
Coding
Methods of data
Processing:
computing
and statistics
Principles of
Scientific writing
What How Conducting of the study
Kumar 2005
Dissemination
including publication

Definition of a research design
A traditional research design is a blueprint or
detailed plan for how a research study is to be
completed-
o Operationalizing variables so they can be measured,
o Selecting a sample of interest to study,
o Collecting data as a basis for testing hypotheses and
o Analyzing the results.
‘Thyer 1993’

Types of study design (I)
Prospective
ClassificationbaseStudydesigns
Number
of contacts
Reference
period
Nature of
investigation
One Two Three or more
Cross-sectional
Studies
Before and
after studies
Longitudinal
Studies
Retrospective
Retrospective
Prospective
Experimental
Pre-
experimental
Semi-
experimental
Kumar 2005

Did the investigator assign exposure “intervention”?
Experimental study Observational study
Random allocation?
Comparison group?
Yes No
Randomized
Controlled
Trial RCT
Non-
Randomized
Controlled
trial
NoYes
Analytical study
Direction?
Descriptive study
Yes No
Cohort
study
Case-control
study
Cross-sectional
study
Exposure and outcomeExposure ←outcomeExposure →outcome
Research designs (II): Clinical perspective
Case report
Case seriesEcological
5/7/2016
8
The research participants
are assigned by chance,
rather than by choice, to
either the experimental
group or the control group

Types of
Experimental
Research
True-
Experimental
Design:
Randomizatio
n + controlled
Quasi-Experimental
design
Pre-Experimental
Design
Pretest/posttest control group
design: At least 2 randomly-assigned groups; both
pretested for dependent variable
Posttest only control group
design: there is no pretest
Solomon four group design: Random
assignment of participants to one of four groups: Two
groups are pretested; two groups are not pretested, One
pretested group & one un-pretested group receive the
experimental treatment: All four groups are post-tested
Single variable
designs
Factorial
designs
Statistical designs**includes:
Randomized block design
Latin Square and
Factorial design

Types of
Experimental
Research
True-Experimental
Design:
Randomization +
controlled
Quasi-Experimental
design: group
assignment is not
randomized,
intervention may be
Pre-Experimental
Design
Non-equivalent control group: Two or
more existing groups pretested; administered treatment;
and post-tested: Participants’ assignment to groups is not
random; assignment of treatments to groups is random
Time series: One group repeatedly pretested;
administered treatment; repeatedly post-tested.
Multiple time series
Single variable
designs
Factorial
designs

Types of
Experimental
Research
True-Experimental Design:
Randomization + controlled
Quasi-Experimental
design: group
assignment is not
randomized,
intervention may be
Pre-Experimental
Design: Not
controlled , non-
randomized
One-short case study: One group
exposed to one treatment then given posttest
One-group, pretest/posttest: One
group pretested, exposed to one treatment, then
post-tested
Static group comparison: at least 2
groups (equal size), One group pretested, exposed
to one treatment, then post-tested: the control is for
just observing performance if not receive the
treatment.
Single variable
designs
Factorial
designs

Typical usesAction in future
time
Action in
present time
Action in past
time
FormTimingType of
study
Prevalence estimates
Reference range
Current health status
Changes over time
Prognosis and
natural history
Etiology
Etiology particularly
for rare diseases
Clinical trials to
assess therapy
Trials to assess
preventive measures
Lab. experiments
Observational
Observational
Observational
Observational
Experimental
Cross-sectional
Cross-sectional
Longitudinal
(prospective)
Longitudinal
(retrospective)
Longitudinal
(prospective)
Cross-
sectional
Repeated
cross-
sectional
Cohort
Case-
control
C.T
Collect
All
information
Define cohort
and assess
risk factors
Observe
outcome
Collect
All
information
Define cases
and controls
(outcome)
Collect
All
information
Collect
All
information
Assess
Risk
factors
Observe
outcome
Apply
intervention
follow
trace
Phases and indications of most commonly used study designs
follow

Observational research designs

Design Elements Common data
summaries
Problems encountered
Ecological
studies
Utilizes population level data, not
data on individuals: U.S. states’
rates of coronary heart disease mortality and
per capita cigarette sales
Correlational
studies
The major
disadvantage is the
ecological fallacy.
Case reports Detailed description of a single
typically new or atypical
individual case: report of abdominal
aortic aneurysm presenting as transient
hemiplegia is a case report.
Show data; few
simple descriptors
and graphs
Caseness
Case series Simple detailed description of a
series of typically new or atypical
individual cases: report of Pneumocystis
pneumonia in previously healthy, homosexual
men is a case series that led to the discovery of
what is now known as HIV/AIDS.
Show data;
sometimes means,
medians, range
and graphs
Caseness
Table 1. Overview of Observational Study Designs:
5/7/2016 14

Design Elements Common data summaries Problems
encountered
Cross-sectional
survey
prevalence
incidence
- Single time point define a population
of individuals at a specific point in
time.
- Observations made on individuals to
describe presence of diseases or other
characteristics.
- Means, standard deviation (SD)
%, regression, odds ratios (OR),
risk ratios (RR), risk differences,
attributable risk, graphs
Unsuitable for rare
diseases or disease
of short duration
Case-control
studies
- Typically retrospective , good for rare
diseases.
- Cases and non cases (controls) are
defined then historical information on
risk factors or exposures is collected
- Means, SD, proportions,
regression, OR, graphs
Several sources of
biases
Selection of
controls
Matching
Longitudinal
cohort studies
- Prospective longitudinal : concurrent
or non-concurrent.
- Suitable for rare exposures.
- Large sample sizes are needed to assess
rare diseases.
- Exposure information is collected at
time of exposure and over time disease
information develops.
- Means, SD, %, regression, OR ,
RR, risk differences, attributable
risk, graphs.
Cohort selection
Baises
Nested Case-
control
- Nested inside some large population
based cohort studies
- Means, SD , %, regression, OR
, RR, risk differences,
attributable risk, graphs.
Table 2. Overview of Observational
Study Designs:
5/7/2016
15

Descriptive Studies
Deal with individuals Relate to the population
Ecological correlational studiesCase report
Case-series
report

I- Case Report
o The least publishable units.
o Reports unusual disease or association prompting
further investigations with more rigorous study
design.
Example: benign hepatocellular adenoma and high-dose contraceptive pills.
o Not all case reports deal with serious health threats,
o Some simply enliven the generally drab medical
literature.

What is the most probable diagnosis?

II-Case-series report
- Report aggregates individual cases in one report.
- A clue to an epidemic: the appearance of several
similar cases heralds an epidemic.
Example: a cluster of homosexual men in Los Angeles with a similar
syndrome alerted the medical community of HIV/AIDS epidemic in
North America.
- A major trigger for further investigations
compared to case report.
- Can constitute the case group for a case-control
study.

Remarks on case report and case series
- Simple description (yet comprehensive and detailed
allowing recognition of similar cases by the readers)
of the clinical data from a very well-defined group of
individuals without reference to a comparison group.
- Motivated by the intent to describe a new clinical
phenomenon.
- Many times these studies lead nowhere,
- Few times they are the discovery of a new disease.

Caseness in case series
• The inclusion criteria (caseness) applied equally to all
patients in the series. “Caseness” needs to be the same
across all patients.
• Definition of a non-case, variables of interest to
diagnosis, outcomes, safety, and other information
needs to be consistent across patients.
• Observations: method as reliable and reproducible as
possible.
• Many times raw data and data summaries are
presented.

Case report and case series Advantages and disadvantages
Advantages Disadvantages
1- Case series/reports useful in
forming hypotheses, planning natural
history studies, and describing
clinical experience.
2- Providing clue for emerging
condition disease.
3- Easy and inexpensive to do in
clinical settings.
1- The selection of cases/patients
biased, making generalization of
results difficult (we usually reported
the sickest persons).
A case series by necessity involves people or
items that have come to medical attention
for one reason or another, or perhaps
involves only the most severe cases or only
those who have access to medical care.
2- The results are of no significance
unless reproduced by further studies.

Reporting Descriptive Studies
The Descriptive Pentad
Descriptive studies are ‘the first toe in the water’
Concerned with and designed only to describe
the existing distribution of variables without
regard to causal or other hypotheses.
Good descriptive study should answer five
basic ‘Ws”.

The Five Ws
ComponentsWs
Age, sex, and othercharacteristics.
A clear, specific,and measurablecase definitionis
essential.
Often provide clues about causethatcan be pursuedwith
moresophisticatedresearch designs.
Timeprovidesimportant cluesabout healthevents.
Geographyhas a hugeeffect on health.
Whohasthedisease?
Whatistheconditionor diseasebeing
studied?
Whydidtheconditionor diseasearise?
Whenistheconditioncommonor rare?
Wheredoesor doesnotthediseaseor
conditionarise?
So what? The implicit What next?

Activity 1
Caseness in case report and case series
Reporting of data
5/7/2016 Professor Tarek Tawfik Amin 25

III-Ecological Correlational Studies
- Look for associations between exposures and outcomes in the
population rather than in individuals.
- Can be a convenient initial search for hypotheses as the data are
already collected.
- Correlation coefficient r, which indicates how linear is the relation
between exposure and outcome.
 The mortality of coronary heart disease correlates with per
capita sales of cigarettes.
 Inverse correlation between access to safe abortion and
maternal mortality rate.

Consumptionof dietary fat and fast food
in certaincommunity.
Highmortalityfromcoronaryheart disease
(highincidence of MI)
Ecological study

Ecological Correlational Studies
The inability to link exposure to outcome in
individuals.
Controlling of confounders.
are the two major limitations of this type of
study.
Death rates from coronary heart disease is
positively correlated with number of color
television sets per capita????

Activity 2
• Reporting and limitations of ecological
study.

Analytical Research Design
1- SINGLE TIME POINT STUDIES:
CROSSSECTIONAL STUDIES,
PREVALENCE
SURVEYS, AND INCIDENCE STUDIES

Definition
• Designs make observations about the
presence of diseases, conditions, or health-
related characteristics in a defined
population at a specific point in time.
• The general design involves
(1) defining the population under study,
(2) deriving a sample of that population, and
(3) defining the characteristics being studied

Prevalence studies Incidence studies Remarks
1- A prevalence rate = No.
of persons or items in a
population with a disease or
condition at a given time /
No. at risk for the condition
at that time.
2- To characterize a disease
and its spectrum of
manifestation.
1- Incidence studies
identify new cases of a
condition over a set period
of time.
2- Insure that all
information needed for
sound modeling is
included in the statistical
analyses before drawing
any conclusions.
1- Most important is to
define what is being studied.
- difficult to determine if a
condition is truly absent;
atherosclerosis, (autopsy or
imaging) in subtle cases.
2- The definition should be
standardized, reproducible,
and feasible to apply on a
large scale

Examples: Healthand Nutrition Examination Survey (HNES), and Censuses.
Bothexposure and outcome are identified at one point in time.
Particularly useful for estimating the point
prevalence/incidence of a condition in the
population:
Point prevalence =
Number with the disease at a single time point
Total number studied at the same time point

Design of Cross-Sectional
Study
Defined population
Gather data on exposure and disease
Diseased
Have risk
Not diseased
Have risk
Diseased
No risk
Not diseased
No risk
Begin with
End with four possible groups

1- Population-based: avoid potential
biases of case series and case reports.
(more representative of the general population).
2- Of short duration.
3- Addressed to specific populations of
interest,
4- Examine variety of exposures and
outcomes simultaneously.
5- Rarely inexpensive for common
diseases not for rare conditions.
Conditions with a prevalence of 1 in 1,000 or
10,000 require very large samples and are
probably not feasible for population-based
cross-sectional studies.
1- Unsuitable for rare diseases or for
diseases of short duration (flu).
2- High refusal rates : accurate
prevalence estimates impossible. (the
difference between participants and non participants
could be a source of bias). Epidemiologists and survey
statisticians tend to become uncomfortable with
participation rates
below 80%.
3- Only association can be inferred “not
causation”.
4- Temporal sequence is difficult to
ascertain “exposure-outcome sequence”.
5- Trend over time can not be identified
“change of magnitude/pattern over time”.
5/7/2016
35
Cross-sectional designs

If a cross-sectional survey demonstrates an association
between low cognitive function and temporal lobe size by
cerebral magnetic resonance imaging (MRI), one cannot
determine from those data alone whether a small temporal
lobe led to cognitive decline or the cognitive decline caused
temporal lobe atrophy or, indeed, whether some third
factor caused them both
Temporal association

Remember
Attempting to find 300 cases in a
population for a disease with a
prevalence that is expected to be 1
in 10,000, needing the full
participation of 3 million subjects.
That is why case-control is more superior in this case if the objective
is to assess (risk) factors associated with that disease.

Repeated cross-sectional
studies
Studies that may be carried out at different time points to
assess trends over time.
These studies involve different groups of individuals at
each time point.
It can be difficult to assess whether apparent changes
over time simply reflect differences in the group
included in the study rather in the condition itself.

Repeated cross-sectional study design.
Study population Study populationStudy populationStudy population
Data collectionData collectionData collectionData collection
Interval Interval Interval
Disadvantages:
1. Maturation effect ‘maturation of responses in young subjects.
2. Reactive effect ‘instrument educates the respondents’
3. Regression towards the mean ‘shift of extreme attitudes and behavior towards the
average’.
4. Conditioning effect ‘repeated contacting with same persons’

Cross-sectional designs usefulness
Monitor health of the population: epidemic syphilis in
USSR, international epidemic of multiple births/
prematurity, caused by assisted reproductive
technologies.
Health services: Laparoscopy, introduction of Anti
HIV/AIDS therapy.
Development of hypotheses: retrolental hyperplasia,
and painted radium dial watches.
Trend analysis.
Planning
Clues about
cause

Overstepping of the data
Post hoc inference:
a temporal association is incorrectly inferred
to be a causal one.
Intake of 6 cups of coffee /day is
associated with lower risk of colonic
cancer!!!!
The role of the media,
The damage in the control efforts,
Damage to the public health.

Activity 3
• The definition of
population/representativeness
• Prevalence reporting
• Conclusion

Case-control design

Case-Control Studies: concept
- Case-control study compares the characteristics of a
group of patients with a particular disease (the cases) to
a group of individuals without a disease (the control),
to see whether any factors occurred more or less
frequently in cases than the controls.
- No information on the prevalence or incidence of
disease.
- Give clues as to which factors elevate or reduce the
risk of disease.

Case-control design:
General description
1- Choosing study subjects who meet a “case” definition
and subjects that are not “cases.”
2- Longitudinal or temporal aspect to the data that
cross-sectional studies do not.
3- Case-control studies are typically retrospective.
4- Possible associations between the disease
hypothesized risk factors.
5-Useful for studying potential etiologies of rare
diseases.

Examplesof TopicsInvestigatedwithCase-controlStudies
OutcomeExposure
Schizophrenia, schizoaffective, or bipolar disorder
Pancreatic cancer
Earthquake mortality
Reflux oesophagitis
Connective tissue disorders
Systemic lupus eryhtematosus
Nipah virus infection
Neonatal tetanus
Esophageal cancer
Metastatic prostate cancer
Dementia
Ovarian cancer
Breast cancer prevention
Genital warts
Ovarian cancer
Colon cancer
Recurrent myocardial infarction prevention
Cat ownership in childhood
Body mass index
Physical disability
Hiatus hernia
Hair dyes
History of shingles
Pig farming
Ghee applied to umbilical cord
Pickled vegetables
Digital rectal examination
Statins for lipid lowering
Paracetamol use
Phyto-estrogens
Male condom use
Physical activity
Sigmoidoscopy screening
Influenza vaccination
5/7/2016 46

Major problem
• Looking “backward in time” can be difficult
and prone to serious biases.
• Forced to rely on subjects’ memories,
hospital records, or other non-standard
sources for information on past exposures.
• Many of the biases to which case-control
studies are prone occur during this data
collection step.

Disease-free
Diseased
Population
Diseased
(cases)
Disease-free
(controls)
Exposed to factor
(a)
Unexposed to factor
(b)
Unexposed to factor
(d)
Exposed to factor
(c)
Sample
Trace Present time
Starting point
Past time
Basic structure of case-control designTheOdds“chanceofexposure
Iscalculatedbetweenbothgroups

Minimizing biases in Case-Control
1- Cases be selected to be representative of all
patients who develop the disease.
(difficult when using a hospital series), because patients
treated at a tertiary referral center usually differ from those
treated at smaller hospitals or those who do not seek care at
all.

2-Controls be representative of the general healthy
population who do not develop the disease.
- Select a random population sample and exclude the rare cases of
disease it might include. OR
- Use multiple control groups. For hospitalized patients, select controls
from patients hospitalized for other conditions.
- Neighborhood or other control group may be added to the study to
help analyze some of these potential biases.

3- Information be collected from cases and controls in
the same way.
- Difficult, particularly if case status is known or obvious to the
interviewer.
- Interviewers and data collectors (aware of the study’s hypothesis), may
be more prone to seek exposure information from cases than from
controls.
- Interviewers and data collectors must be trained to ask questions and
follow up positive or negative responses regardless the case status.

The dilemma of recall
Human beings interviewed may not recall
information in the same way.
If our child is ill, we likely have thought quite a bit about
what potential exposures may have occurred. Why is the child
sick? If our child is not sick, we may have stopped thinking
about information and not recall it as well.
These are examples of recall bias.

Calculate the difference in Odds for
the included exposures for comparison.
Calculate the difference in Odds for
the included exposures for comparison.

Selection of Cases
Cases
Incident cases
Patients who are recruited
at the time of diagnosis
Prevalent cases
Patients who were already
diagnosed before entering the study
1. Recall bias
2. Altered behavior
3. Risk factors may be
related more to survival
1. Less recall bias
2. Less altered behavior
3. But, we have to wait to
be diagnosed

Selection of Cases
Hospital patients
Patients in Physician’s
practices
Clinic patients
Problems:
* Single or multiple hospitals;
Some hospitals have an aggregation
of certain risk factors than others.
* Tertiary Health Care Facility;
A tendency to select severely ill
cases, any risk factors identified
may be only found in these severe
forms of the disease.

Matching
The process of selecting controls so that they are similar
to the cases in certain characteristics, such as age, race,
sex, socioeconomic status, and occupation.
To nullify the difference in characteristics or exposures
other than that has been targeted for study.

Matching: Indications
• Controls may be matched to cases for
age, sex, or specific risk factors (e.g.,
smoking) if these are:
1- Known to be related to disease,
2- Can accurately be measured in everyone,
and
3- The intent is to identify additional potential
etiologic factors.
• The objective is to prevent confounding of an association of
interest by this factor.5/7/2016 57Professor Tarek Tawfik Amin

Tip
- Unless one is certain that a given factor is
related to disease etiology, it is probably better
not to match on it so that it can be examined in
analysis.
- If more than one control group is used, one
group might be matched and another
unmatched.

Selection of Controls
Hospitalized
persons
Non-hospitalized
persons
Community-based
Probability sample
School rosters
Selective service list
Insurance company list
Neighborhood controls:
Door-to-door approach
Or random digit dialing
(Socio-economic, cultural)
Best-friend control:
Similarity in demographic
Characteristics
(lifestyle pattern)
Spouse or sibling controls:
Sibling control may provide
Some control over genetic
Difference between
Cases and controls
Captive population:
They represent a
sample of ill population.
Hospital patients are
differ from people in
the community.
A sample of all other
patients, admitted
or to select a specific
other diagnoses?

Types of Matching
Individual Matching
(matched pairs)
Group Matching
(frequency)
Selection of controls:
Proportion of controls
with certain characteristics
identical to proportion of
cases; 25% of cases are
married, then 25 % of
controls are married.
All cases should be
selected first, and calculation
of proportions are made.
For every case included
an identical matched
control should be selected;
45 year old white female
case, we seek for 45 year
white female control.
used in hospital-based
case-control studies

Problems with Matching
Practical problems
- Matching of too many
characteristics is very
difficult or impossible
to identify an
appropriate control.
A 48-years old black female,
married, has4 children, lives in
zip code21209, and workin
photo-processing plant
Findher control?
Conceptual problems
- Once we have matched
controls to cases to a given
characteristics, we can not
study that characteristics.
Marital status and cancer breast, if
matchingoccur as regard marriage, we
can not be ableto studyof that factor
‘marital status’. Why?
Matching ensures the same prevalence of
thatcharacteristicin bothcasesand
controls.

Uses of Multiple Controls
In case-control studies we usually use more
than one control per case to increase the
power of the study.

1-Multiple controls of the same type.
The power of the study is increasing by including more
controls for each case up to 4 controls per case.
Why not keep the ratio of controls to cases 1:1 and just
increase the number of cases?
1. For rare disease ‘cancer, connective tissue disorders’ the
number of the cases are limited.
2. The limited time frame of that does not allow more
inclusion of cases
3. In the absence of multi-centric collaboration.

2-Multiple Controls of Different Types
The use of hospital and neighborhood controls:
- To assess the level of exposure among the different
controls group in relation to the cases.
- Comparing cases with hospital controls, then cases
to neighborhood controls to assess discrepancy in
the level of exposure, and if present, the reason
should be thought.

Reporting case-control
Exposure
Presence of disease
TotalNumber with
disease (cases)
Number without
disease (controls)
Present a b a+b
Absent c d c+d
Total a+c b+d N
Presentation of Findings: Case-control design
Exposure among cases=a/(a+c)
Odds (probability of exposure)=
[a/(a+c)]/[1-{a/(a+c)}]=a/c
Exposure among controls b/(b+d)
Odds of non exposure=b/d
Chi-square, t-test or
equivalent and Odds ratio

Biases: selection and recruitment
• Volunteer bias: “healthy volunteer” effect.
• Prevalence or incidence bias (survival
bias): (Myocardial infarction studies).
• Membership bias: “healthy worker” or “healthy
migrant” effect.

Biases: The data collection phase
• Diagnostic suspicion bias: knowledge of a subject’s prior
exposure (hormone replacement therapy, influences both the intensity and
the outcome of the diagnostic process, such as screening for endometrial cancer).
• Exposure suspicion bias: knowledge of a subject’s disease
status (presence of mesothelioma, influences both the intensity and the outcome
of a search for exposure to asbestos)
• Family information bias: (a rare familial condition that is never
mentioned until a family member begins to demonstrate some of the same
symptoms)

1- The only practical way to study the
etiology of rare diseases.
2- The number of cases needed. Schlesselman
estimated that a cohort study of a condition occurring at a
rate of 8 cases/ 1,000 require observation of 3,889 exposed and
3,889 unexposed subjects to detect a two-fold increase in risk.
A case-control study, would require only 188 cases and 188
controls. If the prevalence were lower, at 2 cases/ 1,000, cohorts
of 15,700 exposed and 15,700 unexposed subjects needed to
detect a two-fold increased risk, case-control still require only
188 cases and 188 controls.
3- Multiple etiologic factors can be
studied simultaneously. If (cases are
representative of all the cases, controls are representative of
persons without the disease, and data are collected similarly in
cases and controls), the associations and risk estimates are
consistent with other types of studies.
1- Do not estimate incidence or
prevalence.
2- Relative risk is only indirectly
measured by the odds ratio.
3- Selection, recall, and other biases.
4- Associations found must be examined
for biologic plausibility and consistency
with estimates from other study designs
before causality can be inferred.
5- It is difficult to study exposures that
are rare in the overall population.
6- Temporal relationships between
exposure and disease: difficult to
document.
5/7/2016 68

Activity 4
• Control selection
• Matching and why?
• Biases and limitations

Cohort study (marching towards outcomes)
The term cohort has military, not medical
roots.
A cohort was a 300-600-man unit in the
Roman army, ten cohorts formed a legion.
A cohort study consists of bands or groups of
persons marching forward in time from an
exposure to one or more outcomes.

• The association between a particular exposure
(risk factor) and subsequent development of
disease.
• “Prospective” exposure (risk factor) information is
collected and then disease outcomes accrue over
time.
• Exposed and non-exposed: compared for their
rates (incidence) of disease.
Cohort: general

Basic Structure of cohort study
Disease-free
Diseased
Disease-
free
Unexposed
to factor
Exposed
to factor
Population
Develop
Disease (a)
Disease-free
(b)
Develop
Disease (c)
Disease-free
(d)
Sample
Starting point
Present time Future time
Follow
Comparingtheincidenceofdiseaseineachgroup
The Relative Risk is calculated for exposure

What To Look For In Cohort Studies
All participants in a cohort study must be at risk of
developing the outcome.
Clear, unambiguous definition of exposure at the
outset is required (sometimes quantifying the
exposure by degrees, rather than yes/no).
Unexposed should be similar to the exposed in all
aspects except for the exposure.
Outcomes must be defined in advance; should be
clear, measurable and specific.
Who is at risk?
Who is exposed?
Who is an
appropriate control?
Have outcomes been
assessed equally?
5/7/2016 73

Design of Cohort
Incidence
rate of
diseaseTotals
Then follow to see whether
Disease does
not develop
Disease
develops
a
a+b
c
c+d
a+b
c+d
b
d
a
c
Exposed
First
select
Not exposed
The relative risk or risk ratio (RR). This ratio is the risk in those with a characteristic
(exposure) [a/(a/b)] divided by risk in those without a characteristic (non exposed)
[c/(c/d).
5/7/2016 74

Selection of Study Population
Comparison of outcomes in an exposed group
and non-exposed group (or a group with a certain
characteristic and a group without)
Create a study Population by
selecting groups for inclusion
on the basis of whether or not
they were exposed
(occupationally exposed
cohorts)
Select a defined population before
any of its members become
exposed or before their exposures
are identified selection by
factor not related to exposure
(residence),
took histories
or tests and then
separate into exposed
and non-exposed
In both cases we
wait for the
outcome

Data collection in cohort:
forwards and backwards
A cohort study follow-up two or more groups from
exposure to outcome.
It compares the experience of a group exposed to some
factor with another group not exposed to that factor.
The frequency of the outcome will gives the evidence of
association between exposure ad outcome.

Types of Cohort Studies
(concurrent prospective)Using a defined population
(smoking and lung cancer), population of
elementary school children.
Non randomized
Exposed (smoke) Non-exposed (non-smoker)
No diseaseDisease Disease No disease
Concurrent 2000
2010
2020
Time frame for a hypothetical concurrent cohort study begun in 2000
5/7/2016 77

Types of Cohort Studies:
Retrospective Historical
Defined population (old roster of elementary
School children found)
Non randomized
Exposed (smoke) Non-exposed (non-smoker)
No diseaseDisease Disease No disease
Retrospective 1980
1990
2000
Time frame for a hypothetical retrospective cohort study begun in 2000
Surveyed for
smoking habit

• Plassman and colleagues used World War II medical records to
identify persons with severe head trauma and non head trauma
controls and then evaluated survivors 50 years later for
dementia.
• Early life head trauma related to dementia in old age.
• This is different from a case-control study since the sample in
the Plassman study was chosen based on exposure to head
trauma.!!!! How?
Historical cohort

Advantages of Cohort Design
I. The temporal sequence between the putative cause and
outcome is usually clear).
II. Investigation of multiple outcomes.
III. Study of rare exposures.
IV. Reduce the risk of survival bias.
V. Calculation of incidence rates, relative risks, and confidence
intervals.
VI. Survival rates, survival curves and hazard ratios.

Potential Biases in Cohort Studies
1) Bias in assessment of the outcome (blinding or
masking is used to avoid).
2) Information bias (particularly in historical or
retrospective cohort).
3) Bias from non-response and losses to follow-
up (attrition). When lost to follow-up numbers go over 10% and especially
over 20% of the people in a study then questions are raised as to the correctness and
generalizability of the results
4) Analytic bias (blinding is needed).

Problem in assessment
• All study variables (disease outcomes) determined before
the study’s inception and held constant during the course of
the study.
• Difficult to do in a very long-term study because diagnostic
approaches and techniques evolve over time.
- Criteria for myocardial infarction in 1948, for example, when the Framingham
Heart Study began, were very different than they are now.
- Cranial imaging in detection of stroke, substantially increasing its incidence and
reducing its case fatality owing to increased detection of very mild cases.
- Outcome variables vary in degree of confidence
(“hardness”) and reproducibility. Only death is considered to be a
“hard” outcome because it is an unambiguous state. All other variables, even
supposedly objective laboratory values, may have different levels of subjectivity or
measurement error.
5/7/2016
82

When Is A Cohort study Warranted?
A. Good evidence suggests an association of a disease
with a certain exposure.
B. Able to minimize attrition .
C. Interval between exposure and development of
outcome is relatively short.

Reporting of Cohort Studies
 The first table provides demographic and other prognostic factors for both
groups with hypothesis testing (P value), to show the likelihood that observed
differences could be due to chance.
 For dichotomous outcome measures (sick/well), provide raw data sufficient for
the reader to confirm the results.
 For cumulative incidence: calculate the proportion who develop the outcome
during the specified study interval.
 For incidence rates, the value is expressed per unit of time.
 The relative risks, and confidence intervals should be provided.
Use of P values should not replace interval
estimation (relative risk with confidence).

Nested Case-Control Studies
Population
(Cohort)
Develop disease Do not develop
Disease
Subgroup
Selected as
controls
Cases
Time
Initial data and/or
specimen obtained

Advantages of Nested Case-Control
Design- Interviews are performed at the beginning of the study
(baseline), the data are obtained before any disease has
develop, the problem of possible recall bias is eliminated.
- Abnormalities in biologic characteristics found in
specimens obtained years before the development of
clinical disease’ , it is more likely that represent risk factors or pre-
morbid characteristics than a manifestation of early, sub-clinical
disease.
- Temporal association can not be concluded from the ordinary case-control
design.
- More economical to conduct.

How to Choose the Study Design?
Study Design Selection of
subjects by
status
Information
collected on
Exposure
Information
collected on
Disease
Cross-sectional No Current Current
Case- Control Disease Past Current
Cohort:
 Prospective
 Retrospective
Exposure Current Future
Exposure Past Current

How to Choose the Study Design? (cont.)
Options Case-Control Concurrent
Cohort
Retrospective
Cohort
Study time Short Long Short
Cost Low High Low
Rare diseases Yes No No
Sample Size Small Large Large
Loss to follow up No Yes Yes
Incidence No Yes Yes
Relative Risk Approx. Yes Yes

Group Activity 5 and post-test
Type of cohort design
What precautions to lower
attrition?
Assessment of the outcome
Limitations
Post test: selection of appropriate research desi

RECAP quiz

References
1. Altman DG.Practical Statistics for Medical Research. Boca Raton, FL: Chapman & Hall; 1991.
2. Agresti A.Categorical Data Analysis. 2nd ed. Hoboken, NJ: Wiley; 2002.
3. Gordis L. Epidemiology. Philadelphia, PA: Harcourt Brace & Company; 1996.
4. Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB.Designing Clinical Research. 2nd ed. Philadelphia, PA:
Lippincott Williams & Wilkins; 2001.
5. Elwood M. Critical Appraisals of Epidemiological Studies and clinical Trials, Great Britain. 2nd ed. Oxford University Press; 1998.
6. Fleiss JL.The Design and Analysis of Clinical Experiments.New York: Wiley; 1999.
7. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the Reporting of
Observational Studies in Epidemiology (STROBE): explanation and elaboration.PLoS Med2007;(10):4.
8. Friedman GD. Cigarette smoking and geographic variation in coronary heart disease mortality in the United States.J Chronic Dis
1967;20:769e79.
9. Joo JB, Cummings AJ. Acute thoracoabdominal aortic dissection presenting as painless, transient paralysis of the lower
extremities: a case report.Emerg Med2000;19:333e7.
10. CDC. Pneumocystis pneumonia, Los Angeles. MMWR1981;30: 250e2.
11. Hedley AA, Ogden CL, Johnson CL, Carroll MD, Curtin LR, Flegal KM. Prevalence of overweight and obesity among US children,
adolescents, and adults, 1999e2002. JAMA 2004;291: 2847e50.
12. Klungel OH, Kaplan RC, Heckbert SR, Smith NL, Lemaitre RN, Longstreth Jr WT, et al. Control of blood pressure and risk of
stroke among pharmacologically treated hypertensive patients. Stroke 2000;31:420e4.
13. Flegal KM, Graubard BI, Williamson DF, Gail MH. Cause-specific excess deaths associated with underweight, overweight, and
obesity. JAMA2007;298:2028e37.
14. Lilienfeld AM, Lilienfeld DE.Foundations of Epidemiology. 3rd ed. New York: Oxford University Press, Inc; 1980.
15. Strong JP, Malcom GT, McMahon CA, Tracy RE, Newman WP, Herderick EE. Cornhill JF for the Pathobiological Determinants of
Atherosclerosis in Youth Research Group. Prevalence and extent of atherosclerosis in adolescents and young adults:
implications for prevention from the Pathobiological Determinants of Atherosclerosis in Youth Study.
JAMA1999;281:727e35.

References
16. Newman AB, Naydeck B, Sutton-Tyrrell K, Edmundowicz D, Gottdiener J, Kuller LH. Coronary artery calcification in older adults
with minimal clinical or subclinical cardiovascular disease.Jam Geriatr Soc2000;48:256e63.
17. Schlesselman JJ. CaseeControl Studies: Design, Conduct, and Analysis. New York: Oxford University Press, Inc; 1982.
18. Sackett DL. Bias in analytic research. J Chronic Dis 1979;2: 51e63.
19. Herbst AL, Ulfelder H, Poskaner DC. Adenocarcinoma of the vagina: association of maternal stilbesterol therapy with tumor
appearance in young women.N Engl J Med1974;284:878e81.
20. Schlesselman JJ. Case-Control Studies: Design, Conduct, and Analysis. New York: Oxford University Press, Inc.; 1982. p. 17e19.
21. Plassman BL, Havlik RJ, Steffens DC, Helms MJ, Newman TN, Drosdick D, et al. Documented head injury in early adulthood and
risk of Alzheimer’s disease and other dementias. Neurology 2000;55:1158e66.
22. Doll R, Hill AB. The mortality of doctors in relation to their smoking habits: a preliminary report.Br Med J1954;228(i):1451e5.
23. Doll R, Peto R, Boreham J, Sutherland I. Mortality in relation to smoking: 50 years observations on male British doctors.Br Med J
2004;328:1519e33.
24. Brackbill RM, Hadler JL, DiGrande L, Ekenga CC, Farfel MR, Friedman S, et al. Asthma and posttraumatic stress symptoms 5 to 6
years following exposure to the World Trade Center terrorist attack. JAMA2009;302:502e16.
25. Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation
approach.Biometrics1988;44:1049e60.
26. Ridker PM, Hennekens CH, Miletich JP. G20210A mutation in prothrombin gene and risk of myocardial infarction, stroke, and
venous thrombosis in a large cohort of US men. Circulation 1999;99:999e1004.
27. Roest M, van der Schouw YT, de Valk B, Marx JJM, Tempelman MJ, de Groot PG, Sixma JJ, Banga JD. Heterozygosity
for a hereditary hemochromatosis gene is associated with cardiovascular death in women.Circulation1999;100:268e73.
28. Barlow WE, Ichikawa L, Rosner D, Izumi A. Analysis of caseecohort designs.J Clin Epidemiol1999;52:1165e72.
29. Laurion JP, Troponin I. An update on clinical utility and method standardization.Ann Clin Lab Sci2000;30:412e21.
30. Schulman KA, Berlin JA, Harless W, Kerner JF, Sistrunk S, Gersh BJ, et al. The effect of race and sex on physicians’ recommendations
for cardiac catheterization.N Engl J Med1999;340:618e26.
31. Schwartz LM, Woloshin S, Welch HG. Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac
catheterization. N Engl J Med1999;341:279e83.

References
32. Women’s Health Initiative Study Group. Design of the Women’s Health Initiative clinical trial and observational
study.Controlled Clin Trials1998;19:61e109.
33. Prentice RL, Sheppard L. Dietary fat and cancer: consistency of the epidemiologic data, and disease prevention that may follow
from a practical reduction in fat consumption. Cancer Causes Control 1990;l:81e97.
34. Prentice RL, Pettinger M, Anderson GL. Statistical issues arising in the Women’s Health Initiative. Stat Med2005;61:899e910.
35. Women’s Health Initiative Investigators. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med
2003;349:523e34.
36. Prentice RL. Observational studies, clinical trials, and the Women’s Health Initiative.Lifetime Data Anal2007;13:449e62.
37. Prentice RL, Langer RD, Stefanick ML, Howard BV, Pettinger M, Anderson GL, et al. Combined analysis of Women’s Health
Initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. Am J
Epidemiol 2006;163:589e99.
38. Prentice RL, Langer RD, Stefanick ML, Howard BV, Pettinger M, Anderson GL, et al. Combined postmenopausal hormone
therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women’s
Health Initiative clinical trial.Am J Epidemiol2005;163:404e14

Thank you

Observational Research designs: detailed description

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (11)

Similar a Observational Research designs: detailed description

Similar a Observational Research designs: detailed description (20)

Último

Último (20)

Observational Research designs: detailed description