This document discusses sample size estimation and determination. It begins by defining what a sample is and why sample size is important. It describes factors that affect sample size, such as desired level of accuracy and precision. Several methods for calculating sample size are presented, including formulas for cross-sectional, case-control, and comparative studies using both qualitative and quantitative variables. Considerations like power, effect size, and study design are discussed. Examples are provided to demonstrate how to use formulas and tables to estimate sample size for different study designs.
2. CONTENT
What is a sample?
Sample size determination
How large a sample do I need?
What are the methods of determining it?
What are the factors that affect it?
How do you determine it?
How do you use it?
Our take home………………..
3. WHAT IS A SAMPLE?
This is the sub-population to be studied in order to make an
inference to a reference population.
In census, the sample size is equal to the population size.
However, in research, because of time constraint and budget, a
representative sample are normally used.
The larger the sample size the more accurate the findings from a
study.
4. SAMPLE SIZE DETERMINATION
Sample size determination is the mathematical estimation of the number of subjects/units to be
included in a study.
When a representative sample is taken from a population, the finding are generalized to the
population.
Optimum sample size determination is required for the following reasons:
To allow for appropriate analysis
To provide the desired level of accuracy
To allow validity of significance test.
5. HOW BIG A SAMPLE DO YOU NEED?
If the sample is too small:
Even a well conducted study
may fail to answer its research
question
It may fail to detect important
effect or associations
It may associate this effect or
association imprecisely
If the sample size is too large:
The study will be difficult and
costly
Time constraint
Involve extra patients
Loss of accuracy.
Hence, optimum sample size must be determined before
commencement of a study.
6. REMEMBER
Random error: errors that occur by chance. Sources are sample variability, subject to
subject differences and measurement errors. It can be reduced by averaging, increase sample
size and repeating the experiment.
Systematic error (bias): deviations that are not due to chance alone. Several factors,
e.g. patient selection criteria may contribute. It can be reduced by good study design and conduct
of the experiment. A strong bias can yield untrue estimate.
Precision: the degree to which a variable has the same value when measured several
times. It is a function of random error.
Accuracy: the degree to which a variable actually represent the true value. It is function of
systematic error.
7.
8. REMEMBER
Null hypothesis: It is the state that there is no
difference among groups or no association between the
predictor & the outcome variable. This hypothesis need
to be tested.
Alternative hypothesis: It contradicts the null
hypothesis. The alternative hypothesis cannot be tested
directly, it is accepted by exclusion if the test of
significance rejects the null hypothesis. There are two
types; one tail(one-sided) or two tailed(two-sided)
9. REMEMBER
Since our decision is based on the sample we chose
from the population, there is a possibility that we make a
wrong decision.
Type I (alpha; α) error: It occurs if an investigator rejects a null
hypothesis that is actually true in the population. The probability of
making type I error is called as the level of significance, it is commonly
considered as 0.05(5%). Sample size is inversely proportional to type I
error.
Type II(beta; β) error: it occurs if the investigator fails to reject a
null hypothesis that is actually false in the population. A type II error is
frequently due to small sample size. The exact probability of a type II error
is generally unknown.
Power(1-β): This is the probability that the test will correctly identify a
significant difference, effect or association. Sample size is directly
proportional to the power of the study. A well designed trial should have a
power of at least 0.8 (80%).
10. POSSIBLE RESULTS OF ANY
HYPOTHESIS TESTING
Researcher’s Decision
Accepted Rejected
Reality
hypothesis
True Correct
Power= 1-
beta
Type I error
(alpha)
False Type II error
(beta)
Correct
11. REMEMBER
Effect size: It represents the difference that would be of clinical or biological
significance. A large sample size is needed for detection of a minute difference.
Thus, the sample size is inversely related to the effect size. The bigger the size of
the effect in the population, the easier it will be to find and the smaller type II error.
Design effect: The sample size depends on the number of clusters and the
variance between and within the clusters included in the study. It is expressed as
a constant called ‘design effect’ often between 1.0 and 2.0, determined from
previous studies. The calculated sample size are multiplied by the design effect to
obtain the sample size for the cluster sample.
12. REMEMBER
Odds ratio (OR): is a measure of effect
size, describing the strength of association
or non-independence between two binary
data values.
Relative risk (RR): is a ratio of the
probability of the event occurring in the
exposed group versus a non-exposed
group.
13. PRACTICAL ISSUES IN DETERMINING SAMPLE SIZES
Importance of the Research Issue: If the results of the survey
research are very critical, then the sample size should be increased.
Heterogeneity of the population: The higher the standard
deviation, the larger sample size is required.
Funding: quite often, budgetary constraints limit the sample size for the
study.
Number of sub-groups to analyse: If multiple sub-groups in a
population are going to be analysed, the sample size should be increased to
ensure that adequate numbers are obtained for each sub-group.
Sample size determination can be addressed at two
stages:
during the planning stage, while designing the study,
Or through post-hoc power analysis to explain the results if a study did
not find any significant effects.
14. APPROACH FOR ESTIMATING SAMPLE SIZE/POWER
ANALYSIS
Approaches for estimating sample size and performing
power analysis depend primarily on:
The study design: Such as case control design, cohort
design, cross sectional studies, clinical trials, diagnostic test
studies etc.
The main outcome measure of the study: Such as the
odds ration in case-control studies and the relative risk in
cohort studies.
Statistical inference from the study results: this claim the
specification of the estimation and the corresponding
confidence interval, and the test of significance used for
hypothesis testing (e.g. Chi-square test, t-test, F-test…etc.)
15. PROCEDURE FOR CALCULATING SAMPLE SIZE.
There are four procedures that
could be used for calculating
sample size:
1.Use of formulae
2.Ready made table
3.Nomograms
4.Computer software
16. USE OF FORMULAE FOR SAMPLE
SIZE CALCULATION
IN DIFFERENT STUDY DESIGNS IN
MEDICAL RESEARCH
17. SAMPLE SIZE
CALCULATION FOR
CROSS SECTIONAL
STUDIES/SURVEYS
Cross sectional studies or cross
sectional survey are done to
estimate a population parameter like
prevalence of some disease in a
community or finding the average
value of some quantitative variable
in a population.
Sample size formula for qualitative
variable and quantities variable are
different.
18.
19. DESCRIPTIVE STUDY: FOR QUALITATIVE VARIABLE:
WHEN PROPORTION IS THE PARAMETER OF OUR STUDY
Sample size =Z1-α/2
2 p(1-p)/d2
Where
Z1-α/2= standard normal deviate; (at 5% type I error (P<0.05) it is 1.96 and
at 1% type I error (P<0.01) it is 2.58).
p=expected proportion in population based on previous studies or pilot
studies.
d= absolute error or precision – has to be decided by researcher.
20. DESCRIPTIVE STUDY: FOR QUALITATIVE VARIABLE:
WHEN PROPORTION IS THE PARAMETER OF OUR STUDY
Example:
A researcher wants to estimate the proportion of patients
having hypertension in paediatric age group in a city.
According to previously published studies actual number of
hypertensives may not be more than 15%. The researcher
wants to calculate this sample size with the precision/
absolute error of 5% and at type I error of 5%.
ANSWER:
p = 0.15 , Z1-α/2 = 1.96 for α at 5%, d = 0.05
sample size =1.962(0.15)(1-0.15)/0.052
The least number of subjects to be selected= 196.
21. DESCRIPTIVE STUDY: FOR QUANTITATIVE VARIABLE:
WHEN AVERAGE IS THE PARAMETER OF OUR STUDY
Sample size =Z1-α/2
2 SD2/d2
Where
Z1-α/2= standard normal deviate; (at 5% type I error (P<0.05) it is 1.96 and
at 1% type I error (P<0.01) it is 2.58).
SD =standard deviation of the variable based on previous studies or pilot
studies.
d= absolute error or precision – has to be decided by researcher.
22. DESCRIPTIVE STUDY: FOR QUANTITATIVE VARIABLE:
WHEN AVERAGE IS THE PARAMETER OF OUR STUDY
Example:
A researcher is interested in knowing the average systolic
blood pressure in paediatric age group in a city at %% of
type I error and precision of 5 mm Hg of either side (± the
mean systolic BP) and standard deviation , based on
previously done studies, is 25 mm Hg.
ANSWER:
SD = 25 mm Hg , Z1-α/2 = 1.96 for α at 5%, d = 5
sample size =1.962(25)2/52
The least number of subjects to be selected= 96.
23. USE OF FORMULAE
FOR SAMPLE SIZE
CALCULATION FOR
COMPARISON
GROUPS/
INDEPENDENT CASE-
CONTROL STUDIES
In case- control studies, cases
(the group with disease/condition
under consideration) are
compared with controls (the group
without disease/condition under
consideration) regarding exposure
to the risk factor under question.
24. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE-CONTROL STUDIES:
FOR QUALITATIVE VARIABLE
Suppose a researcher
wants to see the link
between childhood
sexual abuses with
psychiatric disorder in
adulthood.
1
He will take a sample of
adult persons with
psychiatric disorder and
will take another sample
of normal adults having
no psychiatric disorders.
2
He will then go
retrospectively to see
history of childhood
sexual abuse in both
groups.
3
Here the number of
people exposed to
childhood sexual abuse
is qualitative variable
hence this formula will
be used:
4
25. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE-CONTROL STUDIES:
FOR QUALITATIVE VARIABLE
Sample size =r+1(p*)(1-p*)(Zβ + Zα/2)2/ r (p1-p2) 2
Where
r = ratio of controls to cases, 1 for equal number of cases and controls
p*= average proportion exposed = (proportion in cases + proportion in controls)/2
Zβ = standard normal variate for power; for 80% power it is 0.84 and for 90% power it is 1.28.
Z1-α/2= standard normal deviate; (at 5% type I error (P<0.05) it is 1.96 and at 1% type I error
(P<0.01) it is 2.58).
p1-p2 = effect size; difference in proportions expected based on previous studies. p1 is the
proportion in cases and p2 is the proportion in controls.
26. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE-CONTROL STUDIES:
FOR QUALITATIVE VARIABLE
Example
If the researcher wants to calculate sample size for the above mentioned
case-control study and he wants to fix power of study at 80%, assuming
the expected proportions are 0.35 and 0.20 for cases and controls
respectively and he wants to have equal number of cases and controls;
then the sample size per group will be:
Answer
r = 1, P*= 0.275, Zβ = 0.84, Z1-α/2= 1.96
Sample size= 2(0.275)(0.725)(0.84+1.96)2 / (0.35-0.20)2
Sample size= 139
So, the research has to recruit at least 139 cases and 139 controls
27. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE CONTROL STUDIES:
FOR QUANTITATIVE VARIABLE
Suppose a
researcher wants to
see the association
between birth weight
and diabetes in
adulthood.
1
He will take a sample
of adult persons with
diabetes and will
take another sample
of normal adults
having no diabetes.
2
He will then go
retrospectively to
see data regarding
their birth weigh.
3
Here birth weight is a
qualitative variable
hence this formula
will be used:
4
28. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE CONTROL STUDIES:
FOR QUANTITATIVE VARIABLE
Sample size= r+1 SD2(Zβ + Zα/2)2 /rd2
Where
r = ratio of controls to cases, 1 for equal number of cases and controls
SD = standard deviation; based on previous studies.
Zβ = standard normal variate for power; for 80% power it is 0.84 and for 90% power it is
1.28.
Z1-α/2 = standard normal deviate; (at 5% type I error (P<0.05) it is 1.96 and at 1% type I
error (P<0.01) it is 2.58).
d = expected mean difference between cases and controls, based on previous studies..
29. USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION FOR CASE CONTROL STUDIES:
FOR QUANTITATIVE VARIABLE
Example
If the researcher wants to calculate sample size for the above mentioned
case-control study and he wants to fix power of study at 80%, assuming the
difference in mean weight between cases and controls is 250 gm and the SD
is 1kg and he wants to have equal number of cases and controls; then the
sample size per group will be:
Answer
r = 1, SD=1, Zβ = 0.84, Z1-α/2= 1.96, d = 0.250
Sample size= 2(1)2(0.84+1.96)2 / (0.250)2
Sample size= 251
So, the research has to recruit at least 251 cases and 251
controls
30. HOW TO USE SAMPLE SIZE FORMULAE
Steps:
1. Identify the major study variables.
2. Determine the types of estimates, such as mean or
proportions.
3. Select appropriate study design, primary outcome
measure, statistical significance.
4. Indicate what you expect the population values to be.
5. Decide on a desired level of confidence in the estimate.
6. Decide on a tolerable range of error in the estimate.
7. Use the appropriate formula to calculate the sample size.
32. USE OF READYMADE TABLE FOR SAMPLE
SIZE CALCULATION
How large a sample of patients should
be followed up if an investigator wishes
to estimate the incidence rate of a
disease to within 10% of it’s true value
with 95% confidence?
The table show that for e=0.10 and
confidence level of 95%, a sample size
of 385 would be needed.
33. USE OF READYMADE TABLE FOR SAMPLE
SIZE CALCULATION
This table can be used to calculate the
sample size making the desired
changes in the relative precision and
confidence level .e.g. if the level of
confidence is reduced to 90%, then the
sample size would be 271.
Such table that give ready made
sample sizes are available for
different designs and situation
35. USE OF NOMOGRAM FOR SAMPLE SIZE CALCULATION:
QUALITATIVE VARIABLE
For use of nomogram to calculate the sample size, one
needs to specify the study and control groups.
The researcher should then decide the effect size that is
clinically important to detect. This should be expressed in
terms of % change in the response rate of the study group
compared with that of the control group.
Example
if 40% of patients treated with standard therapy are cured
and one wants to know whether a new drug can cure 50%,
one is looking for a 25% increase in cure rate . (50%-40%/
40% = 25% )
So, the number of subjects in each study group= 400
36. USE OF NOMOGRAM FOR SAMPLE SIZE CALCULATION:
QUANTITATIVE VARIABLE
The first step in calculating a sample size for comparing means
using NORMOGRAM is to consider the difference in the mean
arterial BP and the corresponding standard deviation to calculate the
STANDARDIZED DIFFERENCE (= mean difference/standard
deviation).
Example
Mean arterial pressure was 95 and 81 mmHg in the groups treated
with early goal-directed and traditional therapy, respectively,
corresponding to a difference of 14 mmHg. The standard deviation
was 18 mm Hg, then the standardized difference = 14/18=0.78.
The total sample size for a trial that is capable of detecting a 0.78
standardized difference with 80% power using a cutoff for statistical
significance of 0.05 is approximately 52; in other words, 26
participants would be required in each arm of the study.
If the cutoff for statistical significance were 0.01 rather than 0.05
then a total of approximately 74 participants (37 in each arm) would
be required.
37. USE OF COMPUTER SOFTWARE FOR SAMPLE
SIZE CALCULATION & POWER ANALYSIS
The following software can be used for
calculating sample size and power;
Epi-info
nQuerry
Power & precision
Sample
STATA
SPSS
38. FINALLY; OUR TAKE HOME
Sample size determination is one of the most
essential component of every research/study.
The larger the sample size, the higher the
degree of accuracy, but this is limited by the
availability of resources.
It can be determined using formulae, readymade
table, nomogram or computer software.
39.
40. References
Lwanga SK, Lemeshow S. Sample size determination in health studies - A practical manual. 1st ed.
Geneva: World Health Organization; 1991.
Zodpey SP, Ughade SN. Workshop manual: Workshop on Sample Size Considerations in Medical
Research. Nagpur: MCIAPSM; 1999
Rao Vishweswara K. Biostatistics A manual of statistical methods for use in health , nutrition and
anthropology. 2nd edition. New Delhi: Jaypee brothers;2007
VK Chadha . Sample size determination in health studies. NTI Bulletin 2006,42/3&4,