Advantages of Hiring UIUX Design Service Providers for Your Business
Research method ch08 statistical methods 2 anova
1. 1
Research Methods in Health
Chapter 8. Statistical Methods 2 ANOVA
Young Moon Chae, Ph.D.
Graduate School of Public Health
Yonsei University, Korea
ymchae@yuhs.ac
2. 2
Table of Contents
• One way ANOVA
• Multiple comparison
• Repeated measure ANOVA
• ANCOVA
4. 4
ANOVA
When to use it
• Analysis of variance (ANOVA) is the most commonly used technique for comparing
the means of groups of measurement data. There are lots of different experimental
designs that can be analyzed with different kinds of ANOVA
• In a one-way ANOVA (also known as a single-classification ANOVA), there is one
measurement variable and one nominal variable.
Null hypothesis
• The statistical null hypothesis is that the means of the measurement variable are the
same for the different categories of data; the alternative hypothesis is that they are
not all the same.
5. 5
Rationale for ANOVA (1)
• We have at least 3 means to test, e.g., H0: m1 = m2 = m3.
• Could take them 2 at a time, but really want to test all 3 (or more) at once.
• Instead of using a mean difference, we can use the variance of the group
means about the grand mean over all groups.
• Logic is just the same as for the t-test. Compare the observed variance
among means (observed difference in means in the t-test) to what we would
expect to get by chance.
6. 6
ANOVA Assumptions
• Data in each group are a random sample from some population.
• Observations within groups are independent.
• Samples are independent.
• Underlying populations normally distributed.
• Underlying populations have the same variance – This can formally tested
with Bartlett’s test
• What might happen (why would it be a problem) if the assumption of
{normality, equality of error, independence of error} turned out to be false?
-> Use non-parametric statistics or use data transformation
7. 7
Multiple comparisons
• When we carry out an ANOVA on k treatments, we test
H0 : μ1 = · · · = μk versus Ha : H0 is false
• Assume we reject the null hypothesis, i.e. we have some evidence that not
all treatment means are equal. Then we could for example be interested in
which ones are the same, and which ones differ.
• For this, we might have to carry out some more hypothesis tests.
• This procedure is referred to as multiple comparisons.
8. 8
Types of multiple comparisons
• There are two different types of multiple comparisons procedures:
• Sometimes we already know in advance what questions we want to answer.
Those comparisons are called planned (or a priori) comparisons.
• Sometimes we do not know in advance what questions we want to answer,
and the judgment about which group means will be studied the same
depends on the ANOVA outcome. Those comparisons are called unplanned
(or a posteriori) comparisons.
-Planned comparisons: adjust for just those tests that are planned.
-Unplanned comparisons: adjust for all possible comparisons.
9. 9
Independence of planned comparison
å =
j
jjcc 021
Comparison A1 A2 A3 A4
1 -1/3 1 -1/3 -1/3
2 -1/2 0 -1/2 1
3 1/2 1/2 -1/2 -1/2
0)1*3/1()2/1*3/1(
)0*1()2/1*3/1(21
=-+--
++--=åj
jj cc
3/26/4)2/1*3/1()2/1*3/1(
)2/1*1()2/1*3/1(31
==--+--
++-=åj
jj cc
One and two are orthogonal; one and three are not
There are J-1 orthogonal comparisons. Use only what you need.
.
10. 10
Example 1
• We previously investigated whether the mean blood coagulation times for
animals receiving different diets (A, B, C or D) were the same.
• Imagine A is the standard diet, and we wish to compare each of diets B, C,
D to diet A.
→ planned comparisons!
• After inspecting the treatment means, we find that A and D look similar, and
B and C look similar, but A and D are quite different from B and C. We
might want to formally test the hypothesis
→ unplanned comparisons!
11. 11
Example 2
• A plant physiologist recorded the length of pea sections grown in tissue
culture with auxin present. The purpose of the experiment was to
investigate the effects of various sugars on growth. Four different treatments
were used, plus one control (no sugar):
No sugar
2% glucose
2% fructose
1% glucose + 1% fructose
2% sucrose
• The investigator wants to answer three specific questions:
- Does the addition of sugars have an effect on the lengths of the pea sections?
- Are there differences between the pure sugar treatments and the mixed sugar
treatment?
- Are there differences among the pure sugar treatments? Planned comparisons!
13. 13
Bonferroni Correction
• Suppose we have 10 treatment groups, and so 45 pairs.
• If we perform 45 t-tests at the significance level = 0.05, we’d expect to
reject 5% × 45 ≈ 2 of them, even if all of the means were the same.
• Let = Pr(reject at least one pairwise test | all μ’s the same)
≤ (no. tests) × Pr(reject test #1 | μ’s the same)
• The Bonferroni correction:
Use ′ = /(no. tests) as the significance level for each test.
• For example, with 10 groups and so 45 pairwise tests,
we’d use ′ = 0.05 / 45 ≈ 0.0011 for each test.
14. 14
Post Hoc Tests
• Given a significant F, where are the mean differences?
• Often do not have planned comparisons.
• Usually compare pairs of means.
• There are many methods of post hoc (after the fact) tests.
15. 15
Scheffé
• Can use for any contrast. Follows same calculations, but uses different
critical values.
• Instead of comparing the test statistic to a critical value of t, use:
)ˆvar(.
ˆ
y
y
est
t =
aFJ )1(S -=
Where the F comes from the overall F test (J-1 and N-J df).
16. 16
Scheffé (2)
Source SS df MS F
Cells (A1-
A4)
219 3 73 12.17
Error 72 12 6
Total 291 15
5.3)22*5(.)28*5(.
)25*5(.)18*5(.ˆ1
-=--
+=y
2/3
4
)5.()5.(5.5.
6)ˆ(.
2222
=
-+-++
=yVarest
86.2
2247.1
5.3
)ˆvar(.
ˆ
-=
-
==
y
y
est
t
24.349.3)14()1(S =-=-= aFJ
49.3)12,3,05.( ==aF
(Data from earlier problem.)
The comparison is not significant because |-2.86|<3.24.
17. 17
Paired comparisons
•Newman Keuls and Tukey HSD depend on q, the studentized range
statistic.
• Suppose we have J independent sample means and we find the
largest and the smallest.
nMS
yy
q
error /
minmax -
=
MS error comes from the ANOVA we did to get
the J means. The n refers to sample size per
cell. If two cells are unequal, use
2n1n2/(n1+n2).
The sampling distribution of q depends on k, the number of means
covered by the range (max-min), and on v, the degrees of freedom
for MSerror.
18. 18
Tukey HSD
HSD = honestly significant difference.
For HSD, use k = J, the number of groups in the study. Choose
alpha, and find the df for error. Look up the value qα. Then find
the value:
n
MS
qHSD error
a=
Compare HSD to the absolute value of the difference between all
pairs of means. Any difference larger than HSD is significant.
19. 19
HSD 2
Grp -> 1 2 3 4 5
M -> 63 82 80 77 70
Source SS df MS F p
Grps 2942.4 4 725.6 4.13 <.05
Error 9801.0 55 178.2
K = 5 groups; n=12 per group, v has 55 df. Tabled value of q with alpha =.05 is 3.98.
34.15
12
2.178
98.3 ===
n
MS
qHSD error
a
Group 1 5 4 3 2
1 63 0 7 14 17* 19*
5 70 0 7 10 12
4 77 0 3 5
3 80 0 2
2 82 0
20. 20
Comparing Post Hoc Tests
•The Newman-Keuls found 3 significant differences in our example. The
HSD found 2 differences.
•If we had used the Bonferroni approach, we would have found an
interval of 15.91 required for significance (and therefore the same two
significant as HSD). Thus, power descends from the Newman-Keuls to
the HSD to the Bonferroni.
• The type I error rates go just the opposite, the lowest to Bonferroni, then
HSD and finally Newman-Keuls. Do you want to be liberal or
conservative in your choice of tests? Type I error vs Power.
21. 21
Repeated Measures ANOVA
When to Use Repeated Measures ANOVA
• Repeated measures ANOVA is used when all members of a random sample are
measured under a number of different conditions. As the sample is exposed to each
condition in turn, the measurement of the dependent variable is repeated. Using a
standard ANOVA in this case is not appropriate because the data violate the ANOVA
assumption of independence.
• This approach is used for several reasons.
- Some research hypotheses require repeated measures. Longitudinal research, for
example, measures each sample member at each of several ages. In this case, age
would be a repeated factor.
- When sample members are difficult to recruit, repeated measures designs are
economical because each member is measured under all conditions.
22. 22
Statistical Terminology Used in this Document
• A sample member is called a subject.
• When a dependent variable is measured repeatedly for all sample members
across a set of conditions, this set of conditions is called a within-subjects
factor. The conditions that constitute this type of factor are called trials.
• When a dependent variable is measured on independent groups of sample
members, where each group is exposed to a different condition, the set of
conditions is called a between-subjects factor. The conditions that
constitute this factor type are called groups.
• When an analysis has both within-subjects factors and between subjects
factors, it is called a repeated measures ANOVA with between-subjects
factors
23. 23
Example
• Suppose that, as a health researcher, you want to examine the impact of dietary
habit and exercise on pulse rate. To investigate these issues, you collect a sample
of individuals and group them according to their dietary preferences: meat eaters
and vegetarians. You then divide each diet category into three groups, randomly
assigning each group to one of three types of exercise: aerobic stair climbing,
racquetball, and weight training. So far, then, your design has two between-
subjects (grouping) factors: dietary preference and exercise type.
• Suppose that, in addition to these between-subjects factors, you want to include a
single within-subjects factor in the analysis. Each subject's pulse rate will be
measured at three levels of exertion: after warm-up exercises, after jogging, and
after running. Thus, intensity (of exertion) is the within-subjects factor in this
design. The order of these three measurements will be randomly assigned for each
subject
24. 24
Research Questions :
Within-Subjects Main Effect
• Does intensity influence pulse rate? (Does mean pulse rate change across the trials
for intensity?) This is the test for a within-subjects main effect of intensity.
Between-Subjects Main Effects
• Does dietary preference influence pulse rate? (Do vegetarians have different mean
pulse rates than meat eaters?) This is the test for a between-subjects main effect of
dietary preference.
• Does exercise type influence pulse rate? (Are there differences in mean pulse rates
between stair climbers, racquetball players, and weight trainers?) This is the test for a
between-subjects main effect of exercise type.
Between-Subjects Interaction Effect
• Does the influence of exercise type on pulse rate depend on dietary preference?
(Does the pattern of differences between mean pulse rates for exercise-type groups
change for each dietary-preference group?) This is the test for a between-subjects
interaction of exercise type by dietary preference.
25. 25
Results
• Diet: With a p value less than .0001, you have a statistically significant effect. You
can therefore conclude that a statistically significant difference exists between
vegetarians and meat eaters on their overall pulse rates. In other words, there is
a main effect for diet. The cell means (not shown here) show that meat eaters
experience higher pulse rates than vegetarians.
• Exercise type: It is non-significant: F(2, 144) = .31, p=.7341. Thus, you can conclude
that the type of exercise has no statistically significant effect on overall mean
pulse rates.
• The test of the DIET BY EXERTYPE interaction also shows a non-significant result
(F(2, 144) = .52, p=.594). This suggests that dietary preferences and type of
exercise do not combine to influence the overall average pulse rate.
• When an interaction effect is significant, the pattern of cell means must be examined
to determine the meaning not only of the interaction, but also the meaning of any
main effects involved in the interaction.
27. 27
(cont.)
When to use it
• The purpose of ANCOVA is to compare two or more linear regression lines.
It is a way of comparing the Y variable among groups while statistically
controlling for variation in Y caused by variation in the X variable.
Null hypotheses
• Two null hypotheses are tested in an ANCOVA. The first is that the slopes
of the regression lines are all the same. If this hypothesis is not rejected, the
second null hypothesis is tested: that the Y-intercepts of the regression lines
are all the same.
• Although the most common use of ANCOVA is for comparing two
regression lines, it is possible to compare three or more regressions. If their
slopes are all the same, it is then possible to do planned or unplanned
comparisons of Y-intercepts, similar to the planned or unplanned
comparisons of means in an ANOVA
28. 28
ANCOVA (GLM): Example
The General Linear Model (GLM) approach is used to ANCOVA to
determine whether MCAT scores are significantly different among
medical students who had different types of undergraduate majors,
when adjusted for year of matriculation.
29. 29
• Dependent variable
§ nmtot1: MCAT total (most recent)
• Fixed factor
§ bmaj2: Undergraduate major
1 = Biology/Chemistry
2 = Other science/health
3 = Other
• Covariate
§ matyr: Year of matriculation
ANCOVA (GLM): cont.