Analysis of Variance (ANOVA)

ANALYSIS OF VARIANCE
(ANOVA)
Avjinder Singh Kaler and Kristi Mai

 Estimating a Population Variance/Standard Deviation
• 𝜒2 (Chi-Square) Distribution
 Comparing Variation in Two Samples
• F Distribution
 One-Way Analysis of Variance (ANOVA)
 Multiple Comparison Tests
• Tukey Test
 Two-Way Analysis of Variance (ANOVA)

Main Ideas:
• The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
• We can use a sample variance to construct a C.I. to estimate the true
value of a population variance and we can also use a sample
standard deviation to construct a C.I. to estimate the true value of a
population standard deviation
• We can also test claims about a population variance or standard
deviation

 If a population has a normal distribution, then the following formula described
the 𝜒2 distribution: 𝜒2 =
𝑛−1 ∗𝑠2
𝜎2
 This is a Chi-Square-score and is a measure of relative standing
 We NEED degrees of freedom for the 𝜒2
distribution
• 𝑑𝑓 = 𝑛 − 1 (in this situation)
• Although this value for degrees of freedom is common, 𝑑𝑓 are NOT always 𝑛 − 1
 Properties of the Chi-Square Distribution:
• The 𝜒2
distribution is NOT symmetric like the t-distribution or the Normal distribution
 Note: Because the distribution is NOT symmetric, the C.I. will NOT be 𝑠2 ± 𝐸
• The values of 𝜒2 can be ≥ 0 but cannot be negative
• The 𝜒2 distribution is different for different degrees of freedom

Main Ideas:
 The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
 We can use two sample variances to test claims about the
difference between two population variances

• If two populations are normally distributed with equal variances ( i.e. 𝜎1
2
= 𝜎2
2
, then
the following formula describes the F distribution: 𝐹 =
𝑠1
2
𝑠2
2
• This is an F-score and is a measure of relative standing
• Notice that this distribution compares the two variations in the form of a ratio
• We NEED two different degrees of freedom for the F distribution
• In this particular situation, we have:
• Numerator 𝑑𝑓 = 𝑛1 − 1
• Denominator 𝑑𝑓 = 𝑛2 − 1
• Properties of the F Distribution:
• The F distribution is NOT symmetric like the t-distribution or the Normal distribution
• The values of F can be ≥ 0 but cannot be negative
• The F distribution is different for different degrees of freedom and depends on TWO
different degrees of freedom

Main Ideas:
 We can actually extend the hypothesis testing foundation that we
already have to test the claim that three or more population means
are all equal
• i.e. 𝐻0: 𝜇1 = 𝜇2 = ⋯ = 𝜇 𝑘 vs. 𝐻1: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑠
 We can test this null hypothesis by analyzing sample variances
 This test (a One-Way ANOVA) is appropriate when we wish to
compare three or more population means within a set of quantitative
data that is categorized according to one treatment (or factor)
• Treatment (factor) – a characteristic allowing us to distinguish between the
different populations of interest
 We CANNOT simply test two samples at a time

Requirements:
 The populations have different distributions that are approximately
normal
• Loose requirement – only a problem is a population is very far from normal
 The populations have the same variance 𝜎2
• Loose requirement – the ratio of variances can be as large as 9:1
 The samples are SRS of quantitative data
 The samples are independent of each other
 The different samples are from populations that are categorized in
only one way

Test Statistic:
 𝐹 =
𝑀𝑆(𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡)
𝑀𝑆(𝐸𝑟𝑟𝑜𝑟)
≈
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
 Note: p-values and critical values are from the F distribution
 The F test statistic is very sensitive to sample means, even though it is based on variance
 Degrees of Freedom
• Equal Sample Sizes
 Numerator 𝑑𝑓 = 𝑘 − 1
 Denominator 𝑑𝑓 = 𝑘(𝑛 − 1)
• Unequal Sample Sizes
 Numerator 𝑑𝑓 = 𝑘 − 1
 Denominator 𝑑𝑓 = 𝑁 − 𝑘
• Notation:
 𝑘: number of samples
 𝑛: number of values in each sample (i.e. sample size)
 𝑁: total number of values in all samples combined

Notice the One-Way ANOVA is an F test – like comparing variances.
Specifically, it is a right-tailed F Test.
Conclusion Cautions:
• Rejecting the null hypothesis does NOT tell us that all of the means are different!
• In fact, rejecting the null hypothesis cannot tell us which mean(s) is(are)
different

Use the performance IQ
scores listed in Table 12-1
and a significance level
of α = 0.05 to test the
claim that the three
samples come from
populations with means
that are all equal.

Here are summary statistics from the collected data:

Requirement Check:
1. The three samples appear to come from populations that are
approximately normal.
2. The three samples have standard deviations that are not dramatically
different.
3. We can treat the samples as simple random samples.
4. The samples are independent of each other and the IQ scores are not
matched in any way.
5. The three samples are categorized according to a single factor: low
lead, medium lead, and high lead.

The hypotheses are:
The significance level is α = 0.05.
H0
: 1
 2
 3
H1
: At least one of the means is different from the others.

From StatCrunch results, the p-value is 0.020 when rounded.
Because the P-value is less than the significance level of α = 0.05, we
can reject the null hypothesis.
There is sufficient evidence that the three samples come from
populations with means that are different.
We cannot conclude formally that any particular mean is different from
the others, but it appears that greater blood lead levels are associated
with lower performance IQ scores.

Larger values of the test statistic result in smaller P-values, so the ANOVA
test is right-tailed.
Assuming that the populations have the same variance σ2 (as required
for the test), the F test statistic is the ratio of these two estimates of σ2:
1) variation between samples (based on variation among sample
means)
2) variation within samples (based on the sample variances)

Multiple Comparison Tests – these tests should be used to identify where the
difference(s) in the means lie if the null hypothesis in the One-Way ANOVA is
rejected. Multiple comparison tests use pairs of means to identify which means
are different while still accounting for the multiple testing problem mentioned
previously by making adjustments to ensure an adequate significance level
• Examples: Duncan, SNK, Scheffe, Dunnett, LSD, Bonferroni, and Tukey Tests
In this course we will utilize the Tukey Test!
• The Tukey Test provides associated p-values for the comparison of each pair of
means
This test will allow you to identify if the means of any two of the 𝑘 many means differ
The Null Hypothesis in this test assumes the equality of the two means being
compared

• The average MPG for 2000-2010 vehicles from four car manufacturers are
compared. We would like to see if there is a difference in average MPG.
(Notice that we are testing to see if there is a difference in the means of a
quantitative variable, MPG, across four different factors/treatments, the
manufacturer)
• After deeming a One-Way ANOVA appropriate for this research question
and checking the requirements for this statistical procedure, we find that
at least one of the mean MPGs differs due to a low p-value (for instance,
0.0003). (Refer back to the One-Way ANOVA section and the hypotheses
for this test)

• Since the One-Way ANOVA revealed a difference but cannot tell us,
specifically, where the difference in mean MPGs is, we decide to
perform a Tukey Test to answer our research question in full. Where is the
difference? Which manufacturer has a higher/lower mean MPG?
• The Tukey Test compares all 𝑘 means, two at a time. (Note: This can be
done here because a Tukey Test does control the overall significance
level for pairwise comparisons)

We introduce the method of two-way analysis of variance, which is
used with data partitioned into categories according to two factors.
The methods of this section require that we begin by testing for an
interaction between the two factors.
Then we test whether the row or column factors have effects.

Main Ideas:
• We can actually extend the One-Way ANOVA to test the claim that three or
more population means are all equal when the data is categorized in TWO
ways (not just one)
• This test (a Two-Way ANOVA) is appropriate when we wish to compare three
or more population means within a set of quantitative data that is
categorized according to two treatments (or factors)
 We CANNOT simply test the effect of the two factors by utilizing two One-Way ANOVAs
because the One-Way ANOVA test would ignore the possible interaction between the
two factors involved

Interaction – there is an interaction between two factors if the effect of
one of the factors changes for different categories of the other factor (like
a combination effect or a synergy effect)
• Interaction plots can be used to visually assess if an interaction effect is
present
• We must test for an interaction effect first!

1. For each cell, the populations have distributions that are
approximately normal
2. The populations have the same variance 𝜎2
3. The samples are SRS of quantitative data
4. The samples are independent of each other
5. The samples are from populations that are categorized in two ways
6. All of the cells have the same number of sample values (i.e. a
balanced design)
• Not a general requirement for Two-Way ANOVA but we won’t have
unbalanced designs

* Notice that there are up to three tests being performed during a Two-Way
ANOVA *
First Test: The Test for an Interaction Effect
 Hypotheses:
•
𝐻0: ܰ‫݋‬ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬
𝐻1: Th݁‫݁ݎ‬ ݅‫ݏ‬ ܽ݊ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬
 Test Statistic:
• 𝐹 =
𝑀𝑆 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
*Note: P-Values and Critical Values are from the 𝐹 Distribution
 Conclusion:
• If the null hypothesis of ‘No Interaction Effect’ is rejected, then there is a significant
interaction effect and we CANNOT proceed to test for main effects. So, if we reject
𝐻0, we must STOP.

Second and Third Tests: The Test for main (Row/Column Factor) Effects
 Hypotheses:
•
H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬
 Test Statistic:
• 𝐹 =
𝑀𝑆 𝑅𝑜𝑤
and 𝐹 =
𝑀𝑆 𝐶𝑜𝑙𝑢𝑚𝑛
• Note: P-Values and Critical Values are from the 𝐹 Distribution
Notice the Two-Way ANOVA is a two-step procedure that performs one to three
separate 𝐹 tests

The data in the table are categorized
with two factors:
1. Gender: Male or Female
2. Blood Lead Level: Low, Medium, or
High
The subcategories are called cells, and
the response variable is IQ score.

Let’s explore the IQ data in the table by calculating the mean for each
cell and constructing an interaction graph.

 An interaction effect is suggested if the line segments are far from
being parallel.
 No interaction effect is suggested if the line segments are
approximately parallel.
 For the IQ scores, it appears there is an interaction effect:
• Females with high lead exposure appear to have lower IQ scores, while
males with high lead exposure appear to have high IQ scores.

Step 1: Interaction Effect – test the null hypothesis that there is no
interaction
Step 2: Row/Column Effects – if we conclude there is no interaction
effect, proceed with these two hypothesis tests
• Row Factor: no effects from row
• Column Factor: no effects from column
All tests use the 𝐹 distribution.

Given the performance IQ scores in the table at the beginning of this
section, use two-way ANOVA to test for an interaction effect, an effect
from the row factor of gender, and an effect from the column factor of
blood lead level.
Use a 0.05 level of significance.

Requirement Check:
1. For each cell, the sample values appear to be from a normally distributed
population.
2. The variances of the cells are 95.3, 146.7, 130.8, 812.7, 142.3, and 143.8, which
are considerably different from each other. We might have some
reservations that the population variances are equal – but for the purposes of
this example, we will assume the requirement is met.
3. The samples are simple random samples.
4. The samples are independent of each other; the subjects are not matched in
any way.
5. The sample values are categorized in two ways (gender and blood lead
level).
6. All the cells have the same number (five) of sample values.

The StatCrunch output is displayed below:

Step 1: Test that there is no interaction between the two factors.
The test statistic is F = 0.43 and the P-value is 0.655, so we fail to reject the
null hypothesis.
It does not appear that the performance IQ scores are affected by an
interaction between gender and blood lead level.
There does not appear to be an interaction effect, so we proceed to test
for row and column effects.

Step 2: We now test:
H0: ܰ‫݋‬ ܴ‫ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ (gender) ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫ݓ݋‬ Factor (gender) ‫ݐ݂݂ܿ݁ܧ‬
For the row factor, F = 0.07 and the P-value is 0.791. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the gender of
the subject.
H0: ܰ‫݋‬ ‫݊݉ݑ݈݋ܥ‬ ‫ݎ݋ݐܿܽܨ‬ (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ‫݊݉ݑ݈݋ܥ‬ Factor (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬
For the column factor, F = 0.10 and the P-value is 0.906. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the level of lead
exposure.

Interpretation:
Based on the sample data, we conclude that IQ scores do not appear
to be affected by gender or blood lead level.
Caution:
• Two-way analysis of variance is not one-way analysis of variance done twice.
• Be sure to test for an interaction between the two factors.

To better understand the method of two-way analysis of variance, let’s
repeat Example 1 after adding 30 points to each of the performance IQ
scores of the females only. That is, in Table 12-3, add 30 points to each
of the listed scores for females.

Step 1:
• Interaction Effect: The display shows a p-value of 0.655 for an interaction
effect. Because that p-value is not less than or equal to 0.05, we fail to
reject the null hypothesis of no interaction effect. There does not appear to
be an interaction effect.

Step 2:
• Row Effect: The display shows a p-value less than 0.0001 for the row variable of
gender, so we reject the null hypothesis of no effect from the factor of gender. In
this case, the gender of the subject does appear to have an effect on
performance IQ scores.
• Column Effect: The display shows a p-value of 0.906 for the column variable of
blood lead level, so we fail to reject the null hypothesis of no effect from the
factor of blood lead level. The blood lead level does not appear to have an
effect on performance IQ scores.

Interpretation:
By adding 30 points to each score of the female subjects, we do
conclude that there is an effect due to the gender of the subject, but
there is not apparent effect from an interaction or from the blood lead
level.

If our sample data consist of only one observation per cell, there is no
variation within individual cells and sample variances cannot be
calculated for individual cells.
If it seems reasonable to assume there is no interaction between the two
factors, make that assumption and test separately:
H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬
(The mechanics of the tests are the same as presented earlier.)

If we use only the first entry from each cell in Table 12-3, we get the
StatCrunch results shown below. Use a 0.05 significance level to test for an
effect from the row factor of gender and also test for an effect from the
column factor of blood lead level. Assume that there is no effect from an
interaction between gender and blood lead level.

• Row Factor:
• We first use the results from StatCrunch display to test the null hypothesis of no
effects from the row factor of gender (male, female). This test statistic (0.02) is not
significant, because the corresponding p-value is 0.901. We fail to reject the null
hypothesis. It appears that performance IQ scores are not affected by the gender
of the subject.
• Column Factor:
• We now use the StatCrunch display to test the null hypothesis of no effect from the
column factor of blood lead level (low, medium, high). The test statistic (1.16) is not
significant because the corresponding p-value is 0.463. We fail to reject the null
hypothesis, so it appears that the performance IQ scores are not affected by the
blood lead level.

• Complete Practice Problems 7

Analysis of Variance (ANOVA)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Analysis of Variance (ANOVA)

Similar to Analysis of Variance (ANOVA) (20)

More from Avjinder (Avi) Kaler

More from Avjinder (Avi) Kaler (20)

Recently uploaded

Recently uploaded (20)

Analysis of Variance (ANOVA)