SlideShare a Scribd company logo
1 of 50
Download to read offline
ANALYSIS OF VARIANCE
(ANOVA)
Avjinder Singh Kaler and Kristi Mai
 Estimating a Population Variance/Standard Deviation
• 𝜒2 (Chi-Square) Distribution
 Comparing Variation in Two Samples
• F Distribution
 One-Way Analysis of Variance (ANOVA)
 Multiple Comparison Tests
• Tukey Test
 Two-Way Analysis of Variance (ANOVA)
Main Ideas:
• The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
• We can use a sample variance to construct a C.I. to estimate the true
value of a population variance and we can also use a sample
standard deviation to construct a C.I. to estimate the true value of a
population standard deviation
• We can also test claims about a population variance or standard
deviation
 If a population has a normal distribution, then the following formula described
the 𝜒2 distribution: 𝜒2 =
𝑛−1 ∗𝑠2
𝜎2
 This is a Chi-Square-score and is a measure of relative standing
 We NEED degrees of freedom for the 𝜒2
distribution
• 𝑑𝑓 = 𝑛 − 1 (in this situation)
• Although this value for degrees of freedom is common, 𝑑𝑓 are NOT always 𝑛 − 1
 Properties of the Chi-Square Distribution:
• The 𝜒2
distribution is NOT symmetric like the t-distribution or the Normal distribution
 Note: Because the distribution is NOT symmetric, the C.I. will NOT be 𝑠2 ± 𝐸
• The values of 𝜒2 can be ≥ 0 but cannot be negative
• The 𝜒2 distribution is different for different degrees of freedom
Main Ideas:
 The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
 We can use two sample variances to test claims about the
difference between two population variances
• If two populations are normally distributed with equal variances ( i.e. 𝜎1
2
= 𝜎2
2
, then
the following formula describes the F distribution: 𝐹 =
𝑠1
2
𝑠2
2
• This is an F-score and is a measure of relative standing
• Notice that this distribution compares the two variations in the form of a ratio
• We NEED two different degrees of freedom for the F distribution
• In this particular situation, we have:
• Numerator 𝑑𝑓 = 𝑛1 − 1
• Denominator 𝑑𝑓 = 𝑛2 − 1
• Properties of the F Distribution:
• The F distribution is NOT symmetric like the t-distribution or the Normal distribution
• The values of F can be ≥ 0 but cannot be negative
• The F distribution is different for different degrees of freedom and depends on TWO
different degrees of freedom
Main Ideas:
 We can actually extend the hypothesis testing foundation that we
already have to test the claim that three or more population means
are all equal
• i.e. 𝐻0: 𝜇1 = 𝜇2 = ⋯ = 𝜇 𝑘 vs. 𝐻1: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑠
 We can test this null hypothesis by analyzing sample variances
 This test (a One-Way ANOVA) is appropriate when we wish to
compare three or more population means within a set of quantitative
data that is categorized according to one treatment (or factor)
• Treatment (factor) – a characteristic allowing us to distinguish between the
different populations of interest
 We CANNOT simply test two samples at a time
Requirements:
 The populations have different distributions that are approximately
normal
• Loose requirement – only a problem is a population is very far from normal
 The populations have the same variance 𝜎2
• Loose requirement – the ratio of variances can be as large as 9:1
 The samples are SRS of quantitative data
 The samples are independent of each other
 The different samples are from populations that are categorized in
only one way
Test Statistic:
 𝐹 =
𝑀𝑆(𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡)
𝑀𝑆(𝐸𝑟𝑟𝑜𝑟)
≈
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
 Note: p-values and critical values are from the F distribution
 The F test statistic is very sensitive to sample means, even though it is based on variance
 Degrees of Freedom
• Equal Sample Sizes
 Numerator 𝑑𝑓 = 𝑘 − 1
 Denominator 𝑑𝑓 = 𝑘(𝑛 − 1)
• Unequal Sample Sizes
 Numerator 𝑑𝑓 = 𝑘 − 1
 Denominator 𝑑𝑓 = 𝑁 − 𝑘
• Notation:
 𝑘: number of samples
 𝑛: number of values in each sample (i.e. sample size)
 𝑁: total number of values in all samples combined
Notice the One-Way ANOVA is an F test – like comparing variances.
Specifically, it is a right-tailed F Test.
Conclusion Cautions:
• Rejecting the null hypothesis does NOT tell us that all of the means are different!
• In fact, rejecting the null hypothesis cannot tell us which mean(s) is(are)
different
Use the performance IQ
scores listed in Table 12-1
and a significance level
of α = 0.05 to test the
claim that the three
samples come from
populations with means
that are all equal.
Here are summary statistics from the collected data:
Requirement Check:
1. The three samples appear to come from populations that are
approximately normal.
2. The three samples have standard deviations that are not dramatically
different.
3. We can treat the samples as simple random samples.
4. The samples are independent of each other and the IQ scores are not
matched in any way.
5. The three samples are categorized according to a single factor: low
lead, medium lead, and high lead.
The hypotheses are:
The significance level is α = 0.05.
H0
: 1
 2
 3
H1
: At least one of the means is different from the others.
From StatCrunch results, the p-value is 0.020 when rounded.
Because the P-value is less than the significance level of α = 0.05, we
can reject the null hypothesis.
There is sufficient evidence that the three samples come from
populations with means that are different.
We cannot conclude formally that any particular mean is different from
the others, but it appears that greater blood lead levels are associated
with lower performance IQ scores.
Larger values of the test statistic result in smaller P-values, so the ANOVA
test is right-tailed.
Assuming that the populations have the same variance σ2 (as required
for the test), the F test statistic is the ratio of these two estimates of σ2:
1) variation between samples (based on variation among sample
means)
2) variation within samples (based on the sample variances)
Multiple Comparison Tests – these tests should be used to identify where the
difference(s) in the means lie if the null hypothesis in the One-Way ANOVA is
rejected. Multiple comparison tests use pairs of means to identify which means
are different while still accounting for the multiple testing problem mentioned
previously by making adjustments to ensure an adequate significance level
• Examples: Duncan, SNK, Scheffe, Dunnett, LSD, Bonferroni, and Tukey Tests
In this course we will utilize the Tukey Test!
• The Tukey Test provides associated p-values for the comparison of each pair of
means
This test will allow you to identify if the means of any two of the 𝑘 many means differ
The Null Hypothesis in this test assumes the equality of the two means being
compared
• The average MPG for 2000-2010 vehicles from four car manufacturers are
compared. We would like to see if there is a difference in average MPG.
(Notice that we are testing to see if there is a difference in the means of a
quantitative variable, MPG, across four different factors/treatments, the
manufacturer)
• After deeming a One-Way ANOVA appropriate for this research question
and checking the requirements for this statistical procedure, we find that
at least one of the mean MPGs differs due to a low p-value (for instance,
0.0003). (Refer back to the One-Way ANOVA section and the hypotheses
for this test)
• Since the One-Way ANOVA revealed a difference but cannot tell us,
specifically, where the difference in mean MPGs is, we decide to
perform a Tukey Test to answer our research question in full. Where is the
difference? Which manufacturer has a higher/lower mean MPG?
• The Tukey Test compares all 𝑘 means, two at a time. (Note: This can be
done here because a Tukey Test does control the overall significance
level for pairwise comparisons)
We introduce the method of two-way analysis of variance, which is
used with data partitioned into categories according to two factors.
The methods of this section require that we begin by testing for an
interaction between the two factors.
Then we test whether the row or column factors have effects.
Main Ideas:
• We can actually extend the One-Way ANOVA to test the claim that three or
more population means are all equal when the data is categorized in TWO
ways (not just one)
• This test (a Two-Way ANOVA) is appropriate when we wish to compare three
or more population means within a set of quantitative data that is
categorized according to two treatments (or factors)
 We CANNOT simply test the effect of the two factors by utilizing two One-Way ANOVAs
because the One-Way ANOVA test would ignore the possible interaction between the
two factors involved
Interaction – there is an interaction between two factors if the effect of
one of the factors changes for different categories of the other factor (like
a combination effect or a synergy effect)
• Interaction plots can be used to visually assess if an interaction effect is
present
• We must test for an interaction effect first!
1. For each cell, the populations have distributions that are
approximately normal
2. The populations have the same variance 𝜎2
3. The samples are SRS of quantitative data
4. The samples are independent of each other
5. The samples are from populations that are categorized in two ways
6. All of the cells have the same number of sample values (i.e. a
balanced design)
• Not a general requirement for Two-Way ANOVA but we won’t have
unbalanced designs
* Notice that there are up to three tests being performed during a Two-Way
ANOVA *
First Test: The Test for an Interaction Effect
 Hypotheses:
•
𝐻0: ܰ‫݋‬ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬
𝐻1: Th݁‫݁ݎ‬ ݅‫ݏ‬ ܽ݊ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬
 Test Statistic:
• 𝐹 =
𝑀𝑆 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
𝑀𝑆(𝐸𝑟𝑟𝑜𝑟)
*Note: P-Values and Critical Values are from the 𝐹 Distribution
 Conclusion:
• If the null hypothesis of ‘No Interaction Effect’ is rejected, then there is a significant
interaction effect and we CANNOT proceed to test for main effects. So, if we reject
𝐻0, we must STOP.
Second and Third Tests: The Test for main (Row/Column Factor) Effects
 Hypotheses:
•
H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬
 Test Statistic:
• 𝐹 =
𝑀𝑆 𝑅𝑜𝑤
𝑀𝑆(𝐸𝑟𝑟𝑜𝑟)
and 𝐹 =
𝑀𝑆 𝐶𝑜𝑙𝑢𝑚𝑛
𝑀𝑆(𝐸𝑟𝑟𝑜𝑟)
• Note: P-Values and Critical Values are from the 𝐹 Distribution
Notice the Two-Way ANOVA is a two-step procedure that performs one to three
separate 𝐹 tests
The data in the table are categorized
with two factors:
1. Gender: Male or Female
2. Blood Lead Level: Low, Medium, or
High
The subcategories are called cells, and
the response variable is IQ score.
Let’s explore the IQ data in the table by calculating the mean for each
cell and constructing an interaction graph.
 An interaction effect is suggested if the line segments are far from
being parallel.
 No interaction effect is suggested if the line segments are
approximately parallel.
 For the IQ scores, it appears there is an interaction effect:
• Females with high lead exposure appear to have lower IQ scores, while
males with high lead exposure appear to have high IQ scores.
Step 1: Interaction Effect – test the null hypothesis that there is no
interaction
Step 2: Row/Column Effects – if we conclude there is no interaction
effect, proceed with these two hypothesis tests
• Row Factor: no effects from row
• Column Factor: no effects from column
All tests use the 𝐹 distribution.
Given the performance IQ scores in the table at the beginning of this
section, use two-way ANOVA to test for an interaction effect, an effect
from the row factor of gender, and an effect from the column factor of
blood lead level.
Use a 0.05 level of significance.
Requirement Check:
1. For each cell, the sample values appear to be from a normally distributed
population.
2. The variances of the cells are 95.3, 146.7, 130.8, 812.7, 142.3, and 143.8, which
are considerably different from each other. We might have some
reservations that the population variances are equal – but for the purposes of
this example, we will assume the requirement is met.
3. The samples are simple random samples.
4. The samples are independent of each other; the subjects are not matched in
any way.
5. The sample values are categorized in two ways (gender and blood lead
level).
6. All the cells have the same number (five) of sample values.
The StatCrunch output is displayed below:
Step 1: Test that there is no interaction between the two factors.
The test statistic is F = 0.43 and the P-value is 0.655, so we fail to reject the
null hypothesis.
It does not appear that the performance IQ scores are affected by an
interaction between gender and blood lead level.
There does not appear to be an interaction effect, so we proceed to test
for row and column effects.
Step 2: We now test:
H0: ܰ‫݋‬ ܴ‫ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ (gender) ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫ݓ݋‬ Factor (gender) ‫ݐ݂݂ܿ݁ܧ‬
For the row factor, F = 0.07 and the P-value is 0.791. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the gender of
the subject.
H0: ܰ‫݋‬ ‫݊݉ݑ݈݋ܥ‬ ‫ݎ݋ݐܿܽܨ‬ (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ‫݊݉ݑ݈݋ܥ‬ Factor (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬
For the column factor, F = 0.10 and the P-value is 0.906. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the level of lead
exposure.
Interpretation:
Based on the sample data, we conclude that IQ scores do not appear
to be affected by gender or blood lead level.
Caution:
• Two-way analysis of variance is not one-way analysis of variance done twice.
• Be sure to test for an interaction between the two factors.
To better understand the method of two-way analysis of variance, let’s
repeat Example 1 after adding 30 points to each of the performance IQ
scores of the females only. That is, in Table 12-3, add 30 points to each
of the listed scores for females.
Step 1:
• Interaction Effect: The display shows a p-value of 0.655 for an interaction
effect. Because that p-value is not less than or equal to 0.05, we fail to
reject the null hypothesis of no interaction effect. There does not appear to
be an interaction effect.
Step 2:
• Row Effect: The display shows a p-value less than 0.0001 for the row variable of
gender, so we reject the null hypothesis of no effect from the factor of gender. In
this case, the gender of the subject does appear to have an effect on
performance IQ scores.
• Column Effect: The display shows a p-value of 0.906 for the column variable of
blood lead level, so we fail to reject the null hypothesis of no effect from the
factor of blood lead level. The blood lead level does not appear to have an
effect on performance IQ scores.
Interpretation:
By adding 30 points to each score of the female subjects, we do
conclude that there is an effect due to the gender of the subject, but
there is not apparent effect from an interaction or from the blood lead
level.
If our sample data consist of only one observation per cell, there is no
variation within individual cells and sample variances cannot be
calculated for individual cells.
If it seems reasonable to assume there is no interaction between the two
factors, make that assumption and test separately:
H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬
H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬
(The mechanics of the tests are the same as presented earlier.)
If we use only the first entry from each cell in Table 12-3, we get the
StatCrunch results shown below. Use a 0.05 significance level to test for an
effect from the row factor of gender and also test for an effect from the
column factor of blood lead level. Assume that there is no effect from an
interaction between gender and blood lead level.
• Row Factor:
• We first use the results from StatCrunch display to test the null hypothesis of no
effects from the row factor of gender (male, female). This test statistic (0.02) is not
significant, because the corresponding p-value is 0.901. We fail to reject the null
hypothesis. It appears that performance IQ scores are not affected by the gender
of the subject.
• Column Factor:
• We now use the StatCrunch display to test the null hypothesis of no effect from the
column factor of blood lead level (low, medium, high). The test statistic (1.16) is not
significant because the corresponding p-value is 0.463. We fail to reject the null
hypothesis, so it appears that the performance IQ scores are not affected by the
blood lead level.
• Complete Practice Problems 7

More Related Content

What's hot

Research method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anovaResearch method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anova
naranbatn
 

What's hot (20)

Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | Statistics
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
Analysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsAnalysis of variance ppt @ bec doms
Analysis of variance ppt @ bec doms
 
Introduction to ANOVAs
Introduction to ANOVAsIntroduction to ANOVAs
Introduction to ANOVAs
 
Analysis Of Variance - ANOVA
Analysis Of Variance - ANOVAAnalysis Of Variance - ANOVA
Analysis Of Variance - ANOVA
 
Mann Whitney U Test
Mann Whitney U TestMann Whitney U Test
Mann Whitney U Test
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
 
F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
T test and types of t-test
T test and types of t-testT test and types of t-test
T test and types of t-test
 
Kruskal Wall Test
Kruskal Wall TestKruskal Wall Test
Kruskal Wall Test
 
Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
 
Anova, ancova
Anova, ancovaAnova, ancova
Anova, ancova
 
Two way analysis of variance (anova)
Two way analysis of variance (anova)Two way analysis of variance (anova)
Two way analysis of variance (anova)
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
Student T - test
Student T -  testStudent T -  test
Student T - test
 
Student's T-Test
Student's T-TestStudent's T-Test
Student's T-Test
 
Mann Whitney U test
Mann Whitney U testMann Whitney U test
Mann Whitney U test
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Research method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anovaResearch method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anova
 
Anova; analysis of variance
Anova; analysis of varianceAnova; analysis of variance
Anova; analysis of variance
 

Viewers also liked (6)

Z And T Tests
Z And T TestsZ And T Tests
Z And T Tests
 
Z test, f-test,etc
Z test, f-test,etcZ test, f-test,etc
Z test, f-test,etc
 
Z test
Z testZ test
Z test
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Pearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear RegressionPearson Correlation, Spearman Correlation &Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
 
Chi square test
Chi square testChi square test
Chi square test
 

Similar to Analysis of Variance (ANOVA)

(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
SilvaGraf83
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
MoseStaton39
 

Similar to Analysis of Variance (ANOVA) (20)

Parametric tests
Parametric  testsParametric  tests
Parametric tests
 
1 ANOVA.ppt
1 ANOVA.ppt1 ANOVA.ppt
1 ANOVA.ppt
 
Biomedical statistics
Biomedical statisticsBiomedical statistics
Biomedical statistics
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
 
(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D(Individuals With Disabilities Act Transformation Over the Years)D
(Individuals With Disabilities Act Transformation Over the Years)D
 
Amrita kumari
Amrita kumariAmrita kumari
Amrita kumari
 
One way AVOVA
One way AVOVA  One way AVOVA
One way AVOVA
 
Parametric & non-parametric
Parametric & non-parametricParametric & non-parametric
Parametric & non-parametric
 
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
 
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOUCorrelation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
Correlation and Regression - ANOVA - DAY 5 - B.Ed - 8614 - AIOU
 
Non parametric
Non parametricNon parametric
Non parametric
 
Statistics using SPSS
Statistics using SPSSStatistics using SPSS
Statistics using SPSS
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Statistics for Anaesthesiologists
Statistics for AnaesthesiologistsStatistics for Anaesthesiologists
Statistics for Anaesthesiologists
 
Shovan anova main
Shovan anova mainShovan anova main
Shovan anova main
 
T test^jsample size^j ethics
T test^jsample size^j ethicsT test^jsample size^j ethics
T test^jsample size^j ethics
 
Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric)
Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric) Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric)
Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric)
 
Parametric and non parametric test in biostatistics
Parametric and non parametric test in biostatistics Parametric and non parametric test in biostatistics
Parametric and non parametric test in biostatistics
 
tests of significance
tests of significancetests of significance
tests of significance
 

More from Avjinder (Avi) Kaler

More from Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Analysis of Variance (ANOVA)

  • 1. ANALYSIS OF VARIANCE (ANOVA) Avjinder Singh Kaler and Kristi Mai
  • 2.  Estimating a Population Variance/Standard Deviation • 𝜒2 (Chi-Square) Distribution  Comparing Variation in Two Samples • F Distribution  One-Way Analysis of Variance (ANOVA)  Multiple Comparison Tests • Tukey Test  Two-Way Analysis of Variance (ANOVA)
  • 3. Main Ideas: • The sample variance is the best point estimate of the population variance and the sample standard deviation is typically used to estimate the population standard deviation • We can use a sample variance to construct a C.I. to estimate the true value of a population variance and we can also use a sample standard deviation to construct a C.I. to estimate the true value of a population standard deviation • We can also test claims about a population variance or standard deviation
  • 4.  If a population has a normal distribution, then the following formula described the 𝜒2 distribution: 𝜒2 = 𝑛−1 ∗𝑠2 𝜎2  This is a Chi-Square-score and is a measure of relative standing  We NEED degrees of freedom for the 𝜒2 distribution • 𝑑𝑓 = 𝑛 − 1 (in this situation) • Although this value for degrees of freedom is common, 𝑑𝑓 are NOT always 𝑛 − 1  Properties of the Chi-Square Distribution: • The 𝜒2 distribution is NOT symmetric like the t-distribution or the Normal distribution  Note: Because the distribution is NOT symmetric, the C.I. will NOT be 𝑠2 ± 𝐸 • The values of 𝜒2 can be ≥ 0 but cannot be negative • The 𝜒2 distribution is different for different degrees of freedom
  • 5.
  • 6. Main Ideas:  The sample variance is the best point estimate of the population variance and the sample standard deviation is typically used to estimate the population standard deviation  We can use two sample variances to test claims about the difference between two population variances
  • 7. • If two populations are normally distributed with equal variances ( i.e. 𝜎1 2 = 𝜎2 2 , then the following formula describes the F distribution: 𝐹 = 𝑠1 2 𝑠2 2 • This is an F-score and is a measure of relative standing • Notice that this distribution compares the two variations in the form of a ratio • We NEED two different degrees of freedom for the F distribution • In this particular situation, we have: • Numerator 𝑑𝑓 = 𝑛1 − 1 • Denominator 𝑑𝑓 = 𝑛2 − 1 • Properties of the F Distribution: • The F distribution is NOT symmetric like the t-distribution or the Normal distribution • The values of F can be ≥ 0 but cannot be negative • The F distribution is different for different degrees of freedom and depends on TWO different degrees of freedom
  • 8.
  • 9. Main Ideas:  We can actually extend the hypothesis testing foundation that we already have to test the claim that three or more population means are all equal • i.e. 𝐻0: 𝜇1 = 𝜇2 = ⋯ = 𝜇 𝑘 vs. 𝐻1: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑠  We can test this null hypothesis by analyzing sample variances  This test (a One-Way ANOVA) is appropriate when we wish to compare three or more population means within a set of quantitative data that is categorized according to one treatment (or factor) • Treatment (factor) – a characteristic allowing us to distinguish between the different populations of interest  We CANNOT simply test two samples at a time
  • 10. Requirements:  The populations have different distributions that are approximately normal • Loose requirement – only a problem is a population is very far from normal  The populations have the same variance 𝜎2 • Loose requirement – the ratio of variances can be as large as 9:1  The samples are SRS of quantitative data  The samples are independent of each other  The different samples are from populations that are categorized in only one way
  • 11. Test Statistic:  𝐹 = 𝑀𝑆(𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡) 𝑀𝑆(𝐸𝑟𝑟𝑜𝑟) ≈ 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠  Note: p-values and critical values are from the F distribution  The F test statistic is very sensitive to sample means, even though it is based on variance  Degrees of Freedom • Equal Sample Sizes  Numerator 𝑑𝑓 = 𝑘 − 1  Denominator 𝑑𝑓 = 𝑘(𝑛 − 1) • Unequal Sample Sizes  Numerator 𝑑𝑓 = 𝑘 − 1  Denominator 𝑑𝑓 = 𝑁 − 𝑘 • Notation:  𝑘: number of samples  𝑛: number of values in each sample (i.e. sample size)  𝑁: total number of values in all samples combined
  • 12. Notice the One-Way ANOVA is an F test – like comparing variances. Specifically, it is a right-tailed F Test. Conclusion Cautions: • Rejecting the null hypothesis does NOT tell us that all of the means are different! • In fact, rejecting the null hypothesis cannot tell us which mean(s) is(are) different
  • 13. Use the performance IQ scores listed in Table 12-1 and a significance level of α = 0.05 to test the claim that the three samples come from populations with means that are all equal.
  • 14. Here are summary statistics from the collected data:
  • 15. Requirement Check: 1. The three samples appear to come from populations that are approximately normal. 2. The three samples have standard deviations that are not dramatically different. 3. We can treat the samples as simple random samples. 4. The samples are independent of each other and the IQ scores are not matched in any way. 5. The three samples are categorized according to a single factor: low lead, medium lead, and high lead.
  • 16. The hypotheses are: The significance level is α = 0.05. H0 : 1  2  3 H1 : At least one of the means is different from the others.
  • 17.
  • 18. From StatCrunch results, the p-value is 0.020 when rounded. Because the P-value is less than the significance level of α = 0.05, we can reject the null hypothesis. There is sufficient evidence that the three samples come from populations with means that are different. We cannot conclude formally that any particular mean is different from the others, but it appears that greater blood lead levels are associated with lower performance IQ scores.
  • 19. Larger values of the test statistic result in smaller P-values, so the ANOVA test is right-tailed. Assuming that the populations have the same variance σ2 (as required for the test), the F test statistic is the ratio of these two estimates of σ2: 1) variation between samples (based on variation among sample means) 2) variation within samples (based on the sample variances)
  • 20.
  • 21. Multiple Comparison Tests – these tests should be used to identify where the difference(s) in the means lie if the null hypothesis in the One-Way ANOVA is rejected. Multiple comparison tests use pairs of means to identify which means are different while still accounting for the multiple testing problem mentioned previously by making adjustments to ensure an adequate significance level • Examples: Duncan, SNK, Scheffe, Dunnett, LSD, Bonferroni, and Tukey Tests In this course we will utilize the Tukey Test! • The Tukey Test provides associated p-values for the comparison of each pair of means This test will allow you to identify if the means of any two of the 𝑘 many means differ The Null Hypothesis in this test assumes the equality of the two means being compared
  • 22. • The average MPG for 2000-2010 vehicles from four car manufacturers are compared. We would like to see if there is a difference in average MPG. (Notice that we are testing to see if there is a difference in the means of a quantitative variable, MPG, across four different factors/treatments, the manufacturer) • After deeming a One-Way ANOVA appropriate for this research question and checking the requirements for this statistical procedure, we find that at least one of the mean MPGs differs due to a low p-value (for instance, 0.0003). (Refer back to the One-Way ANOVA section and the hypotheses for this test)
  • 23. • Since the One-Way ANOVA revealed a difference but cannot tell us, specifically, where the difference in mean MPGs is, we decide to perform a Tukey Test to answer our research question in full. Where is the difference? Which manufacturer has a higher/lower mean MPG? • The Tukey Test compares all 𝑘 means, two at a time. (Note: This can be done here because a Tukey Test does control the overall significance level for pairwise comparisons)
  • 24.
  • 25. We introduce the method of two-way analysis of variance, which is used with data partitioned into categories according to two factors. The methods of this section require that we begin by testing for an interaction between the two factors. Then we test whether the row or column factors have effects.
  • 26. Main Ideas: • We can actually extend the One-Way ANOVA to test the claim that three or more population means are all equal when the data is categorized in TWO ways (not just one) • This test (a Two-Way ANOVA) is appropriate when we wish to compare three or more population means within a set of quantitative data that is categorized according to two treatments (or factors)  We CANNOT simply test the effect of the two factors by utilizing two One-Way ANOVAs because the One-Way ANOVA test would ignore the possible interaction between the two factors involved
  • 27. Interaction – there is an interaction between two factors if the effect of one of the factors changes for different categories of the other factor (like a combination effect or a synergy effect) • Interaction plots can be used to visually assess if an interaction effect is present • We must test for an interaction effect first!
  • 28. 1. For each cell, the populations have distributions that are approximately normal 2. The populations have the same variance 𝜎2 3. The samples are SRS of quantitative data 4. The samples are independent of each other 5. The samples are from populations that are categorized in two ways 6. All of the cells have the same number of sample values (i.e. a balanced design) • Not a general requirement for Two-Way ANOVA but we won’t have unbalanced designs
  • 29. * Notice that there are up to three tests being performed during a Two-Way ANOVA * First Test: The Test for an Interaction Effect  Hypotheses: • 𝐻0: ܰ‫݋‬ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬ 𝐻1: Th݁‫݁ݎ‬ ݅‫ݏ‬ ܽ݊ ‫݊݋݅ݐܿܽݎ݁ݐ݊ܫ‬ ‫ݐ݂݂ܿ݁ܧ‬  Test Statistic: • 𝐹 = 𝑀𝑆 𝐼𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑀𝑆(𝐸𝑟𝑟𝑜𝑟) *Note: P-Values and Critical Values are from the 𝐹 Distribution  Conclusion: • If the null hypothesis of ‘No Interaction Effect’ is rejected, then there is a significant interaction effect and we CANNOT proceed to test for main effects. So, if we reject 𝐻0, we must STOP.
  • 30. Second and Third Tests: The Test for main (Row/Column Factor) Effects  Hypotheses: • H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬ H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬  Test Statistic: • 𝐹 = 𝑀𝑆 𝑅𝑜𝑤 𝑀𝑆(𝐸𝑟𝑟𝑜𝑟) and 𝐹 = 𝑀𝑆 𝐶𝑜𝑙𝑢𝑚𝑛 𝑀𝑆(𝐸𝑟𝑟𝑜𝑟) • Note: P-Values and Critical Values are from the 𝐹 Distribution Notice the Two-Way ANOVA is a two-step procedure that performs one to three separate 𝐹 tests
  • 31. The data in the table are categorized with two factors: 1. Gender: Male or Female 2. Blood Lead Level: Low, Medium, or High The subcategories are called cells, and the response variable is IQ score.
  • 32. Let’s explore the IQ data in the table by calculating the mean for each cell and constructing an interaction graph.
  • 33.  An interaction effect is suggested if the line segments are far from being parallel.  No interaction effect is suggested if the line segments are approximately parallel.  For the IQ scores, it appears there is an interaction effect: • Females with high lead exposure appear to have lower IQ scores, while males with high lead exposure appear to have high IQ scores.
  • 34. Step 1: Interaction Effect – test the null hypothesis that there is no interaction Step 2: Row/Column Effects – if we conclude there is no interaction effect, proceed with these two hypothesis tests • Row Factor: no effects from row • Column Factor: no effects from column All tests use the 𝐹 distribution.
  • 35.
  • 36.
  • 37. Given the performance IQ scores in the table at the beginning of this section, use two-way ANOVA to test for an interaction effect, an effect from the row factor of gender, and an effect from the column factor of blood lead level. Use a 0.05 level of significance.
  • 38. Requirement Check: 1. For each cell, the sample values appear to be from a normally distributed population. 2. The variances of the cells are 95.3, 146.7, 130.8, 812.7, 142.3, and 143.8, which are considerably different from each other. We might have some reservations that the population variances are equal – but for the purposes of this example, we will assume the requirement is met. 3. The samples are simple random samples. 4. The samples are independent of each other; the subjects are not matched in any way. 5. The sample values are categorized in two ways (gender and blood lead level). 6. All the cells have the same number (five) of sample values.
  • 39. The StatCrunch output is displayed below:
  • 40. Step 1: Test that there is no interaction between the two factors. The test statistic is F = 0.43 and the P-value is 0.655, so we fail to reject the null hypothesis. It does not appear that the performance IQ scores are affected by an interaction between gender and blood lead level. There does not appear to be an interaction effect, so we proceed to test for row and column effects.
  • 41. Step 2: We now test: H0: ܰ‫݋‬ ܴ‫ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ (gender) ‫ݐ݂݂ܿ݁ܧ‬ H1: There ݅‫ݏ‬ ܽ ܴ‫ݓ݋‬ Factor (gender) ‫ݐ݂݂ܿ݁ܧ‬ For the row factor, F = 0.07 and the P-value is 0.791. Fail to reject the null hypothesis, there is no evidence that IQ scores are affected by the gender of the subject. H0: ܰ‫݋‬ ‫݊݉ݑ݈݋ܥ‬ ‫ݎ݋ݐܿܽܨ‬ (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬ H1: There ݅‫ݏ‬ ܽ ‫݊݉ݑ݈݋ܥ‬ Factor (blood lead level) ‫ݐ݂݂ܿ݁ܧ‬ For the column factor, F = 0.10 and the P-value is 0.906. Fail to reject the null hypothesis, there is no evidence that IQ scores are affected by the level of lead exposure.
  • 42. Interpretation: Based on the sample data, we conclude that IQ scores do not appear to be affected by gender or blood lead level. Caution: • Two-way analysis of variance is not one-way analysis of variance done twice. • Be sure to test for an interaction between the two factors.
  • 43. To better understand the method of two-way analysis of variance, let’s repeat Example 1 after adding 30 points to each of the performance IQ scores of the females only. That is, in Table 12-3, add 30 points to each of the listed scores for females.
  • 44. Step 1: • Interaction Effect: The display shows a p-value of 0.655 for an interaction effect. Because that p-value is not less than or equal to 0.05, we fail to reject the null hypothesis of no interaction effect. There does not appear to be an interaction effect.
  • 45. Step 2: • Row Effect: The display shows a p-value less than 0.0001 for the row variable of gender, so we reject the null hypothesis of no effect from the factor of gender. In this case, the gender of the subject does appear to have an effect on performance IQ scores. • Column Effect: The display shows a p-value of 0.906 for the column variable of blood lead level, so we fail to reject the null hypothesis of no effect from the factor of blood lead level. The blood lead level does not appear to have an effect on performance IQ scores.
  • 46. Interpretation: By adding 30 points to each score of the female subjects, we do conclude that there is an effect due to the gender of the subject, but there is not apparent effect from an interaction or from the blood lead level.
  • 47. If our sample data consist of only one observation per cell, there is no variation within individual cells and sample variances cannot be calculated for individual cells. If it seems reasonable to assume there is no interaction between the two factors, make that assumption and test separately: H0: ܰ‫݋‬ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ ‫ݎ݋ݐܿܽܨ‬ ‫ݐ݂݂ܿ݁ܧ‬ H1: There ݅‫ݏ‬ ܽ ܴ‫݊݉ݑ݈݋ܥ/ݓ݋‬ Factor ‫ݐ݂݂ܿ݁ܧ‬ (The mechanics of the tests are the same as presented earlier.)
  • 48. If we use only the first entry from each cell in Table 12-3, we get the StatCrunch results shown below. Use a 0.05 significance level to test for an effect from the row factor of gender and also test for an effect from the column factor of blood lead level. Assume that there is no effect from an interaction between gender and blood lead level.
  • 49. • Row Factor: • We first use the results from StatCrunch display to test the null hypothesis of no effects from the row factor of gender (male, female). This test statistic (0.02) is not significant, because the corresponding p-value is 0.901. We fail to reject the null hypothesis. It appears that performance IQ scores are not affected by the gender of the subject. • Column Factor: • We now use the StatCrunch display to test the null hypothesis of no effect from the column factor of blood lead level (low, medium, high). The test statistic (1.16) is not significant because the corresponding p-value is 0.463. We fail to reject the null hypothesis, so it appears that the performance IQ scores are not affected by the blood lead level.
  • 50. • Complete Practice Problems 7