2. Introduction
Consider the following examples:
Effectiveness of different promotional devices in term of sales
Quality of a product produced by different manufacturers in terms of an
attribute
Production volume in different shifts in a factory
Yield from plots of land due to varieties of seeds, fertilizers, and cultivation
methods
Under certain circumstances we may not conduct repeated t-tests on pairs of
the samples. This is because when many independent tests are carried out
pairwise, the probability of the outcome being correct for the combined
results is reduced greatly.
3. Analysis of Variance (ANOVA)
Under certain assumptions, a method known as analysis of
variance (ANOVA) developed by R. A. Fisher is used to test the
significance of the difference between several population means.
Analysis of Variance (ANOVA) is a statistical formula used to
compare variances across the means (or average) of different
groups. A range of scenarios use it to determine if there is any
difference between the means of different groups.
4. Basic terms related to ANOVA
• The following are few terms that will be used during discussion on analysis of
variance:
• A sampling plan or experimental design is the way that a sample is selected from
the population under study and determines the amount of information in the
sample.
• An experimental unit is the object on which a measurement or measurements is
taken. Any experimental conditions imposed on an experimental unit provides
effect on the response.
• A factor or criterion is an independent variable whose values are controlled and
varied by the researcher.
• A level is the intensity setting of a factor.
• A treatment or population is a specific combination of factor levels.
• The response is the dependent variable being measured by the researcher
5. Example 1
A tyre manufacturing company plans to conduct a tyre-quality study
in which quality is the independent variable called factor or criterion
and the treatment levels or classifications are low, medium and high
quality.
The dependent (or response) variable might be the number of
kilometers driven before the tyre is rejected for use.
A study of daily sales volumes may be taken by using a completely
randomized design with demographic setting as the independent
variable. A treatment levels or classifications would be inner-city
stores, stores in metro-cities, stores in state capitals, stores in small
towns, etc. The dependent variable would be sales in rupees.
6. Example 2
• For a production volume in three shifts in a factory, there are two
variables—days of the week and the volume of production in each shift.
• If one of the objectives is to determine whether mean production volume
is the same during days of the week, then the dependence (or response)
variable of interest, is the mean production volume.
• The variables that are related to a response variable are called factors, that
is, a day of the week is the independent variable and the value assumed by
a factor in an experiment is called a level.
• The combinations of levels of the factors for which the response will be
observed are called treatments, i.e. days of the week. These treatments
define the populations or samples which are differentiated in terms of
production volume and we may need to compare them with each other
8. Experimental Design
• Complete Randomized Design (One-way ANOVA)
• Randomized Block Design (Two-way without replication)
• Latin Square Design (Two-way with replication, e.g., 2x2, 4x4)
• Factorial Design (Two-way with replication, e.g., 2x3, 3x2, etc.)
9. Assumptions of Analysis of Variance
• Each population under study is normally distributed with a
mean µr that may or may not be equal but with equal
variances σr
2.
• Each sample is drawn randomly and is independent of other
samples.
10. Analysis of Variance (ANOVA)
The first step in the analysis of variance is to partition the total variation
in the sample data into the following two component variations in such
a way that it is possible to estimate the contribution of factors that may
cause variation.
• The amount of variation among the sample means or the variation
attributable to the difference among sample means. This variation is
either on account of difference in treatment or due to element of
chance. This difference is denoted by SSC or SSTR.
• The amount of variation within the sample observations. This
difference is considered due to chance causes or experimental
(random) errors. The difference in the values of various elements in a
sample due to chance is called an estimate and is denoted by SSE.
11. One –way ANOVA
A one-way ANOVA evaluates the impact of a sole factor on a
sole response variable. It determines whether all the samples
are the same. The one-way ANOVA is used to determine
whether there are any statistically significant differences
between the means of three or more independent (unrelated)
groups.
Total Variation
Variation between (or among) sample means
(Also called sum of squares of treatments)
Variation within the sample values (Also
called sum of squares for errors)
12. One –way ANOVA: Hypothesis
H0: µ1 = µ2 = . . . = µr ← Null hypothesis
H1: Not all µj s are equal (j = 1, 2, . . ., r) ← Alternative hypothesis
13. Source of
Variation
Sum of
Squares
Degrees of
freedom
Mean Sum of
Squares
Test Statistic or
F-value
Between
samples
(Treatments)
SSR r – 1 MSR
F = MSR/MSE
Within samples
(error)
SSE n – r MSE
Total SST n – 1
If Fcal < Ftable, accept null hypothesis H0
14. Example
As head of the department of a consumer’s research organization, you
have the responsibility for testing and comparing lifetimes of four
brands of electric bulbs. Suppose you test the life-time of three
electric bulbs of each of the four brands. The data are shown below,
each entry representing the lifetime of an electric bulb, measured in
hundreds of hours:
Brand
A B C D
20 25 24 23
19 23 20 20
21 21 20 20
15. Two –way ANOVA
Total Variation
(SST)
Variation between samples (or groups), SSC
Variation within samples (or groups) due to error,
SSE
Unwanted variation due to
difference between block
means, i.e. sum of square
rows (blocking), SSR
New variation due to
random error—new sum
of squares of error (SSE)
16. Source of
Variation
Sum of
Squares
Degrees of
freedom
Mean Sum of
Squares
Test Statistic or
F-value
Between
columns
SSC c – 1 MSC
Ftreatment =
MSC/MSE
Between rows SSR r – r MSR
Residual error SSE (c – 1) (r – 1) MSE Fblocks =
MSR/MSE
Total SST n – 1