Degrees Of Freedom Assignment No 3

ASSIGMENT NO. 3

Submitted to: Mr. Rizwan Ahmed
Instructor (Statistical Inference)

Submitted by: Abdul Saboor Zaman (10901)
Faraz Ahmed Khan (10903)
Sanaullah Wafa (10910)

Statistical Inference (STA 404)

Masters of Business Administration (Industrial Management)

Institute of Business Management

DEGREES OF FREEDOM
Many elementary statistics textbooks introduce this concept in terms of the numbers that
are "free to vary" (Howell, 1992; Jaccard & Becker, 1990). Some statistics textbooks just
give the degrees of freedoms of various distributions (e.g. Moore & McCabe, 1989; Agresti
& Finlay, 1986). Johnson (1992) simply said that degree of freedom is the "index number"
for identifying which distribution is used. Some definitions given by statistical instructors
can be as obscured as "a mathematical property of a distribution related to the number of
values in a sample that can be freely specified once you know something about the sample."
(Flatto, 1996) The preceding explanations cannot clearly show the purpose of degrees of
freedom. Even advanced statistics textbooks do not discuss the degrees of freedom in
detail. It is common that many advanced statistics students and experienced researchers
have a vague idea of the degrees of freedom concept.

DEFINITIONS

Various definitions of degrees of freedom have been developed over time and are as
follows with dictionary references.

Daintith and Rennie (2005, p. 60)

They define Degree of Freedom as, the number of independent parameters that are needed
to specify the configuration of a system.

Schwartzman (1994, p. 96)

In mathematics the term degrees of freedom refers to the number of independent variables
involved in a statistic.

Mayhew (2004)

Mayhew defines it as ‘A number which in some way represents the size of the sample or
samples used in a statistical test. In some cases, it is the sample size, in others it is a value
which has to be calculated. Each test has its specific calculation, and the correct value for
each test must be calculated before the result of the test can be checked for statistical
significance.’

Upton and Cook (2002)

They refer degrees of freedom to ‘a parameter that appears in some probability
distributions used in statistical inference, particularly the t distribution, the chi‐squared
distribution, and the F distribution’ and note that ‘the phrase “degrees of freedom” was
introduced by Sir Ronald Fisher in 1922’ without mentioning its purpose. This is then
followed by several formulae for computing degrees of freedom, without any explanation of
how the formulae are derived.

Clapham (1996, pp. 65–66)

Clapham states that the number of degrees of freedom is ‘a positive integer normally
equivalent to the number of independent observations in a sample, minus the number of
population parameters to be estimated from the sample’.

Kotz and Johnson (1982, pp. 293–294)

Although the number of degrees of freedom is usually a positive integer, fractional
numbers occur in some approximations, and one can, for example, have a non‐central chi‐
squared distribution with zero degrees of freedom, obtained by taking this value for the
degrees of freedom parameter.

Everett (2002, p.111)

A somewhat clearer definition is offered by after describing degrees of freedom as ‘an
elusive concept’ he explains ‘essentially the term means the number of independent units
of information in a sample relevant to the estimation of a parameter or the calculation of a
statistic. For example, in a 2 x 2 contingency table with a given set of marginal totals, only
one of the four cell frequencies is free and the table has therefore a single degree of
freedom’.

Glenn and Littler (1984, p. 46)

Both independence and sample size: ‘In statistics it is the number of independent items of
information given by the data; that is, the total number of items less the number of relevant
summary statistics or restraints. Thus a set of independent results x1 , x2 , . . . xn has n
degrees of freedom, but n – 1 if the mean x is known, since any one of the xI is now
dependent on the sum of the others. Note that a sample of size n retains n degrees of
freedom if the population mean μ is known, since this does not determine xI for I = 1 . . . n if
the other ( n – 1) values are known. The concept is of importance in statistical inference
since it defines the effective size of a sample.’

These were the definitions of Degrees of Freedom, vaguely, inconsistently and in a mysterious way
discussing about it.

EXAMPLES

Examples are a best way to emphasize on understanding for students. If instructors focus
on Daily Life Examples of Degrees of Freedom to make students understand about them,
than the hype created by this term would easily be dissolved into thin air. Some of the daily
life examples of degrees of freedom are depicted below;

Example 1:
If three hours are allocated to perform three different tasks say 1:00 p.m. to 4:00
p.m. for eat, read & nap. If one allocates time for two tasks, the third one will be itself

allocated without any need of interference. Hence we observe constraint of total
time in this example. The degree of freedom for this example will be 2, because we
are allowed to allocate two tasks only while the other will automatically be
allocated.
Example 2:
A person has 5 different flavored candies to divide them among 3 friends, 2 will be
given to 2 friends each while the last one will be delivered to the third one. It means
the person dividing these candies has no option other than giving the fifth candy to
the third friend. The degree of freedom for this example will become 4.
Example 3:
A plot of 360 yards has to be equally divided among three brothers, the elder
brother emphasizes to have the corner plot. Now, we only have the option to
allocate any one plot to any one of the brothers other than elder one. Here we only
have 1 degree of freedom. Corner plot is allocated to elder brother; one of the other
two plots is allocated to the second brother while the third one will get the left out
plot automatically. So only one option is possible which refers to one degree of
freedom.

Daily life examples like this will be more helpful in making students understand the
concept of Degree of Freedom.

STATISTICAL APPLICATIONS OF DEGREES OF FREEDOM

Degrees of Freedom are applicable in the Statistics in following applications;

Sample Variance
ANOVA & Regression
Chi‐Squared Test of Independence
Chi‐Squared Goodness of Fit Test

Sample Variance

The sample variance is calculated by averaging squared deviations from the mean over the
degrees of freedom (df) rather than over no. of samples (n).

The sample variance involves in the following statistical estimations the degree of freedom;

1. t‐test (for small sample sizes of single and double normally distributed populations)
2. F‐Ratio (for ratio of two population variances)

These are the early occurring topics in Inferential Statistics and require the meaning and
purpose of Degree of Freedom earlier to be discussed and emphasized upon.

The degrees of freedom for Sample Variance can easily be understood by the following
example;

Originally consider the No. of Samples (n) = 5

No other information is provided, hence there are no restrictions, any of the 5 values will
be sufficient to represent the observations, and all can be freely discarded or replaced by
others, i.e. the degrees of freedom are 5.

But for Sample Variance Calculation we need to know about the sample mean also; which
for instance should be considered as;

x=
∑x j

n

x = 10

Now it is not possible to say that all the five observations are able to be adjusted.

For this situation the sum of all the observations should total to;

nx = 50

Here another constraint had been applied restricting the sum of all sample observations
not to exceed 50. Consequently the degrees of freedom here are;

df = n − 1 = 5 − 1 = 4

Because of the effective sample size which is reduced to 4.

For ttest of two normally distributed populations, pooled variance is calculated by the
degrees of freedom equal to;

(n1 − 1) + (n2 − 1) = n1 + n2 − 2

This is because of the independence of altering and placing (n1) samples in each
populations and the remaining will themselves be adjusted on the basis of their sample
means.

The F‐ratio for comparison of two population variances using samples n1 and n2, the
degrees of freedom will be;

Degrees of freedom for numerator;

df n = n1 − 1

Degrees of freedom for the denominator;

df d = n2 − 1

The early emphasis on degree of freedom is an important matter which pays off throughout
the course.

ANOVA & Regression

In the analysis of variances (ANOVA) there is a prominent application of Degrees of
Freedom, because of its reference to effective sample size. For a total of n observations, the
overall variance is the sum of squares total divided by the total degree of freedom i.e.

SST
σ2 =
n −1

For k treatment categories, the sum of squares due to treatments is given by;

k

∑ n (x
i =1
i i − x) 2

Where ni = no. of observations in ith treatment category

xi = mean in the ith treatment category

The basic formula for calculation of Sum of Squares Total is;

SST = SSTR + SSE

Where SSE = Sum of Squares due to Errors

SST = Sum of Squares Total

Once the value of SSTR is calculated, the final term in the sum nk ( x k − x)2 is determined by
the value of SSTR and the preceding k − 1 terms; hence for the calculation of SSTR, the
degrees of freedom will be;

df = k − 1

The calculation of SSESum of Squares due to Errors is based on squared deviations within
each of the k categories, the mean square due to errors‐MSE, has the degrees of freedom as;

df = (n1 − 1) + (n2 − 1) + (n3 − 1) + ... + (nk − 1) = n − k

Since SSTR & SSE combine together to generate total sum of squares SST, its degrees of
freedom total to the same as we studied in the case of sample variances;

df = ( n − k ) + ( k − 1) = n − 1

In the context of Multiple Linear Regression the same concept reappears, if we consider;

k= No. of regression coefficients including the constant term

Coefficient of determination can be replaced by;

SSE
R2 = 1 −
SST

Here also we can average the effective sample sizes (degrees of freedom) for;

SSE to be df = n − k and

SST to be df = n − 1

ChiSquared Tests for Independence

For the case of Chi‐Squared Test for Independence, we use Contingency Tables to identify
the observed and expected occurrence frequencies f o and f e respectively.

The contingency table with no. of columns c and no. of rows r contains total r x c cells.
These cells can randomly be filled with samples drawn from there respective populations
as shown in this 2x3 Contingency table;

C 1 C 2 C 3

R1 (R1,C1) (R1,C2) (R1,C3)

R2 (R2,C1) (R2,C2) (R2,C3)

Hence the no. of effective samples here would be

df = r × c i.e. df=6

But if the marginal totals i.e. the sum of corresponding rows and the sum of corresponding
columns is added;

C 1 C 2 C 3 Total

R1 (R1,C1) (R1,C2) (R1,C3) 100

R2 (R2,C1) (R2,C2) (R2,C3) 80

20 70 90 180

Now if only one frequency is inserted say (R1,C1)=16, it becomes redundant that (R2,C1)=4,
limiting choice for (R2,C1) allocation. And if only one other frequency is known, say
(R1,C2)=44, the remaining cells will automatically be identified as follows;

C 1 C 2 C 3 Total

R1 16 44 40 100

R2 4 26 50 80

20 70 90 180

Therefore for a 6 cell contingency table; or a r = 2 and c = 3 table, only 2 non‐redundant
values i.e. df = 2 are needed to specify others itself. For even the large contingency tables
the rule remains same and this forms a generalized view of Degrees of Freedom for
Independence Test as;

df = ( r − 1) × (c − 1)

More generalization results from detailing the redundant & non‐redundant cells and their
calculations.

ChiSquared GoodnessofFit Test

Case I

Consider the following statistics of observations drawn from a right skewed Poisson
distributed population;

No. of Samples = n = 100

Population Mean=μ = 1.4

Another assumption that no. of observations never exceeds 5, is taken for simplicity.

The sample data can be divided into six categories (k=6) and the random variable takes the
values as;

X = 0,1, 2, 3, 4, 5

Because of the constraint of totaling these 6 categories to 100, only 5 of the categories can
be freely varied, the remaining category will itself be adjusted. So the Chi‐Squared statistic
in this case will be;

df = k − 1 = 6 − 1 = 5

Case II

Now suppose that the population mean (µ) is unknown. It can rather be estimated from the
sample data but if we calculate it from Sample Data, we need to fix the sample mean
(x=1.40).

Here we face two constraints; one is related to the sum of frequencies, which would sum
total to 100 i.e.

f1+f2+f3+f4+f5+f6=100

and another constraint of sum of all observations which would be totaled to 140 i.e.

0f1+1f2+2f3+3f4+5f6=140

At this instant we’ve got two equations and six variables leaving 4 degrees of freedom;

df=k‐2=6‐2=4

which means only 4 frequencies are free to be varied while others to be fixed, leaving 4
degrees of freedom.

With this example we can say that degree of freedom in the case of chi‐squared test for
goodness of fit is different for each parameter which depends on the sample and can be
estimated from it.

IN A NUTSHELL
Degree of freedom is an ever‐present concept in statistics but it is mostly not properly
defined to the pupils and very less of the statisticians themselves understand about it. The
students who are commonly affected of Math Phobia and Math Anxiety counter as another
unexplained and undefined factor. This misconception can be reduced by sufficient
discussion and effective exercise. Instructors are responsible to devise any system for their
particular class to give them at least a quick look on the topic.

Degrees Of Freedom Assignment No 3

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Degrees Of Freedom Assignment No 3

Similar a Degrees Of Freedom Assignment No 3 (20)

Degrees Of Freedom Assignment No 3