1. Unit-4
Tests of Significance
Once sample data has been gathered through an observational study or experiment, statistical
inference allows analysts to assess evidence in favor or some claimabout the population from
which the sample has been drawn. The methods of inference used to support or reject claims
based on sample data are known as tests of significance.
Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been
put forward, either because it is believed to be true or because it is to be used as a basis for
argument, but has not been proved.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is
not better on average, than the current drug. We would write H0: there is no difference
between the two drugs on average.
The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to
establish.
For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new
drug has a different effect, on average, compared to that of the current drug. We would write
Ha: the two drugs have different effects, on average. The alternative hypothesis might also be
that the new drug is better, on average, than the current drug. In this case we would write Ha:
the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null
hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject
Ha", or even "accept Ha".
If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is
true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the
null hypothesis then, suggests that the alternative hypothesis may be true.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Hypotheses are always stated in terms of population parameter, such as the mean ๐. An
alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a
parameter is either larger or smaller than the value given by the null hypothesis. A two-sided
hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis
-- the direction does not matter.
Hypotheses for a one-sided test for a population mean take the following form:
H0: ๐ = k Ha: ๐ > k or H0: ๐ = k Ha: ๐ < k.
2. Hypotheses for a two-sided test for a population mean take the following form:
H0: ๐ = k
Ha: ๐ k.
A confidence interval gives an estimated range of values which is likely to include an unknown
population parameter, the estimated range being calculated from a given set of sample data.
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Example
Suppose a test has been given to all high school students in a certain state. The mean test score
for the entire state is 70, with standard deviation equal to 10. Members of the school board
suspect that female students have a higher mean score on the test than male students, because
the mean score ๐ฅฬ from a random sample of 64 female students is equal to 73. Does this provide
strong evidence that the overall mean for female students is higher?
The null hypothesis H0 claims that there is no difference between the mean score for female
students and the mean for the entire population, so that ๐ = 70. The alternative hypothesis
claims that the mean for female students is higher than the entire student populations mean,
so that ๐ > 70.
Types of errors:-
There are two types of error in testing of hypothesis.
When a statistical hypothesis is tested there are four types of possibilities arise
1. The hypothesis is true but our test rejects it. (Type- I error)
2. The hypothesis is false but our test accepts it. (Type-II error)
3. The hypothesis is true but our test accepts it. (Correct decision)
4. The hypothesis is false but our test rejects it. (Correct decision)
The first two possibility leads to errors.
In a statistical hypothesis testing experiment, a type-I error is committed by rejecting the null
hypothesis when it is true. The probability of committing a type-I error is denoted by ๐ผ
(pronounced alpha), where
๐ผ = Prob. (Type- I error)
= Prob. (Rejecting ๐ป0/๐ป ๐ is true)
On the other head, a Type-II error is committed by not rejecting (i.e. accepting) the null
hypothesis when it is false. The probability of committing a type-II error is denoted by ๐ฝ
(pounced as beta), where
3. ๐ฝ= Probability (Type-II error)
= Probability (Not rejecting or accepting ๐ป0/๐ป ๐ false)
The distinction between these two types of error can be made by an example.
Assume that the difference between the two population mean is actually zero. If our test of
significance when applied to the simple mean is significant, we make an Type- I error.
On the other hand, suppose there is true difference between the two population means. Now
our test of significance leads to the judgment โnot significantโ, we commit Type- II error, we
thus find ourselves in the situation which is described by the following table:
Hypothesis test
As we know sometimes we cannot survey or test all persons or objects; therefore, we have to
take a sample. From the results of analysis from the sample data, we can predict the results
from the population. Some questions that one may want to answer are
1. Are unmarried workers more likely to be absent from work than married workers?
2. In Fall 1996, did students in Math 163-01 score the same on the exam as students in
Math 163-02?
3. Is there any difference between the strengths of steel wire produced by the XY
Company and Bobโs Wire Company?
4. A hospital spokesperson claims that the average daily room charge for a specific
procedure is $622. Can we reject this claim?
Hypothesis testing is a procedure, based on sample evidence and probability theory, used to
determine whether the hypothesis is a reasonable statement and should not be rejected, or is
unreasonable and should be rejected.
Hypothesis test:- A statistical hypothesis test is a method of statistical inference used for
testing a statistical hypothesis. A test result is called statistically significant if it has been
predicted as unlikely to have occurred by chance alone, according to a threshold probabilityโ
the significance level.
Steps in the hypothesis testing procedure
1. State the null hypothesis and the alternate hypothesis.
4. Null Hypothesis โ statement about the value of a population parameter.
Alternate Hypothesis โ statement that is accepted if evidence proves the null hypothesis to be
false.
2. Select the appropriate test statistic and level of significance. When testing a hypothesis of a
proportion, we use the z-statistic or z-test and the formula
๐ง =
๐ฬ โ ๐
โ
๐๐
๐
When testing a hypothesis of a mean, we use the z-statistic or we use the t-statistic according
to the following conditions.
If the population standard deviation, ฯ, is known and either the data is normally distributed or
the sample size n > 30, we use the normal distribution (z-statistic).
When the population standard deviation, ฯ, is unknown and either the data is normally
distributed or the sample size is greater than 30 (n > 30), we use the t-distribution (t-statistic).
A traditional guideline for choosing the level of significance is as follows: (a) the 0.10 level for
political polling, (b) the 0.05 level for consumer research projects, and (c) the 0.01 level for
quality assurance work.
3. State the decision rules. The decision rules state the conditions under which the null
hypothesis will be accepted or rejected. The critical value for the test-statistic is determined by
the level of significance. The critical value is the value that divides the non-reject region from
the reject region.
4. Compute the appropriate test statistic and make the decision. When we use the z-statistic,
we use the formula
๐ง =
๐ฅฬ โ ๐
๐/โ ๐
When we use the t-statistic, we use the formula
๐ก =
๐ฅฬ โ ๐
๐ /โ ๐
Compare the computed test statistic with critical value. If the computed value is within the
rejection region(s), we reject the null hypothesis; otherwise, we do not reject the null
hypothesis.
5. Interpret the decision. Based on the decision in Step 4, we state a conclusion in the context
of the original problem.
5. ๏ The average test score for an entire school is 75 with a standard deviation of 10. What is
the probability that a random sample of 5 studentd scored above 80 ?
Conditions for using t-test:
1. ๐ is unknown
2. ๐ < 30
Here ๐ = 75, ๐ = 10, ๐ = 5, ๐ฅฬ = 80
The firstconditionisnotsatisfiedSointhisproblrmwe will use ๐- test.
๐ง =
๐ฅฬ โ ๐
๐/โ ๐
=
80 โ 75
10/โ5
=
5
10/2.236
=
5
4.472
= 1.118
๏ The average test score for an entire school is 75. The standard deviation of a random
sample 40. What is the probability that a random sample of 10 studentd scored above
80 ?
Conditions for using t-test:
1. ๐ is unknown
2. ๐ < 30
Here ๐ = 75, ๐ = 40, ๐ = 10, ๐ฅฬ = 80
The secondconditionisnotsatisfiedSointhisproblrmwe will use ๐- test.
๐ง =
๐ฅฬ โ ๐
๐/โ ๐
๏ The average test score for an entire school is 75. The standard deviation of a random
sample of 9 students is 10. What is the probability the average test score for the sample
is above 80 ?
Conditions for using t-test:
1. ๐ is unknown
2. ๐ < 30
Here ๐ = 75, ๐ = 10, ๐ = 9, ๐ฅฬ = 80
Here both the conditionfort-testissatisfied.Sowe will use the ๐ก โ ๐ก๐๐ ๐ก.
๐ก =
๐ฅฬ โ ๐
๐ /โ ๐
6. Example:-
The average score of all sixth graders in school District A on a math aptitude exam is 75 with a
standard deviation of 8.1. A random sample of 100 students in one school was taken. The
mean score of these 100 students was 71. Does this indicate that the students of this school are
significantly less skilled in their mathematical abilities than the average student in the district?
(Use a 5% level of significance.)
Solution:-
Here
Mean = ๐ = 75 , Standard deviation= ๐ = 8.1 , ๐ = 100, ๐ฅฬ = 71
Conditions for using t-test:
1. ๐ is unknown
2. ๐ < 30
Since ฯ is known and ๐ > 30, we use the z-test that is based on the normal curve or normal
distribution.
Step 1:-
State the null hypothesis (contains =, โฅ, or โค) and alternate hypothesis (usually contains โnotโ).
Think of the statement โDoes this indicate that the students of this school are significantly less
skilled in their mathematical abilities than the average student in the district?โ From
โ...students of this school are significantly less skilled...,โ we write the alternate hypothesis
as ๐ป1: ๐ < 75
๐ป0: ๐ โฅ 75 ๐ป1: ๐ < 75
Step 2:- Select a level of significance. Stated in the problem as 5% ๐๐ ๐ผ = 0.05
Step 3:- Identify the statistical test to use. Use z-test because ฯ is known and the sample
(n=100) is a large sample (n > 30).
๐ง =
๐ฅฬ โ ๐
๐/โ ๐
Recall that in the normal curve, Z=0 corresponds to the mean. Z=1, 2, 3 represent 1, 2, and 3
standard deviations above the mean; the negatives are below the mean.
7. Step 4:- Formulate a decision rule.
Since the alternate hypothesis states ฮผ< 75, this is a one-tailed test to the left. For ฮฑ= 0.05, we
find ๐ in the normal curve table that gives a probability of 0.05 to the left of Z. This means the
negative of the z value (critical value) corresponding to a table value of
0.5 โ 0.05 = 0.45 ๐๐ ๐ = โ1.645.
That is ๐(๐ < โ1.645) = 0.05.. Because 0.4500 is exactly half way between 0.4495 and
0.4505, we get half way between 1.640 and 1.650 to get z = 1.645. Since 71 is to the left of 75,
we have ๐ง = โ1.645. That is ๐(๐ง < โ1.645) = 0.05.
Thus, we reject the null hypothesis if z < -1.645. And accept the alternate hypothesis that the
students in the school sampled are less skilled in math aptitude than those in district A.
Step 5:- Take a sample; arrive at a decision.
The sample of 100 students have been tested and found that their mean score was 71. Using
the statistical test (z-test) identified in Step 3 compute the test statistic by the formula from
Step 3
๐ง =
๐ฅฬ โ ๐
๐/โ ๐
=
71โ75
8.1/โ100
= โ4.938
Since the computed ๐ง = โ4.938 < โ1.645 (๐๐๐๐ก๐๐๐๐ ๐ง ๐ฃ๐๐๐ข๐), we reject the null hypothesis
that the students in the school are not less skilled in mathematical ability. Thus, we conclude
that the sixth graders in the school are less skilled in mathematical ability than the sixth graders
in District A.
The following problem is presented for students to work:
8. A sample of 250 married workers showed 22 missed more than 5 days last year for any reason.
A sample of 300 unmarried workers showed 35 missed more than 5 days. Use the 5% level of
significance to test and answer the question: Are unmarried workers more likely to be absent
from work than married workers?
Test of significance for Large samples:-
If the size of the sample exceeds 30 then we will test of significance for large samples.
The assumption made while dealing with problems relating to large samples are:
a) The random sampling distribution of a static is approximately normal, and
b) Values given by the samples are sufficiently close to the population value and
can be used in its place for calculating the standard error of the estimate.
Standard error of Mean:-
a) When standard deviation of the population is known
๐. ๐ธ. ๐ฬ =
๐๐
โ ๐
Where ๐. ๐ธ. ๐ฬ refers to the standard error of the mean.
๐๐ = Standard deviation of the population
๐ = Number of observations in the sample
b) When standard deviation of the population is not known , We have to use the standard
deviation of the sample in calculating standard error of mean.
The formula for calculating standard error is
๐. ๐ธ. ๐ฬ =
๐(๐ ๐๐๐๐๐)
โ ๐
Where ๐denote the standard deviation of the sample.
Noteโ If standard deviation of both the sample and the population are available then
standard deviation of the sample in calculating standard error of mean is preferred.
Example:- Calculate the standard error of mean from the following data showing the amount
paid by 100 firms in Calcutta on the occasion of Durga Puja:
9. Mid value (Rs.) 39 49 59 69 79 89 99
No. of firms 2 3 11 20 32 25 7
Solution:-
๐. ๐ธ. ๐ฬ =
๐
โ ๐
CALCULATION OF STANDARD DEVIATION
Mid value
๐
๐ (๐ โ 69)/10
= ๐
๐๐ ๐๐2
39 2 -3 -6 18
49 3 -2 -6 12
59 11 -1 -11 11
69 20 0 0 0
79 32 +1 +32 32
89 25 +2 +50 100
99 7 +3 +21 63
๐ = 100 โ๐๐ = 80 โ๐๐2
= 236
๐ = โ
โ๐๐2
๐
โ (
โ๐๐
๐
)
2
ร ๐ = โ
236
100
โ (
80
100
)
2
ร 10
= โ2.36 โ 0.64 ร 10 = 1.311 ร 10 = 13.11
๐. ๐ธ. ๐ฬ =
๐
โ ๐
=
13.11
โ100
=
13.11
10
= 1.311
Two-tailed test for the Difference between the Means of two Samples:-
i. If two independent random samples with ๐1and ๐2 numbers (Both sample sizes are
greater than 30) respectively are drawn from the same population of standard
deviation ๐1 the standard error of the difference between the sample means is given
by the formula:
S.E. of the difference between sample means
= โ๐2 (
1
๐1
+
1
๐2
)
If ๐ is unknown, sample standard deviation for combined sample must be substituted.
10. ii. If two random sample with๐ฬ 1, ๐1, ๐1 and ๐ฬ 2, ๐2, ๐2 respectively are drawn from the
different populations, then the S.E. of the difference between the mean is given by
the formula:
= โ
๐1
2
๐1
+
๐2
2
๐2
And where ๐1 and ๐2 are unknown.
S.E. of the difference between the means
= โ
๐1
2
๐1
+
๐2
2
๐2
Where ๐1 and ๐2 are represented standard deviation of the two samples.
The null hypothesis to be tested is that there is no significant difference in the means of
the two samples. i.e. ,
๐ป0: ๐1 = ๐2 โ Null hypothesis, there is no difference
๐ป ๐: ๐1 โ ๐2 โ Alternative hypothesis, a difference exists.
Example-1:-
Intelligence test on two groups of boys and girls gave the following results:
Mean S.D N
Girls 75 15 150
Boys 70 20 250
Is there a significant difference in the mean scores obtained by boys and girls ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean scored obtained by
boys and girls.
๐. ๐ธ. (๐ฬ 1 โ ๐ฬ 2) = โ
๐1
2
๐1
+
๐2
2
๐2
๐1 = 15, ๐2 = 20, ๐1 = 150, ๐2 = 250
Substituting these values
11. ๐. ๐ธ. ( ๐ฬ 1 โ ๐ฬ 2) = โ
(15)2
150
+
(20)2
250
= โ1.5 + 1.6 = 1.761
๐ท๐๐๐๐๐๐๐๐๐
๐. ๐ธ.
=
75 โ 70
1.761
= 2.84
Since the difference is more than 2.58 S.E.(1% label of significance), the hypothesis is rejected.
There seems to be a significant difference in the mean scores obtained by boys and girls.
Example-2:-
A man buys 50 electric bulbs of โPhilipsโ and 50 electric bulb of โHMTโ. He finds that โPhilipsโ
bulbs give an average life of 1500 hours with a standard deviation of 60 hours and โHMTโ bulbs
give an average life of 1512 hours with a standard deviation of 80 hours. Is there a significant
difference in the mean of the two makes of bulbs ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean life of the two
makes of the bulbs. Calculating standard error of difference of means
๐. ๐ธ. (๐ฬ 1 โ ๐ฬ 2) = โ
๐1
2
๐1
+
๐2
2
๐2
๐1 = 60, ๐2 = 50, ๐1 = 80, ๐2 = 50
Substituting these values
๐. ๐ธ. ( ๐ฬ 1 โ ๐ฬ 2) = โ
(60)2
50
+
(80)2
50
= โ
3600 + 6400
50
= โ200 = 14.14
Observed difference between the means=1512-1500=12
๐ท๐๐๐๐๐๐๐๐๐
๐. ๐ธ.
=
12
14.14
= 0.849
Since the difference is less than 2.58 S.E.(1% label of significance), it could have arisen due to
fluctuation of sampling. Hence the difference in the mean of the two makes is not significant.
Test of significance for small samples:-
12. When the sample size is small(less than 30) the test for large sample will not work good. So
special tests are there for small samples , such as t-test and F-test.
Student t-distribution
Theoretical work on t-distribution are done by W.S. Gosset (1876-1937) In year 1900. Gosset
was employed by the Guinness & Son, a Dublin bravery, iseland, which did not permit employs
to publish research finding under their own names. So Gosset adopted the pen name
โstudentโ and published his finding under this name. Therefore, the t-distribution is commonly
called Student t-distribution.
The t-distribution is used when the sample size is 30 or less and the population standard
deviation is unknown. The t-statistic is defined as:
๐ก =
๐ฅฬ โ ๐
๐
ร โ ๐
Where ๐ =
โโ( ๐ฅโ๐ฅฬ )2
๐โ1
Test the significance of the mean of a Random Sample:-
In determining whether the mean of a sample drawn from a normal distribution deviates
significantly from a stated value (the hypothetical value of the population mean), when
variance of the population is unknown we calculate the statistic:
๐ก =
๐ฅฬ โ ๐
๐
ร โ ๐
๐ฅฬ = the mean of the sample
๐ = the actual or hypothetical mean of the population
๐ = the sample size
๐ = the standard deviation of the sample
๐ = โโ( ๐ฅโ๐ฅฬ )2
๐โ1
or ๐ โ โโ๐2
โ๐( ๐ฬ )2
๐โ1
= โ
1
๐โ1
[โ๐2 โ
(โ๐)2
๐
]
Where ๐ = deviation from the assumed mean
If the calculated value of | ๐ก| exceeds ๐ก0.05, we say that the difference between ๐ฅฬ and ๐ is
significant at 5% label if it exceeds ๐ก0.01 , the difference is said to be significant at 1% label . If
13. | ๐ก| < ๐ก0.05, we conclude that the difference between ๐ฅฬ and ๐ is not significant and hence the
sample might have been drawn from a population with mean = ๐ .
Fiducial limits of population Mean:-
Assuming that the sample is a random sample from a normal population of unknown mean the
95% fiducial mean of the population mean (๐)are:
๐ฅฬ ยฑ
๐
โ ๐
๐ก0.05
And 99% limits are
๐ฅฬ ยฑ
๐
โ ๐
๐ก0.01
Example:- The manufacture of a certain make of electric bulbs claims that his bulbs have a
mean life of 25 months with a standard deviation of 5 months. A random sample of 6 such
bulbs gave a following value. Life of months 24, 26, 30,20, 20, 18 .
Can you regard the procedureโs claimto be valid at 1% label of significance? (Given that the
table values of the appropriate test statistics at the said label are 4.032, 3.707 and 3.499 for 5,6
and 7 degree of freedom respectively.)
Solutions:- Let us take the hypothesis that there is no significant difference in the mean life of
bulbs in the sample and that of the population. Applying t-test
๐ก =
๐ฅฬ โ ๐
๐
ร โ ๐
CALCULATION OF ๐ฬ and ๐
๐ฅ (๐ฅ โ ๐ฅฬ ) ๐ฅ2
24 +1 1
26 +3 9
30 +7 49
20 -3 9
20 -3 9
18 -5 25
โ๐ฅ = 138 โ๐ฅ2
= 102
๐ฅฬ =
โ๐ฅ
๐
=
138
6
= 23
14. ๐ = โ
โ๐ฅ2
๐ โ 1
= โ
102
5
= โ20.4 = 4.517
๐ก =
๐ฅฬ โ ๐
๐
ร โ ๐ =
|23 โ 25|
4.517
ร โ6 =
2 ร 2.449
4.517
= 1.084
๐ฃ = ๐ โ 1 = 6 โ 1 = 5. For ๐ฃ = 5 ๐ก0.01 = 4.032.
The calculated value of t is less then the tabulated value. So the hypothesis is accepted. Hence
the producerโs claimis not valid at 1% label of significance.
Example:- A random sample size 16 has 53 as mean. The sum of the squares of the deviation
taken from the mean is 135. Can this sample be regarded as taken from the population having
56 as mean ? Obtain 95% and 99% confidence limit of the mean of the population. ( For v=15,
๐ก0.05 = 2.13,for v = 15, ๐ก0.01 = 2.95)
Solutionโ
Let us take the hypothesis that there is no significant difference between the simple mean and
hypothetical population mean. . Applying t-test
๐ก =
๐ฅฬ โ ๐
๐
ร โ ๐
๐ฅฬ = 53, ๐ = 56, ๐ = 16, โ( ๐ฅ โ ๐ฅฬ )2
= 135
๐ = โ
โ( ๐ฅ โ ๐ฅฬ )2
๐ โ 1
= โ
135
15
= 3
๐ก =
|53 โ 56|
3
โ16 =
3 ร 4
3
= 4
๐ฃ = 16 โ 1 = 15,For ๐ฃ = 15, ๐ก0.05 = 2.13
The calculated value of t is more than the tabulated value. So the hypothesis is rejected. Hence,
the sample has not come from the population having 56 as mean.
95% confidence limit of the population mean
๐ฅฬ ยฑ
๐
โ ๐
๐ก0.05 = 53 ยฑ
3
โ16
ร 2.13 = 53 ยฑ
3
4
ร 2.13 = 53 ยฑ 1.6 = 51.4 to 54.6
99% confidence limit of the population mean
๐ฅฬ ยฑ
๐
โ ๐
๐ก0.01 = 53 ยฑ
3
โ16
ร 2.95 = 53 ยฑ
3
4
ร 2.95 = 53 ยฑ 2.212 = 50.788 to 55.212
Testing difference between means of two samples (Independent Samples):-
Given two independent random samples of size ๐1 ๐๐๐ ๐2 with the means ๐ฅฬ 1 ๐๐๐ ๐ฅฬ 2 and the
standard deviations ๐1 ๐๐๐ ๐2 we may be interested in testing the hypothesis that the samples
15. Come from same normal populations. To carry out the test, we calculate the statistic as follows:
๐ก =
๐ฅฬ 1 โ ๐ฅฬ 2
๐
ร โ
๐1 ๐2
๐1 + ๐2
Where ๐ฅฬ 1 = mean of the first sample
๐ฅฬ 2 = mean of the second sample
๐1 = number of the observations in the first sample
๐2 = number of the observations in the second sample
๐ = Combined standard deviation .
The value of ๐ is calculated by the following formula:
๐ = โ
โ( ๐ฅ1 โ ๐ฅฬ 1)2 + โ( ๐ฅ2 โ ๐ฅฬ 2)2
๐1 + ๐2 โ 2
When the actual means are in fraction the deviation should be taken from the assumed
means. In such a case the combined standard deviation is obtained by applying following
formula:
๐ = โ
โ( ๐ฅ1 โ ๐ด1)2 + โ( ๐ฅ2 โ ๐ด2)2 โ ๐1( ๐ฅฬ 1 โ ๐ด1)2 โ ( ๐ฅฬ 2 โ ๐ด2)2
๐1 + ๐2 โ 2
๐ด1 = Assumed mean of the first sample
๐ด2 = Assumed mean of the second sample
๐ฅฬ 1 = Actual mean of the first sample
๐ฅฬ 2 = Actual mean of the second sample
The degree of freedom = ๐1 + ๐2 โ 2.
When we are given the number of observation and the standard deviation of the two
samples, the pooled estimate of standard deviation can be obtained as follows:
๐ = โ
( ๐1 โ 1) ๐1
2
+ ( ๐2 โ 1) ๐2
2
๐1 + ๐2 โ 2
16. The calculated value of ๐ก be > ๐ก0.05 ( ๐ก0.01), the difference between the sample means is
said to be significant at 5%(1%)label of significance otherwise the data are said to be
consistent with the hypothesis.
Example:- Two typed of drug are used on 5 and 7 patient for reducing their weight.
Drug A was imported and drug B was indigenous. The decreases in the weight after using the
drug for six months as follows:
Drug A 10 12 13 11 14
Drug B 8 9 12 14 15 10 9
Solution:- Let us take the hypothesis that there is no significant difference in the
efficiency of the two drugs. Applying t-test
๐ก =
๐ฅฬ 1 โ ๐ฅฬ 2
๐
ร โ
๐1 ๐2
๐1 + ๐2
๐ฅ1 ( ๐ฅ1 โ ๐ฅฬ 1) ( ๐ฅ1 โ ๐ฅฬ 1)2
๐ฅ2 ( ๐ฅ2 โ ๐ฅฬ 2) ( ๐ฅ2 โ ๐ฅฬ 2)2
10 -2 4 8 -3 9
12 0 0 9 -3 9
13 +1 1 12 +1 1
11 -1 1 14 +3 9
14 +2 4 15 +4 16
10 -1 1
9 -2 4
โ๐ฅ1 = 60 โ( ๐ฅ1 โ ๐ฅฬ 1)2
= 10
โ๐ฅ2 = 77 โ( ๐ฅ2 โ ๐ฅฬ 2)2
= 44
๐ฅฬ 1 =
โ๐ฅ1
๐1
=
60
5
= 12; ๐ฅฬ 2 =
โ๐ฅ2
๐2
=
77
7
= 11
๐ = โ
โ( ๐ฅ1 โ ๐ฅฬ 1)2 + โ( ๐ฅ2 โ ๐ฅฬ 2)2
๐1 + ๐2 โ 2
= โ
10 + 44
5 + 7 โ 2
= โ
54
10
= 2.324
๐ก =
๐ฅฬ 1 โ ๐ฅฬ 2
๐
ร โ
๐1 ๐2
๐1 + ๐2
17. =
12 โ 11
2.324
ร โ
5 ร 7
5 + 7
=
1.708
2.324
= 0.735
๐ฃ = ๐1 + ๐2 โ 2 = 5 + 7 โ 2 = 10
๐ฃ = 10, ๐ก0.05 = 2.228
For calculated value of t is less than the table value, the hypothesis is accepted. Hence, there is
no significance in the efficacy of two drugs. Since drug B is indigenous and there is no difference
in the efficacy of imported and ingenious drugs, we should by ingenious B.
Testing Difference between Means of two sample (Dependent sample or Matched Paired
Sample):-
Two samples are said to be dependent when the elements in one sample are related to those in
the other in any significant or meaningful manner. In fact the two samples may consist of pair
of observations made on the same objects, individual or more generally, on the same selected
population elements. The t-test based on the paired observations is defined by the following
formula:
๐ก =
๐ฬ โ0
๐
ร โ ๐ or ๐ก =
๐ฬ โ ๐
๐
Where ๐ฬ = the mean of the differences
๐ = the standard deviation of the differences
The value of ๐ is calculated as follows:
๐ = โโ(๐ โ ๐ฬ )
2
๐ โ 1
๐๐ โโ๐2 โ ๐(๐ฬ )
2
๐ โ 1
It should be noted that ๐ก is based on ๐ โ 1degree of freedom.
Example:-
To verify whether a course in accounting improved performance, a similar test was given to 12
participants both before and after the course. The original mark recorded in the alphabetical โ
Were 44,40, 61,52,32,44,70,41,67,72,53 and 72. After the course, the marks were in the same
order 53,38,69,57,46,39,73,48,73,74,60 and 78. Was the course useful ?
Solution:-
18. Let us take the hypothesis that there is no significant difference in the marks obtained before
and after the course. i.e. The course has not been useful.
Applying the t- test(difference formula):
๐ก =
๐ฬ โ ๐
๐
Participants Before
(1st Test)
After
(2nd Test)
2nd -1st Test
๐
๐2
A 44 53 +9 81
B 40 38 -2 4
C 61 69 +8 64
D 52 57 +5 25
E 32 46 +14 196
F 44 39 -5 25
G 70 73 +3 9
H 41 48 +7 49
I 67 73 +6 36
J 72 74 +2 4
K 53 60 +7 49
L 72 78 +6 36
โ๐ = 60 โ๐2
= 578
๐ฬ =
โ๐
๐
=
60
12
= 5
๐ = โโ๐2 โ ๐(๐ฬ )
2
๐ โ 1
= โ
578 โ 12(5)2
12 โ 1
=
278
11
= 5.03
๐ก =
๐ฬ โ ๐
๐
=
5 ร โ12
5.03
=
5 ร 3.464
5.03
= 3.443
๐ฃ = ๐ โ 1 = 12 โ 1 = 11; ๐น๐๐ ๐ฃ = 11, ๐ก0.05 = 2.201
The calculated value of t is greater than the tabulated value. So the hypothesis is rejected.
Hence the course has been useful.
19. The F-test or the variance ratio test:-
The F-test is named in the honor of the great statistician R.A. Fisher. The object of the F-test is
to find out whether the two independent estimates of population variance differ significantly or
whether the two samples may be regarded as drawn from the normal populations having the
same variance. For carrying one out the test of significance, we calculate the ratio F. F is defined
as
๐น =
๐1
2
๐2
2,
Where ๐1
2
=
โ( ๐ฅ1โ ๐ฅฬ 1)2
๐1โ1
and ๐2
2
=
โ( ๐ฅ2โ ๐ฅฬ 2)2
๐2โ1
It should be noted that ๐1
2
is always the larger estimate of variance. i.e. ๐1
2
> ๐2
2
.
๐น =
๐ฟ๐๐๐๐๐ ๐๐ ๐ก๐๐๐๐ก๐ ๐๐ ๐ฃ๐๐๐๐๐๐๐
๐๐๐๐๐๐๐ ๐๐ ๐ก๐๐๐๐ก๐ ๐๐ ๐ฃ๐๐๐๐๐๐๐
๐ฃ1 = ๐1 โ 1 and ๐ฃ2 = ๐2 โ 1
๐ฃ1 = degrees of freedom of the sample having larger variance
๐ฃ2 = degrees of freedom of the sample having smaller variance
The calculated value of F is compared with the tabulated value for ๐ฃ1 and ๐ฃ2 at 5% or 1% label
of significance. If calculated value of F is greater than the tabulated value then the F ratio is
considered significant and the null hypothesis is rejected. On the other hand If calculated value
of F is less than the tabulated value then the null hypothesis is accepted and it id inferred that
the both the sample have come from the population having the same variance.
Since F test is based on the ratio of two variances, it is also called Variance Ratio Test.
Exampleโ
Two random samples were drawn from two normal populations and their values are
A 66 67 75 76 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% label of significance
(F=3.36) at 5% label for ๐ฃ1 = 10 and ๐ฃ2 = 8.
Solutionโ
20. Let us take the hypothesis that the two populations have the same variance. Applying F-test
๐น =
๐1
2
๐2
2
A
๐1
(๐1 โ ๐ฬ 1)
= ๐ฅ1
๐ฅ1
2
B
๐2
(๐2 โ ๐ฬ 2)
= ๐ฅ2
๐ฅ2
2
66 -14 196 64 -19 361
67 -13 169 66 -17 289
75 -5 25 74 -9 81
76 -4 16 78 -5 25
82 +2 4 82 -1 1
84 +4 16 85 +2 4
88 +8 64 87 +4 16
90 +10 100 92 +9 81
92 +12 144 93 +10 100
95 +12 144
97 +14 196
โ๐1 = 720 โ๐ฅ1 = 0 โ๐ฅ1
2
= 734 โ๐2 = 913 โ๐ฅ2 = 0 โ๐ฅ2
2
= 1298
๐ฬ 1 =
โ๐1
๐1
=
720
9
= 80; ๐ฬ 2 =
โ๐2
๐2
=
913
11
= 83
๐1
2
=
โ( ๐1)2
๐1 โ 1
=
734
9 โ 1
= 91.75
๐2
2
=
โ( ๐1)2
๐2 โ 1
=
1298
11 โ 1
= 129.8
๐น =
๐1
2
๐2
2
=
91.75
129.8
= 0.707
For ๐ฃ1 = 10 and ๐ฃ2 = 8. ๐น0.05 = 3.36.
The calculated value of F is less than the tabulated value. So the hypothesis is accepted. Hence
it may be calculated that the two populations have same variance.
Chi-Square Test:-
The ฯ2
test (pronounced Chi-Square Test) is one of the simplest and most widely used non-
parametric tests on statistical test. The symbol ฯ2
is the Greek later Chi . The ฯ2
test was first
used by Karl Pearson in the year 1900. The quantity ฯ2
describes the magnitudes of the
discrepancy between theory and observations. It is defined as:
21. ฯ2
= โ
( ๐ โ ๐ธ)2
๐ธ
Where ๐ is the observed frequencies and ๐ธ refers to the expected frequencies.
Example:- In an antimalarial complain in a certain area, quinine was administered to 812
persons out of total population of 3248. The number of fever cases is shown below
Treatment Fever No fever Total
Quinine 20 792 812
No quinine 220 2216 2436
Total 240 3008 3248
Discuss the usefulness of quinine in checking malaria.
Solution:-Let us take the hypothesis that quinine is not effective in checking malaria.
Applying ฯ2
test:
๐ธ11 = Expectation of (AB) =
(๐ด)ร(๐ต)
๐
=
240ร812
3248
= 60
Expecting the frequency corresponding to first row and first column is 60
๐ธ12 =
3008 ร 812
3248
= 752
๐ธ21 =
240 ร 2436
3248
= 180
๐ธ22 =
3008 ร 2436
3248
= 2256
The table of the expected frequency shall be:
60 752 812
180 2256 2436
240 3008 3248
๐ ๐ธ ( ๐ โ ๐ธ)2
(๐ โ ๐ธ)2
/๐ธ
20 60 1600 26.667
220 180 1600 8.889
792 752 1600 2.128
2216 2256 1600 0.709
โ(๐ โ ๐ธ)2
/๐ธ = 38.393
22. ฯ2
= โ
( ๐ โ ๐ธ)2
๐ธ
= 38.393
๐ฃ = ( ๐ โ 1)( ๐ โ 1) = (2 โ 1)(2 โ 1) = 1
๐ฃ = 1, ฯ2
0.05
= 3.84
The calculated value of ฯ2
is greater than the tabulated value. So the hypothesis is rejected.
Hence quinine is useful in checking malaria.
Yates Correction
The Yates correction is a correction made to account for the fact that both Pearsonโs chi-square
test and Mc Nemarโs chi-square test are biased upwards for a 2 x 2 contingency table. An
upwards bias gives a larger result than they should be then the Yates correction is usually
recommended, especially if the expected cell frequency is below 5.
Calculating the Yates Correction
In Yates correction, 0.5 is subtracted from the numerical difference between the observed
frequencies and expected frequencies. It is just the Chi2 formula with the .5 subtraction:
๐2
๐๐๐ก๐๐
= โ
(| ๐ โ ๐ธ| โ 0.5)2
๐ธ
Arguments for why the Yates Correction should not be used
Although some people recommend that you should use the correction only if your expected cell
frequency is below 5, others recommend that you donโt use it at all. A large body of research
has found that the correction is too strict. Several researchers, including Yates, have used
known statistical data to test whether the correction works. If we are using a statistical program
like SPSS to calculate the critical chi-square value for a contingency table, the program will
usually force you to incorporate the correction. However, knowing that the correction may be
too strict allows you to make a judgment call on your data.