3. ABSTRACT
The hypothesis testing is about population means
and proportions. A sample mean or proportion,
obtained from a single sample, will be with the
hypothesized parameter and a decision made as to
whether or not to reject the hypothesis.
However, it is more important to obtain a good
understanding of fundamental ideas than to be
overly concerned with practical applications.
4. Introduction
Statistics first of all is not a method by which one can prove
almost anything one wants to prove. Hypothesis testing is a
part of this vast statistics where “hypothesis” means a
statement to be totally proved empirically. The hypothesis
is the most important technique in statistical inference.
Hypothesis tests are widely used in business and industry
for making decisions. In attempting to reach decisions ,it is
useful to make assumptions 0r guesses about the
populations involved.
5. Cont…
Such assumptions which may or may not be true are called
“statistical hypothesis”. The hypothesis is made about the
value of some parameter ,but the only facts available to
estimate the true parameters are those provided by a
sample. If the sample statistics differs from the hypothesis
made about the population parameter a decision must be
made as to whether or not this difference is significant. It is
the hypothesis that is rejected, if not it must be accepted.
Hence, the term “tests of hypothesis”.
6. Definition
Single sample test is the part of this hypothesis testing.
The single sample test is used to determine whether a
sample comes from a population with a specific mean.
This population mean is not always known, but is
sometimes hypothesized.
“Single samples are measurements made on two
different sets of items. When we conduct a hypothesis
test using two random samples. We must choose the
best based on whether the samples are dependent or
independent.”
7. Cont..
For example, you want to show that a new teaching
method for pupils struggling to learn English grammar
can improve their grammar skills to the national
average. The sample would be pupils who received the
new teaching method and the population mean would
be the national average score.
Again, alternatively the doctors that work in Accident
and Emergency (A&E) departments work 100 per
week despite the dangers(e.g. tiredness) of working such
long hours. The sample 1000 doctors in emergency
departments and see their hours differ from 100 hours.
8. Central Limit Theorem
The central limit theorem states that the sampling
distribution of the mean of any independent random
variable will be normal or nearly normal if the sample
size is large enough . How large is "large enough"? -
The answer depends on two factors:
1. Requirement for accuracy. The more closely the
sampling distribution needs to resemble a normal
distribution, the more sample points will be required.
9. Cont…
2) The shape of the underlying population. The more
closely the original population resembles a normal
distribution, the fewer sample points will be required.
In practice, some statisticians say that a sample size of
30 is large enough when the population distribution is
roughly bell- shaped. Others recommend a sample size
of at least 40. But if the original population is
distinctly not normal ( is badly skewed, has multiple
peaks, and/or has outliers), researchers like the sample
size to be even larger.
10. Cont…
Theory of Statistical regularity under general conditions
the average of data observed over time tends to be
distributed as a normal distribution. It's usefulness lies in
it's complete generality : no matter how a variable change,
the sum of it's values will show a normal distribution if
enough measurement are taken. It forms the basis of the
law of large numbers and was formulate by the Russian
mathematician Alexander
Mikhailovich Lyapunov (1857-1918) drawing upon the
work of the French mathematician
Pierre Simon Laplace ( 1749 - 1827).
11. Cont…
According to the central limit theorem, the mean of a
sample of data will be closer to the mean of overall
population in question as the sample size increases,
notwithstanding standing the actual distribution of
the data, whether it is normal or non normal.
Example:
If an investor is looking to analysis the overall return
for a stock index made up of 1,000 stocks, he can
random samples of stocks from the index to get an
estimate for the return of the total index.
12. The samples must be random and at least 30 stocks
must be evaluated in each sample for the central limit
theorem to hold. Random samples ensure a broad
range of stock across industries and sectors is
represented in the sample. Stock previously selected
must also be replaced for selection in other samples to
avoid bias. The average returns from these samples
approximate the return for the whole index and are
approximately normally distributed.
13. USUAGE EXAMPLES : 1
1)The central limit theorem was especially useful in
increasing our understanding of the statistical modeling
position we held in our project.
2) If u want to try and predict the future for a product, you
can use the central limit theorem to get a good base line.
3)Using the central limit theorem will allow you to
breakdown your company’s finances and find out just how
well you are doing.
4) This gives you to the ability to measure how much the
means of various samples will vary without having to take
any other sample means to compare it with.
14. Formula of the Central Limit
Theorem:
Central limit theorem states that if we have mean and
standard deviation of a particular population and we
take a large sample size within the population, then
mean of sample is same as the mean of the population.
Standard deviation of the sample is equal to standard
deviation of the population divided by square root of
sample size. Central limit theorem is applicable for a
sufficiently large sample size ( n ≥ 30). The formula for
central limit theorem can be stated as follows:
15. Cont…
µ =µ and
Where, µ = Population mean.
= Population standard deviation
µ =sample mean
= sample standard deviation
N= sample size.
Solved Examples:
Question no. 1:The record of weights of male population of
follows normal distribution. Its mean and standard
deviation are 70 kg and 15 kg respectively. If a researcher
considers the records of 50 males, then what would be the
mean and standard deviation of the chosen sample?
16. Cont…
Solution:
Mean of the population µ = 70 kg.
Sample size N = 50
Mean of the sample is given by;
µ = µ
µ = 70 kg.
standard deviation(sd) of the sample is given by
= = =2.121 = 2.1 kg (Ans).
17. Question 2 :
At a coastal area the number of crabs caught per day are recorded. The
average of which is 10 and S.D.() is 3 if there the record of 60 days is
chosen randomly, estimate the mean and standard deviation of the chosen
sample.
Solution:
Mean of population µ= 10
Standard deviation of population= 3
Sample size n= 60
Mean of the sample is given by µ =µ, µ = 10
Standard deviation of the sample is given by =
=0.387
Decision: The mean
and SD ( of the chosen sample is 0.387
(approximately)
18. HYPOTHESIS TESTING FOR SMALL SAMPLE
AND POPULATION MEAN UNKNOWN
When using a test statistics for one population mean
there were two cases where we must use the t-
distribution instead of the z-distribution. The first
case is where the sample size is small (below 30 or so)
and the second case is when the population standard
deviation is not known and we have to estimate it
using the sample standard deviations. In both cases we
have less reliable information on which to base our
conclusions , so we have to pay a penalty for this by
using the t-distribution which has more variability in
the tails than a z-distribution has.
19. REQUIREMENTSSMALL SAMPLE TEST OF
HYPOTHESIS ABOUT POPULATION MEAN
1. A random sample is selected from the target
population
2. The population has a relative frequency distribution
that is approximately normal
Small sample test of hypothesis about µ
Two tailed Left tailed Right tailed
H∘:µ=µ∘ H∘:µ=µ∘ H∘:µ=µ∘
Ha:µ≠µ∘ Ha:µ<µ∘ Ha:µ>µ∘
20. POPULATION STANDARD DEVIATION(σ) KNOWN
OR UNKNOWN
As with confidence intervals there are two types of single
sample hypothesis tests:
1. When the population standard deviation(σ) is known or
given
2. When the population standard deviation (σ) is not
known and therefore we have to use an estimate.
When σ is known, we use the normal standard or z-
distribution to establish the non rejection region and
critical values.
When σ is not known, we use the t-distribution instead
every sample size has its own t-distribution with (n-1)
degrees of freedom.
21. T-TEST
The statistician and chemist W.S Gusset discovers it in
1908
T-test looks at the t-statistic ,t-distribution and degree of
freedom to determine the probability of difference between
population. The test statistic is known as t-test.
“ The t-test is a hypothesis test that uses the t-statistics and
the t-distribution to arrive at a decision. When small
samples are used and when the population standard
deviation is unknown, the hypothesis test about one mean
and the test involving two means is t-
test.”[SCHMIDT:1979;485]
22. ASSUMPTIONS FOR T-TEST
1. Data are interval or ratio level
2. Simple random sample has been taken. The data is
collected from a representative randomly selected
portion of the total population.
3. The data when plotted results in a normal
distribution, bell-shaped distribution curve.
4. Homogeneity of variance ,homogenous or equal
variance exits when the standard deviation of
samples are approximately equal.
23. Z- Test
A Z- test is a statistical test used to determine whether two
population means are different when the variances are
known and the sample size is large. The test statistic is
assumed to have a normal distribution and nuisance
parameters such as standard deviation should be known
for an accurate Z- test to be performed.
Z- test is a hypothesis test that uses a Z- score as the
obtained statistics and the normal distribution. When
population standard deviation are known, the hypothesis
test about one or two means are Z- test.
[Schimidt:1979;488]
24. Assumptions of Z- test:
All parametric statistics have a set of
assumptions that must be met in order to
properly use the statistics to test hypothesis.
the assumption of the Z- test are listed
below:
Random sampling from a defined
population.
Interval or ratio scale of measurement.
Population is normally distributed.
25. Question No. 1 That the mean waste recycled by adults in
the United States is more than 10 pound per person per
month. Yu want to test this claim. You find that the mean
waste recycled per person per month for a random sample
of 18 adults in the United States is 12.4 ponds and the
standard deviation is 2.7 pound. Af =0.01. Can you support
the claim?
26. Solution:
Let the hypothesis,
Null hypothesis (H0): (µ≤µ0)
The claim that the mean waste recycled for U.S. adults is not
more than 10 pound per person per month.
Here,
Given that,
Sample mean = 12.4
Population mean µ= 10
Sample size n= 18
Standard deviation = 2.7
Here, sample size is less than 30. So it is a T- test. We know the
formula of T- test .
T- test =
27. Cont…
We can put the value into the formula
T=
=
=
=
= 3.64
So out calculated value is 3.64
28. Here,
Degree of freedom df= n-1 = 18-1 = 17. So it is one tailed test.
Here , level of significance = .01
So the table value with df = 17 and 0.01 level of significance is
2.898
We know,
when calculated value(cv) ≥ table value(tv).
H0 is to be rejected.
Here, we have found,
calculated value(cv) = 3.64 and
table value(tv) = 2.898
Since the calculated value is greater than the table value, so we
can reject the null hypothesis(H0).
Decision: The claim that the mean recycled waste for U.S. adults
is more than 10 pound per month.
29. Question No. 2:
Boys of a certain age are known to have a mean weight
of µ = 85 pounds. A complaint is made that the boys
living in a municipal children`s home are underfed. As
bit of evidence, n = 25 boys are weighted and found to
have a mean weight of =80.94 pounds. It is known
that the population standard deviation is 11.6 pounds
and level of significance 0.05. Based on the available
data, what should be concluded concerning the
complaint?
30. Solution:
Let the hypothesis,
Null hypothesis (H0): µ=µ0
The boys living in a municipal children`s home are not
underfed.
• Alternative hypothesis (Ha):µ≠µ∘
• The boys living in a municipal children’s home are underfed.
• Here,
Given that,
Sample mean = 80.94
Population mean µ = 85
Sample size n = 64
Standard deviation =11.6
Here, the sample size is greater than 30 so it is a Z test.
31. Cont…
We know the formula of Z test:
Z = =
=
=
= -2.8
So our calculated value (cv) = -2.8
The level of significance = 0.05
It is a two tailed test.
32. Cont…
So (1-0.05) = .95/2 = 0.475 is 0.4750
The closest area of .475 is .4750
The co-responding Z value of .4750 is 1.96.
We know,
If calculated value (cv) ≥ tv then H0 is to be rejected.
Here,
Calculated value(cv) + -2.8
Table value(tv) = 1.96
Since our calculated value is less than table value so we can
accept null hypothesis.
Decision: the boys living in a municipal children`s home
are not underfed.
33. CHI- SQUARE TEST:
Chi-square test is a statistical test commonly used to
compare observed data with data we would expect to
obtain according to a specific hypothesis.
A chi-square statistics is a measurement of how
expectations compare to results. The data used in
calculating a chi-square statistic must be random, raw,
mutually exclusive, drawn from independent variables and
drawn from a large enough sample.
A chi-square test is designed to analyze categorical data.
That means that the data has been counted and divided
into categories.
A statistical method assessing the goodness of fit between
a set of observed values and those expected theoretically.
34. Cont…
According to Spiegel, “A measure of discrepancy
existing between the observed and expected
frequencies where if the total frequency is N .”
According to Blalock, “The chi-square test is a very
general test that can be used whenever the researchers
wish to evaluate whether or not frequencies which
have been empirically obtained. Significantly from
those which would be expected under a certain set of
theoretical assumptions.”
35. ASSUMPTION OF CHI-SQUARE
TEST:
1. LEVEL OF MEASUREMENT: Chi-square tests are
sometimes used with ordinal scales and sometimes even
interval scales.
2. EXACT TEST: This test is one of a class of “exact test”,
because the significance of the deviation from a
“null hypothesis” can be calculated exactly.
3. INDEPENDENCE ASSUMPTION: Chi-square test can not
be used on correlated data.
4. SAMPLING DISTRIBUTION: Distributions are
differentiated according to the degrees of freedom.
5. MODEL: Independent random samples.
36. Cont…
6. THE DATA ARE NOMINAL OR ORDINAL LEVEL.
7. NO EXPECTED CELL FREQUINCES IS LESS THAN 5.
8. CHI-SQUARED GOODNESS OF FIT TEST.
9. CHI- SQUARE TEST OF INDEPENDENCE.
37. PROPERTIES OF THE CHI-SQUARE
DISTRIBUTION:
The chi-square distribution is a continuous probability
distribution within the values ranging from 0 to infinity in
the positive direction.Chi-square can never assume
negative values.
The total area under a chi-square curve is equal to 1.
Each chi-square curve (except when degree of freedom
=1)begins at 0 on the horizontal axis,increases to a peak
,and them approaches the horizontal axis asymptotically
from above.
Each chi-square curve is skewed to the right.As the number
of degree of freedom increase,the curve becomes more and
more like a normal curve.
38. Cont…
It is one of the most widely used distributions in
statistical applications.
This distribution may be derived from normal
distribution.
Chi-square is non-negative.
Chi-square is non-symmetric.
Chi-square is a non- parametric test ,which is less
restrictive than parametric test such as Z test.
39. Question:
A certain drug is claimed to be effecting in curing cold.
in an experiment on 500 persons with cold half of
them were given the drug and half of them were given
the sugar pills. The patients reactions to the treatment
are recorded on the following table:
Helped Harmed No effect Total
Drug 150(a) 30(b) 70(c) 250
Sugar pills 130(b) 40(e) 80(f) 250
Total 280 70 150 500
40. Cont…
On the basis of this data can it be concluded that there
is a significant difference in the effect of the drug and
sugar pills?
Let the hypothesis,
H0 (Null hypothesis): there is no significance
difference in drug and sugar pills.
Ha (Alternative hypothesis) there is a significant
difference in drug and sugar pills. from the table above
we can determine the expected values of frequencies.
41. Cont…
The formula of expected frequencies (fe) is
fe(a)= = 140
fe(b) = = 35
fe(c) = = 75
fe(d) = = 140
fe(e) = = 35
fe(f) = = 75
The formula of the Chi-Square Test is = ∑
42. Now, we can prepare a calculation
of Chi-Square table:
call
A 150 140 10 100 0.714
B 30 35 -5 25 0.714
C 70 75 -5 25 0.333
D 130 140 -10 100 0.714
E 40 35 5 25 0.714
F 80 75 5 25 0.333
∑= 3.522
43. Cont…
Our calculated value (cv) of chi-square x2 is 3.522
Significance level is = 0.05
Degree of freedom, df = (R-1)(C-1)
=(2-1)(3-1)
= 1×2
=2
At 0.05 significance level and df =2, the table value (tv) is
5.99.
We know, if cv≥tv H0 is to be rejected.
Here,
CV= 3.522
TV= 5.99
44. Cont..
Since our calculated value is less than table value. So
we can reject Ha. accept H0.
Decision: There is no significance difference in drug
and sugar.
45. Significance / importance of Chi-
Square test:
The most useful and popular tools in social science
research is Chi-Square. The test has many
applications, the most common of which in the social
sciences are 'contingency' problems in which two
nominal scale variables have been cross classified.
[H.M. BLALOCK; SOCIAL STATISTICS]
46. Weakness of Chi- square test :
chi- square test is very popularity used in practice. However,
there are a few limitations that we need to be aware of when we
consider using it:
It tends to be less accurate with very small expected frequencies.
It tends to less accurate for small degrees of freedom and for
small N too.
Chi- Square does not measure the strength of the relationship. It
merely measures whether
there is a relationship. Which is not likely to be due to chance?
It is less powerful and less restrictive.
It the observations are not independent of each other, they can’t
be used in Chi- Square in proper way.
47. A Test of Goodness of Fit of Chi-
Square Test:
A test of goodness of fit of Chi- Square test: Chi-
Square test enable us to see how does the assumed
theoretical distribution fit to the observe data. When
some theoretical distribution is fit to the given data.
We are always interested in knowing as how well this
distribution fits with the observed data. The Chi-
Square test can give answer to it.
48. Difference between Z test and T test.
Z- test T- test
1. Z-test is a statistical hypothesis test that
follows a normal distribution
1. T-test follows a Student’s T-distribution.
2. Z- test is appropriate when the sample size is
moderate to large(n <30)
2. t- test is appropriate when the sample size is
small(n>30).
3. Z-tests are less commonly used than T-tests. 3. T-tests are more commonly used than Z-
tests.
4. Z-tests are preferred when standard
deviations are known
4. T-tests are preferred when
standard deviations are unknown.
5. Z- test is less adaptable while T- test is more
adaptable.
5. T- test has many methods that will
suit any need.
6. Z- tests find lessee users than T- test 6. T- test is more commonly used than Z- test.
7. There are no fluctuations occur in Z- test 7. there are fluctuations that may occur in t test
sample variances.
49. Difference between T- distribution and
chi-square distribution:
T- distribution Chi-square distribution
1. T- distribution is a parametric test 1. Chi-square distribution is a non parametric
test.
2 . The variables involved in t- distribution are
measured at interval level data.
2 . In Chi-square test nominal or ordinal level
data are used.
3 . T – test may one or two tailed test. 3 . The Chi-square test is always one tailed test
4 . The critical region of t distribution may
appear at either sides of the mean.
4 . the critical region of chi- square test always
appears in the right side of the mean.
5 . The value of t- distribution may be positive
or negative.
5 . The value of Chi-square is always positive.
6 . The sample size is relatively low/small in t-
test.
6 . in Chi-square distribution the sample size is
high.
50. Conclusion:
Hypothesis testing begins with the drawing of a
sample & calculating its characteristics. The statistical
testing of hypothesis is the most important technique
in statistical inference that based on probability & is
used to draw conclusions about population parameter.
This is widely used in business & industry for making
decisions. So, the major purpose of hypothesis testing
is to choose between two competing hypothesis about
the value of a population parameter.
51. Reference:
Social Statistics, Hurbert M. Blalock, Jr.
Business Statistics, SP Gupta and MP Gupta.
Statistics to Social Science, Anthony Walsh.
Understanding and Using Statistics, Schmidt.
Statistics For Measurement, Richard I. Levin-
Davis S. Rubin.