What is statistical analysis? It's the science of collecting, exploring and presenting large amounts of data to discover underlying patterns and trends. Statistics are applied every day – in research, industry and government – to become more scientific about decisions that need to be made.
3. It is the procedure where by
inference about the population are
made on the bases of the results
obtained from sample drawn from
that population ((the results of the
sample reflected on the
corresponding population)).
Statistical Inference
5. If you take your car to your car
dealer's service and you ask the
service manager how much it will
cost to repair your car. If the
manager says it will cost you
$500 then he is providing a point
estimate. If the manager says it
will cost somewhere between
$400 and $600 then he is
providing an interval estimate.
6. The process of estimation entails
calculating some statistics that is
offered as an approximation of the
corresponding parameter of the
population.
7. Mean, , is
unknown
Population Random Sample
I am 95%
confident that
is between
40 & 60.
Estimation Process
Sample
50
X
Mean
8. Estimation
For each parameter ((from population)),
we can compute two types of estimation:
a- Point estimation: is a single
numerical value obtained from
information of random sample
((statistics)) used to estimate the
corresponding population value
((parameter)).
9. The sample mean(x) is the best
point estimator for the
population mean (µ).
-
The sample proportion (p) is the
best point estimator for the
population proportion (P).
~
10. But because of sampling variation, the
statistics can't be expected to be
equal to the corresponding
parameters, this difference can be
estimated by the standard error (SE).
∂ /√n
S /√n
The sample variance (S2) is the best
point estimator for population
variance (∂2), so the sample SD (S) is
the best point estimator for
population SD (∂).
11. b- Interval estimation: Estimation
of the confidence interval (CI).
CI: Consist of two numerical values
defining the interval with which lies
the unknown parameter with certain
degree of confidence. These values
depend on the confidence level
which is equal to 1- ⍺, and ⍺ is the
probability of error.
12. Three properties of a good
estimator
1. Unbiased estimator: The expected value
or the mean of the estimate obtained
from samples of a given sizes is equal to
the parameter being estimated
2. Consistent estimator: As the sample size
increases, the value of the estimator
approaches the value of the parameter
estimated
3. Efficient estimators: Has the smallest
variance among other estimates
13. The health researchers who use statistical
inference procedures must be aware of the
difference between Sampled Population and
Target Population
Statistical inference allows one to make
inferences about sampled population
14. Only when Sampled Population and Target
Population are the same, it is possible for one
to reach conclusion about the target
population
Otherwise we use nonstatistical mean in
inference (Sampled Population and Target
Population are similar in respect to all
important characteristics e.g. age, sex,
severity of illness, duration of illness and so
on)
15. A confidence interval gives an
estimated range of values which is
likely to include an unknown
population parameter, the estimated
range being calculated from a given
set of sample data.
16. Confidence Interval
We are trying to draw a conclusion about
a population based on a finite sample
We can not be 100% sure about our
conclusion
Instead, we express a confidence interval
17. Level of Confidence is the Probability
that the unknown population
parameter falls within the interval
Is Probability That the Parameter Is
Not Within the Interval (This is also
called the “level of significance.”)
= 1- level of confidence, e.g., 1- 95%
= 5%
Level of Confidence
18. A confidence interval provides
additional information about
variability
Point Estimate
Lower
Confidence
Limit
Upper
Confidence
Limit
Width of
confidence interval
19. General Interpretation: 90% CI
If we did this 100 times, we
would expect 90 of the
intervals to contain the true
population mean and 10%
of the intervals would not.
20. 1-Calculation of C.I for the population
mean (µ)
The sample mean(x) is the best point
estimator for the population mean
(µ).
a- When the population variance is
known or the sample size is large
(>30), the formula is:
C.I (µ) = x ± Z (∂ /√n)
-
-
21. Ex. The mean of serum bilirubin level for 16
four days old infants was found to be 5.89
mgdl with ∂ = 3.5mgdl. Find:
➨ 90% CI for the mean?
90% CI (µ) = 5.98 ± 1.64 X 3.5 / √16
= 5.98 ± 1.43 = (4.54 - 7.42)
➨ 95% CI for the mean?
95% CI (µ) = 5.98 ± 1.96 X 3.5 / √16
= 5.98 ± 1.71 = (4.26 - 7.69)
➨99% CI for the mean?
99% CI (µ) = 5.98 ± 2.58 X 3.5 / √16
= 5.98 ± 2.25 = (3.72 - 8.24)
22. 90% Samples
95% Samples
x
_
Confidence Intervals
x
x .
.
645
1
645
1
x
x
96
.
1
96
.
1
x
x .
.
58
2
58
2
99% Samples
n
Z
X
Z
X X
X
_
23. CI= Point Estimator
+/- Margin of Error
Margin of Error =
Reliability Coefficient X SE
n
z
*
24. Sample size effect
As sample size increases, the
width of our confidence
interval clearly decreases.
95% CI (µ) = 5.98 ± 1.96 X 3.5 / √16
= 5.98 ± 1.71 = (4.26 - 7.69)
If n=100, then
95% CI (µ) = 5.98 ± 1.96 X 3.5 / √100
= 5.98 ± 0.686 = (5.29 - 6.67)
25. b- When the population variance is
unknown or the sample size is small (≤30),
the formula is:
C.I (µ) = x ± t (S /√n)
Ex. The mean of WT of 16 rats was found to
be 60.75 Kg with S = 3.8kg. Find the
95% CI for the mean? df=15
95% CI (µ) = 60.75 ± 2.13 X 3.8√16
= 60.75 ± 2.04
= (58.71 – 62.79)
-
26.
27. 2- Calculation of C.I for the difference
between two population means
(µ1-µ2)
a- When the population variance is
known or the sample size is large
(>30), the formula is:
C.I (µ1-µ2) = (x1-x2) ± Z √ [∂1
2/n1]
+ [∂2
2/n2]
- -
28. Ex: Tow samples n1=10, n2=10 yield a
mean height of x 1=59.8 cm, x 2 = 58.5
cm ∂1=2cm, ∂2=3cm. find the 95% CI
for the two means difference of
height.
95% C.I (µ1-µ2) = (x1-x2) ± Z √
[∂1
2/n1] + [∂2
2/n2]
= (59.8-58.5) ± 1.96√4/10 + 9/10
=1.3 ± 2.23
= (-0.93 -3.53)
- -
29. b- When the population variance is
unknown or the sample size is small (≤30),
the formula is:
C.I (µ1-µ2) = (x1-x2) ± t √ [s1
2/n1] + [s2
2/n2]
(df= n+n-2)
Ex: A breast cancer research teams collect the following
data on tumor size;
Type 1: n1= 21 x1=3.85cm S1=1.95cm
Type 2: n1= 16 X2=2.8 cm S2=1.7cm, Find the
95% CI for the two means difference of size.
95% C.I (µ1-µ2) = (3.85-2.8) ± 2.03√(1.95) 2 /21 + (1.7)2 /16
=1.05± 1.22 = (-0.17- 2.27)
- -
30. 3- Calculation of C.I for the pop. proportion
The sample proportion (P) is the best
point estimator for population proportion
(P). We use z-test and the formula is;
C.I (P) = p ± Z√ [(p(1-P)/ n)]
Ex: In a survey, 300 adults were interviewed,
123 said they have yearly medical check up.
Find 95% C.I for the proportion having yearly
check up.
95% C.I (P) = p ± 1.96√ [(p (1-P)/n)]
= 123/300 ± 1.96√ [0.41(1-0.41)/ 300]
= 0.41± 0.06 = (0.35- 0.47)
̂
̂
̂
31. 4- Calculation of C.I for the difference
between two population proportions (P1-P2).
C.I (P1-P2) = (P1-P2) ± Z√ [(P1 (1- P1)/n1)
+ P2 (1- P2)/ n2)]
Ex: Two samples n1=100, n2= 100 suffering from
certain disease, first group received new
treatment, 90 recovered in 3 days, 2nd group
received standard treatment 78 recovered in 3
days. Find 95% C.I for the difference in the
proportion between them.
=( 90/100 -78/100) ± 1.96√0.9(1-0.9)/100
+0.78(1-0.78)/100
= 0.121 ± 0.1 = (0.02 - 0.22)
̂ ̂
33. Hypothesis testing
The purpose of hypothesis testing is to help the clinician,
researcher or the administrator in reaching a decision
concerning a population by examining a sample from that
population. It involves conducting a test of statistical
significance and qualifying the degree to which sampling
variability may account for the results observed in a
particular study.
Hypotheses; May be defined simply as a statement about
one or more populations, it is usually concerned with the
parameters of the population and by means of hypothesis,
one can determine whether or not such statements are
compatible with the available data, researchers are
concerned with two types of hypotheses;
34. 1-Research hypothesis
2-Statistical hypothesis
Research hypothesis; is the supposition that
motivates the researcher to do a research, it may be the
result of years of observations, clinical experience, or
from scientific speculation. For example, the birth weight
of newborn babies born to diabetic mothers, 4.3; 4.5;
4.7; and 5 kilograms, and for many newborn babies of
diabetic mothers it tends to be higher than 3.5 Kg, which
is higher than the normal mean of birth weight of
healthy newborn babies especially of non diabetic
mothers.
35. Research hypothesis here form the desire to
determine whether or no the suspicions of
the researcher, that the mother condition
(being diabetic) had effect on birth weight,
can be supported when subjected to
scientific investigation. So research
hypothesis leads directly to the statistical
hypotheses (the hypothesis that is going to
be tested through certain statistical
procedures), later on the statistical
hypotheses evaluated by appropriate
statistical techniques.
36. Test of significance
;
There are many tests of significance
developed and utilized. Most common are
“Z” test or “normal curve test”, student’s
“t” test. Chi-square test “2”, F-test, …..
37. Procedure and steps;
The steps involved, in general may be presented as a SEVEN-STEPS
procedure, in the utilization of any test of significance are;
1-Data; The nature of the data that form or represent the basis of testing
is determined and well understood. The data must be determined whether
quantitative or qualitative, and it is presented in mean, SD for quantitative
data, and in frequency and proportion for the qualitative data.
2-Assumptions: the assumptions that are of importance in hypothesis
testing include;
a- Randomly selection of a sample
b- Independence of samples
c- Normality distribution of the population
d- Equality of the variances of the populations
And each test of significancy has its own proper assumption.
3-Hypothesis: We have two types of statistical hypotheses;
a- Null hypothesis
b- Alternative hypothesis
38.
Null hypothesis (Ho); Or the hypothesis to be
tested, symbolized by (Ho) which is defined as the
hypothesis of no difference, it is a statement of
agreement with (or no difference) conditions
presumed to be true in the population of interest.
The samples or populations being compared in an
experiment, study or test are similar. Any difference
occurs is related to chance and not to any other
factor. Ho is either not rejected (meaning that the
data tested do not provide sufficient evidence to
cause rejection) or it is rejected (data are not
compatible with, but are supportive to some other
hypothesis which is the alternative hypothesis).
39. Alternative hypothesis (HA); Or it is the hypothesis
of difference (so it is the hypothesis adopted when
we reject Ho). It is important to remember that
when we fail to reject Ho, we do not say that it is
true, but that it may be true (the data fail to support
our decision or our suspicion).
If m1 = m2 this is Ho (No difference)
If m1 m2 this is HA (difference)
In general statistical inference leads to the proof of a
hypothesis, it merely indicates whether the
hypothesis is supported or is not supported by the
available data. When we fail to reject a null
hypothesis, therefore we do not say that it is true,
but that it is may be true.
40. Type I and Type II errors; the following table
summarizes the various situations that can rise, when
testing Ho against HA;
The probabilities of committing a Type I error and a Type
II error are written as α and respectively. α is called
the size of the test and (1-) is called the power of the
test.
Accept Ho
Accept HA
Ho
No error
Type I error
Ho is true
Type II error
No error
HA is true
41. 4-Level of significancy: It specifies the
area under the curve of the distribution of
test statistics that represent the basis on
which determination of the rejection
region and acceptance region for the
tested data. A tabulated value “critical
value” is going to be obtained from certain
tables (each test has its own table to
obtain the tabulated value) according to
the level of significance (α) “0.05, 0.025,
0.01” and according to the degree of
freedom.
42. Acceptance Region
95% influencing factor
effect area
2.5% Chance factor 2.5% Chance factor
Rejection region Rejection region
-Tabulate value 0 +Tabulate value
43. It is defined as α, the level of significance, α
specify the area under the curve of the
distribution of the test of statistical significancy,
that above the value is the rejection region of
Ho. α is a probability of rejecting a true null
hypothesis, and since it is an error we should
make it small, so we select a small values of α,
in order to make the probability of rejecting a
true null hypothesis small, the more frequently
used values of α are 0.05 (5% chance factor),
0.025 (2.5% chance factor), and 0.01 (1%
chance factor).
44. Reject Ho if it is true it is called Type I
error, or α type of error.
Accept Ho if it is false it is called Type
II error, or type of error.
α Type of error: It is the probability of
rejecting null hypothesis although it is
really true.
Type of error: It is the probability of
accepting null hypothesis although it is
really false.
45. Conclusion of
significancy testing
Reality
Ho is True Ho is False
Reject Ho Type I error
α type of error
(Prob. = Sig.)
Correct decision
(Prob. = Power)
Accept Ho Correct decision
(Prob. = 1 - Sig.)
Type II error
type of error
(Prob. = 1 - Power)
46. The probability that we do not make a
type II error (100-%) is called the power
of the test, and increasing sample size will
increases the power, since the sampling
distribution curve would be taller and
narrower and therefore overlaps less.
47. 5-Apply the proper test of significance;
In which the data derived from the sample is
used to compute the difference, the value will
make he decision of reject or not reject Ho. The
selection of the appropriate test to be utilized
and calculation of the test criterion based on the
type of data. There are many tests of
significance, the most common are “Z” test, “t”
test, “2”, F-test, … From applying the test we are
going to determine calculated value or called “test
statistic” to be compared with the tabulated one
obtained at the step of level of significance.
48. 6- Statistical decision;
a- P value determination; in which we
calculate the magnitude of P value (which is the
probability of effect of chance factor) that depend on
the level of significancy. In biostatistics and medicine
we use to make it as 0.05 or less as the level of
significance.
b- Decision of reject or accept Ho;
Comparison of the calculated test criterion value
with that of the theoretical value at 5%, 1%.
-If the calculated test criterion value is higher
than the theoretical value of the level of significance
point so we fall in the rejection region and the P
value is <0.05, so we reject Ho and HA is accepted.
49. -If the calculated test criterion value is lower than the
theoretical value of the level of significance point so we fall
in the acceptance region and the P value is >0.05, so we
accept Ho.
Drawing of the conclusion (or inference) on the basis of
level of significance is deciding whether the difference
observed is due to chance due to some other known
factors.
The value of the calculated test statistic tells us to
reject Ho if it falls in the rejection region (larger than the
value determined by α) and not to reject (or accept) Ho if
the calculated value falls in the acceptance region (less
than the value determined by α).
51. 7-Conclusion;
-If Ho is rejected so we conclude that HA is true.
-If Ho is not rejected so we conclude that Ho may
be true.
Meaning of statistically significant and not statistically
significant results; statistical significance is not
synonymous with biologic or clinical relevance,
conversely, the failure to demonstrate statistical
significance does not rule out the existence of a clinically
important difference between two populations.
Any difference, however small, may be found
statistically significant (unlikely to have occurred by
random chance) if the sample size (n) is sufficiently
large. However, a difference of small magnitude, while
statistically significant, may not be clinically important.
52. Hypothesis testing purpose is to assist us in
making decision, we must emphasize that, however,
the outcome of the statistical test is represent only
one piece of evidence that influence our decision.
The statistical decision should not be interpreted as
definitive, but should be considered along with all
the other relevant information available to the
researcher (pathological, medical, cytological,
microbiological, and immunological, etc….).
53. Power;
Power is the ability of a statistical test to detect a
difference of a specified magnitude (known as a
clinically important difference) given that this
difference exists in the populations being compared.
That is, it is the probability that a statistical test will
reject Ho given that Ho is false;
Power = 1 –
Unlike α and , power is not the risk of a
particular error. Instead, it is the probability
that a statistical test will reach a particular
correct conclusion, the power of a statistical
test is analogous to the sensitivity (true
positive rate) of a diagnostic test.
54. The P-value versus the α level;
A conclusion based on a statistical test is typically
reported in conjunction with a P-value. The P-value
and α level, while similar in terms of the information
they symbolize, have slightly different definitions.
a- The P-value; represents the actual probability
of obtaining the particular sample outcome (or one
more extreme) from a population for which Ho is true.
That is, it is the probability of a type I error. The P-
value, therefore, varies from sample to sample.
b- The α level; represents the risk of incurring a
type I error that the investigator is willing to tolerate.
It is chosen by the investigator and is independent of
the data obtained from any given sample.
55. t-test:
it is one of the commonly used test for testing hypothesis
for testing the significance of difference for the quantitative
data. It depends on a distribution called the t-distribution
with (n-1) degree of freedom. This distribution was
introduced by Gossett, who used the pen-name “student” and
is often called students’ t-test (or t-test), distribution which is
like the normal distribution of a symmetrical bell-shaped
distribution with a mean of zero but it is of lower peak and
more spread out, having two tails.
Normal distribution
t-distribution
- Zero +
56. The exact shape of the t-distribution
depends on the degree of freedom (d.f.= n-1),
and on the sample SD, the fewer the degrees of
freedom, the more the t-distribution is spread
out.
t-test use is restricted to the small sample
size (less than 30).
t-test represent the measurement of the
significancy of difference between two means;
57. difference between two means
t = ----------------------------------------
standard error of difference
difference between two means
t = ----------------------------------------
standard error of difference
58. Applications of t-test;
1-For calculation of population mean.
2-For calculation of significancy of difference
between sample mean and population mean.
3-For calculation of significancy of difference
between two independent means.
4-For calculation of significancy of difference
between two dependent means (paired
observations).
59. 1-Calculation of population mean:
In general, a confidence interval “C.I.”
(Population mean; ) is calculated using t
distribution through appropriate significancy
level (α=0.05 for 95% C.I., α=0.01 for 99%
C.I.) with (n-1) degrees of freedom.
This is applied for small sample size (n < 30)
because for large degrees of freedom, the t
distribution is almost the same as the standard
normal distribution.
60. e.g. The followings are the numbers of hours of relief
obtained by 6 patients after receiving a new drug;
2.2, 2.4, 4.9, 2.5, 3.7, & 4.3 hours
Mean = 3.3 hours SD= ±1.3 hours n= 6
Calculate population mean? (using α=0.05).
Tabulated t for α=0.05, for d.f. (n-1) is;
α=0.05
t = 2.571
df.=n-1=6-1=5
α
= mean ± t x SE
d.f.
62. 2-Difference between sample mean and population mean:
3-Difference between two independent means:
The difference between the means of two independent samples is
normally distributed. The same procedure for calculation is
followed except that;
•Assumption; we assume that we have two independent samples
randomly chosen each one from a normally distributed population
with equal variances of populations.
•Equation… as the SE of difference is calculated as; Standard
deviation of population (pooled SD) SP
S1
2 (n1 -1 ) + S2
2 (n2 -1 ) S1
2 variance of group 1
SP = ---------------------------------- S2
2 variance of group 2
n1 + n2 – 2
1 1
SE of difference = SP ----- + ------
n1 n2
• The d.f. = n1 + n2 – 2 or (n1-1) + (n2-1)
63. Birth weight (Kg) of infants born
to non-smoker mothers (n=15)
Birth weight (Kg) of infants born
to heavy smoker mothers (n-14)
3.99 3.18
3.79 2.84
3.60 2.90
3.73 3.27
3.21 3.85
3.60 3.52
4.08 3.23
3.61 2.76
3.83 3.60
3.31 3.75
4.13 3.59
3.26 3.63
3.54 2.38
3.51 2.34
2.71
Mean1= 3.5933 Kg Mean2= 3.2029 Kg
SD1= ±0.37 Kg SD2= ±0.49 Kg
n = 15 n = 14
e.g. A study of birth weight of infants born to 15 non-smoker and 14 heavy smoker
mothers (during pregnancy).
64. Does these data provide any evidence of effect of smoking during
pregnancy on birth weight? (use α=0.05).
• Data: Data represent the birth weight in kilograms of two
independent groups of smoker and non-smoker mothers infants with
mean birth weight of non-smoker mother’s infants of 3.5933, and of
heavy smoker mother’s infants of 3.2029 kilograms.
• Assumption: We assume that the two independent groups (of
infants born to non-smoker mothers and those born to heavy smokers
mothers) were randomly drawn each one form a normally distributed
population with equal variances of populations.
• HO: There is no significant difference between mean birth weight
of infants of non-smoker mothers and birth weight of infants of heavy
smoker mothers (m1=m2; m1- m2=0). OR There is no significant
influence (effect) of mother smoking during pregnancy on birth weight
of their infants.
• HA: There is significant difference between mean birth weight of
infants of non-smoker mothers and birth weight of infants of heavy
smoker mothers (m1m2; m1- m20). OR There is significant influence
(effect) of mother smoking during pregnancy on birth weight of their infants.
65. •Level of significance; (α = 0.05);
5% Chance factor effect area
95% Influencing factor effect area
d.f.= n1 + n2 – 2 or (n1-1) + (n2-1);
tabulated t for d.f ( n1 + n2 – 2 =15+14-2=27) equal to 2.052
-2.052 +2.052
66. •Apply the proper test of significance
difference between two means
t = ----------------------------------------
standard error of difference
m 1
– m2
t = ---------------------------------
standard error of difference
(Pooled SE)
m1 – m2
t = -------------------------------------------------
standard error of difference (Pooled SE)
67. S1
2 (n1 -1 ) + S2
2 (n2 -1 ) [(0.372 x (15-1) + 0.492 x (14-1)]
SP = --------------------------------- = ----------------------------------------
n1 + n2 – 2 15+14 – 2
SP = ±0.43 Kg
t = ------------------------=--------------------- = 2 . 42 (calculated t value)
SP (1/n1) + (1/n2) 0.43 x (1/15)+(1/14)
2.42 > 2.052
Since Calculated t > Tabulated t
So P < 0.05
m1 – m2 3.5933 – 3.2029
68. Then reject Ho and accept HA ....
• There is significant difference between mean birth
weight of infants of non-smoker mothers and birth weight
of infants of heavy smoker mothers.
• There is significant influence (effect) of mother
smoking during pregnancy on birth weight of their infants
• Meaning there is significanctly lowering of birth weight
of infants by the effect of smoking of their mothers during
pregnancy.
*********
α=0.05 α=0.02 α=0.01 α=0.001
t = 2.05 t = 2.47 t = 2.77 t = 3.69
df.=27 df.=27 df.=27 df.=27
P value < 0.05 <0.02 significant effect
69. 4-Difference between two dependant
sample means:
The use of t-test here for the difference between
pairs of variables measured on each individual, such
as the results of blood pressure of each individual
before taking hypotensive agent and after its taken, if
the drug have no effect so there is no difference in
the values of blood pressure before and after its taken
(as there is small but not significant difference), but if
there is effect of drug so there is difference between
the two measurements, it means there is significant
difference (before and after). So the same sample
(one group) under two occasions is considered here,
e.g. before and after, etc…
70. The use of test is restricted for samples with matched
pairs of less than 30. This technique is applied in order to
eliminate as much as possible a maximum number of
sources of variation by making the pairs similar (identical) to
each other with respect to as many variables as possible
(for the same persons two measurements one before and
one after as the difference is mostly due to the effect of the
factor itself)
e.g. The effect of a certain sleeping drug is to be tested
by taking 10 patients, we gave them at first night a placebo
drug and we measure the sleeping hours, next night we
gave them the sleeping drug and we measure the sleeping
hours, the following results were obtained from this placebo-
controlled clinical trial to test the effectiveness of a sleeping
drug;
71. patient Hours of sleep
drug placebo
1 6.1 5.2
2 7.0 7.9
3 8.2 3.9
4 7.6 4.7
5 6.5 5.3
6 8.4 5.4
7 6.9 4.2
8 6.7 6.1
9 7.4 3.8
10 5.8 6.3
What can you conclude from that data? (Use α 0.05)
73. •Data: Data represent 10 patients with sleeping hours for
each one before and after taking a sleeping agent, with
mean duration of sleeping of 7.06 hours after taking the
sleeping agent, and 5.28 hours after taking a placebo.
•Assumption: We assume that the sample was selected
randomly from a normally distributed population. OR we
assume that the observed difference constitutes a simple
random sample from a normally distributed population of
differences.
•HO: There is no significant difference in the mean sleeping
hours after taking the drug from that of the placebo
(m1=m2; m1-m2=0). OR There is no significant effect of
the drug as a sleeping agent.
HA: There is significant difference in the mean sleeping
hours after taking the drug from that of the placebo (m1
m2; m1-m2 0).
74. •OR There is significant effect of the drug as a sleeping agent
•Level of significance; (α = 0.05);
5% Chance factor effect area
95% Influencing factor effect area (sleeping agent)
d.f.=n-1;
tabulated t for d.f {(n-l)=l0-1=9} equal to 2.2622
-2.2622 +2.2622
75. Apply the proper test of significance
(d)2
d2- --------
.n
sd of difference= (--- ------------- ) = {(59.82-17.82/10)/9}= ±1.77
.n - 1
t = ---------------------= -------------- = ----------- = 3.1786 (calculated t)
D of differencen 1.77/ 10 0.56
3.1786 > 2.2622
since Calculated > Tabulated T
So P<0.05
ml - m2 7.06 – 5.28 1.78
76. Then reject Ho and accept HA ....
• There is significant difference in the mean sleeping hours
after taking the drug from that of the placebo
• There is significant effect of the drug as a sleeping agent
• Significantly this agent is increasing the sleeping hours
after its taken.
*********
α=0.05 α=0.02 α=0.01 α=0.001
t = 2.2622 t = 2.82 t = 3.25 t = 4.78
df.=9 df.=9 df.=9 df.=9
P value < 0.05 <0.02 significant effect
**********
77. Z test:
In order to determine whether two mean values
were derived from two samples drawn from two
populations differ or not, we use the Z test, Z is the
value of the standard normal deviation “SND”
corresponding to a standard deviation or a standard
normal deviation (95% of values with + 1.96 SD).
Values are obtained from a table of z values
available in most statistics texts; it is used for not
normally distributed data of large sample size (n >
30) (? n>60).
A critical value is selected from a table for the
normal distribution; there are three commonly used
values;
79. Applications of z test:
1- For testing the significance of difference
of sample mean with a population mean
(for large sample size (n > 30). By using
the following formula;
m-
Z = ------------
SD n
80. 2- For testing the significance of difference
between two sample means derived from two
independent not normally distributed population
for large sample size (n > 30). By using the
following formula;
m 1 – m 2
Z = ---------------------------
SP x (1/nl) + (1/n2)
For two populations with equal variance, and the
formula;
m 1 – m 2
Z = ---------------------------
(SEl)2 + (SE2)2
81. Which is used for two independent
population of non-equal variances (but in this
case there is different d.f. which is estimated
through a certain equation).
3- For testing the significance of difference
of sample mean before and after certain
condition (one sample under two occasions,
Paired observations) (for large sample size (n
> 30).By using the following formula;
ml - m2
Z = -----------------
SD of difference/n
82. e.g. A researcher wishes to compare the results
of a sample of 70 patients with leukemia differ
in their serum uric acid (mean= 8.1 mg/dl,
SD= +1.82 mg/dl) from the normal standard
serum uric acid level (6 mg/ dL)? (use & 0.05).
z= (m - ) / (SD/n )
z= (8.1- 6) / (1.82 / 70) = 9.654
9.654 > 1.96 so P<0.05 then reject Ho and accept
HA ....
83. Let's perform a one sample z-test: In the
population, the average IQ is 100 with a standard
deviation of 15. A team of scientists wants to test a
new medication to see if it has either a positive or
negative effect on intelligence, or no effect at all. A
sample of 30 participants who have taken the
medication has a mean of 140?
Let's begin.
1. Define Null and Alternative Hypotheses
84. Here we have 0.025 in each tail. we find a critical
value of 1.96. Thus, our decision rule for this two-
tailed test is:
If Z is less than -1.96, or greater than 1.96, reject
the null hypothesis.
2. State Alpha
with a two-tailed test, we would expect our distribution to
look something like this:
85. 5. State Results
Z = 14.60
Result: Reject the null hypothesis.
6. State Conclusion
Medication significantly affected intelligence, z = 14.60, p < 0.05.
4. Calculate Test Statistic