Comparing means

Inferential Statistics
• Hypothesis testing
• Drawing conclusions about differences between groups
• Are differences likely due to chance?
• Comparing means
• t-test: 2 means
• Analysis of variance: 2 or more means ~

Are there differences?
• One of the fundament questions of survey research is if
there is a difference among respondents
• When seeking to evaluate differences in means, we can
use t-test or ANOVA = analysis of variance

4
Comparison of the z-test and t-test
• Interval or ratio scaled variables
• t-test
• When groups are small
• When population standard deviation is unknown
• z-test
• When groups are large
• σ is known

6
The t Statistic
• The t statistic allows researchers to use sample
data to test hypotheses about an unknown
population mean.
• The particular advantage of the t statistic, is that
the t statistic does not require any knowledge of
the population standard deviation.
• Thus, the t statistic can be used to test
hypotheses about a completely unknown
population; that is, both μ and σ are unknown,
and the only available information about the
population comes from the sample.

T-test
• Dependent t-test
– Compares two means based on related data.
– E.g., Data from the same people measured at different times.
– Data from ‘matched’ samples.
• Independent t-test
– Compares two means based on independent data
– E.g., data from different groups of people
• Significance testing
– Testing the significance of Pearson’s correlation coefficient
– Testing the significance of b in regression.

(c) 2007 IUPUI SPEA K300 (4392)
Type of the T-test
• One-sample t-test compares one sample
mean with a hypothesized value
• Paired sample t-test (dependent sample)
compares the means of two dependent
variables
• Independent sample t-test compares the
means of two independent variables
• Equal variance
• Unequal variance

Assumptions of the t-test
• Both the independent t-test and the dependent t-test are
parametric tests based on the normal distribution.
Therefore, they assume:
– The sampling distribution is normally distributed. In the dependent
t-test this means that the sampling distribution of the differences
between scores should be normal, not the scores themselves.
– Data are measured at least at the interval level.
• The independent t-test, because it is used to test different
groups of people, also assumes:
– Variances in these populations are roughly equal (homogeneity of
variance).
– Scores in different treatment conditions are independent (because
they come from different people).

The One-sample t-test
• Evaluating hypothesis about population
• taking a single sample
• Does it likely come from population?
• Test statistics
• z test if s known
• t test if s unknown ~

(c) 2007 IUPUI SPEA K300 (4392)
One sample t-test
• Compare a sample mean with a particular
(hypothesized) value
• H0: µ=c, Ha: µ≠c, where c is a particular value
• Degrees of freedom: n-1
)1(~ 

 nt
n
s
cx
tx
n
s
txc
n
s
tx 2/2/  

t statistic
X
s
X
t

 1 ndf

t-test
t =
observed difference
between sample means
−
expected difference
between population means
(if null hypothesis is true)
estimate of the standard error of the difference between two
sample means

(c) 2007 IUPUI SPEA K300 (4392)
Paired sample t-test 1
• Compare two paired (matched) samples.
• Ex. Compare means of pre- and post-
scores given a treatment. We want to know
the effect of treatment.
• Ex. Compare means of midterm and final
exam of K300.
• Each subject has data points (pre- and
post, or midterm and final)

(c) 2007 IUPUI SPEA K300 (4392)
Paired sample t-test 2
• Compute d=x1-x2 (order does not matter)
• H0: µd=c, Ha: µd≠c, where c is a particular value
(often 0)
• Degrees of freedom: n-1
n
d
d
i
1
)( 2
2


 
n
dd
s i
d
)1(~ 

 nt
n
s
cd
t
d
d
n
s
tdc
n
s
td dd
2/2/  

(c) 2007 IUPUI SPEA K300 (4392)
Paired sample t-test 3: Example
• Example: Cholesterol levels
• H0: µd=0, Ha: µd≠0
• N=5, dbar=16.7, std err=25.4,
• Test size=.01, df=4, critical value=2.015
• Test statistic is 1.61, which is smaller than CV
• Do not reject the null hypothesis. 1.61 is likely
when the null hypothesis is true.
)16(61.1~
6
4.25
07.160





n
s
d
t
d
d

(c) 2007 IUPUI SPEA K300 (4392)
Independent sample t-test
• Compare two independent samples
• Ex. Compare means of personal income between
Colombo and Jaffna
• Ex. Compare means of GPA between UOJ and
SJP
• Each variable include different subjects that are
not related at all

Independent sample t-test
•H0: μ1= μ2 or μ1 – μ2= 0
•H1: μ1≠ μ2 or μ1 – μ2 ≠ 0

(c) 2007 IUPUI SPEA K300 (4392)
How to get standard error?
• If variances of two sample are equal, use the
pooled variance.
• Otherwise, you have to use individual variance to
get the standard error of the mean difference (µ1-
µ2)
• How do we know two variances are equal?
• (Folded form) F test is the answer.

(c) 2007 IUPUI SPEA K300 (4392)
F-test for equal variance
• Compute variances of two samples
• Conduct the F-test as follows.
• Larger variance should be the numerator so that F is always
greater than or equal to 1.
• Look up the F distribution table with two degrees of freedom.
• degrees of freedom numerator (dfn) and degrees of
freedom denominator (dfd)
• If H0 of equal variance is not rejected, two samples have the
same variance.
2
2
2
10 : ss H
)1,1(~2
2
 SL
S
L
nnF
s
s

F-test for equal variance
• Look up the F distribution table with two degrees of
freedom.
• degrees of freedom numerator (dfn) and degrees of
freedom denominator (dfd)
• dfn = a-1
dfd = N-a
• where "a" is the number of groups and "N" is the total
number of subjects

(c) 2007 IUPUI SPEA K300 (4392)
Independent sample t-test: Equal variance
• Compare means of two independent samples
that have the same variance
• The null hypothesis is µ1-µ2=c (often 0)
• Degrees of freedom is n1+n2-2
2
)1()1(
2
)()(
21
2
22
2
11
21
2
22
2
112







nn
snsn
nn
yyyy
s
ji
pool
)2(~
11
)(
21
21
21



 nnt
nn
s
yy
t
pool

(c) 2007 IUPUI SPEA K300 (4392)
• Example
• X1bar=$26,800, s1=$600, n1=10
• X2bar=$25,400, s2=$450, n2=8
• F-test: F 1.78 is smaller than CV 4.82; do not
reject the null hypothesis of equal variance at
the .01 level.
• Therefore, we can use the pooled variance.
)18,110(78.1~
450
600
2
2
2
2

S
L
s
s

(c) 2007 IUPUI SPEA K300 (4392)
• X1bar=$26,800, s1=$600, n1=10
• X2bar=$25,400, s2=$450, n2=8
• Since 5.47>2.58 and p-value <.01, reject the H0
at the .01 level.
)2(~
11
)(
21
21
21



 nnt
nn
s
xx
t
pool
75.291093
2810
450)18(600)110(
2
)1()1( 22
21
2
22
2
112







nn
snsn
spool
)2810(47.5~
8
1
10
1
5.539
)2540026800(



t

(c) 2007 IUPUI SPEA K300 (4392)
Independent sample t-test: Unequal variance
• Compare means of two independent samples
that have different variances (if the null
hypothesis of the F-test is rejected)
• The null hypothesis is µ1-µ2=c (often 0)
• Individual variances need to be used
• Degrees of freedom is approximated; not
necessarily an integer
)(~
2
2
2
1
2
1
21
iteSatterthwadft
n
s
n
s
xx
t




(c) 2007 IUPUI SPEA K300 (4392)
• Approximation of degrees of freedom
• Not necessarily an integer
• Satterthwait’s approximation (common)
• Cochran-Cox’s approximation
• Welch’s approximation
2
2
2
1
21
)1()1)(1(
)1)(1(
cncn
nn
df iteSatterthwa



2
2
21
2
1
1
2
1
nsns
ns
c



(c) 2007 IUPUI SPEA K300 (4392)
• Example
• X1bar=191, s1=38, n1=8
• X2bar=199, s2=12, n2=10
• F-test: F 10.03 (7, 9) is larger than CV 4.20,
indicating unequal variances. Reject H0 of
equal variance at the .05 level.
• Therefore, we have to use individual variances
)110,18(03.10~
12
38
2
2
2
2

S
L
s
s

(c) 2007 IUPUI SPEA K300 (4392)
• Example
• X1bar=191, s1=38, n1=8
• X2bar=199, s2=12, n2=10
• Test statistics |-.57| is small.
• CV 2.365 for 7 (8-1) degrees of freedom and does not
reject the null hypothesis
• However, we need the approximation of degrees of
freedom to get more reliable df.
)(57.~
10
12
8
38
199191
22
2
2
2
1
2
1
21
iteSatterthwadf
n
s
n
s
xx
t 







(c) 2007 IUPUI SPEA K300 (4392)
Summary of Comparing Means
One sample
T-test
One
sample?
Dependent?
Equal
Variance?
Two
Paired sample
T-test
Independent
sample T-test
(Pooled variance)
Independent sample
T-test
(Approximation of d.f.)
Unequal
Independent
cH :0 0:0 dH  0: 210  H 0: 210  H
1 ndf 1 ndf 221  nndf edapproximatdf 

ANOVA – Analysis of Variance
• Statistical technique specially designed
to test whether the means of more than 2
quantitative populations are equal.

• ANOVA is similar to regression in that it is used to
investigate and model the relationship between a
dependent (response) variable and one or more
independent (explanatory) variables.
• It is different
• the independent variables are qualitative (categorical)
• no assumption is made about the nature of the relationship
• ANOVA really extends the two-sample t-test for testing
the equality of two population means to a more
general null hypothesis of comparing the equality of
more than two means, versus them not all being
equal.

• An analysis of variance (=ANOVA) is a statistical
method, to detect if there is a statistical difference
between the means of the populations.
• The null hypothesis in the simple ANOVA test is the
following:
• H0: μ1 = μ2 = … = μk
• Against the alternative
• H1: at least two μ’s differ
• The test statistic for ANOVA is the ANOVA F-statistic.

Example ANOVA research question
• Are there differences in the degree of religious
commitment between countries (UK, USA, and Australia)?
• 1-way ANOVA
• 1-way repeated measures ANOVA
• Factorial ANOVA
• Mixed ANOVA
• ANCOVA

Example ANOVA research question
• Do university students have different levels of satisfaction
for educational, social, and campus-related domains ?
• 1-way ANOVA
• Factorial ANOVA
• Mixed ANOVA
• ANCOVA

Example ANOVA research questions
• Are there differences in the degree of religious
commitment between countries(UK, USA, and Australia)
and gender (male and female)?
• 1-way ANOVA
• Factorial ANOVA
• Mixed ANOVA
• ANCOVA

• Does couples' relationship satisfaction differ between
males and females and before and after having children?
• 1-way ANOVA
• Factorial ANOVA
• Mixed ANOVA
• ANCOVA

• Are there differences in university student satisfaction
between males and females (gender) after controlling for
level of academic performance?
• 1-way ANOVA
• Factorial ANOVA
• Mixed ANOVA
• ANCOVA

ANOVA
• Variance can be separated into two major components
• Within groups – variability or differences in particular groups
(individual differences)
• Between groups - differences depending what group one is in or
what treatment is received

F test
• ANOVA partitions the sums of squares (variance from the
mean) into:
• Explained variance (between groups)
• Unexplained variance (within groups) – or
• error variance
F = ratio between explained & unexplained variance
p = probability that the observed mean differences between
groups could be attributable to chance

One-way ANOVA - Assumptions
• Dependent variable (DV) must be: Interval
or ratio
• Normality: Normally distributed for all
groups
• Variance: Equal variance across for all
groups (homogeneity of variance)
• Independence: Participants' data should be
independent of others' data

One-way ANOVA:
• Are there differences in satisfaction
levels between students who get
different grades?

• The sample mean and standard deviation is
calculated, by selecting the variable by double
clicking it and going View/ Descriptive
Statistics/Tests/Stats by classification..

The ANOVA test in Eviews
• To determine whether to reject the null hypothesis or not
we focus on the highlighted ANOVA F-test output. The
column named Probability contains the p-value of interest.
Since the p-value is below 5% we reject the null
hypothesis and conclude that there is a statistical
significant difference in weight between the groups.

Testing assumptions
• A number of assumptions must be met to ensure the
validity of the above analysis of variance.
• The following three assumptions will be checked in this
section
• 1) Homogeneity of variance
• 2) Normally distributed errors
• 3) Independent error terms

Homogeneity of variance (1)
•To test for homogeneity of variance
between the different groups in the
analysis, we use Levene’s test for
equality of variance.

• To have EViews run Levene’s test, is somewhat similar to
running the ANOVA test in the first place. Once again you
need to select the variable of interest
• View /Descriptive Statistics /Tests/Equality Tests by
Classification...

• we get a p-value of 0.67, which is way above any
reasonable level of significance. Therefore we cannot
reject the null hypothesis and assumption of homogeneity
of variance is considered satisfied.

Normally distributed errors
• we address the assumption by creating distribution
histograms for each group.

• Making cross group analysis is done by using Q-Q plots
to determine whether or not the observations follow a
normal distribution when analyzed within their group. To
make this analysis in EViews do the following:
• Select Quick > Graph from the top tool, which should
result in the following windows:

• Within this windows type in the variable of interest

• First you need to choose Categorical Graph from the
dropdown menu (1). Then select the specific graph
Quantile – Quantile (2), which is also known as the Q-Q
plot. To make Eviews create a separate graph for each
outcome in the grouping variable, you need to type in the
grouping variable in the Across Graphs window.

Quantile – Quantile
The output displays the perfect normal distribution

Independent error terms
• Assumptions concerning independent error terms is
simply done, by making scatter plots of the variable of
interest and the observation numbers. This is done to
ensure that a pattern related to order in which the sample
is collected, doesn’t exist.
• Select Quick/Graph and type in the variable of interest in
the resulting window and click OK.

Comparing means

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Comparing means

Similar a Comparing means (20)

Más de University of Jaffna

Más de University of Jaffna (16)

Último

Último (20)

Comparing means