Advance statistics 2

Advance Statistics: Introduction
to Regression
and Analysis of Variance
Fixed and Random Effects

Today’s Report
 Random effects.
 One-way random effects ANOVA.
 Two-way mixed & random effects
ANOVA.
 Sattherwaite’s procedure.

Two-way ANOVA
 Second generalization: more than one grouping variable.
 Two-way ANOVA model: observations:
(Yijk), 1 < i < r , 1 < j < m, 1 < k < nij : r groups in first grouping variable,
m groups ins second and nij samples in
(i; j)-“cell”:
Yijk = µ + α i + βj + (α β)ij + Eijk, Eijk~N(0, σ ² ):
 n Constraints:
r∑i=1 αi = 0
m∑j=1 βj = 0
m∑j=1 (αβ)ij = 0, 1 < i < r
r∑i=1 (αβ)ij = 0, 1 < j < m

Random and fixed effects
 In ANOVA examples we have seen so far, the categorical
variables are well-defined categories: below average fitness,
long duration, etc.
 In some designs, the categorical variable is “subject”.
 Simplest example: repeated measures, where more than
one (identical) measurement is taken on the same individual.
 In this case, the “group” effect αi is best thought of as
random because we only sample a subset of the entire
population of subjects.

When to use random effects?

 A “group” effect is random if we can think of the levels we
observe in that group to be samples from a larger population.

 Example: if collecting data from different medical centers,
“center” might be thought of as random.

 Example: if surveying students on different campuses,
“campus” may be a random effect.

Example: sodium content in beer
 How much sodium is there in North American beer? How?
much does this vary by brand?

 Observations: for 6 brands of beer, researchers recorded the
sodium content of 8 12 ounce bottles.

 Questions of interest: what is the “grand mean” sodium
content? How much variability is there from brand to brand?

 “Individuals” in this case are brands, repeated measures are
the 8 bottles.

One-way random effects model
 Suppose we take n identical measurements from r subjects

 Yij ~ µ + αi + Eij, 1 < I < r, 1 < j < n

 Eij ~ N(0, σ²), 1 < i < r, 1 < j <n

 αi ~ N(0,σ²µ), 1 < i < r

 We might be interested in the population mean, µ. : CIs, is it zero? etc.

 Alternatively, we might be interested in the variability across
subjects,σ²µ : CIs, is it zero?

One-way random ANOVA table
Source SS df E(MS)

Treatments SSTR = r ∑i=1 n(Y1 – Y..) ² r–1 σ² + n σ²µ
Error SSE = r ∑i=1 (Yij- Yi.) ² (n - 1)r σ²

 Only change here is the expectation of SSTR which reflects
randomness of αi’s
 ANOVA table is still useful to setup tests: the same F statistics for
fixed or random will work here.
 Under Ho : σ²µ = 0, it is to see that

MSTR ~Fr-1, (n-1) r’
MSE

Inference for µ
 We know that E(Y..) = µ. , and can show that
Var(Y..) = nσ²µ + σ²
rn
 Therefore,
Y.. - µ
square SSTR ~tr - 1
(r-1)rn
 Why r – 1 Degrees of freedom? Imagine we could record an infinite
number of observations for each individual, so that Yi ---- µi

 To learn anything about µ. We still only have r observations (µ1,…., µr).

 Sampling more within an individual cannot narrow the CI for µ,.

Estimating ²σµ

 From the ANOVA table
σ²µ = E(SSTR/(r-1))-E(SSE/((n-1)r))
n
 Natural estimate
S²µ = SSTR/(r-1)-SSE/((n-1)r))
n
 Problem: this estimate can be negative! One of the difficulties in random
effect model

Example: productivity study
 Imagine a study on the productivity of employees in a large
manufacturing company.

 Company wants to get an idea of daily productivity, and how it depends
on which machine on employee uses.

 Study: take m employee and r machines, having each employee work on
each machine for a total of n days.

 As these employees are not all employees, and these machines are not
all machines it take sense to think of both the effect of machine and
employees (and interactions) as random

Two-way random effects model
 Yijk ~ µ.. + αi + βj + (αβ)ij + Eij, 1< i < r, 1 < j < m, 1 < k < n

 Eijk ~ N(0, σ²), 1 < i < r, 1 < j < m, 1 < k < n

 αi ~ N(0, σ²α), 1 < i < r.

 βj ~ N(0, σ²β), 1 < j < m.

 (αβ)ij ~ N(0, ²σαβ), 1 < j < m, 1< i < r.

 Cov (Yijk,Yi’j’k’) = &ii’ ²σα + &jj’ ²σβ + &ii’ &jj’ ²σαβ + &ii’ &jj’ &kk’ σ².

ANOVA table: Two-way (random)
SS df E(SS)
SSA = nm r∑i=1 (Yi.. – Y…) ² r -1 σ² + nm σ²α + nσ²αβ

SSB = nr m∑j=1 (Yj.. – Y…) ² m -1 σ² +nr σ²β + nσ²αβ

SSAB = n r∑i=1 (Yij. – Yi.. –Yj.. + Y…) ² (m – 1)(r – 1) σ² + nσ²αβ

SSE = r∑i=1 m∑j=1 n ∑k=1 (Yijk – Y ij.)² (n – 1)ab σ²

 To test Ho : σ²α = 0 use SSA and SSAB.

 To test H0 : σ²αβ = 0 use SSAB and SSE.

Mixed effects model
 In some studies, some factors can be thought of as fixed, others
random.

 For instance, we might have a study of the effect of a standard
part of the brewing process on sodium levels in the beer
example

 Then, we might think of a model in which we have a fixed effect
for “brewing technique” and a random effect for beer

Two-way mixed effects model
 Yijk ~ µ.. + αi + βj + (αβ)ij + Eij, 1< i < r, 1 < j < m, 1 < k < n

 Eijk ~ N(0, σ²), 1 < i < r, 1 < j < m, 1 < k < n

 αi ~ N(0, σ²α), 1 < i < r.

 βj ~ N(0, σ²β), 1 < j < m. are constant!

 (αβ)ij ~ N (0,(m-1) σ²αβ/m), 1 < j < m, 1< i < r.

 Constraints:
m∑j=1 βj = 0
m∑j=1 (αβ)ij = 0, 1 < i < r
Cov ((αβ)ij, (αβ) i’j’ ) = - σ²αβ/m
 Cov (Yijk,Yi’j’k’) = &jj’ (σ²β + &ii’ m-1 σ²β – (1-&ii’) 1 σ²αβ + &ii’ &kk’ σ²).
m m

ANOVA table: Two-way (mixed)
SS df E(MS)
SSA r -1 σ² + nm σ²α

SSB m -1 σ² + m∑j=1 β²i+ nσ²αβ
m -1
SSAB (m – 1)(r – 1) σ² + nσ²αβ

SSE = r∑i=1 m∑j=1 n ∑k=1 (Yijk – Y ij.)² (n – 1)ab σ²

 To test Ho : σ²α = 0 use SSA and SSAB.

 To test Ho : β1 = … = βm = 0 0 use SSB and SSAB.

 To test Ho : σ²αβ use SSAB and SSE.

Confidence intervals for variances

 Consider estimating σ²β in two-way random effects ANOVA. A natural
estimate is
σ²β = nr (MSB – MSAB)
 What about CI?

 A linear combination of x² - but not x².

 To form a confidence interval for σ²β we need to know distribution of a
linear combination of MS.’s at least approximately.

Satterwaite’s procedure
 Given K independent MS.’s
L ~ k∑i=1 ciMSi
 Then
dfTL “~” x²dfT
E(L)
where
dfT = (k∑i=1 ciMSi)²
k∑i=1 ciMS²i/dfi

Where dfi the the degrees of freedom of the i-th MS.
 (1-α) . 100% CI for E(L):

LL = dfT x L , LU = dfT x L
x² dfT; 1-α/2 x² dfT; 1-α/2

Advance statistics 2

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Destacado

Destacado (18)

Similar a Advance statistics 2

Similar a Advance statistics 2 (20)

Más de Tim Arroyo

Más de Tim Arroyo (18)

Último

Último (20)

Advance statistics 2