Multi Level Modelling&Weights Workshop Kiel09

Please check if you have 4 files in the folder C:KIEL

Please start SPSS now

MULTIPLE REGRESSION AND
MULTILEVEL MODELLING

1

2
I NTRODUCTION

1. Brief overview of multiple regression analysis

2. Multiple regression using PISA data

3. Brief overview of multilevel modelling

4. Multilevel modelling using PISA data

5. Differences between the two types of analyses

O VERVIEW MULTIPLE REGRESSION

3

4
S IMPLE REGRESSION MODEL

 Predicting the dependent variable using linear
relationship with independent variables

 Regression analysis with one independent
variable:

 β0 is the intercept (the value of Ŷi when Xi=0)

 β1 is the slope of the line that minimises εi

6 M ULTIPLE REGRESSION MODEL

R<>0

Ŷi
r=0

7 M ULTIPLE REGRESSION MODEL

O O O

M ULTIPLE REGRESSION WITH
PISA DATA

8

9
PISA DATA
P LAUSIBLE VALUES

 Plausible values for cognitive performance
 5 randomly drawn values from a student’s most
likely ability range (posterior distribution)

 Unbiased populations estimates (even with one
PV)

 Imputation variance (measurement error)

 NEVER average the plausible values!
Instead, average 5 statistics (means, regression
coefficients, etc.)

10
PISA DATA
S TUDENT WEIGHTS

 Final student weight: the number of students
represented in the population by each student

 The inverse of the probability to select the
student’s school times the probably of selecting
the student given that the school is selected

 Non-response and post-stratification
adjustments and trimming

11
PISA DATA
R EPLICATE WEIGHTS

 80 BRR replicate weight with Fay’s k=0.5

 Used to compute sampling variance

 Computation of sampling variance using BRR
weights
 Takes two-stage sampling method into account

 Takes stratification into account

 is identical for any statistic

12
E RROR VARIANCE

 Error variance is a combination of the sampling
variance and the imputation variance
(measurement error)

 Imputation variance can only be estimated when
using a set of plausible values

 Imputation variance is small compared to the
sampling variance

 Standard error is the square root of the error
variance

13
C OMPUTATION OF
STANDARD ERROR

 Error variance

 Sampling variance

 Imputation variance

 Standard error is the square root of the error variance

14
SPSS REPLICATES ADD - IN

 Password WI-FI: Hawking09+

 mypisa.acer.edu.au
  Public data & analysis

  Software & manuals

 Download and install replicates add-in

 Start SPSS

 Copy CD to C:Kiel and unzip file

15
E XAMPLE IN SPSS

 C:KielINT_Stu06_SCHWGT.sav

 German data

 Regress science performance on
 Sex

 Immigration status

 ESCS

O VERVIEW MULTILEVEL MODELLING

16

17
E XAMPLE

 For Japan in 2006:
 Strong relationship between ESCS and science
(38.8)

 Large intra-class correlation in performance

 Small intra-class correlation in ESCS

 For this example, only nine Japanese schools are
selected

18
S INGLE LEVEL REGRESSION

Overall slope is 38.8

19
M ULTILEVEL MODEL WITH
RANDOM SLOPES

20
M ULTILEVEL MODEL WITH
FIXED SLOPES

Average slope is 7.2

21
I NTERPRETATION REGRESSION
COEFFICIENTS

 Single level regression gives the overall relationship
between ESCS and performance in a country (38.8 in
Japan)
 Multi-level regression takes the 2-level structure of
the data into account and
 Estimates a unique slope within each school (or the
variance of the slopes) or

 Estimates the average slope within schools (7.2 in
Japan)

 Which type of analysis is more correct?

22
N OTATION MLM

 Random intercept model

 Level 1: Yij  0 j  1 X ij  rij
 Level 2: 0 j   00   01W j  u0 j
 Random slopes and random intercept

 Level 1: Yij   0 j  1 j X ij  rij
 Level 2:  0 j   00   01W j  u0 j
1 j   10   11W j  u1 j

23
R ANDOM INTERCEPT

 System of equations

 Level 1: Yij  0 j  1 X ij  rij
 Level 2: 0 j   00   0 jW j  u0 j
 Mixed-effects model
Yij   00   0 jW j  1 X ij  u0 j  rij

Fixed part Random part

24
R ANDOM INTERCEPT AND
RANDOM SLOPES

 System of equations

 Level 1: Yij   0 j  1 j X ij  rij
 Level 2:  0 j   00   0 jW j  u0 j
1 j   10   11W j  u1 j
 Mixed-effects model
Yij   00   0 jW j  u0 j    10   11W j  u1 j  X ij  rij
  00   0 jW j   10 X ij   11W j X ij  u1 j X ij  u0 j  rij

Fixed part Random part
Cross-level interaction

25
VARIANCE DECOMPOSITION

 In single level regression analysis, the overall
variance of the dependent variable is estimated
and the amount of this variance that is explained
by the independent variables (R-squared)

 In multilevel analysis, the variance is
decomposed in between-cluster (school) and
within-cluster variance

 The independent variables can explain variance
at either level or at both levels

26
VARIANCES

 Total variance = within-cluster variance +
between-cluster variance
 Average within-cluster variance

y  y j 
2
n( 2) n(1)
  
2 ij

n(2)  n(1)  1
r
j 1 i 1

 Between-cluster variance

y  y 
2
n( 2 )
 r2
0 j  
j
2
(2)

j 1 n n (1)

27
I NTRACLASS CORRELATION
AND EXPLAINED VARIANCE

 Null model: yij   00  u0 j  rij
 Intraclass correlation (rho)=
between-cluster variance / total variance

 Explained variance (R-squared) of a model with
predictors:
 Level 1: 1 - (var(W)p / var(W)n)

 Level 2: 1 - (var(B)p / var(B)n)

28
T HE STANDARD ERROR

 One assumption of OLS is independence of
observations
 In 2-stage sampling designs, observations within
clusters are often not independent
 MLM allows for correlated errors and therefore gives
unbiased SEs
 Generally, SEs estimated with OLS are too small
 However, BRR replicate weights are designed to deal
with the dependence of observations within schools,
so OLS with BRR gives correct standard errors!

29
W EIGHTING - 1

 Single level regression: final students weights and
BRR replicate weights

 How do we use PISA weights in MLM?

 Data analysis manual: normalise final student
weights and replication weights and run the
analysis in SPSS or SAS

 We now know this is not the best way

30
W EIGHTING - 2

 SPSS and SAS do not assume the weights to be
sampling weights (they are precision weights)

 SPSS and SAS can only weight at the student level

 MLM and BRR are both taking the multi-level
structure of the data into account, so this is done
twice in the PISA data analysis manual method

 However, there is no final consensus about the
right way to use weights in MLM

31
W EIGHTING - 3

 In PISA school-level sampling is much more
informative than student-level sampling
(stratification is at school-level; students have
often very similar weights within schools )
 Therefore, schools should be weighted by a
school-level weight
 Students should be weighted by a conditional
student weight (inverse of the probability to be
selected given that the student’s school is
sampled)

32
W EIGHTING - 4

 Options for conditional student level weights:
 Equal weights (weight=1)
 Raw conditional student weights
 Rescaled weights: Pfefferman method 1 when
student sampling is not informative
 Rescaled weights: Pfefferman method 2 when
student sampling is informative

 Differences are small when cluster sizes are
larger than 20 students

33
R AW CONDITIONAL STUDENT
WEIGHTS

 Raw conditional student weights:
W_FSTUWT
w 
(1)
i| j
W_FSCHWT
 School weight is included in the school
questionnaire data file
 Not exactly correct, because some adjustments
are made independent of schools (e.g. non-
response adjustment)
 Often leads to an overestimation of the
between-school variance

34
P FEFFERMAN METHOD 1

 When student sampling is not informative at
level 1

 Conditional student weights are multiplied by the
sum of weights within cluster divided by the sum
of squared weights within cluster
n(1)
j

 |j
wi(1)
PFEFF1  wi(1)
|j
i 1
n(1)

w 
j
(1) 2
i| j
i 1

35
P FEFFERMAN METHOD 2

 When student sampling is informative

 Conditional student weights are divided by the
average conditional student weight in school j or
n (1)
PFEFF 2  wi(1)
j
|j n(1)
j

w
i 1
(1)
i| j

 This is the same as normalising full student
weights within schools

36 L ET ’ S TRY IT OUT IN MLWI N

 Australia, because they oversample indigenous
students who perform less than non-indigenous
students (positive correlation between
conditional student weights and performance)

 C:Kiel INT_Stu06_SCHWGT.sav

 I have added the full school weights
(W_FSCHWT) and the normalised school weights
(N_FSCHWT)

 N_FSCHWT= W_FSCHWT*SAMPSIZE/POPSIZE

37
C OMPARING CONDITIONAL
STUDENT WEIGHTS IN MLWI N - 1
Equal Raw Pfeff1 Pfeff2 Std MLwiN
Response PV1SCIE PV1SCIE PV1SCIE PV1SCIE PV1SCIE

Fixed Part
CONS 521 523 522 522 520

Random Part
Level: SCHOOLID
CONS/CONS 1527 1300 1517 1508 1782
Level: STIDSTD
CONS/CONS 8605 29105 8172 8472 8404

-2*loglikelihood: 169452 170788 169614 169633 173562
DIC:
Units: SCHOOLID 356 356 356 356 356
Units: STIDSTD 14170 14170 14170 14170 14170

38
C OMPARING CONDITIONAL
STUDENT WEIGHTS IN MLWI N – 2

 Equal weights (=1) and Pfefferman methods 1
and 2 give similar results when using PISA data

 Pfefferman method 2 most conservative:
recommended

 Raw weights over-estimate the within-school
variance (I think this is MLwiN specific, similar
problem with unscaled school weights)

39
W EIGHTS STANDARDISED BY
MLWI N

 MLwiN’s standardisation of the weights:
 At the school level, the full school weight is
normalised at country level

 The student level weight is the Pfefferman 2
conditional student weight * the normalised
school weight * a factor to make the average
student weight equal to one

 Odd that the school weight is included at both
levels, but results are the same as in HLM

40 W HICH WEIGHTS ARE BETTER ?

 In simulation study the differences in results
were minimal, but the differences were big when
using data from some real countries

 We do not know which method is best

 Probably safest in MLwiN to use standardised
weights, because we do not know how the
weights are built into their algorithm

 Need to explore what other software packages
do (gllamm in STATA)

41
R EFERENCES

 Rabe-Hesketh, S. & Skondral, A. (2006). Multi-
level modelling of complex survey data. Journal
of Royal Statistical Society, 169, 805-827

 Chantala, K., Blanchette, D. & Suchindran, C. M.
(2006). Software to compute sampling weights
for multilevel analysis.
http://www.cpc.unc.edu/restools/data_analysis/
ml_sampling_weights/Compute%20Weights%20f
or%20Multilevel%20Analysis.pdf

P RACTISE MLM WITH PISA DATA

42

43
E XERCISE

 For MLwiN, data has to be sorted first by the
highest level ID variable, then by the second
highest, etc. (SCHOOLID in PISA)

 MLwiN needs a constant in the data (compute
CONS=1.) to estimate the intercept

 Start with data from Chile, where the intraclass
correlation in both science performance and
ESCS is high

 Start MLwiN…

44
WARNINGS

 Definition of a school is not the same in each
country and not always that clear (campus)
 Differences in educational systems between or
even within countries or cycles (tracked)
 Risk of swimming and too complicated models to
interpret if MLM is more data driven than theory
driven
 To interpret results carefully, you need to know
enough about the educational system in a
country or differences across countries

C OMPARING MULTIPLE REGRESSION
AND MULTILEVEL ANALYSIS

45

46 C OMPARISONS

OLS with BRR MLM

 Fixed effects  Random effects and cross-
level interactions

 Includes measurement  Difficult to include
error measurement error

 Takes stratification into  I think it doesn’t take
account school stratification into
account

 Output is SPSS data file for  Output is often in text
easy editing format

47
O PTIONS FOR FINAL PART OF
THE WORKSHOP

 Try a MLM on data of your own country
 Try school and student level variables

 Try to add cross level interactions (free the
slopes)

 Discuss MLMs that you have tried in the past or
would like to do in the future

 Ask any PISA related data analysis questions

Multi Level Modelling&Weights Workshop Kiel09

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (11)

Similar a Multi Level Modelling&Weights Workshop Kiel09

Similar a Multi Level Modelling&Weights Workshop Kiel09 (20)