Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Predicting the quality of a survey question from its design characteristics: SQP
1. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Predicting the quality of a survey question
from its design characteristics: SQP
Daniel Oberski
(joint work with Willem Saris)
U N I V E R S I T A T
P O M P E U F A B R A
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
2. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Measurement Representation
Construct
Measurement
Response
Edited data
Validity
Processing
error
Measurement
error
Inferential population
Target population
Sampling frame
Sample
Respondents
Survey statistic
Coverage
error
Sampling
error
Nonresponse
error
(Groves et al. 2004).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
3. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error ConclConstruct
Measurement
Response
Edited data
Validity
Processing
error
Measurement
error
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
4. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
• Assume the step from construct to measurement is already
acceptable
→ Assume that the question measures an intended construct:
respondent knows the answer, can interpret the question,
...
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
5. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
• Assume the step from construct to measurement is already
acceptable
→ Assume that the question measures an intended construct:
respondent knows the answer, can interpret the question,
...
→ reaction of respondent to the question depends on some
unobserved value/opinion, which is in turn a measure of
construct.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
6. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
• Assume the step from construct to measurement is already
acceptable
→ Assume that the question measures an intended construct:
respondent knows the answer, can interpret the question,
...
→ reaction of respondent to the question depends on some
unobserved value/opinion, which is in turn a measure of
construct.
• We focus only on the degree to which the response is a
good measure of this unobserved score/opinion,
“measurement error”.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
7. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
• Assume the step from construct to measurement is already
acceptable
→ Assume that the question measures an intended construct:
respondent knows the answer, can interpret the question,
...
→ reaction of respondent to the question depends on some
unobserved value/opinion, which is in turn a measure of
construct.
• We focus only on the degree to which the response is a
good measure of this unobserved score/opinion,
“measurement error”.
• (NOT the degree to which the question is interpretable,
measures some construct, etc.)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
8. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
• Assume the step from construct to measurement is already
acceptable
→ Assume that the question measures an intended construct:
respondent knows the answer, can interpret the question,
...
→ reaction of respondent to the question depends on some
unobserved value/opinion, which is in turn a measure of
construct.
• We focus only on the degree to which the response is a
good measure of this unobserved score/opinion,
“measurement error”.
• (NOT the degree to which the question is interpretable,
measures some construct, etc.)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
9. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
10. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
• var(ˆµ) = κ−1
σ2
/n, where κ ∈ (0, 1) is reliability
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
11. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
• var(ˆµ) = κ−1
σ2
/n, where κ ∈ (0, 1) is reliability
• Unreliability reduces apparent strength of relationships
between variables:
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
12. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
• var(ˆµ) = κ−1
σ2
/n, where κ ∈ (0, 1) is reliability
• Unreliability reduces apparent strength of relationships
between variables:
• ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy
the observed correlation.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
13. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
• var(ˆµ) = κ−1
σ2
/n, where κ ∈ (0, 1) is reliability
• Unreliability reduces apparent strength of relationships
between variables:
• ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy
the observed correlation.
• Correlated measurement errors will make variables look
more related than they really are; e.g. “How many minutes
does it take to...” questions correlate partly because they
are all asked in the same way.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
14. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Reasons to study measurement error
• Reliability is an upper bound on validity; responses can
never measure underlying construct better than the single
indicator.
• Unreliability increases the variance of estimators:
• var(ˆµ) = κ−1
σ2
/n, where κ ∈ (0, 1) is reliability
• Unreliability reduces apparent strength of relationships
between variables:
• ρxy = κx · κy · ρXY , where ρXY is the true correlation and ρxy
the observed correlation.
• Correlated measurement errors will make variables look
more related than they really are; e.g. “How many minutes
does it take to...” questions correlate partly because they
are all asked in the same way.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
15. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Public health ranking: Correction of regression coefficients for κ
Country
Educationaldifferentialsinsubjectivehealthwith2s.e.interval
-0.4-0.3-0.2-0.10.0
GR
CZ
PT
SI
FI
HU
PL
SK
LU
ES
EE
DK
DE
TR
IS
NO
CH
BE
IE
FR
UA
AT
NL
SE
Uncorrected regression coefficient
Measurement error-corrected coefficient
0.82
0.85
0.78
0.73
0.56
0.75
0.71
0.81
0.86
0.85
0.95
0.84
0.91
0.70
0.81
0.87
0.81
0.82
0.92
0.85
0.91
0.81
0.93
0.99
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
16. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Design characteristics of questions
• Social Desirability
• Centrality
• Reference period
• Question
formulation
• WH word used
• Use of gradation
• Balance of the
request
• Encouragement
• Showcards
present
• Showcards have
pictures
• ...
• Emphasis on subjective
opinion in request
• Information about the
opinion of other people
• Use of stimulus or
statement in the question
• Absolute or comparative
judgment
• Response scale: basic
choice
• Number of categories
• Labels full, partial, or no
• Labels full sentences
• Knowledge provided
• Survey mode
• ...
• Order of the labels
• Correspondence between
labels and numbers of the
scale
• Theoretical range of the
scale
• Neutral category
• Number of fixed reference
points
• Don’t know option
• Interviewer instruction
• Respondent instruction
• Extra motivation, info or
definition available?
• Agree-disagree scale
• . . .
(Saris & Gallhofer 2007)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
17. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
18. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
• It is clear that what is a good method depends strongly on
the topic, for example
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
19. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
• It is clear that what is a good method depends strongly on
the topic, for example
• The frequency and importance of an event or series of
events asked about determine: reasonable reference
periods; reasonable categories - wide or deep;
approximately or exactly (Tourangeau et al. 2000).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
20. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
• It is clear that what is a good method depends strongly on
the topic, for example
• The frequency and importance of an event or series of
events asked about determine: reasonable reference
periods; reasonable categories - wide or deep;
approximately or exactly (Tourangeau et al. 2000).
• But are some methods generally better than others?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
21. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
• It is clear that what is a good method depends strongly on
the topic, for example
• The frequency and importance of an event or series of
events asked about determine: reasonable reference
periods; reasonable categories - wide or deep;
approximately or exactly (Tourangeau et al. 2000).
• But are some methods generally better than others?
• If so, what about those methods makes them better?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
22. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• There are a great number of question design
characteristics for which it has at some point been found or
suggested that they influence the response;
• Any question in a questionnaire represents a series of
choices (conscious or not) on those characteristics: a
method of asking the question;
• It is clear that what is a good method depends strongly on
the topic, for example
• The frequency and importance of an event or series of
events asked about determine: reasonable reference
periods; reasonable categories - wide or deep;
approximately or exactly (Tourangeau et al. 2000).
• But are some methods generally better than others?
• If so, what about those methods makes them better?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
23. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Question design choices
• But are some methods generally better than others?
• If so, what about those methods makes them better?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
24. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Talk outline
1 Question design
The influence of the method
Variation in influence of the method
2 Modeling measurement error
Definitions
Formal model and assumptions
3 Estimating measurement error
Design requirements
Estimation of the model
4 Predicting measurement error
Description of the data
Meta-analysis of the MTMM experiments
Program demonstration
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
25. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
The method influences the answers
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
26. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
European Social Survey, 2002
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
27. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
European Social Survey, 2002
Method A:
ENTER START TIME:
1 TvTot
CARD 1 On an average weekday, how much time, in total, do you
spend watching television? Please use this card to answer.
No time at all
Less than ½ hour
½ hour to 1 hour
More than 1 hour, up to1½ hours
More than 1½ hours, up to 2 hours
More than 2 hours, up to 2½ hours
More than 2½ hours, up to 3 hours
More than 3 hours
(Don’t know)
A2 TvPol
STILL CARD 1 And again on an average weekday, how much of
your time watching television is spent watching news or
programmes about politics and current affairs1
? Still use
this card.
00 GO TO A3
01
02
03
04 ASK A2
05
06
07
88
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
28. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
European Social Survey, 2002
Method A:
ENTER START TIME:
1 TvTot
CARD 1 On an average weekday, how much time, in total, do you
spend watching television? Please use this card to answer.
No time at all
Less than ½ hour
½ hour to 1 hour
More than 1 hour, up to1½ hours
More than 1½ hours, up to 2 hours
More than 2 hours, up to 2½ hours
More than 2½ hours, up to 3 hours
More than 3 hours
(Don’t know)
A2 TvPol
STILL CARD 1 And again on an average weekday, how much of
your time watching television is spent watching news or
programmes about politics and current affairs1
? Still use
this card.
00 GO TO A3
01
02
03
04 ASK A2
05
06
07
88
Method B:!
!""#$%&'()*%)+&#!)&,%$#
!
-&.# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#)$32.4"(#
3&5&%464/"89
:##
#
# # # ,$/+%#/)#;!<=>0#### ###?@A#BC@<DE>0# # # #
# # # #
-&1# #!"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#5463&"4"(#3/#
3.&#'$+4/8F
:##
#
# # # ,$/+%#/)#;!<=>G## ?@A#BC@<DE>G# # # #
# # # # # #
#
#
-&2# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#'&$+4"(#3.&#
"&)67$7&'688
:##
#
# # # ,$/+%#/)#;!<=>G# #?@A#BC@<DE>G# #
#
#
#Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
31. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
Newspaper reading: method A versus method B
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of newspaper reading:
categorical scale
0
2000
4000
6000
8000
10000
12000
q
q
q
0
2000
4000
6000
8000
10000
Hours of newspaper reading:
write in hrs and mins
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
32. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
TV watching: method A versus method B
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of TV watching:
categorical scale
0.00
0.05
0.10
0.15
0.20
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of TV watching:
write in hrs and mins, recoded
0.00
0.05
0.10
0.15
0.20
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
33. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
Radio listening: method A versus method B
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of radio listening:
categorical scale
0.00
0.05
0.10
0.15
0.20
0.25
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of radio listening:
write in hrs and mins, recoded
0.00
0.05
0.10
0.15
0.20
0.25
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
34. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
The influence of the method
Newspaper reading: method A versus method B
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of newspaper reading:
categorical scale
0.0
0.1
0.2
0.3
0.4
0
h<0.5
0.5<=h<=1
1<h<=1.5
1.5<h<=2
2<h<=2.5
2.5<h<=3
h>3
Hours of newspaper reading:
write in hrs and mins, recoded
0.0
0.1
0.2
0.3
0.4
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
35. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
36. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
37. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
• There are much fewer people who watch very little TV with
method B, (9% versus 4% of 40,355 respondents),
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
38. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
• There are much fewer people who watch very little TV with
method B, (9% versus 4% of 40,355 respondents),
• Numeric method B has more people who watch a lot of TV.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
39. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
• There are much fewer people who watch very little TV with
method B, (9% versus 4% of 40,355 respondents),
• Numeric method B has more people who watch a lot of TV.
• Numeric method B has a spike at exactly 1 hour for radio
and newspaper.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
40. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
• There are much fewer people who watch very little TV with
method B, (9% versus 4% of 40,355 respondents),
• Numeric method B has more people who watch a lot of TV.
• Numeric method B has a spike at exactly 1 hour for radio
and newspaper.
• Overall it is clear the method has some influence on
average over all 40,355 respondents.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
41. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• The numeric method clearly produces many outliers, as
well as very high values that may or may not be outliers.
• To the extent that this is due to confusion of hours and
minutes, version C may remedy that problem.
• Distributions of hours with method A and B (recoded) is
similar but not the same:
• There are much fewer people who watch very little TV with
method B, (9% versus 4% of 40,355 respondents),
• Numeric method B has more people who watch a lot of TV.
• Numeric method B has a spike at exactly 1 hour for radio
and newspaper.
• Overall it is clear the method has some influence on
average over all 40,355 respondents.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
42. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Is the difference between methods the same for all
respondents?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
43. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Is the difference between methods the same for all
respondents?
The same people were asked both versions. This allows us to
show variation in answers to the numeric question, within
categories of the categorical question.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
44. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Is the difference between methods the same for all
respondents?
No time at all
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
Less than 0,5 hour
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
0,5 hour to 1 hour
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
More than 1 hour, up to 1,5 hours
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
More than 1,5 hours, up to 2 hours
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
More than 2 hours, up to 2,5 hours
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
More than 2,5 hours, up to 3 hours
Numeric value given
Density
0 1 2 3 4
0.00.20.40.60.81.0
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
45. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• Not only does the method influence the distribution of
answers,
• the method effect also depends on the person.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
46. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Variation in influence of the method
Do people answer methods differently?
• Not only does the method influence the distribution of
answers,
• the method effect also depends on the person.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
47. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Traits, Methods, and Persons
• Can imagine the same question (“Trait”) being asked in
different ways (“Methods”);
• Can imagine the same method being used to ask different
questions;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
48. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Traits, Methods, and Persons
• Can imagine the same question (“Trait”) being asked in
different ways (“Methods”);
• Can imagine the same method being used to ask different
questions;
• A response to a survey question is then different person’s
answers to Trait-Method combinations.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
49. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Traits, Methods, and Persons
• Can imagine the same question (“Trait”) being asked in
different ways (“Methods”);
• Can imagine the same method being used to ask different
questions;
• A response to a survey question is then different person’s
answers to Trait-Method combinations.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
50. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Measurement error model
1 Responses are a measure of some underlying score
(“trait”) so that if a person’s memory were erased and the
person re-interviewed, they should give a similar answer.
2 Responses are influenced by random variation: errors,
such as mistaking minutes for hours, but also variation in
information retrieved from memory.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
51. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Measurement error model
1 Responses are a measure of some underlying score
(“trait”) so that if a person’s memory were erased and the
person re-interviewed, they should give a similar answer.
2 Responses are influenced by random variation: errors,
such as mistaking minutes for hours, but also variation in
information retrieved from memory.
3 The method influences the answers on average, e.g. there
might be more social desirability bias in one method than
another, the scale may suggest some unspoken norm, etc.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
52. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Measurement error model
1 Responses are a measure of some underlying score
(“trait”) so that if a person’s memory were erased and the
person re-interviewed, they should give a similar answer.
2 Responses are influenced by random variation: errors,
such as mistaking minutes for hours, but also variation in
information retrieved from memory.
3 The method influences the answers on average, e.g. there
might be more social desirability bias in one method than
another, the scale may suggest some unspoken norm, etc.
4 Influence of method is different for different people:
random variation in the differences between methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
53. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Measurement error model
1 Responses are a measure of some underlying score
(“trait”) so that if a person’s memory were erased and the
person re-interviewed, they should give a similar answer.
2 Responses are influenced by random variation: errors,
such as mistaking minutes for hours, but also variation in
information retrieved from memory.
3 The method influences the answers on average, e.g. there
might be more social desirability bias in one method than
another, the scale may suggest some unspoken norm, etc.
4 Influence of method is different for different people:
random variation in the differences between methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
54. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Modeling measurement error
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
55. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Quasi-equation
Response =
Responses are a measure of some underlying score
(“trait”) so that if a person’s memory were erased and
the person re-interviewed, they should give a similar
answer.
Trait + Trait × Person+
Responses are influenced by random variation: er-
rors, such as mistaking minutes for hours, but also
variation in information retrieved from memory.
Person × Moment+
The method influences the answers on average, e.g.
there might be more social desirability bias in one
method than another, the scale may suggest some
unspoken norm, etc.
Method + Method × Trait
Influence of method is different for different people:
random variation in the differences between meth-
ods.
Method × Person
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
56. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Quasi-equation
Response = Trait + Method + Trait × Method+
Trait × Person + Method × Person+
Person × Moment
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
57. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Interpretation of the model
If persons are a random sample from a population U, consider
Person a random factor.
1 “Rest” variance is called “random measurement error”
2 Proportion of Residual variance on the total is called
“unreliability” (1 − r2)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
58. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Interpretation of the model
If persons are a random sample from a population U, consider
Person a random factor.
1 “Rest” variance is called “random measurement error”
2 Proportion of Residual variance on the total is called
“unreliability” (1 − r2)
3 Proportion of Method×Person variance on the total is
called “common method variance” (sometimes “invalidity”),
(1 − v2)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
59. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Interpretation of the model
If persons are a random sample from a population U, consider
Person a random factor.
1 “Rest” variance is called “random measurement error”
2 Proportion of Residual variance on the total is called
“unreliability” (1 − r2)
3 Proportion of Method×Person variance on the total is
called “common method variance” (sometimes “invalidity”),
(1 − v2)
4 Proportion of Trait×Person variance on the total is called
“quality” of the question (q2 or κ)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
60. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Interpretation of the model
If persons are a random sample from a population U, consider
Person a random factor.
1 “Rest” variance is called “random measurement error”
2 Proportion of Residual variance on the total is called
“unreliability” (1 − r2)
3 Proportion of Method×Person variance on the total is
called “common method variance” (sometimes “invalidity”),
(1 − v2)
4 Proportion of Trait×Person variance on the total is called
“quality” of the question (q2 or κ)
5 “Quality” (q2 or κ) will equal v2 · r2.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
61. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Definitions
Interpretation of the model
If persons are a random sample from a population U, consider
Person a random factor.
1 “Rest” variance is called “random measurement error”
2 Proportion of Residual variance on the total is called
“unreliability” (1 − r2)
3 Proportion of Method×Person variance on the total is
called “common method variance” (sometimes “invalidity”),
(1 − v2)
4 Proportion of Trait×Person variance on the total is called
“quality” of the question (q2 or κ)
5 “Quality” (q2 or κ) will equal v2 · r2.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
62. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Equation model
Yijk = τijk + ηij + ξik + ijk ,
where
i Indexes persons;
j Indexes traits;
k Indexes methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
63. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Model
Response = Trait + Method + Trait × Method+
Trait × Person + Method × Person+
Person × Moment
Yijk = τijk + ηij + ξik + ijk ,
where
i Indexes persons;
j Indexes traits;
k Indexes methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
64. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Equation with Trait×Method interaction with
Trait×Person
Yijk = τijk + λjk ηij + ξik + ijk ,
where
i Indexes persons;
j Indexes traits;
k Indexes methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
65. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Assumptions in the model
1 The (interaction) effects do not depend on other
Method×Trait combinations a person might receive;
(“no carry-over effects”, “SUTVA”, “independence
assumption”)
Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov
& Reinecke (eds), 2012)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
66. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Assumptions in the model
1 The (interaction) effects do not depend on other
Method×Trait combinations a person might receive;
(“no carry-over effects”, “SUTVA”, “independence
assumption”)
2 There is no separate Person main effect: Trait and Method
within Person already capture all within-person correlation
Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov
& Reinecke (eds), 2012)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
67. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Assumptions in the model
1 The (interaction) effects do not depend on other
Method×Trait combinations a person might receive;
(“no carry-over effects”, “SUTVA”, “independence
assumption”)
2 There is no separate Person main effect: Trait and Method
within Person already capture all within-person correlation
(“method variance is the only systematic
variance”, COVU( ijk , ξik ) = 0 and
COVU( ijk , ηik ) = 0 )
Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov
& Reinecke (eds), 2012)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
68. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
Assumptions in the model
1 The (interaction) effects do not depend on other
Method×Trait combinations a person might receive;
(“no carry-over effects”, “SUTVA”, “independence
assumption”)
2 There is no separate Person main effect: Trait and Method
within Person already capture all within-person correlation
(“method variance is the only systematic
variance”, COVU( ijk , ξik ) = 0 and
COVU( ijk , ηik ) = 0 )
Assumption 2 can sometimes be relaxed (Oberski et al in Salzborn, Davidov
& Reinecke (eds), 2012)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
69. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
The parameters of interest in the model are
• The variance over persons in the Trait effect;
• The variance over persons in the Method effect.
Expressed as proportions of the total variance over persons of
Yjk , these two quantities equal, respectively,
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
70. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
The parameters of interest in the model are
• The variance over persons in the Trait effect;
• The variance over persons in the Method effect.
Expressed as proportions of the total variance over persons of
Yjk , these two quantities equal, respectively,
• The reliability κjk of a question asking Trait j with Method k
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
71. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
The parameters of interest in the model are
• The variance over persons in the Trait effect;
• The variance over persons in the Method effect.
Expressed as proportions of the total variance over persons of
Yjk , these two quantities equal, respectively,
• The reliability κjk of a question asking Trait j with Method k
• The correlation between two different questions that is
purely due to them being measured with the same method.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
72. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Formal model and assumptions
The parameters of interest in the model are
• The variance over persons in the Trait effect;
• The variance over persons in the Method effect.
Expressed as proportions of the total variance over persons of
Yjk , these two quantities equal, respectively,
• The reliability κjk of a question asking Trait j with Method k
• The correlation between two different questions that is
purely due to them being measured with the same method.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
73. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of measurement error with the MTMM design
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
74. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Design requirements
What design is needed to estimate this model?
Response = Trait + Method + Trait × Method+
Trait × Person + Method × Person+
Person × Moment
Yijk = τijk + ηij + ξik + ijk ,
i Indexes persons; j indexes traits; k indexes methods.
• The model suggests that a Person×Method×Trait factorial
experiment would allow for the estimation of the reliability
and method variance.
• Residual or “measurement error” error Person × Moment is
estimated by Person × Trait × Method interaction.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
75. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Design requirements
What design is needed to estimate this model?
• A Person×Method×Trait factorial experiment would ask
the same question in different ways (Methods) and use
different methods to ask the same questions, within each
person;
• Campbell and Fiske introduced such designs in 1959
under the name “Multitrait-multimethod” (MTMM)
experiment.
• Not all Trait-Method combinations are necessary, but at
least one repetition within each person is required (Saris,
Satorra & Coenders, 2004).
• Under the model and assumptions 1 and 2, the MTMM
design will provide data that allow for the estimation of the
reliability and method variance (“invalidity”).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
76. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Design requirements
Example of an MTMM experiment
On an average weekday, how much time, in total...
T = 1 ...do you spend watching television?
T = 2 ...do you spend listening to the radio?
T = 3 ...do you spend reading the newspapers?
Scales:
M = 1: 8pt (hours)
M = 2: Write in hours and minutes
M = 3: 7pts vague quantifiers
Each respondent answered all three questions in two different
ways.
The repetition was given at the end of the interview (after
approximately 50 minutes passed)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
77. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Estimation issues
Yijk = τijk + λjk ηij + ξik + ijk .
• The model can be estimated with regression (with Person
a random factor);
• Not flexible enough: little influence on covariance structure
and λjk not possible.
• The model can also be recognized as a factor analysis or
more generally as a structural equation model (SEM),
• through transformation as an IRT or latent class model.
• The SEM framework allows enough flexibility to estimate
the parameters of interest: trait, method and residual
variance or r2, v2, and quality q2.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
78. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
The model as a SEM (or IRT or latent class) model
M1 M2 M3
T1 T2 T3
y11 y21 y31 y12 y22 y32 y13 y23 y33
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
79. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Another example
COMPARING QUESTIONS WITH AGREE/DISAGREE RESPONSE OPTIONS TO QUESTIONS WITH ITEM-SPECIFIC RESPONSE OPTIONS 69
Table 4: Experiment 2 of round 2
Introduction Statements Answer categories
Main Using this card, - There is a lot of variety in my work - not at all true
questionnaire please tell me how - My job is secure - a little true
true each of the - My health or safety is at risk because - quite true
“A/D” following statements of my work - very true
is about your current job.
SC group 1 The next 3 questions - Please choose one of the following to - not at all varied
are about your describe how varied your work is. - a little varied
IS current job. - Please choose one of the following to - quite varied
describe how secure your job is - very varied
- Please choose one of the following to (same type of response
say how much, if at all, your work puts scale using terms secure
your health and safety at risk. and safe instead of varied)
SC group 2 - Please indicate, on a scale of 0 to 10, Horizontal 11 point
how varied your work is, where 0 is not scale only labelled at the
IS at all varied and 10 is very varied. end points
- Now please indicate, on a scale of 0 to
10, how secure your job is, where 0 is
not at all secure and 10 is very secure.
- Please indicate, on a scale of 0 to 10,
how much your health and safety is at
risk from your work, where 0 is not at
all at risk and 10 is very much at risk.
Table 5: The means reliability, validity and quality of the three questions of experiment 2 in Round 2 of the ESS across 10 countries for the
different methods (standard deviations in brackets)
Reliability r2
Validity v2
Quality q2
Method Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Source: R´evilla, Saris & Krosnick, (2010)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
80. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Results from another example
- Please choose one of the following to (same type of response
say how much, if at all, your work puts scale using terms secure
your health and safety at risk. and safe instead of varied)
SC group 2 - Please indicate, on a scale of 0 to 10, Horizontal 11 point
how varied your work is, where 0 is not scale only labelled at the
IS at all varied and 10 is very varied. end points
- Now please indicate, on a scale of 0 to
10, how secure your job is, where 0 is
not at all secure and 10 is very secure.
- Please indicate, on a scale of 0 to 10,
how much your health and safety is at
risk from your work, where 0 is not at
all at risk and 10 is very much at risk.
Table 5: The means reliability, validity and quality of the three questions of experiment 2 in Round 2 of the ESS across 10 countries for the
different methods (standard deviations in brackets)
Reliability r2
Validity v2
Quality q2
Method Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
A/D(4) .65 .59 .61 .99 .98 .99 .64 .58 .60
(.09) (.18) (.15) (.02) (.03) (.03) (.10) (.18) (.15)
IS(4) .80 .80 .80 1 1 1 .80 .80 .80
(.14) (.13) (.14) (0) (0) (0) (.14) (.13) (.14)
IS(11) .81 .83 .77 .98 .98 .98 .80 .82 .76
(.09) (.11) (.12) (.03) (.03) (.04) (.10) (.12) (.14)
using a truth scale with the same number of categories for all
three questions (around .7 to .9 versus .5 to .6). The position
of the IS scale in the supplementary questionnaire is not an
issue as the better quality of the IS scale is also observed both
when it comes first and when it comes later.
Possibly the order of the observations with the different
scale types has an impact on the size of the differences since
we see fewer differences in this second experiment than in
the first, but this may also be linked to the subject matter
of the experiments or to other characteristics of the methods
used (such as the number of points). More research is needed
to determine this, however the important point here is that in
different combinations, the superiority of the IS in terms of
scale with 11 categories was also better than the IS scale with
4 categories. So, not only might the kind of scale (IS versus
A/D) impact the total quality of a measure, but so might the
length of the scale (number of response categories). How-
ever, it seems that this effect varies across countries.
Experiments in Round 3 of the
ESS
In round 3 of the ESS again two SB-MTMM experiments
have been done which allow the comparison of the IS scales
with A/D scales. The attraction of these experiments is thatPredicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
81. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Results from another example
Quality q2
Q1 Q2 Q3
.64 .58 .60
(.10) (.18) (.15)
.80 .80 .80
(.14) (.13) (.14)
.80 .82 .76
(.10) (.12) (.14)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
82. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Results from another example
• It looks like there is much more measurement error
(residual variance) in the agree-disagree questions than
there is in the item-specific scales.
• This was true over all countries (shown is the average over
countries).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
83. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Results from another example
• It looks like there is much more measurement error
(residual variance) in the agree-disagree questions than
there is in the item-specific scales.
• This was true over all countries (shown is the average over
countries).
• Still wonder whether the same would be found with other
topics and under other conditions, and with other
combinations of methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
84. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Estimation of the model
Results from another example
• It looks like there is much more measurement error
(residual variance) in the agree-disagree questions than
there is in the item-specific scales.
• This was true over all countries (shown is the average over
countries).
• Still wonder whether the same would be found with other
topics and under other conditions, and with other
combinations of methods.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
85. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Are some types of questions better than others?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
86. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
87. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
88. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
89. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
• Effective sample size per country is at least 1500.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
90. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
• Effective sample size per country is at least 1500.
• Each experiment usually estimates the quality for 9
questions (Method-Trait combinations).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
91. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
• Effective sample size per country is at least 1500.
• Each experiment usually estimates the quality for 9
questions (Method-Trait combinations).
• Range of topics is reasonably diverse, though factual
questions are underrepresented.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
92. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
• Effective sample size per country is at least 1500.
• Each experiment usually estimates the quality for 9
questions (Method-Trait combinations).
• Range of topics is reasonably diverse, though factual
questions are underrepresented.
• In total about 5000 questions available, but only 3000 of
those will be used here for various reasons.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
93. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• The examples given so far come from a much larger series
of MTMM experiments;
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• So far there have been five rounds (2002, 4, 6, 8, and 10).
• The experiments are done in 20-30 European countries
every two years;
• Effective sample size per country is at least 1500.
• Each experiment usually estimates the quality for 9
questions (Method-Trait combinations).
• Range of topics is reasonably diverse, though factual
questions are underrepresented.
• In total about 5000 questions available, but only 3000 of
those will be used here for various reasons.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
94. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• In addition to the ESS, an older series of experiments also
exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s)
• These add another 1089 questions for which reliability and
validity coefficients are estimated
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
95. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• In addition to the ESS, an older series of experiments also
exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s)
• These add another 1089 questions for which reliability and
validity coefficients are estimated
• Combining the two datasets (ESS question qualities and
Old experiment qualities, we created a database of 3011
questions with their reliability and validity estimates.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
96. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• In addition to the ESS, an older series of experiments also
exists (F. Andrews; K¨oltringer; Saris; Billiet, 1990’s)
• These add another 1089 questions for which reliability and
validity coefficients are estimated
• Combining the two datasets (ESS question qualities and
Old experiment qualities, we created a database of 3011
questions with their reliability and validity estimates.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
97. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Reliability and validity estimates of 3011 questions
Reliability coefficient
Reliability coefficient
Frequency
0.4 0.6 0.8 1.0
0200400600800
Validity coefficient
Validity coefficient
Frequency
0.2 0.4 0.6 0.8 1.0
050010001500
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
98. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Logit transform of Reliability and validity estimates
Reliability coefficient, logit
Validity coefficient
Frequency
0 2 4 6
0200400600800
Validity coefficient, logit
Validity coefficient
Frequency
0 2 4 6
0100200300400500
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
99. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
100. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
101. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
• For English source version, experts double-coded
questions independently, then created consensus codes;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
102. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
• For English source version, experts double-coded
questions independently, then created consensus codes;
• Non-expert codes were quality-controlled by detailed
comparison with consensus codes for the English source;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
103. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
• For English source version, experts double-coded
questions independently, then created consensus codes;
• Non-expert codes were quality-controlled by detailed
comparison with consensus codes for the English source;
• In a meeting between the experts and each other coder,
the discrepancies were discussed and either corrected or
left in as true differences.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
104. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Coding design characteristics of the 3011 questions
• For each of the 3011 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
• For English source version, experts double-coded
questions independently, then created consensus codes;
• Non-expert codes were quality-controlled by detailed
comparison with consensus codes for the English source;
• In a meeting between the experts and each other coder,
the discrepancies were discussed and either corrected or
left in as true differences.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
105. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
• absolute
• avgabs intro
• avgabs total
• avgsy total
• avgwrd intro
• avgwrd total
• balance
• centrality
• computer.assisted
• concept
• country
• domain
• dont know
• encourage
• fixrefpoints
• form basic
• future
• labels
• instr interv
• instr respon
• interviewer
• intr request
• intropresent
• knowledge
• labels gramm
• labels order
• language
• motivation
• opinionother
• past
• position
• questiontype
• scal neutral
• scale basic
• scale corres
• scale trange
• scale urange
• showc boxes
• showc horiz
• showc letter
• showc over
• showc quest
• showc start
• socdesir
• stimulus
• subjectiveop
• symmetry
• used WH word
• usedshowcard
• visual
• from
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
106. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Domain of question # questions
Internatl politics 64
Health 190
Living conditions 453
Other beliefs 292
Work 469
Personal relations 320
Consumer behavior 34
Leisure activts 131
National gvt 141
Institutions 284
Political parties 30
Trade unions 12
Economy 237
Other 354
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
107. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Description of the data
Concept of question # questions
Evaluative belief 713
Feeling 903
Importance 96
Expectation 39
Facts, behavior 63
Judgement 123
Relationship 8
Evaluation 704
Norm 57
Policy 250
Right 4
Action tendency 51
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
108. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
109. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
110. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
• The next step was to relate the design characteristics to
the quality estimates:
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
111. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
• The next step was to relate the design characteristics to
the quality estimates:
• Can the quality estimates be predicted from the design
characteristics?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
112. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
• The next step was to relate the design characteristics to
the quality estimates:
• Can the quality estimates be predicted from the design
characteristics?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
113. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis dataset
• For each of the 3011 questions, we have in the database:
• The estimated quality (reliability and validity coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
• The next step was to relate the design characteristics to
the quality estimates:
• Can the quality estimates be predicted from the design
characteristics?
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
114. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis
• Prediction by random forests of regression trees (Breiman
2001);
• Two separate models: one for validity and for reliability
coefficients;
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
115. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis
• Prediction by random forests of regression trees (Breiman
2001);
• Two separate models: one for validity and for reliability
coefficients;
• Missing data are multiply imputed using the MICE
algorithm (van Buuren & Groothuis-Oudshoorn 2011).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
116. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis
• Prediction by random forests of regression trees (Breiman
2001);
• Two separate models: one for validity and for reliability
coefficients;
• Missing data are multiply imputed using the MICE
algorithm (van Buuren & Groothuis-Oudshoorn 2011).
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
117. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Example regression tree for logit(reliability coefficient)
|
domain=3,4,7,11,13,14,112
domain=3
gradation>=0.5 position< 339.5
position>=410
concept=1,2 position< 404.5
concept=1,73,78
position< 322.5
ncategories>=4.5
domain=6,101,103,120
domain=4,7,11,13,14,112
gradation< 0.5 position>=339.5
position< 410
concept=73,75,76 position>=404.5
concept=2,76
position>=322.5
ncategories< 4.5
1.955
n=1988
1.724
n=1303
0.9636
n=108
0.4959
n=36
1.198
n=72
1.793
n=1195
1.642
n=722
2.023
n=473
1.544
n=108
1.28
n=76
2.17
n=32
2.165
n=365
1.97
n=217
2.45
n=148
2.394
n=685
1.489
n=138
2.622
n=547
2.384
n=233
2.799
n=314
2.681
n=260
3.364
n=54
Example regression tree for reliability coefficient
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
118. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis with random forests
• R2 based on out-of-bag (crossvalidation) mean square
error is 85% for validity coefficient and 60% for reliability
coefficient.
• Importance measures indicate domain, number of
categories, concept, position in the questionnaire, number
of syllables, country, number of words, fixed reference
points, and other linguistic complexity measures are the
most influential for reliability.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
119. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis with random forests
• R2 based on out-of-bag (crossvalidation) mean square
error is 85% for validity coefficient and 60% for reliability
coefficient.
• Importance measures indicate domain, number of
categories, concept, position in the questionnaire, number
of syllables, country, number of words, fixed reference
points, and other linguistic complexity measures are the
most influential for reliability.
• For validity, in addition to the above, order of the labels
(positive-negative), centrality of the trait and other
characteristics are also important.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski
120. Introduction Question design Modeling measurement error Estimating measurement error Predicting measurement error Concl
Meta-analysis of the MTMM experiments
Meta-analysis with random forests
• R2 based on out-of-bag (crossvalidation) mean square
error is 85% for validity coefficient and 60% for reliability
coefficient.
• Importance measures indicate domain, number of
categories, concept, position in the questionnaire, number
of syllables, country, number of words, fixed reference
points, and other linguistic complexity measures are the
most influential for reliability.
• For validity, in addition to the above, order of the labels
(positive-negative), centrality of the trait and other
characteristics are also important.
Predicting the quality of a survey question from its design characteristics: SQP Daniel Oberski