The document summarizes a study that tested the measurement equivalence of versions of the Business-focused Inventory of Personality (BIP) personality test translated into different European languages. The study found:
1) All BIP scales demonstrated the same basic factor structure across language versions, showing structural equivalence.
2) Most scales showed similar relationships between items and constructs being measured, demonstrating metric equivalence.
3) No scales showed complete comparability of scores across versions, lacking scalar equivalence.
4) Despite some differences, the study demonstrated that personality scores can still be meaningfully compared across the different language versions of the BIP through the identification of fully invariant items and the use of techniques to handle culturally specific
Ch05 Concepts, Operationalization, and Measurement
ITC Measurement equivalence
1. Measurement equivalence of
Business-focused Inventory of
Personality :
A comparison of European language
versions
Tao Li
Hogrefe Ltd. UK
The 7th Conference of the International Test Commission
Hong Kong July 2010
2. To set the scene
• An international organisation wishes to use a
personality test to select managers globally for
expatriate assignments
– Does the test measure the same traits for the
candidates?
– Are the scores comparable across countries?
3. Measurement equivalence
• The relative comparability of the wording, scaling,
and scoring of constructs across groups
– A prerequisite for valid group comparison
– Implicitly assumed but RARELY examined
4. Levels of Measurement Equivalence
• Constructs have the same basic factor structure
across groups
Structural • The constructs have similar meaning
• The strength of the relationships between items and
constructs being measured are equivalent
Metric • The constructs have the same meaning
• Measure the constructs on the same scale
• The groups use the response scale in a similar way
Scalar • Complete comparability of scores
5. Partial Invariance
• Full invariance: ideal but often impossible
• Partial invariance: some, but not all, of the item are
equivalent across groups
9. Results
• Fit indices:
– Comparative Fit Index (CFI): >0.95 good fit
– The root mean square error of approximation (RMSEA): <0.08 good fit
• Structural equivalence: all scales
– min CFA=0.956; max RMSEA=0.070
• Metric equivalence: 11/14 scales
– min CFA=0.950; max RMSEA=0.060
• Scalar equivalence: none
• Partial equivalence: all scales
– min CFA=0.950; max RMSEA=0.055
10. Full invariance items
Scales Full invariant items
Achievement Motivation 5
Power Motivation 3
Leadership Motivation 3
Conscientiousness 3
Flexibility 3
Action Orientation 4
Social Sensitivity 4
Openness to Contact 4
Sociability 3
Team Orientation 2
Assertiveness 3
Emotional Stability 2
Working under Pressure 3
Self confidence 4
11. Differential item functioning
• People from different groups with the same
underlying ability/trait level have a different
probability of endorsing an item
• MIMIC approach to DIF detecting
– Modelling DIF and latent mean difference
simultaneously
12. MIMIC approach to DIF detecting
Country
Item
Item
Construct Item
Item
Item
13. Example
• BIP Openness to Contact scale
• Portuguese vs. German.
Assume no DIF DIF effect modelled
Latent mean difference 0.20 0.52
Effect size Small Medium
14. Measurement equivalence involving emic items
• Etic vs. Emic
— using the “same” items vs. using culturally specific items
• How to compare combined etic-emic instruments?
15. Missing data technique
• Introducing “imaginary” observed items
Country A Country B
Common Common
item item
Common Common
item item
Common
Common item
item Construct
Construct
Emic item A Emic item B
Imaginary Imaginary
Emic item B Emic item A
16. Example
• BIP Flexibility scale: Spanish vs. German
– 12 common items
– 2 items unique to German version
– 1 items unique to Spanish version
Model fit CFI: 0.978; RMSEA: 0.046
Latent mean difference 0.10
17. Summary
• Measurement equivalence
– All BIP scales demonstrated structural invariance
– Most scales showed metric invariance
– No scales presented scalar invariance
– Full invariant items were identified for each scale
18. Implications
• Common items make it possible to equate
scores across versions in the presence of DIF
• Comparing instruments involving emic items is
possible and necessary