The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External
1. The Art and Science of Test Development—Part G
Psychometric/technical statistical analysis: External
Kevin S. McGrew, PhD.
Educational Psychologist
Research Director
Woodcock-Muñoz Foundation
The basic structure and content of this presentation is grounded extensively on the test
development procedures developed by Dr. Richard Woodcock
2. “In god we trust….all others must show data”
(unknown source)
Test authors and
publishers have
standards-based
responsibility to provide
supporting psychometric
technical information re:
tests and battery
Typically in the form of a series of
technical chapters in manual or a
separate technical manual
4. External evidence is
Theoretical Domain - CHC g focused on
relations between
test battery
variables (measures
or latent constructs)
and other
external (outside of
Gf Gv Glr Gs battery)
constructs,
measures, or
criteria
Gc Gsm Ga
Measurement or empirical domain
5. External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Group differentiation
• Structural equation modeling
• Correlation of observed measures with other measures
• Multitrait-Multimethod matrix
Characteristics of • Focal constructs vary in theorized ways with other constructs
strong test validity • Measures of the constructs differentiate existing groups that
program are known to differ on the constructs
• Measures of focal constructs correlate with other validated
measures of the same constructs
• Theory-based hypotheses are supported, particularly when
compared to rival hypotheses
6. External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Correlation of observed measures with other measures
Characteristics of • Measures of focal constructs correlate with other validated
strong test validity measures of the same constructs
program
7. Concurrent external validity example:
WJ III GIA clusters correlations with other IQ
battery full scale scores
Provide evidence at select key age groups (related to intended age range
and purpose of battery) in normal samples
8. Concurrent external validity example:
WJ III Achievement (reading, math, writing) cluster correlations
with measures from other (external) ach. batteries
Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples
9. Concurrent external validity example:
Comparative predictive validity (of achievement)
Comparisons of correlations (across reading, math, written language, and total
achievement domains) of the average WJ III GIA and Predicted Achievement score
options and full scale scores from other (external) major intelligence batteries
Other
Battery WJ III WJ III WJ III
Total (Full Pred. GIA- GIA-
Scale) Score Ach. Extended Standard
DAS .41 -- .52 .47
WPPSI-R .37 -- .52 .47
WISC-III .50 .68 .67 .63
WAIS-III .39 .56 -- .56
KAIT .53 .56 -- .56
Provide evidence at select key age groups (related to intended age range
and purpose of battery) in normal samples
10. External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Correlation of observed measures with other measures
Characteristics of • Focal constructs vary in theorized ways with other constructs
strong test validity • Measures of focal constructs correlate with other validated
program measures of the same constructs
11. • Focal constructs vary in theorized ways with other constructs
• Measures correlate with other validated measures of the same constructs
(select illustrative examples—concurrent external validity correlations)
?
12. • Focal constructs
vary in theorized
ways with other
constructs
• Measures
correlate with
other validated
measures of the
same constructs
(select illustrative
example—
exploratory factor
analysis of select
WJ III and WISC-III
tests)
13. • Focal constructs vary in theorized ways with other constructs
• Measures correlate with other validated measures of the same constructs
(select illustrative example—WJ III Block Rotation [Gv-Vz] correlation
with WISC-III tests in grade 3-5 sample)
WJ III
BLKROT
WISC-III Tests
Information 0.27
Coding 0.08
Similarities 0.29
Picture Arangment 0.14
Arithmetic 0.09
Block Design 0.38
Vocabulary 0.23
Object Assembly 0.31
Comprehension 0.15
Symbol Search 0.23
Digit Symbol 0.08
Note: Absolute magnitude of correlations artificially low due to sample range restriction.
Important observation is relative magnitude of correlations
14. • Focal constructs
vary in theorized
ways with other
constructs
• Measures
correlate with
other validated
measures of the
same constructs
(select illustrative
example—
confirmatory factor
analysis of select
WJ III and WISC-III
tests)
Phelps et al. (2005)
WISC-III/WJ III cross-
battery (joint) CFA
15. Phelps et al. (2005) WISC-III/WJ III cross-battery (joint) CFA
17. External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Structural equation modeling
Characteristics of • Theory-based hypotheses are supported, particularly when
strong test validity compared to rival hypotheses
program
20. Structural equation modeling external validity evidence example
f1
r17
Visual Matching .82 r20
.78Mem for Sentences
r18 .64 .44 Mem .69
Decision Speed Gs Span r21
.73 Mem for Words
r19
.62
Cross Out
.46 f9
r22
.62 Aud Working Mem
Work .67
Mem r23
Numbers Reversed
f2
r1
Block Rotation .40
r2 .44 f5 r11
.89
Spatial Relations Verbal Comp
Gv
3
.35
.9
r3
Picture Recognition .78 Oral Comp r12
Gc
.8
f3 .79
5 .87
r4
Memory for Names .52 General Information r13
r5 .48
Retrieval Fluency .93 g f6
.07
r14
r6 .69 Glr .94 .63
Analysis-Synthesis
DR: Vis-Aud Lrng 8
.74
.83
.7 f7 Concept Formation
r15
Gf
.84
r7
Vis-Aud Learning .63
f4
r16
r8 Numerical Reas
Sound Blending .64
.19
r9 .49
.36
Incomplete Words Ga
r10
.45
Sound Patterns .27
.96 Word Attack
r24
WA
Ages 6-8
21. External Stage of Test Development
Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Group differentiation
Characteristics of • Measures of the constructs differentiate existing groups that
strong test validity are known to differ on the constructs
program
28. (Note: The following information is almost identical to that presented in Part F—Internal
psychometric/statistical analysis)
Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and
rear its ugly head when performing the final statistical analysis, especially multivariate
validity analyses (SEM, DF, multiple regression, EFA, CFA)
Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final
statistical analysis
Data screening, data screening, data screening!!!!……. prior to do performing final statistical
analysis
• Compute extensive descriptive statistical analysis for all variables (e.g., histograms,
scatterplots, box-whisker plots, etc.)
• More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc.
Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix
sampling) introduce an extreme level of “back end” complexity to routine
statistical/psychometric analysis
Know your limits, level of expertise, and skills. Even those with extensive test development
experience often need access to trusted measurement/statistical consultants
(cont. next slide)
29. Published statistics/psychometric information needs to be based on final publication
length tests
• Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities
• Correlations between short /and or long norming versions of a test and other tests,
that differ in test length (number of items) from publication length test, may need special
adjustments/corrections.
Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your
work and progress. Do it constantly. Build redundancy into your files and people skill
sets
Sad fact: Majority of test users do NOT pay attention to the fancy and special
psychometric/statistical analysis you report in technical chapters or manuals. Be
prepared for post-publication education via other methods.
Post-manual publication technical reports of special/sophisticated analyses are good
when publication time-line pressures dictate making difficult decisions.
30. Most test developers are stuck in a methodological rut. There is much that
can be learned about the internal and external validity of a test battery using
lesser-used statistical methods.
• Multidimensional scaling (MDS); cluster analysis, CART
(classification and regression tree analysis), MARS (multivariate
applied regression splines)
Use of curve smoothing procedures to better estimate population
parameters from statistical analyses across age groups.
Multiple group CFA (planned incomplete data) reference variable validity
designs and methods (Jack McArdle).
31. End of Part G
Additional steps in test development process will be
presented in subsequent modules as they are developed