Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External

The Art and Science of Test Development—Part G

Psychometric/technical statistical analysis: External

Kevin S. McGrew, PhD.

Educational Psychologist

Research Director
Woodcock-Muñoz Foundation

The basic structure and content of this presentation is grounded extensively on the test
development procedures developed by Dr. Richard Woodcock

“In god we trust….all others must show data”
(unknown source)

Test authors and
publishers have
standards-based
responsibility to provide
supporting psychometric
technical information re:
tests and battery

Typically in the form of a series of
technical chapters in manual or a
separate technical manual

Calculate psychometric/measurement
statistics for technical manual/chapters

With external measures

Use Joint Test Standards as a guide

External evidence is
Theoretical Domain - CHC g focused on
relations between
test battery
variables (measures
or latent constructs)
and other
external (outside of
Gf Gv Glr Gs battery)
constructs,
measures, or
criteria
Gc Gsm Ga

Measurement or empirical domain

External Stage of Test Development

Purpose Examine the external relations among the focal construct (i.e.,
intelligence or cognitive abilities) and other constructs and/or
subject characteristics
Questions asked Do the focal constructs and observed measures “fit” within a
network of expected construct relations (i.e., the nomological
network)
Method and concepts • Group differentiation
• Structural equation modeling
• Correlation of observed measures with other measures
• Multitrait-Multimethod matrix
Characteristics of • Focal constructs vary in theorized ways with other constructs
strong test validity • Measures of the constructs differentiate existing groups that
program are known to differ on the constructs
• Measures of focal constructs correlate with other validated
measures of the same constructs
• Theory-based hypotheses are supported, particularly when
compared to rival hypotheses


network)
Method and concepts • Correlation of observed measures with other measures

Characteristics of • Measures of focal constructs correlate with other validated
strong test validity measures of the same constructs
program

Concurrent external validity example:
WJ III GIA clusters correlations with other IQ
battery full scale scores

Provide evidence at select key age groups (related to intended age range
and purpose of battery) in normal samples

WJ III Achievement (reading, math, writing) cluster correlations
with measures from other (external) ach. batteries

Provide evidence at select key age groups (related to intended age range and purpose of battery) in normal samples

Comparative predictive validity (of achievement)

Comparisons of correlations (across reading, math, written language, and total
achievement domains) of the average WJ III GIA and Predicted Achievement score
options and full scale scores from other (external) major intelligence batteries

Other
Battery WJ III WJ III WJ III
Total (Full Pred. GIA- GIA-
Scale) Score Ach. Extended Standard
DAS .41 -- .52 .47
WPPSI-R .37 -- .52 .47
WISC-III .50 .68 .67 .63
WAIS-III .39 .56 -- .56
KAIT .53 .56 -- .56

Provide evidence at select key age groups (related to intended age range
and purpose of battery) in normal samples


network)
Method and concepts • Correlation of observed measures with other measures

Characteristics of • Focal constructs vary in theorized ways with other constructs
strong test validity • Measures of focal constructs correlate with other validated
program measures of the same constructs

• Focal constructs vary in theorized ways with other constructs
• Measures correlate with other validated measures of the same constructs
(select illustrative examples—concurrent external validity correlations)

?

• Focal constructs
vary in theorized
ways with other
constructs

• Measures
correlate with
other validated
measures of the
same constructs

(select illustrative
example—
exploratory factor
analysis of select
WJ III and WISC-III
tests)

• Focal constructs vary in theorized ways with other constructs
• Measures correlate with other validated measures of the same constructs
(select illustrative example—WJ III Block Rotation [Gv-Vz] correlation
with WISC-III tests in grade 3-5 sample)

WJ III
BLKROT
WISC-III Tests
Information 0.27
Coding 0.08
Similarities 0.29
Picture Arangment 0.14
Arithmetic 0.09
Block Design 0.38
Vocabulary 0.23
Object Assembly 0.31
Comprehension 0.15
Symbol Search 0.23
Digit Symbol 0.08

Note: Absolute magnitude of correlations artificially low due to sample range restriction.
Important observation is relative magnitude of correlations

• Focal constructs
vary in theorized
ways with other
constructs

• Measures
correlate with
other validated
measures of the
same constructs

(select illustrative
example—
confirmatory factor
analysis of select
WJ III and WISC-III
tests)

Phelps et al. (2005)
WISC-III/WJ III cross-
battery (joint) CFA

Phelps et al. (2005) WISC-III/WJ III cross-battery (joint) CFA

RDGFLZ LWIDNTZ PSGCMPZ
r42 r44
r43 .24
.36 .26 .50 .64 r22
r1 .19 WMATRSS
KAUDCSS .77 .69
.69 KLOGSTSS r23
r2 .76
KDEFSS KMYSCSS r24
.21 Grw
r3 Gf .67
KDOUBMSS .30 r25
.53 ANLSYNZ
.50
r4 CONFRMZ r26
VRBCMPZ f8 .67
.85 f9
.69 .70 r27
r5 Gc WPICCSS
WCOMPSS .52
.83 .89 .47 WPICASS r28
r6
WVOCSS f1 .70
.73 .80 r29
WBDSS
r7
WINFOSS g f7 r30
f10 .66 .64 SPARELZ
r8 r31
WSIMSS .59 .72 .60 BLKROTZ
Gv
.71
r9 WARITHSS Gq r32
.90 VISCLOZ
.32
.24 .51
r10 PICRECZ r33
MEMSENZ .47 .21 .36
DECSPDZ r34
r11
MEMWRDZ .80
.73Joint WJ III/WAIS-III/WMS-III/KAIT CFA r35
.45 CRSOUTZ
r12 .55 Gregg/Hoy College LD/NLD (n=200) Sample—Analysis by K. McGrew
AWKMEMZ r36 .38
Gsm .54 .69 VISMAT2Z
r13 .80 (This is NOT the complete model..only portion that
NUMREVZ r37
.66 RETFLUZ
includes Gv factor information)
r14 f2 .35
WLNSSS Gs r38
.67 RPCNAMZ
.45


network)
Method and concepts • Structural equation modeling

Characteristics of • Theory-based hypotheses are supported, particularly when
strong test validity compared to rival hypotheses
program

Structural equation modeling external validity evidence example

Structural equation modeling external validity evidence example
f1
r17
Visual Matching .82 r20
.78Mem for Sentences
r18 .64 .44 Mem .69
Decision Speed Gs Span r21
.73 Mem for Words
r19

.62
Cross Out
.46 f9
r22
.62 Aud Working Mem
Work .67
Mem r23
Numbers Reversed
f2
r1
Block Rotation .40
r2 .44 f5 r11
.89
Spatial Relations Verbal Comp
Gv

3
.35

.9
r3
Picture Recognition .78 Oral Comp r12
Gc
.8
f3 .79
5 .87
r4
Memory for Names .52 General Information r13
r5 .48
Retrieval Fluency .93 g f6
.07

r14
r6 .69 Glr .94 .63
Analysis-Synthesis
DR: Vis-Aud Lrng 8
.74
.83

.7 f7 Concept Formation
r15
Gf
.84

r7
Vis-Aud Learning .63
f4
r16
r8 Numerical Reas
Sound Blending .64
.19

r9 .49
.36

Incomplete Words Ga
r10
.45
Sound Patterns .27

.96 Word Attack
r24
WA
Ages 6-8


network)
Method and concepts • Group differentiation

Characteristics of • Measures of the constructs differentiate existing groups that
strong test validity are known to differ on the constructs
program

Group differentiation external validity evidence example:
LD vs Non-LD university samples

Group differentiation external validity evidence example:
Normal/Gifted/LD/MR samples

Group differentiation external validity evidence example—
discriminant function analysis
(Normal/Gifted/LD/MR samples)

Group differentiation external validity evidence example—
discriminant function analysis classification accuracy
(Normal/Gifted/LD/MR samples—grade 3-4)

Group differentiation external validity evidence example
(variety of “clinical disorder groups”)

(continued on next slide)

Group differentiation external validity evidence example (cont.)
variety of “clinical disorder groups”)

(Note: The following information is almost identical to that presented in Part F—Internal
psychometric/statistical analysis)

Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and
rear its ugly head when performing the final statistical analysis, especially multivariate
validity analyses (SEM, DF, multiple regression, EFA, CFA)

Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final
statistical analysis

Data screening, data screening, data screening!!!!……. prior to do performing final statistical
analysis
• Compute extensive descriptive statistical analysis for all variables (e.g., histograms,
scatterplots, box-whisker plots, etc.)

• More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc.

Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix
sampling) introduce an extreme level of “back end” complexity to routine
statistical/psychometric analysis

Know your limits, level of expertise, and skills. Even those with extensive test development
experience often need access to trusted measurement/statistical consultants
(cont. next slide)

Published statistics/psychometric information needs to be based on final publication
length tests
• Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities

• Correlations between short /and or long norming versions of a test and other tests,
that differ in test length (number of items) from publication length test, may need special
adjustments/corrections.

Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your
work and progress. Do it constantly. Build redundancy into your files and people skill
sets

Sad fact: Majority of test users do NOT pay attention to the fancy and special
psychometric/statistical analysis you report in technical chapters or manuals. Be
prepared for post-publication education via other methods.

Post-manual publication technical reports of special/sophisticated analyses are good
when publication time-line pressures dictate making difficult decisions.

Most test developers are stuck in a methodological rut. There is much that
can be learned about the internal and external validity of a test battery using
lesser-used statistical methods.
• Multidimensional scaling (MDS); cluster analysis, CART
(classification and regression tree analysis), MARS (multivariate
applied regression splines)

Use of curve smoothing procedures to better estimate population
parameters from statistical analyses across age groups.

Multiple group CFA (planned incomplete data) reference variable validity
designs and methods (Jack McArdle).

End of Part G
Additional steps in test development process will be
presented in subsequent modules as they are developed

Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (8)

Similar a Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External

Similar a Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External (20)

Más de Kevin McGrew

Más de Kevin McGrew (20)

Último

Último (20)

Applied Psych Test Design: Part G: Psychometric/technical statistical analysis: External