2. Importance of Test Theories
• Estimate examinee ability and
how the contribution of error
might be minimized
• Disattenuation of variables
• Reporting true scores or ability
scores and associated
confidence
3. Psychometric History
• Lord (1952, 1953) and other
psychometricians were interested in
psychometric models with which to assess
examinees independently of the particular
choice of items or assessment tasks that
were used in the assessment.
• Measurement practices would be enhanced
if item and test statistics would be made
sample independent.
• Birnbaum (1957, 1958)
• George Rasch (1960)
• Wright (1968)
4. Limitations of the CTT
• Item difficulty and item discrimination
are group dependent.
• The p and r values are dependent on
the examinee sample from which they
are taken.
• Scores are entirely test dependent.
• No basis to predict the performance of
examinees on an item.
5. Assumptions in IRT
• Unidimensionality
– Examinee performance is a single
ability
• Response → Dichotomous
– The relationship of examinee
performance on each item and the
ability measured by the test is
described as monotonically
increasing.
6. • Monotonicity of item performance
and ability is typified in an item
characteristic curve (ICC).
• Examinees with more ability have
higher probabilities for giving
correct answers to items than
lower ability students
(Hambleton, 1989).
7. • Mathematical model
linking the observable
dichotomously scored
data (item performance)
b
a to the unobservable data
(ability)
c
• Pi(θ) gives the probability
of a correct response to
item i as a function if
ability (θ)
• b is the probability of a
b=item difficulty correct answer (1+c)/2
a=item discrimination
c=psuedoguessing parameter
8. • Two-parameter
model: c=0
• One-parameter
a model: c=0, a=1
b
9. • Three items
showing
different item
difficulties (b)
12. Polychotomous IRT Models
• Having more than 2 points in the
responses (ex. 4 point scale)
• Partial credit model
• Graded response model
• Nominal model
• Rating scale model
14. • In IRT measurement framework,
ability estimates of an examinee
obtained from a test that vary difficulty
will be the same.
• Because of the unchanging ability,
measurement errors are smaller
• True score is determined each test.
• Item parameters are independent on
the particular examinee sample used.
• Measurement error is estimated at
each ability level.
15. Test Characteristic Curve (TCC)
• TCC: Sum of ICC that
make up a test or
assessment and can be
used to predict scores of
examinees at given ability
levels.
TCC(Ѳ)=∑Pi(Ѳ)
• Links the true score to the
underlying ability
measures by the test.
• TCC shift to the right of
the ability scale=difficult
items
16. Item Information Function
• I(Ѳ), Contribution of
particular items to the
assessment of ability.
• Items with higher
discriminating power
contribute more to
measurement precision
than items with lower
discriminating power.
• Items tend to make their
best contribution to
measurement precision
around their b value.
18. 1
2
2
1 2 3
0.8
1.5
0.6 4
1
1
0.4
0.2 0.5
3
4
0 0
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
Ability (θ) Ability (θ)
Four item characteristic curves Item information for four test items
Figure 6: Item characteristics curves and corresponding item information functions
19. their corresponding IFF
Test Information Function
• The sum of item information functions in a test.
• Higher values of the a parameter increase the
amount of information an item provides.
• The lower the c parameter, the more information an
item provides.
•
• The more information provided by an assessment at
a particular level, the smaller the errors associated
with ability estimation.
20. 2
1.5
1
0.5
0
0 3
Ability (θ)
Figure 7: Test information function for a four–item test
21. Item Parameter Invariance
• Item/test characteristic
functions and item/test
information functions are
integral features of IRT.
22. Benefits of Item
Response Models
• Item statistics that are independent of the
groups from which they were estimated.
• Scores describing examinee proficiency or
ability that are not dependent on test
difficulty.
• Test models that provide a basis for
matching items or assessment tasks to
ability levels.
• Models that do not require strict parallel
tests or assessments for assessing
reliability.
23. Application of IRT on
Test Development
• Item Analysis
– Determining sample invariant item
parameters.
– Utilizing goodness-of-fit criteria to
detect items that do not fit the
specified response model (χ2,
analysis of residuals).
24. Application of IRT on
Test Development
• Item Selection
– Assess the contribution of each
item the test information function
independent of other items.
25. – Using item information functions:
• Describe the shape of the desired test
information function vs. desired range
abilities.
• Select items with information functions
that will fill up the hard to fill areas
under the target information function
• Calculate the test information function
for the selected assessment material.
• Continue selecting materials until the
test information function approximates
the target information function to a
satisfactory degree.
26. • Item banking
– Test developers can build an
assessment to fit any desired test
information function with items
having sufficient properties.
– Comparisons of items can be made
across dissimilar samples.