This slideshow was used for teacher training workshops I conducted in the fall of 2011 at the Center for English as a Second Language, University of Arizona (Tucson, USA).
1. ‘Tes%ng a test’ – Evalua%ng our
Assessment Tools
Eddy White, Ph.D.
Assessment
Coordinator
Center for English as a
Second Language
University of Arizona
2. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 2
3. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 3
8. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 8
12. ESL Assessment‐Purposes
• idenGfy strengths and weaknesses of individual
students,
• adjust instrucGon to build on students’ strengths
and alleviate weaknesses,
• monitor the effecGveness of instrucGon,
• provide feedback to students (sponsors,
parents,etc.), and
• make decisions about the advancement of
students to the next level of the program.
(Source: ESL Senior High Guide to ImplementaGon, 2002)
12
13. Consider • Research suggests that
teachers spend from
one‐quarter to one‐third
of their professional Gme
on assessment‐related
acGviGes.
• Almost all do so
without the benefit
of having learned the
principles of sound
assessment.
(S%ggins, 2007)
15. Assessment literacy
• the kinds of assessment know‐how
and understanding that teachers
need to assess their students
effecGvely
• Assessment literate educators
should have knowledge and skills
related to the basic principles of
quality assessment pracGces
(SERVE Center, University of North Carolina, 2004)
16. Assessment Literacy
Know‐how and
understanding
teachers need to
assess students
effec%vely and
maximize
learning
17. Importance of classroom
assessment
• We may not like it, but
students can and do
ignore our teaching;
• however if they want to
get a qualificaGon, they
have to parGcipate in the
assessment processes we
design and implement.
(Brown, S. 2004. Assessment for learning. Learning and Teaching in Higher
Educa0on, 1, 81‐89)
22. Assessment framework
• ‐ the series of assessment
tools (exams, tasks, projects,
etc.) that are scored and used
to arrive at a summa%ve grade
for a course
• ‐it should be skills‐based and
knowledge‐ based (i.e. Ss
demonstrate what they know
about and can do with English)
• based on learning outcomes
23. • The spirit and style
of student
assessment defines
the de facto
curriculum.
(Rowntree, 1987)
de facto= exisGng in fact, actual,
whether intended or not
24.
25. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 25
30. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 30
32. A test . . .
• is a method of measuring a person’s
ability, knowledge, or performance in
a given domain.
• is an instrument – a set of
techniques, procedures, or items –
that requires performance on the
part of the test‐taker.
32
34. A test must measure
• Some tests measure general ability, while
others focus on very specific competencies or
objecGves.
• Examples
• A mulG‐skill proficiency test measures general
ability;
• a quiz on recognizing correct use of definite
arGcles measures very specific knowledge.
34
35. • A test measures
performance, . . .
• but, the results
imply the test‐
takers ability, or
competence.
35
36. • Performance‐
based tests
sample the test‐
takers actual use
of language,
• but from those
samples the test
administrator
infers general
competence.
36
37. • A well‐constructed
test is an
instrument that
provides an
accurate measure
of a test‐taker’s
ability within a
parGcular domain.
• Construc%ng a
good test is a
complex task.
37
42. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions 42
45. • Exploring how
principles of
language assessment
can and should be
applied to formal
tests.
• These principles
apply to assessment
of all kinds.
• How to use these
principles to design a
good test.
45
46. • What are the
‘five cardinal
criteria’ that
can be used to
design and
evaluate all
types of
assessment?
46
60. 2. Reliability
• Is all work being
consistently marked to the
same standard?
60
61. • A reliable test is consistent and
dependable.
• If you give the same test to the same
student or matched students on two
different occasions, the test should
yield similar results.
61
62. What
factors
contribute
to the
unreliability
of a test?
62
63. Test Unreliability‐
contribuGng factors
• Student related reliability
• Rater reliability (inter, intra)
• Test administra%on reliability
• Test reliability
63
69. • A valid test of reading
ability . . .
• actually measures reading
ability –
• not math skills
• or previous knowledge in
a subject
• nor wriGng skills
• nor some other variable of
quesGonable relevance
69
71. Content validity
• If a test requires the test‐taker to perform the
behavior that is being measured. . .
• it can claim content‐related evidence of validity
(content validity)
• e.g. A test of a person’s ability to speak an L2
requires the student to actually speak within
some sort of authenGc context.
• A test with paper and pencil mulGple choice
quesGons requiring grammaGcal judgments does
not achieve content validity.
71
72. • direct tes%ng –
Another way of involves the test‐taker
in actually performing
understanding the target task
content validity • indirect tes%ng‐
students not
is to consider performing the task
the difference itself, but a related
task.
between direct
• e.g. tes%ng oral
and indirect produc%on of
tesGng. syllable stress
72
76. Face validity
• The extent to which students view the
assessment as:
1. fair
2. relevant
3. useful for improving learning
• Face validity refers to the degree to which a
test looks right, and appears to measure the
knowledge or abiliGes it claims to measure.
76
77. High face validity: the test . . .
• is well‐constructed, expected format with
familiar tasks
• is clearly doable within alloued Gme
• has items that are clear and uncomplicated
• direcGons that are crystal clear
• has tasks related to course work (content
validity)
• has a difficulty level that presents a
reasonable challenge
77
82. AuthenGcity checklist
• Is the language in the test as natural as
possible?
• Are topics as contextualized as possible rather
than isolated?
• Are topics and situaGons interesGng
enjoyable, and/or humorous?
• Is some themaGc organizaGon provided, such
as through a story line or episode?
• Do tasks represent, or closely approximate,
real‐world tasks?
82
86. Washback
• Classroom assessment: the affects of an
assessment on teaching and learning prior to
the assessment itself (preparaGon)
• Another form of washback=the informaGon
that ‘washes back’ to students in the form of
useful diagnoses of strengths and weaknesses.
• Formal tests provide no washback if students
receive a simple leuer grade or single overall
numerical score.
86
87. A test that provides beneficial
washback . . .
• posiGvely influences what and how teachers
teach
• posiGvely influences what and how students
learn
• offers learners a chance to adequately prepare
• gives learners feedback that enhances their
language development
• provides condiGons for peak performance by
the learner
87
90. Targets 1. My background
2. Classroom based
assessment
3. Tests ‐ purposes/
func%ons
4. The ‘cardinal
criteria’ for
evalua%ng a test
5. Conclusions
92. Answer. A ‘good’ test:
• can be given within appropriate administraGve
constraints,
• is dependable,
• accurately measures what you want it to
measure,
• the language in the test is representaGve of
real‐world language use, and
• the test provides informaGon that is useful for
the learner.
92
93. • These principles will help
you make accurate
judgments about the
English competence of
your students.
• They provide useful
guidelines for evaluaGng
exisGng tests, and
designing our own.
93
94. Assessment Literacy
Know‐how and
understanding
teachers need to
assess students
effec%vely and
maximize
learning
95. • There is no gewng
away from the fact
that most of the
things that go wrong
with assessment are
our fault,
• the result of poor
assessment design‐
and not the fault of
our students.
(Race et al., 2005)
96. • Improving student
learning implies
improving the
assessment system.
• Teachers oxen assume
that it is their teaching
that directs student
learning.
• In pracGce, assessment
directs student learning,
because it is the
assessment system that
defines what is worth
learning.
(Havnes, 2004, p.1)
97. (Boud & Falchikov, 2007)
• There is substanGal evidence that
assessment, rather than teaching, has the
major influence on students’ learning.
• It directs auenGon to what is important,
acts as an incenGve for study, and has a
powerful effect on student’s approaches
to their work.
Rethinking Assessment in Higher Educa0on
98. “We owe it to
ourselves and our
students to devote at
least as much energy
to ensuring that our
assessment practices
are worthwhile as we
do to ensuring that
we teach well”.
Dr. David Boud,
University of Technology,
Sydney, Australia
98