Testing a Test: Evaluating Our Assessment Tools

‘Tes%ng a test’ – Evalua%ng our 
Assessment Tools 
Eddy White, Ph.D. 
Assessment 
Coordinator 

Center for English as a 
Second Language 

University of Arizona

Targets  1.  My background 
2.  Classroom based 
assessment 
3.  Tests ‐ purposes/  
func%ons 
4.  The ‘cardinal 
criteria’ for 
evalua%ng a test 
5.  Conclusions   2

assessment 
func%ons 

Classroom-based Assessment

•  Assessment 
of Learning 

•  Assessment 
for Learning

assessment 
func%ons 

The goal of assessment is to . . .  

9

The goal of assessment has to be, above
all, to support the improvement of
learning and teaching. 
(Fredrickson & Collins, 1989)   

10

deﬁniGon: Classroom Assessment 

Planning 

ReporGng  Assessment  CollecGng 

Analyzing

ESL Assessment‐Purposes 
•  idenGfy strengths and weaknesses of individual 
students, 
•  adjust instrucGon to build on students’ strengths 
and alleviate weaknesses, 
•  monitor the eﬀecGveness of instrucGon, 
•  provide feedback to students (sponsors, 
parents,etc.), and  
•  make decisions about the advancement of 
students to the next level of the program.  

(Source: ESL Senior High Guide to ImplementaGon, 2002) 
12

Consider  •  Research suggests that 
teachers spend from 
one‐quarter to one‐third 
of their professional Gme 
on assessment‐related 
acGviGes. 

•  Almost all do so 
without the beneﬁt 
of having learned the 
principles of sound 
assessment. 
(S%ggins, 2007)

Teachers learn how to teach without learning 
much about how to assess. (Heritage, 2007) 

14

Assessment literacy  
•  the kinds of assessment know‐how 
and understanding that teachers 
need to assess their students 
eﬀecGvely 
•  Assessment literate educators 
should have knowledge and skills 
related to the basic principles of 
quality assessment pracGces  
(SERVE Center, University of North Carolina, 2004)

Assessment Literacy 

Know‐how and 
understanding 
teachers need to 
assess students 
eﬀec%vely and 
maximize 
learning

Importance of classroom 
assessment 
•  We may not like it, but 
students can and do 
ignore our teaching;  
•  however if they want to 
get a qualiﬁcaGon, they 
have to parGcipate in the 
assessment processes we 
design and implement.  

(Brown, S. 2004. Assessment for learning. Learning and Teaching in Higher 
Educa0on, 1, 81‐89)

Who are the assessment 
‘deciders’ at your 
insGtuGon?

Classroom-Based Assessment: Challenges,
Choices, and Consequences

Assessment framework 
•  ‐ the series of assessment 
tools (exams, tasks, projects, 
etc.) that are scored and used 
to arrive at a summa%ve grade 
for a course 
•  ‐it should be skills‐based and 
knowledge‐ based (i.e. Ss 
demonstrate what they know 
about and can do with English) 
•  based on learning outcomes

•  The spirit and style 
of student 
assessment deﬁnes 
the de facto 
curriculum. 

 (Rowntree, 1987) 
de facto= exisGng in fact, actual,  
whether intended or not

assessment 
func%ons 

Assessing an English arGcles 
quiz 
Context 
• ConversaGon class 
(listening & speaking) 
• high‐beginner level 
27

What is a fundamental 
problem with this quiz? 

28

assessment 
func%ons 

What is 
a test? 

31

A test . . .  
•  is a method of measuring a person’s 
ability, knowledge, or performance in 
a given domain. 
•  is an instrument – a set of 
techniques, procedures, or items – 
that requires performance on the 
part of the test‐taker. 
32

Tests – measuring func%on 

33

A test must measure 
•  Some tests measure general ability, while 
others focus on very specific competencies or 
objecGves. 
•  Examples  
•  A mulG‐skill proficiency test measures general 
ability;  
•  a quiz on recognizing correct use of definite 
arGcles measures very specific knowledge. 

34

•  A test measures 
performance, . . .  
•  but, the results 
imply the test‐
takers ability, or 
competence. 

35

•  Performance‐
based tests 
sample the test‐
takers actual use 
of language,  
•  but from those 
samples the test 
administrator 
infers general 
competence.  
36

•  A well‐constructed 
test is an 
instrument that 
provides an 
accurate measure 
of a test‐taker’s 
ability within a 
parGcular domain. 
•  Construc%ng a 
good test is a 
complex task. 
37

Your 
assessment  
prac%ces? 

38

Think about 
what is 
happening in 
your context 
and your 
assessment 
pracGces

Your assessment pracGces? 
•  True–False Item  •  Inventories 
•  MulGple Choice 
Checklists 
•  CompleGon  How do you 
• 
•  Peer RaGng 
•  Short Answer 
•  Essay  assess your 
•  Self RaGng 
•  Journals 
• 
• 
PracGcal Exam 
Papers/Reports 
students? 
•  Porkolios 
•  Projects  •  ObservaGons 
•  QuesGonnaires  •  Discussions 
•  PresentaGons 
•  Interviews

For you, 
which of the 
four skills are 
more/less 
challenging 
to test? 

41

assessment 
func%ons 

•  Exploring how 
principles of 
language assessment 
can and should be 
applied to formal 
tests. 
•  These principles 
apply to assessment 
of all kinds. 
•  How to use these 
principles to design a 
good test. 

45

•  What are the 
‘ﬁve cardinal 
criteria’ that 
can be used to 
design and 
evaluate all 
types of 
assessment? 
46

Q. How do you know if a 
test is eﬀecGve, appropriate, 
useful, or, in down‐to‐earth 
terms, a “good” test?   

47

Five key assessment principles? 

• Discuss 
• 3 minutes 
• Hint (ﬁve nouns) 
48

• PracGcality 
Five key  • Reliability 
assessment 
principles 
• Validity 
• AuthenGcity 
• Washback 
49

Key  Assessment Principles

•  These quesGons 
provide an 
excellent 
criterion to 
evaluate the 
tests we design 
and use. 
52

1. PracGcality 
• Is the procedure relaGvely easy 
to administer? 

54

Prac%cality considera%ons 
•  the logisGcal and administraGve issues 
involved in making, giving and scoring an 
assessment instrument 
•  the amount of Gme it takes to construct 
and administer 
•  the ease of scoring 
•  ease of interpreGng/reporGng the results 
55

An effecGve test is prac%cal.  
This means that it: 
•  is not excessively expensive 
•  stays within appropriate Gme 
constraints 
•  is relaGvely easy to administer, and 
•  has a scoring/evaluaGon procedure 
that is specific and Gme efficient  56

The value and quality of a 
test someGmes  hinge on 
such ni`y‐gri`y prac%cal 
considera%ons. 

57

•  In classroom 
based tesGng, 
_________ is 
almost always 
a crucial 
pracGcal factor 
for busy 
teachers.  

58

2. Reliability 
• Is all work being
consistently marked to the
same standard?

60

•  A reliable test is consistent and 
dependable. 

•  If you give the same test to the same 
student or matched students on two 
diﬀerent occasions, the test should 
yield similar results. 

61

What 
factors 
contribute 
to the 
unreliability 
of a test? 
62

Test Unreliability‐
contribuGng factors 

• Student related reliability 
• Rater reliability (inter, intra) 
• Test administra%on reliability 
• Test reliability 
63

Q. What is one key way to 
increase  reliability? 

A. Use rubrics 

64

• Rubrics are scoring guidelines. 
•  They provide a way to make 
judgments fair and sound when 
assessing performance. 
•  A uniform set of precisely deﬁned 
criteria or guidelines are set forth to 
judge student work.  

65

3. Validity 
•  Does the
assessment
‐ most complex 
measure
criteria 
what we
really want ‐  most important 
principle 
to
measure?

Validity ‐ deﬁniGon 
•  ‘The extend to which inferences 
made from assessment results 
are appropriate, meaningful, and 
useful in terms of the purpose of 
the assessment.’  

(Gronlund, 1998, p. 226) 
68

•  A valid test of reading 
ability . . .  
•  actually measures reading 
ability – 
•  not math skills 
•  or previous knowledge  in 
a subject 
•  nor wriGng skills 
•  nor some other variable of 
quesGonable relevance 
69

How is the validity of a test 
established? 
1. Content 
validity 
2. Face 
validity 

70

Content validity 
•  If a test requires the test‐taker to perform the 
behavior that is being measured. . . 
•  it can claim content‐related evidence of validity 
(content validity) 
•  e.g. A test of a person’s ability to speak an L2 
requires the student to actually speak within 
some sort of authenGc context.  
•  A test  with paper and pencil mulGple choice 
quesGons requiring grammaGcal judgments does 
not achieve content validity. 
71

•  direct tes%ng – 
Another way of  involves the test‐taker 
in actually performing 
understanding  the target task 
content validity  •  indirect tes%ng‐
students not 
is to consider  performing the task 
the diﬀerence  itself, but a related 
task. 
between direct 
•  e.g. tes%ng oral 
and indirect  produc%on of 
tesGng.  syllable stress 
72

To achieve 
content validity in 
classroom 
assessment, try to 
test performance 
directly. 
73

How is the validity of a test 
established? 
1. Content 
validity 
2. Face 
validity 

75

Face validity 
•  The extent to which students view the 
assessment as: 
1.  fair 
2.  relevant 
3.  useful for improving learning 
•  Face validity refers to the degree to which a 
test looks right, and appears to measure the 
knowledge or abiliGes it claims to measure. 
76

High face validity: the test . . . 
•  is well‐constructed, expected format with 
familiar tasks 
•  is clearly doable within alloued Gme 
•  has items that are clear and uncomplicated 
•  direcGons that are crystal clear 
•  has tasks related to course work (content 
validity) 
•  has a diﬃculty level that presents a 
reasonable challenge 
77

•  Most signiﬁcant cardinal principle of 
assessment evalua%on. 
•  If validity is not established, all other 
consideraGons may be rendered useless.  
78

4. AuthenGcity 
• Are students asked to perform 
real‐world tasks? 

80

Test task authen%city 
• tasks represent, or closely 
approximate, real‐world tasks 
• the task is likely to be enacted 
in the “real world” 
• not contrived or arGﬁcial 
81

AuthenGcity checklist 
•  Is the language in the test as natural as 
possible? 
•  Are topics as contextualized as possible rather 
than isolated? 
•  Are topics and situaGons interesGng 
enjoyable, and/or humorous? 
•  Is some themaGc organizaGon provided, such 
as through a story line or episode? 
•  Do tasks represent, or closely approximate, 
real‐world tasks? 
82

5.Washback 
• Does the assessment have
positive effects on learning
and teaching?

84

Washback = the eﬀect 
of tesGng on teaching 
and learning  
‐ posi%ve washback 
‐ nega%ve washback 
85

Washback 
•  Classroom assessment: the aﬀects of an  
assessment on teaching and learning prior to 
the assessment itself (preparaGon) 
•  Another form of washback=the informaGon 
that ‘washes back’ to students in the form of 
useful diagnoses of strengths and weaknesses. 
•  Formal tests provide no washback if students 
receive a simple leuer grade or single overall 
numerical score.  
86

A test that provides beneficial 
washback . . .  
•  posiGvely influences what and how teachers 
teach 
•  posiGvely influences what and how students 
learn 
•  offers learners a chance to adequately prepare 
•  gives learners feedback that enhances their 
language development 
•  provides condiGons for peak performance by 
the learner 
87

Teachers’ challenge 
• to create classroom tests 
that serve as learning tools 
through which washback is 
achieved 
88

assessment 
func%ons 
5.  Conclusions

Q. How do you know if a test is 
eﬀecGve, appropriate, useful, or, in 
down‐to‐earth terms, a “good” 
test?   

91

Answer.  A ‘good’ test:   
•  can be given within appropriate administraGve 
constraints, 
•  is dependable, 
•   accurately measures what you want it to 
measure, 
•  the language in the test is representaGve of 
real‐world language use, and 
•  the test provides informaGon that is useful for 
the learner. 
92

•  These principles will help 
you make accurate 
judgments about the 
English competence of 
your students. 

•  They provide useful 
guidelines for evaluaGng 
exisGng tests, and 
designing our own.  

93

•  There is no gewng 
away from the fact 
that most of the 
things that go wrong 
with assessment are 
our fault, 
•  the result of poor 
assessment design‐ 
and not the fault of 
our students. 

(Race et al., 2005)

•  Improving student 
learning implies 
improving the 
assessment system. 
•  Teachers oxen assume 
that it is their teaching 
that directs student 
learning.  
•  In pracGce, assessment 
directs student learning, 
because it is the 
assessment system that 
deﬁnes what is worth 
learning.        
(Havnes, 2004, p.1)

(Boud & Falchikov, 2007) 

•  There is substanGal evidence that 
assessment, rather than teaching, has the 
major inﬂuence on students’ learning. 
•  It directs auenGon to what is important, 
acts as an incenGve for study, and has a 
powerful eﬀect on student’s approaches 
to their work. 

Rethinking Assessment in Higher Educa0on

“We owe it to
ourselves and our
students to devote at
least as much energy
to ensuring that our
assessment practices
are worthwhile as we
do to ensuring that
we teach well”.  

Dr. David Boud,  
University of Technology,  
Sydney, Australia 
98

Thank you for your Gme and 
parGcipaGon

Testing a Test: Evaluating Our Assessment Tools

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Testing a Test: Evaluating Our Assessment Tools

Similar a Testing a Test: Evaluating Our Assessment Tools (20)

Más de Eddy White, Ph.D.

Más de Eddy White, Ph.D. (6)

Último

Último (20)

Testing a Test: Evaluating Our Assessment Tools