2. DEFINITION:
The process of making a judgment or forming an
opinion, after considering something or someone
carefully.
PURPOSE:
Inform people about their progress in learning.
10. Alderson and
Wall (1993)
list some
«Washback
Hypostheses»
Tests will have washback on what
teachers teach (Content).
Tests also have impact on how
teachers teach (methodology).
High-stakes tets (tests with important
consequences)would have more
impact than low-stakes tests.
11. Alderson and Hamp-Lyons (1996) show that teachers
may indeed change the way they teachwhen teaching
towards a test (in this case, theTOEFL —Test of
English as a Foreign Language).
They show that the nature of the change and the
methodology adopted varies from teacher to teacher
(supported by Watanabe’s findings, 1996).
12. Shohamy et al, (1996) show that the nature of
washback varies according to factors such as the status
of the language being tested, and the uses of the test.
Watanabe concludes that washback is caused by the
interplay between the test and the test taker in a
complex manner.
He emphasises that what may be most important is
not the objective difficulty of the test, but the students'
perception of difficulty.
13. Wall summarises research findings which show that test
design is only one of the factors affecting washback, and
lists as factors influencing the nature of test washback.
She makes a number of recommendations about the steps
that test developers might take in the future in order to
assess the amount of risk involved in attempting to bring
about change through testing:
assessing the feasibility of examination reform by
studying the 'antecedent' conditions —what is
increasingly referred to as a 'baseline study‘
14. Policy makers
should be aware that
tests on their own
will not have positive
impact if the
materials and
practices they are
based on have not
been effective.
15. Messick (1994) argues that all testing involves
making value judgements, and therefore language
testing is open to a critical discussion of whose
values are being represented and served.
Spolsky (1997) points out that tests and examinations have always
been used as instruments of social policy and control, with the gate-
keeping function of tests often justifying their existence.
Shohamy (1997a) argues that uses of tests which exercise control
and manipulate stakeholders rather than providing information on
proficiency levels are also unethical, and she advocates the
development of “critical language testing”
16. A number of case studies have been presented recently
which illustrate the use and misuse of language tests.
Hawthorne (1997) describes two examples of the
misuse of language tests: the use of the access test to
regulate the flow of migrants into Australia, and the
step test, allegedly designed to play a central role in the
determining of asylum seekers‘ residential status.
Norton and Starfield (1997) argue that criteria for
assessment should be made explicit and public if
testers are to behave ethically.
17. The International Language Testing Association
(ILTA) has recently developed a Code of Ethics (rather
than finalising the draft Code of Practice referred to
above), which is 'a set of principles which draws upon
moral philosophy and strives to guide good
professional conduct.
This Code is clear: testers should follow ethical
practices, and have a moral responsibility to do so.
18. Tests are frequently used as instruments of
educational policy, and they can be very powerful — as
attested by Shohamy (2001a).
Brindley (1998,2001) describes the political use of test-
based assessment for reasons of public accountabilty,
often in the context of national frameworks, standards
or benchmarking.
Politics can be defined as action, or activities, to
achieve power or to use power, and as beliefs about
government, attitudes to power, and to the use of
power.
19. National educational policy
often involves innovations
in testing in order to
influence the curriculum,
or in order to open up or
restrict access to education
and employment.
Politics can be seen as
methods, tactics, intrigue,
manoeuvring, within
institutions which are
themselves not political,
but commercial, financial
and educational.
21. Levels of proficiency
Level certified by public
examinations
Guarantee educational
and employment
mobility
EUROPEAN COMMON
FRAMEWORK
22. The development of national language tests continues
to be the focus of many publications, although many
are either simply descriptions of test development or
discussions of controversies, rather than reports on
research done in connection with test development.
Page (1993) argues that the use of the target language
in questions makes it more difficult to sample the
syllabus adequately, and claims that the more
communicative and authentic the tasks in
examinations become, the more English (the mother
tongue) has to be used on the examination paper in
order to safeguard both the validity and the
authenticityof the task.
23. In the Netherlands (Jansen &Peer,
1999) reports a study of the recently
introduced use of dictionaries in
Dutch foreign language examinations
and shows that dictionary use does not
have any significant effect on test
scores.
Pupils are very positive about being
allowed to use dictionaries, claiming
that it reduces anxiety and enhances
their text comprehension.
Guillon (1997) recommends that more
open-ended tasks be used, and that
teachers be trained in the reliable use
of valid criteria for subjective marking,
instead of their current practice of
merely counting errors in production.
24. Language testing can inform debates in language
education more generally.
Washback studies have also been used in teacher
training, both in order to influence test preparation
practices, but also to encourage teachers to reflect on
the reasons for their and others' practices.
25. Douglas (1997, 2000) identifies two aspects that typically
distinguish LSP testing from general purpose testing.
a) The authenticity of the tasks,
b) The interaction between language knowledge and specific
content knowledge.
The development of an LSP test typically begins with an in-
depth analysis of the target language use situation, perhaps
using genre analysis (see Tarone,2001).
Attention is paid to general situational features such as
topics, typical lexis and grammatical structures.
26. Douglas (2000) stands firmly by claims made much
earlier in the decade that in highly field-specific
language contexts, a field-specific language test is a
better predictor of performance than a general purpose
test (Douglas & Selinker, 1992)
27. Computer-based testing has witnessed
rapid growth in the past decade and
computers are now used to deliver
language tests in many settings.
A computerbased version of the TOEFL
was introduced on a regional basis in
the summer of 1998, tests are now
available on CD ROM, and the Internet
is increasingly used to deliver tests to
users.
Computers can be used at all stages in
the test development and
administration process.
The commonest use of computers in
language testing is to deliver tests
adaptively (e.g.,Young et al.,1996).
29. The introduction of self-assessment was viewed
aspromising by many, especially in formative assessment
contexts (Oscarson, 1989).
It was considered to encourage increasing sophistication in
learner awareness, helping learners to:
A) gain confidence in their own judgement.
B) acquire a view of evaluation that covers the whole
learning process.
C) see errors as something helpful.
It was also seen to be potentially useful to teachers,
providing information on learning styles, on areas needing
remediation and feedback on teaching (Barbot, 1991)
30. Carton (1993) discusses how self-assessment can
become part of the learning process.
In general, these studies have found self-assessment to
be a robust method for gathering information about
learner proficiency and that the risk of cheating is low
(see Barbot, 1991).
31. It is usually taken to mean assessment procedures which are less formal than
traditional testing, which are gathered over a period of time rather than being
taken at one point in time, which are usually formative
rather than summative in function, are often
low-stakes in terms of consequences, and are claimed
to have beneficial washback effects. Although such
procedures may be time-consuming and not very
easy to administer and score, their claimed advantages
are that they provide easily understood information,
they are more integrative than traditional tests and
they are more easily integrated into the classroom.
McNamara (1998) makes the point that alternative
assessment procedures are often developed in an
attempt to make testing and assessment