Language testing and the use of the common european framework of reference for languages
1. Language testing and the use of the
Common European Framework of
Reference for Languages (CEFR)
J Charles Alderson,
Department of Linguistics and English
Language,
Lancaster University
3. Test Principles and Grammatical Tense:
The Simple Past?
In many countries in Europe:
• Teacher knew best
• Having a degree in a language meant you
were an ‘Expert’
• Experience was all
• But 20 years experience may be one year
repeated twenty times and is never checked
4. Past (?) European tradition
• Quality of important examinations not monitored
• No obligation to show that exams are fair, unbiased,
reliable, and measure relevant skills
• University degree in a foreign language qualifies one
to examine language competence, despite lack of
training in language testing
• In many circumstances merely being a native speaker
qualifies one to assess language competence.
• Teachers assess students’ ability without having been
trained in assessment.
5. Past (?) European tradition
Teacher-centred
Teacher develops the questions
Teacher's opinion the only one that counts
Teacher-examiners are not standardised
Assumption that the teacher-examiner makes
reliable and valid judgements
Authority, professionalism, reliability and validity of
teacher rarely questioned
Rare for students to fail
6. Psychometric tradition: Perfect?
Tests externally developed and administered
National or regional agencies responsible for
development, following accepted standards
Tests centrally constructed, piloted and revised
Difficulty levels empirically determined
External, trained assessors
Empirical equating to known standards or levels
of proficiency
7. Validity
• My parents think the test looks good.
• The test measures what I have been taught.
• My teachers tell me that the test is
communicative and authentic.
• If I take the X test instead of the Cambridge
FCE, I will get the same result.
• I got a good English test result, and I had no
difficulty studying in English at university.
8. Validity
Note: a test that is not reliable
cannot, by definition, be valid
• All tests should be piloted, and the results
analysed to see if the test performed as
predicted
• A test’s items should work well: they should
be of suitable difficulty, and good students
should get them right, whilst weak students
are expected to get them wrong.
9. Reliability
• If I take the test again tomorrow, will I get the same
result?
• If I take a different version of the test, will I get the
same result?
• If the test had had different items, would I have got
the same result?
• Do all markers agree on the mark I got?
• If the same marker marks my test paper again
tomorrow, will I get the same result?
10. Practicality
• Number of tests to be produced
• Length of test in time
• Cost of test
• Cost of training
• Cost of monitoring
• Difficulty in piloting/ pre-testing
• Time to report results
11. Washback
• Test can have positive or negative effects
• Test can affect content of teaching
• Test can affect method of teaching
• Test can affect attitudes and motivation
• Test can affect all teachers and students in
same way, or individuals differently
• Importance of test will affect washback
12. WASHBACK
Testing is too important to be left to the
teacher
Testing is too important to be left to the
tester
Both are needed, to reflect and influence
teaching, validly and reliably.
14. Present Tense / Tension:
Practice vs. Principles
Teacher-based assessment vs central development
Internal vs external assessment
Quality control of exams vs. no quality control
Piloting or not
Test analysis and the role of the expert
The existence of test specifications – or not
Guidance and training for test developers and
markers – or not
15. Exam Reform in Europe
(mainly school-leaving exams)
• Slovenia
• The Baltic States
• Hungary
• Russia
• Slovakia
• Czech Republic
• Poland
• Germany
• Austria
16. Hungarian English Exams Reform Teacher
Support Project
• Project philosophy:
“The ultimate goal of examination reform is to
encourage, to foster and to bring about
change in the way language is taught and
learned in Hungary.”
17. Achievements of English Exam Reform
Teacher Support Project
– Trained item writers, including class
teachers
– Trained teacher trainers and disseminators
– Developed, refined and published Item
Writer Guidelines and Test Specifications
– Developed a sophisticated item production
system
18. Achievements of English Exam Reform
Teacher Support Project
• In-service courses for teachers in modern test
philosophy and exam preparation
– Modern Examinations Teacher Training (60 hrs)
– Assessing Speaking at A2/B1 (30 hrs)
– Assessing Speaking at B2 (30 hrs)
– Assessing Writing at A2/B1 (30 hrs)
– Assessing Writing at B2 (30 hrs)
– Assessing Receptive Skills (30hrs)
19. Achievements of English Exam Reform
Teacher Support Project
– Developed sets of rating scales and trained
markers
– Developed Interlocutor Frame for speaking
tests and trained interlocutors
– Items / tasks piloted, IRT-calibrated and
standard set to CEFR using DIALANG
procedures
20. Achievements of English Exam Reform
Teacher Support Project
• Into Europe series: textbook series for test
preparation:
– many calibrated tasks
– explanations of rationale for task design
– explanations of correct answers
– CDs of listening tasks
– DVDs of speaking performances
21. Into Europe
Reading + Use of English
Writing Handbook
Listening + CDs
Speaking Handbook + DVD
All downloadable for free from
http://www.lancs.ac.uk/fass/projects/examreform
22. Post test Item Writer
analysis Training
Marking
support Test
specification
Live
Text
administration
mapping
Standard Task
setting Testing development
Banking - cycle Peer review
Rejection
Statistical Expert
Analysis review
Central
Correction Trial 1
Central
Trial 2
Correction
Revision - Statistical 22
Rejection Analysis
23. Good tests and assessment,
following professional practice, cost
money and time
But
Bad tests and assessment,
ignoring professional practice,
waste money, time and LIVES
24. Use and abuse of
the Common
European Framework of
Reference for Languages:
Learning, teaching and
assessment (CEFR)
25. Hands up!
• Who owns a copy of the CEFR – the Blue
Book?
• Who has read it?
• Who is familiar with its contents?
• Who has already heard of the CEFR?
26. Outline
• Background
• Uses in various contexts
• Advantages
• Limitations
• Misuse
• Improvement and development
27. Background
• 1970s work encouraged by the Council of
Europe
• Notional-functional syllabus (Wilkins, Morrow)
– Threshold
– Waystage
– Vantage
– Learning target specifications
• 1996
• 2001
28. CEFR: comprehensive, non-
prescriptive, reflection tool
Common reference points + Common metalanguage
Relevant to objectives + progress + outcomes
Descriptive scheme / chapters + Common reference
levels / scales
Tool for reflection
29. CEFR: comprehensive, non-
prescriptive, reflection tool
• Guides for users
• Compendium of case studies
• CEFR Tool kit
• CDs for Reading and Listening
• DVDs for Speaking
• Dutch Grid for Reading and Listening
• Grids for Writing and Speaking
• Manual for relating exams to the CEFR 2003, 2009
(standard-setting)
30. Descriptive scheme: ‘action-oriented’
Users as social agents: «members of society who
have tasks to accomplish in a given set of
circumstances in a specific environment and
within a particular field of action»
General competences (knowledge, skills,
existential competence; ability to learn)
Communicative language competences (linguistic,
pragmatic, sociolinguistic and sociocultural)
31. Descriptive scheme: ‘action-oriented’
• Dimensions of communicative language
competence:
– general linguistic range, vocabulary range,
vocabulary control, grammatical accuracy,
phonological control, sociolinguistic
appropriateness, flexibility, turn-taking,
thematic development, coherence and
cohesion, spoken fluency, propositional
precision
32. Uses in various contexts
• Case studies 2002 and 2004
• Intergovernmental Language Policy Forum,
2007:
– “The clear success of the CEFR has significantly
changed the context in which language teaching
and assessment of language learning outcomes
now take place in Europe”
• Martyniuk and Noijons Survey, 2006
33. Uses in various contexts
• The usefulness of the CEFR rated at 2,44 on a
0-3 scale
• The CEFR most useful in the domains of
testing /assessment /certification (2,70 on a
0-3 scale) and curriculum/ syllabus
development (2,66 on a 0-3 scale)
• Institutionally, the CEFR most useful for
examination providers (2,88 on a 0-3 scale)
34. Uses in various contexts
• Curriculum development
– Varying impact
• Teacher education/training
– Wide spectrum of use
– Useful for defining proficiency of teachers
• Testing and assessment
– Support for a common reference
– CEFR-based examinations attempted in most
countries
35. EALTA’s Guidelines for Good Practice
1. What evidence is there of the quality of the
process followed to link tests and
examinations to the Common European
Framework?
2. Have the procedures recommended in the
Manual and the Reference Supplement been
applied appropriately?
3. Is there a publicly available report on the
linking process?
36. Example use of CEFR:
DIALANG
A European System
for
On-line
Diagnostic Language Assessment
37. What is DIALANG?
• Computer-based diagnostic language testing
system
• 14 European languages
• Delivers tests across the Internet
• Supports language learners
• Institutional or private use, free of charge
• Still widely used throughout Europe and
beyond, 8 years after launch
38. COUNCIL OF EUROPE
• DIALANG is an application of the Common
European Framework of reference
• DIALANG uses
– Common European Framework
– scales
– self-assessment statements (modified)
• DIALANG provides some evidence of their
validity
39. PURPOSE
• to provide language users and learners with
diagnostic information about their strengths
and weaknesses and to help them to find ways
of improving their proficiency
40. INNOVATIVE ASPECTS
• first large-scale system for diagnosis /
feedback rather than certification
• on-line, Internet-delivered, universally
available, not restricted to a particular place
or time
• first implementation of CEFR in tests
• first attempt at standard-setting – empirically
relating test items and sections to the CEFR
41. ASSESSMENT PROCEDURE
1 3
2 Selection
Client
of section:
enters
D
I Vocabulary reading
A Size writing
Placement listening
L Test
structures
A vocabulary
N
G
42. ASSESSMENT PROCEDURE
4 5 6 7
F
e
Self- Respond- e EXIT
assess- ing to d Selection Goodbye!
ment tasks b
a
c
k Another
section/
language
43. SECTIONS
• Reading Comprehension (CEFR)
• Listening Comprehension (CEFR)
• Writing (CEFR)
• Structures
• Vocabulary
• no overall section (nor grade & feedback)
• from beginners to advanced
44. LANGUAGES
• Danish • Icelandic
• Dutch • Irish
• English • Italian
• Finnish • Norwegian
• French • Portuguese
• German • Spanish
• Greek • Swedish
45. Feedback
• VSPT
– score band and description
• results (and self-assessment)
– CEFR scales and report on self assessment
• explanatory feedback
– Why self-assessment may not match test result
• advisory feedback
– What you can do and how to progress, based on CEFR
• item review
46. Example use of CEFR:
Standardisierte Reifeprüfung
The current Austrian Matura:
– Only one examiner: the class teacher
– Teachers set tasks for their own students
– Teachers mark the essays with whatever criteria
they wish
– No central training, no central monitoring
– No piloting
– No post-test analysis
47. The Reform
• Began in 2007, obligatory use by law in 2014/15
• Parallel reforms, coordinated by University of
Innsbruck, in English, French, Spanish, Italian,
Latin and Greek.
• First foreign language (English) aims at CEFR B2 in
Listening, Reading and Language Use (The Written
Examination)
• Second foreign languages (French, Italian,
Spanish) 6-year and 4-year courses, targeted next
(for 6-year courses, B2 except for Listening and
Writing = B1. For 4-year courses, target is B1).
48. The Reform
• Rolling reform, first with 59 schools in 2008,
gradually spreading as schools or teachers
volunteer for the new standardised Written
Exam tasks.
• Spring 2011, 300+ gymnasia volunteered for
tests in Reading, Listening and Language in Use
in English, French, Italian or Spanish
• Standardised Written Exam obligatory for all
gymnasia in 2014 and for all vocational schools
in 2015
• See http://uibk.ac.at/srp/
49. Advantages of the CEFR
• European: not American, Australian or British
• Relevant to much more than testing and
assessment
• Widely accepted
• Levels frequently cited: A common currency
50. Advantages of the CEFR
• The CEFR claims to be comprehensive;
• “...it should attempt to specify as full a range
of language knowledge, skills and use as
possible…and all users should be able to
describe their objectives, etc., by reference to
it”. (Council of Europe, 2001: 7).
51. Advantages of the CEFR
• Research-based: teachers’ perceptions of
levels and progression, Rasch-scaled
• Descriptive Scheme and Illustrative Scales
• Intended to enhance transparency in language
education, mutual understanding and thus to
encourage mobility
52. Advantages of the CEFR
• Point of reference, not an instrument of
coercion, nor for accountability
• Nevertheless, a force for change and
innovation, especially in testing and
assessment
• e.g. European Language Portfolio, DIALANG,
school-leaving exam reforms
53. Limitations of the CEFR
• Not enough information for test development
– DIALANG experience
• Lack of specificity as to how language
proficiency develops
• No reference to specific languages - but see
reference level descriptions:
www.coe.int/t/dg4/linguistic/DNR_EN.asp
54. Limitations
• Limited empirical research to underpin
• Based on teachers’ opinions / perceptions
about the level of the descriptors and on that
of some of their learners
• No theoretical basis
• Draws on Waystage, Threshold, Vantage, etc
but these documents are barely different from
each other
55. Limitations
• All too frequently couched in language that is not
easy to understand, often vague, undefined and
imprecise
• Has needed a plethora of accompanying documents
to help users: The Manual, now in revised form; The
Reference Supplement; Guidance on conducting
case studies, the Tool Kit CDs and DVDs, and still
users request more teacher training, simpler
versions, more illustrative performances, etc, etc
56. USE and MISUSE
• CEF R
• Yet politicians legislate levels for school-leaving (A2,
B1, B2), for University graduation (C2!), for migration
(A1 minus to B1), for citizenship (A1 to B2)
• How to establish the appropriacy of a level?
• How to engage politicians in a debate about “levels”?
57.
58.
59. ‘Destination B2 is the ideal
grammar and vocabulary
practice book for all
students preparing to
take a B2 level exam, for
example the Cambridge
FCE examination.
Key Features:
A well researched
grammatical and lexical
syllabus based on the B2
(Vantage) level of the
Council of Europe’s
Common European
Framework’
60. Claims about links with the CEFR and
reality
• Importance of CEFR in testing, training,
publishing and curricula
• Many claims of links to CEFR
• How many claims are empirically based?
• Who monitors the quality of the claims?
– Council of Europe?
– ALTE?
– Self-monitoring?
61. Results of 2006 Survey
Curriculum development
Need for further dissemination, guidance and training
Need to develop additional level specifications, descriptors
and scales
Need for plans to relate curricula and/or textbooks to the
CEFR empirically
Teacher education/training
Need for more dissemination, guidance and training
Need for co-operation at international level
Testing and assessment
Complexity of relating tests to the CEFR levels
Need for more guidance and training
63. Problems with the CEFR
• Terminology problems: synonymy or not?
• Inconsistency?
• Lack of definition
• Gaps
64. Terminology problems: synonymy ?
Operations at A2 Operations at B2
• Understand • Understand
• Take • Scan
• Get • Monitor
• Follow • Obtain
• Identify • Select
• Infer • Evaluate
• Locate
• Identify
65. Inconsistency?
• I can understand familiar names, words and
very simple sentences, for example on notices
and posters or in catalogues” (page 26)
• “Can recognise familiar names, words and
very basic phrases on simple notices in the
most common everyday situations” (page 70)
66. Lack of definitions
• Simple, the most common, everyday, familiar,
concrete, predictable, straightforward, factual,
complex, specialised, highly colloquial, short,
long
• Is a short text necessarily “easier” than a
longer text?
67. Gaps in the CEFR
• The Task: what is it that candidates have to do
with text?
• Test methods and the processing demands
they create
• CEFR is NOT a test specification
68. Gaps: Processes of comprehension
• Focus on and retrieve explicitly stated
information
• Make straightforward inferences
• Interpret and integrate ideas and information
• Examine and evaluate content, language and
textual elements
69. Intergovernmental Forum
• Language of CEFR needs simplifying
• Training essential to avoid oversimplifications
• Need to ensure the quality of the
implementation of the CEFR
• How to avoid prescriptive use of CEFR and the
scales?
• Need for international networks and training
to ensure proper application in assessment
and curricula
• Importance of national, regional and local
contexts and their needs when applying the
CEFR
70. Improvement and development
More research needed into the development of
language proficiency as learners progress through
the levels of the CEFR
Design and construction of learner language corpora
linked to the CEFR, based on standardised tasks
Investigation of instruction aimed at the different CEFR
levels
Diagnosis of learner strengths and weaknesses at the
different CEFR levels
Revision and (further) supplementation of the CEFR
71. Some issues
• How does L2 proficiency develop?
• What are the linguistic features that
characterise CEFR levels?
• How are the abstract constructs in the CEFR to
be operationalised?
• What and how do teachers teach at the
various CEFR levels?
72. Some issues
The design of tasks to measure development of
language proficiency
1. How can we ensure that we elicit target
language features?
2. How can we check both what the learners are
able to do and also what they freely choose to
do?
3. How can we ensure that tasks at a given CEFR
level are parallel? Is my B1 your B1?
4. We need banks of validated reading and
listening tasks to illustrate CEFR levels
73. Will the future be perfect?
There will probably always be misuse of the
CEFR
Politicians will probably always lack assessment
literacy
Governments will always want simple
(simplistic) solutions to complex problems
But relevant research is ongoing
The CEFR can be improved
The Council of Europe might publish a revised
second edition of the CEFR
74.
75. Thank you for your attention!
c.alderson@lancaster.ac.uk