Chapters from Hughes’ Testing for Language Teachers
8. Common Test techniques: Elaine, 24th
9. Testing Writing: Marta, Ido...
 8. Common Test Techniques
• Features:
 Reliable
 Valid
 Reliably scored
 Economical
 Beneficial Washback effect
• M...
1. Set representative tasks
1. Specify all possible content
2. Include a representative sample of the specified content
2....
• “The most highly prized language skill”, Lado’s Language
Testing (1961).
• Challenges:
 Ephemeral, intangible.
 Simult...
1. Set representative tasks
1. Specify all possible content
2. Include a representative sample of the specified content
2....
Plan and structure the test carefully (2)
1. Quiet room with good acoustics
2. Put candidates at ease (at first, easy ques...
PROBLEMS:
 Indirect assessment:
 We read in very different ways: scanning, skimming, inferring,
intensive, extensive rea...
PROBLEMS
 As in listening: Indirect assessment and different ways of listening
 As in speaking: Transient nature of spee...
GRAMMAR
 Why? Easy to test, Content validity
 Why not? Harmful washback effect
 It depends on the type of test.
 Speci...
Useful in particular tests where washback is not
important (placement tests, for example)
 Cloze test (from closure). Bas...
TIPS
- Make testing and assessment an integral part of teaching
- Feedback: immediate and positive
- Self assessment
- Was...
7.2 assessment and the cefr (2)
Próxima SlideShare
Cargando en…5
×

7.2 assessment and the cefr (2)

92 visualizaciones

Publicado el

Assessment and the CEFR (2)

Publicado en: Educación
0 comentarios
0 recomendaciones
Estadísticas
Notas
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Sin descargas
Visualizaciones
Visualizaciones totales
92
En SlideShare
0
De insertados
0
Número de insertados
0
Acciones
Compartido
0
Descargas
5
Comentarios
0
Recomendaciones
0
Insertados 0
No insertados

No hay notas en la diapositiva.
  • Common Test Techniques

    We need techniques which:
    - will elicit behaviour which is a reliable and valid indicator of the ability in which we are interested;
    - will elicit behaviour which can be reliably scored;
    - are as economical of time and effort as possible;
    will have a beneficial backwash effect, where this is relevant.

    MULTIPLE CHOICE

    Advantages:
    Reliable
    Economical
    Good for receptive skills
    (It used to be as the perfect, almost only way to test)

    Disadvantages:

    Only for recognition
    Guessing may have a considerable but unknowable effect
    The technique severely restricts what can be tested
    It is very difficult to write successful items
    Washback may be harmful
    Cheating may be facilitated

    YES/NO TRUE/FALSE ITEMS
    Essentially multiple choice, but with a 50 % chance4 of getting it right. Ok in class activities. Not appropriate in real testing.

    SHORT-ANSWER ITEMS

    Advantages.
    Less guessing
    No need for distractors
    Less cheating
    Items are easier to write

    Disadvantages
    Responses may take longer
    The test taker has to produce language (mixture of skills in a receptive test) (TRY TO MAKE RESPONSES REALLY SHORT)
    Judging may be required (less validity or reliability)
    Scoring may take longer (SOLUTIONS: MAKE THE REQUIRED RESPONSE UNIQUE)

    GAP FILLING ITEMS very similar to short-answer items

  • Set representative tasks
    Specify all possible content (in the specifications)
    Include a representative sample of the specified content (in the test)

    Elicit valid samples of writing ability
    Set as many separate tasks as feasible
    Test only writing ability and nothing else (creativity, imagination, etc. No extra long instructions with complicated reading)
    Restrict candidates

    Ensure valid and reliable scoring:
    Set as many tasks as possible
    Restrict candidates
    Give no choice of tasks
    Ensure long enough samples
    Create appropriate scales for scoring: HOLISTIC/ANALYTIC See examples. HOLISTIC. Good if many scorers. ANALYTIC: equal or unequal weight to the different parts, main disadvantage: time-consuming, if too much attention is payed to the parts, one may forget the general impression. IMPORTANT POTENTIAL FOR WASHBACK.
    Calibrate the scale to be used (collect samples. Choose representative ones. Use them as reference points. This is called “benchmarking”)
    Select and train scorers
    Follow acceptable scoring procedures: benchmarking, two scorers (and a third, senior one for discrepancies), carry out statistical analysis

  • “The most highly prized language skill”, a source of cultural capital, Lado’s Language Testing (1961). However, it hasn’t always been properly assessed.
    Challenges: ephemeral, intangible. Solutions: recording it, and also sound waves, spectrographs
    Some tests (TOEFL in particular) have a long history of ignoring it: Only in 2005 TOEFL iBT/Contrast with Cambridge Certificate of Proficiency in English (1913) which already included it. However, Grammar-Translation approaches ignored it almost completely. Kaulfers 1944 created the first scales used to assess oral proficiency, designed for the military abroad
    Key notion: not accent, but intelligibility (the ease or difficulty with which a listener understands L2 speech. You can be highly intelligible with a non-native accent. It is only when the accent interferes with a learner’s ability that it should be considered in speaking scales.
    Very different approaches.
    Indirect (multiple choice as an indicator, not really valid or reliable)
    Direct or Semi-direct (responding to stimulus from a computer, TOEFL ibt, OTE, Aptis). Problems: raters and rating scales (which oversimplify the complexity of oral speech). Despite the practical challenges, they are the only valid formats for assessing L2 speech today. Conflict with the American tradition of “psychometrically influenced assessment tradition” focusing on the technical (statistical) reliability of test items (multiple choice) and the most administratively feasible test formats and item types in the context of large-scale, high-stakes tests (GRE?)
    The future?: Fully automated L2 speaking tests: Versant test, Speechrater. Automatic scoring systems (measuring grammatical accuracy, lexical frequency, acoustic variables, temporal variables)
    Not only speaking, also interaction (listening and speaking): Cambridge included interaction in 1996. Washback effect (usual practice in class, pairwork, groupwork). Problems: peer interlocutor variables (L2 proficiency, L1 background, gender, personality, etc). Solutions: more tasks.






  • Set representative tasks
    Specify all possible content
    Include a representative sample of the specified content
    Elicit valid samples of oral ability.
    Techniques:
    Interview (the candidate may feel intimidated): Questions, pictures, role play, interpreting (L1 to L2), prepared monologue, reading aloud
    Interaction with fellow candidates: discussion, roleplay
    Responses to audio- or video-recordings (semi-direct)
    Plan and structure the test carefully
    Make the oral test as long as it is feasible
    Plan the test carefully
    As many tasks (“fresh starts”) as possible
    Use a second tester
    Set only tasks that candidates could do easily in L1
  • Quiet room with good acoustics
    Put candidates at ease (at first, easy questions, not assessed, problem with note-taking?)
    Collect enough relevant information
    Do not talk too much
    (select interviewers carefully and train them)

    Ensure valid and reliable scoring:
    Create appropriate scales for scoring: HOLISTIC/ANALYTIC. Used as a check on each other
    Calibrate the scale to be used
    Select and train scorers (different from interviewers if possible)
    Follow acceptable scoring procedures

  • PROBLEMS:
    Indirect assessment: the exercise of receptive skills does not manifest itself directly. We need an instrument.
    We read in very different ways: scanning, skimming, inferring, intensive, extensive reading… All of them should be specified and tested

    SOME TIPS
    As many texts and operations as possible (Dialang). (Time limits for scanning or skimming?)
    Avoid texts which deal with general knowledge (answers will be guessed)
    Avoid disturbing topics, or texts students might have read
    Use, as much as possible, authentic texts
    Techniques: better short answer and gap filling than multiple choice. Also information transfer.
    Task difficulty can be lower than text difficulty
    Items should follow the order of the text
    Make items independent of each other
    Do not take into account errors of grammar or spelling.
  • Similar PROBLEMS to listening:
    Indirect assessment: the exercise of receptive skills does not manifest itself directly. We need an instrument.
    We listen in very different ways: scanning, skimming, inferring, intensive, extensive listening… All of them should be specified and tested
    And to Speaking:
    Transient nature of speech

    http://www.usingenglish.com/articles/why-your-students-have-problems-with-listening-comprehension.html

    Similar tips from Reading (go back to the list)
    If recording is used, make it as natural as possible (with typical spoken redundancy). Don’t read aloud written texts.
    Items should be far apart in the text (to have time to write them down)
    Give students time to become familiar with the tasks
    Techniques: apart from multiple choice, shot answers and gap filling, information transfer (draw a map of the accident), note taking, partial dictation (problem: do you consider spelling?), transcription (spelling names, numbers: real life task)
    Moderation (more teachers, trialing) is essential
    How many times? Why two? Never three
  • GRAMMAR:
    Why? Easy to test, Content validity: more than in any of the skills (Skills: we just cover a few of the topics, or operations from the specifications. Grammar: we can cover many more items)
    Why not? Harmful washback effect
    Maybe not in proficiency tests, but, if grammar is taught (and it almost always is), it should be included in achievement tests, placement and diagnostic tests. However, because of the potential harmful washback effect, it should not be given too much (porcentual) prominence.
    Specifications: from the Council of Europe books (Threshold, etc.)
    Techniques: Gap filling, rephrasings, completion
    Don’t penalize for mistakes that were not tested (-s if the item is testing relatives, for example)

    VOCABULARY
    Why (not)? Similar arguments as for grammar.
    Specifications: use frequency considerations (cobuild dictionaries)
    Techniques:
    Recognition: Recognise synonims, recognise definitions, recognise appropriate word for context
    Production: pictures, definitions, gap filling,
  • Special techniques which are more useful in tests where washback is not important: placement tests, for example

    Types:

    Cloze test (from closure). Based on the idea of “reduced redundancy”. Texts are always redundant. If we reduce the redundancy (by deleting a few words), native speakers are easily able to cope and guess the missing words. Originally, every seventh word. In the 80s ot ised to be considered as a language testing panacea (panasía). Easy to construct, administer and score. Unfortunately, poor validity. Native speakers cannot always guess the words. SUBTYPES:
    Selected deletion cloze
    Conversational cloze
    The C-Test: a variety of cloze, with the second half of every second word deleted. Puzzle-like
    Dictation: traditionally used (particularly in places like France, but not only). However, in the 60s, dictation testing was considered misguided. Later, nevertheless, research showed correlation between scores on dictation tests and scores on more complex tests, or on cloze tests. They are easy to create, and easy to adminster, but very difficult to score properly.

    Main problem with all of these tests: horrible washback effect.
  • Primary School: Other types of assessment are more appropriate. However, a common yardstick at the end is necessary: Pruebas Estandarizadas.
    Good opportunity to develop good attitudes towards assessment. Recommendations:
    Make testing an integral part of assessment, and assessment an integral part of the teaching program
    Feedback from tests should be immediate and positive
    Self assessment should be part of the teaching program
    Washback is more important than ever
    TIPS
    Short tasks: Short attention span
    Use stories and games
    Use pictures and color
    Don’t forget that children are still developing L1 and cognitive abilities
    Include interaction

    SOME TECHNIQUES:
    Placing objects or identifying people
    Multiple choice pictures
    Colour and draw
    Use pictures in reading and in writing
    Cartoon stories for writing
    Long warm-ups in speaking
    Use cards and pictures



  • 7.2 assessment and the cefr (2)

    1. 1. Chapters from Hughes’ Testing for Language Teachers 8. Common Test techniques: Elaine, 24th 9. Testing Writing: Marta, Idoia, 22nd 10. Testing Oral Abilities: Paula, Ángela, 24th 11. Testing Reading: Lucía, 24th 12. Testing Listening: Lorena, 22nd 13. Testing Grammar and Vocabulary: Clara, Cristina, 22nd 14. Testing Overall Ability: Jefferson, 22nd 15. Tests for Young Learners: Tania, Diego, 24th
    2. 2.  8. Common Test Techniques • Features:  Reliable  Valid  Reliably scored  Economical  Beneficial Washback effect • Multiple choice: Advantages/Disadvantages • Yes/No, True/False: like multiple choice • Short Answer: Adv./Disadv. • Gap-Filling
    3. 3. 1. Set representative tasks 1. Specify all possible content 2. Include a representative sample of the specified content 2. Elicit valid samples of writing ability 1. Set as many separate tasks as feasible 2. Test only writing ability and nothing else 3. Restrict candidates 3. Ensure valid and reliable scoring: 1. Set as many tasks as possible 2. Restrict candidates 3. Give no choice of tasks 4. Ensure long enough samples 5. Create appropriate scales for scoring: HOLISTIC/ANALYTIC 6. Calibrate the scale to be used 7. Select and train scorers 8. Follow acceptable scoring procedures 9. Avoid Taboo topics
    4. 4. • “The most highly prized language skill”, Lado’s Language Testing (1961). • Challenges:  Ephemeral, intangible.  Simultaneous assessment. Solutions?  Very stressful! • Features of oral speech: inaccuracy, unfinished sentences, less precision, generic vocabulary, pauses • Contrast US/UK: Certificate of Proficiency in English (1913) already included it, TOEFL only in 2005 iBT • Key notion: not accent, but intelligibility • Very different approaches.  Indirect  Direct (Cambridge, EOIs) or Semi-direct (TOEFL ibt, OTE, Aptis). Conflict with the American tradition.  The future?: Fully automated L2 speaking tests: Versant test, Speechrater. • Not only speaking, also interaction
    5. 5. 1. Set representative tasks 1. Specify all possible content 2. Include a representative sample of the specified content 2. Elicit valid samples of oral ability. 1. Techniques: 1. Interview :Questions, pictures, role play, interpreting (L1 to L2), prepared monologue, reading aloud 2. Interaction: discussion, roleplay 3. Responses to audio- or video-recordings (semi-direct) 2. Plan and structure the test carefully 1. Make the oral test as long as it is feasible 2. Plan the test carefully 3. As many tasks (“fresh starts”) as possible 4. Use a second tester 5. Set only tasks that candidates could do easily in L1
    6. 6. Plan and structure the test carefully (2) 1. Quiet room with good acoustics 2. Put candidates at ease (at first, easy questions, not assessed, problem with note-taking?) 3. Collect enough relevant information 4. Do not talk too much 5. (select interviewers carefully and train them) 1. Ensure valid and reliable scoring: 1. Create appropriate scales for scoring: HOLISTIC/ANALYTIC. Calibrate the scale to be used 2. Select and train scorers (different from interviewers if possible) 3. Follow acceptable scoring procedures
    7. 7. PROBLEMS:  Indirect assessment:  We read in very different ways: scanning, skimming, inferring, intensive, extensive reading… SOME TIPS  As many texts and operations as possible (Dialang).  Avoid texts which deal with general knowledge  Avoid disturbing topics, or texts students might have read  Use authentic texts  Techniques: better short answer and gap filling than multiple choice  Task difficulty can be lower than text difficulty  Items should follow the order of the text  Make items independent of each other  Do not take into account errors of grammar or spelling  Instructions: Easy to understand, even in L1  If higher stakes, trialling/piloting is essential  Provide an example of the task.
    8. 8. PROBLEMS  As in listening: Indirect assessment and different ways of listening  As in speaking: Transient nature of speech, Redundancy is typical  Anxiety!!! (everything in real time, no re-reading, can’t stop, or slow down) http://www.usingenglish.com/articles/why-your-students-have-problems-with-listening-comprehension.html TIPS:  Same as in reading  If recording is used, make it as natural as possible  Items should be far apart in the text  Give students time to become familiar with the tasks  Techniques: apart from multiple choice, shot answers and gap filling, information transfer, note taking, partial dictation, transcription  Moderation is essential  Write ítems after listening, not looking at the script  No general knowledge, no “common-sense” questions  Follow the order of the speech  Independent items  How many times?
    9. 9. GRAMMAR  Why? Easy to test, Content validity  Why not? Harmful washback effect  It depends on the type of test.  Specifications: Core Inventory/English Profile  Techniques: Gap filling, rephrasings, completion  Don’t penalize for mistakes that were not tested (-s if the item is testing relatives, for example) VOCABULARY  Why (not)?  Specifications: use frequency considerations (English Profile)  Techniques:  Recognition: Recognise synonims, recognise definitions, recognise appropriate word for context  Production: pictures, definitions, gap filling,
    10. 10. Useful in particular tests where washback is not important (placement tests, for example)  Cloze test (from closure). Based on the idea of “reduced redundancy”. Subtypes:  Selected deletion cloze  Conversational cloze  C-Tests: second half of every second word deleted. Example: “The passen___ sits bes___ the dri___”  Dictation: Long tradition. Easy to create and administer, but difficult to score properly. Main problema of all these techniques : horrible washback effect.
    11. 11. TIPS - Make testing and assessment an integral part of teaching - Feedback: immediate and positive - Self assessment - Washback more important than ever - Short tasks (short attention span) - Use stories and games - Use pictures and color - Don’t forget that children are still developing L1 and cognitive abilities - Include interaction - Use colour and drawing - Use cartoon stories - Long warm-ups in speaking - Use cards and pictures

    ×