David petrie iatefl 2014 chalk and cheese slide notes
1. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
(SLIDE 1: TITLE)
Good Morning. I’m David Petrie and I have a confession to make. I like exams. I feel I should say this and
get it out there in the open. I like exams. What I like, is the clarity of thinking and purpose that exams can
bring to a class. I went to Susan Sheehan’s session yesterday on Teaching for Learning and I agree with
pretty much everything she said but I firmly believe that most of the problems involved with testing and
assessment come down to inappropriate use. People just doing it wrong. For me, the right test, in the
right hands, and used for the right purposes, is a thing of beauty. And for the record, I think that both
IELTS and TOEFL are good tests. They are meticulously researched, tested and validated and they do what
they do, describing differences in learner English ability or competence, very well.
One of the reasons I like exams is because I work with them a lot. I’ve been the Senior Teacher for adult
and exam classes at International House Coimbra, Portugal, for the last six years or so, though I should add
that the views expressed here are my own and do not necessarily reflect those of my employers. But exam
preparation classes form the basis of my current professional responsibilities – helping learners prepare for
their exams, helping teachers prepare their learners and generally acting as a bit of a walking
encyclopaedia on all things exam related.
(SLIDE 2: VENN DIAGRAMS)
This presentation came about because I wanted to investigate how transferrable exam teaching skills are.
If, for example, you spend five years teaching IELTS in one country, how good would you be at teaching
TOEFL in another? Or FCE in a third? These exams had always seemed quite distinct entities and I wanted
to find out how true that was.
(SLIDE 3: TALK OUTLINE)
2. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
In this talk what I want to talk about is how these exams have been mapped onto each other and the
equivalency research that has been done into them. I want to show you what my findings were when I
looked at the structural similarities and differences between these two exams and finally I want to discuss
some of the implications for stakeholders in the examinations process.
(SLIDE 4: Part One Intro – IELTS and TOEFL graphics)
Why IELTS and TOEFL? Firstly, because these are possibly the most popular English language exams
available. I say possibly because I did have some difficulty finding decent statistics on the annual
candidatures. ETS say on their website that over 27 million people have taken the TOEFL test – which is up
by 2 million on a similar claim they make in a 2011 press release. IELTS announced last year that they had
2 million candidates in 2012, making them a carefully worded “world’s most popular English proficiency
test for higher education and global migration”.
And this is, of course, the other key point, that IELTS and TOEFL are both used as either access points or
entry barriers (depending on your point of view) for higher education.
(SLIDE 5: the numbers)
According to UK government statistics for 2011/12, ELT in the UK was worth 2.5 billion pounds – this is all
the students studying at language schools and centres around the country. On top of that, an additional
10.2 billion pounds was spent by international students in higher education. I’m not quite sure where
things like pre-sessional EAP figure in these statistics, but by any accounting, that’s an awful lot of IELTS
and TOEFL exams. When David Graddol yesterday mentioned our ambivalence towards the
corporatisation of ELT, I think it’s fair to say there are at least 10.2 billion reasons to feel ambivalent right
there!
3. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
So let’s look at how these two exams line up. Almost every university in the UK and indeed the wider
world beyond, at least those offering tuition in English, has an English language requirement and these are
usually expressed in the form of desirable language test scores, which are usually presented in equivalency
tables – like this one which I lifted off the University of Portsmouth website.
(SLIDE 6: Portsmouth screenshot)
There seems to be some disagreement and contradiction in in both how these scores align across different
universities and in terms of what different universities consider acceptable for admission – and there does
not seem to be much in the way of actual research done to verify these equivalencies.
The only suitably stringent score equivalency research I was able to uncover was a study by ETS, the
makers of TOEFL, who in 2010, tried to establish a correlation between IELTS and TOEFL iBT scores from a
sample of over 1,000 people. This was an opportunity sample of students who submitted their TOEFL iBT
and their IELTS scores to the researchers and data was concentrated in a central band, giving limited
information at higher and lower achievement levels. The researchers also point out that as the two tests
are built on different frameworks, “TOEFL iBT scores do not mean exactly the same thing as IELTS scores”
(ETS 2010: 14). We’ll look at that comment in more detail a little later on.
SLIDE 7 (ETS equivalency data)
A couple of key issues not addressed in the reports are (a) when the test takers took the relevant tests, (b)
which test was taken first, (c) whether the candidates received test preparation in advance, and if so, for
which test, (d) whether the candidates received test instruction or preparation between taking the two
tests. It seems likely that some of these issues would have affected some of these candidates, given the
potentially life-changing nature of these high-stakes exams, thus also potentially skewing the
correspondence between the results. As far as I know, this is the only reliable study that looks at this.
Note the little asterisks in the mid-range there – and the footnote at the bottom that says “indicates score
4. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
comparison ranges with the highest degree of confidence” – as I said earlier one of the other problems the
researchers had was finding a suitably broad research sample, so scores at the upper and lower ranges are
not as statistically strong as those in the middle.
You’ll also note the equivalency table for the CEFR – I was going to talk a little bit about that, but I’m wary
of the time, so perhaps on another occasion!
Of course, attempting to find score equivalencies between IELTS and TOEFL rests on the assumption that
the exams themselves are equivalent – in other words that they test the same things. Otherwise it’s a bit
like going to one doctor and having your blood pressure checked and then going to another doctor to have
your cholesterol looked at and asking both of them to separately tell you how healthy you are.
So – the next obvious question is to find out how similar the two exams are:
SLIDE 8: Structural Comparison
There is no overt test of language knowledge in either IELTS or TOEFL iBT, though ETS state grammar is
“evaluated in speaking and writing responses” (2008) for the TOEFL iBT and IELTS also gives “lexical
resource” and “grammatical range and accuracy” as assessment criteria in the speaking and writing
components (2010).
IELTS has two different versions available, the academic module and the general module. However this
only affects task types, not the timings or number of questions (IELTS, 2008).
ETS give a range for the number of questions and the timings on some papers. This is because ETS
incorporates additional questions into the Reading and Listening papers that do not count towards the
candidates score but are included as part of the test and item validity and reliability checking procedure
(ETS 2012).
5. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
Slide 9: Task Types Comparison
So for this table I went through the descriptions provided by IELTS of what tasks can be used in the exam
and then did the same thing with the ETS TOEFL description. It’s probably a bit too difficult to see
everything so let me just highlight the similarities
Slide 10: Task Types with similarities.
You’ll notice there are four task types that correspond to each other – writing an essay, multiple choice
reading, multiple choice listening and multiple matching listening. This isn’t necessarily a problem though
– different constructs. So I went on to look at the testing focus of the different sections. To do this, I used
the content analysis checklists developed by ALTE – the Association of Language Testers in Europe. As
ALTE emphasise, the tool is meant to help describe a single version of a test as subsequent versions of tests
may differ from each other. It should also be noted that Cambridge ESOL is a member of ALTE and that
they may have contributed to the research that helped devise this instrument, which may therefore more
readily reflect the composition of Cambridge Esol tests than their competitors. Nevertheless, in an
adapted form that selects the language ability testing focus and with the addition of any additional foci
that appear in the exams being described but not on the original ALTE criteria, these checklists provide a
very useful tool for the description and comparison of IELTS and TOEFL iBT.
While the receptive skills checklist details possible subskills testing foci, those listed for productive skills
could also be taken as a list of language functions, for example “giving instructions, making suggestions,
persuading”. While these are clearly important, there are other factors which need to be considered, not
least the form the output is required in, as elements of audience reciprocity and reaction as well as
discourse patterns and structures will influence the success of any language produced. The checklists for
speaking and writing have therefore been broadened to include additional criteria: prompts & inputs, task
type used and (for the writing checklist) the text type the candidate is expected to produce. These
6. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
additional criteria have also been drawn from the ALTE checklists and expanded as necessary to
incorporate additional features present in the target exams but not listed in the checklist. Equally, items
originally occurring in the ALTE criteria which analysis has shown do not apply to either of the target exams
have been removed from the checklist.
Now I’m very wary of time, and I don’t want to rush through this bit, but at the same time, it’s essentially
data display, which, if you feel so inclined, you can of course review later at your leisure. So I’m just going
to flash each section up for a minute or so.
SLIDE 11 Listening.
These are the listening correspondences
SLIDE 12 Reading
And these are the reading correspondences
SLIDE 13 Speaking
I’ve had to flip this on it’s side to get it onto the slide – but these are the speaking bits.
SLIDE 14 Writing
And finally here are the writing bits. I’m sorry it’s not so easy to read – again, this will all be available for
download. I was interested to see that in the writing section there is a greater degree of correspondence
than in the other sections.
I wonder whether this is because higher education systems are fundamentally different between the US
and the UK, but academic writing is more of a global skill because of the international nature of the
journals. A thought anyway.
7. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
Looking through all the criteria across both exams there are 55 distinct items, of which 42% are the same in
both exams. Or to look at it from the other side, they’re 58% different.
So what are the implications?
SLIDE 15 Implications.
Again – I think time is running out – so I’ll keep this short.
The implications are not particularly startling. The most obvious implication is quite simple – these are
different exams. They have different structures, they have different task types and there is considerable
variance within the testing focus of each exam – it is no wonder that it has proved difficult to conduct
score equivalency research – put simply – they aren’t equivalent. And this realization, or at least having
the results of an analysis to back up what was always a heartfelt conviction, that the exams were
fundamentally different creatures has caused me a certain amount of consternation. I do feel a little bit
like the boy in the story of the Emperor’s New Clothes – surely I can’t be the only person who sees that the
King is naked? I’m sure I’m not. But what I don’t understand is why there isn’t a double page tabloid
spread with the headline “NAKED KING IN TAILOR SCAM SHOCKER”. Of course, if I was a cynic I might think
that there were 10.2 billion reasons why that headline has never appeared.
But putting naked kings to one side for a moment, I do think that certain stakeholders need to rethink their
relationships with the exams.
Students need to think about which exams play to their strengths – it is not enough for them to choose an
exam based on utility or convenience – their strengths and weaknesses need to be determined and the
right exam chosen.
Teachers need to be fully informed about the options available to their students and I think need a greater
understanding of what each exam entails. It is not enough to recommend an exam based on expertise or
8. David Petrie - IATEFL 2014
“Chalk and Cheese – Equivalency Issues with IELTS and TOEFL”
experience – as I think often happens. Schools need to develop the expertise in their teachers across the
exams so that the teachers can fit the best exam to the student. After all, if institutions don’t seem to
mind which exam the students take, then the choice, made after analysis and informed advice, needs to be
the student’s.
Institutions however, need to rethink whether either of these exams meet their needs. Both exams are
meant to provide information about the suitability of candidates for higher education – given the disparity
between then I would suggest that at the very least they reflect very different philosophical views of
education – almost certainly they do not provide the same information about candidates. I’m also not
quite sure whether the prevalence of pre-sessional EAP courses reflects a shortcoming in the exams to test
desirable skills and abilities or whether academic skills just aren’t a necessary part of the testing process.
Perhaps institutions need to think about the exams which reflect their own values and needs more
accurately.
OK – I’m going to wrap up now.
SLIDE 16 contact info
The slides from this talk will be available via the blog shortly, do feel free to download them – I welcome
any and all criticism!
Thank you.