John De Jong: Optimizing Test & Courseware Development

Optimizing Test &
Courseware Development
Lisbon
23 April 2016
John De Jong
SVP Global Assessment Standards, Pearson
Professor of Language Testing VU University Amsterdam

3
PISA
Programme for International
Student Assessment

PISA Development over time
2000: Reading Mathematics and Science
+ Optional Electronic Reading
+ Optional Electronic Mathematics
2015: Electronic: Reading Mathematics and Science
+ Collaborative Problem Solving
2018 : Reading Mathematics and Science
+ Global Competence
4

Lessons from PISA
Major drivers of success of countries
• Clear standards defined at national level
• High level of teacher autonomy
5

… then, how to define standards?

Ranking CPS in higher
education and workplace
Applied Skill Rank Educ Rank Work
Oral Communications 3 1
Teamwork / Collaboration 3 1
Problem Solving 1 2
Written Communications 2 2
Information Technology Application 4 3
Lifelong Learning / Self Direction 2 4
Professionalism / Work Ethic 5 4
Ethics / Social Responsibility 6 4
Creativity / Innovation 3 5
Diversity 7 6
Leadership 7 7

Survey results
Definition Agree %
is clearly described 97
matches my own understanding
of CPS 95
will help higher ed institutions
to understand CPS 88
will help employers to
understand CPS 100
is what is taught in my country 52
The CPS definition is … Agree %

Crucial reformation targets
•Establish needs
•Define learning objectives
•Define coherent and realistic curriculum
•Engage students
9

15
Structural approach to defining objectives
Difficulty
Domain
Language

Domainsoflanguageuse/Topics
Difficulty
Self / personal experience
Negotiating with others
Deal with new
Academic
Specialized
Jokes
GE: A1 A2 B1 B2 C1 C2
AE: General MBA
PE: Waiter Politician
Coherent bank of objectives

A General Model of Language
Development
GeneralCognition
Language Proficiency
Measuring within
population of language
learners: measures both
linguistic and general
cognitive development
Measuring across two
populations of language
learners, may just
measure cognitive
development only.
Including appropriate
native speaker
population can help to
measure linguistic
development only
0 1 2 3 4 5 etc. “language age”
012345etc.“cognitiveage”

The Global Scale of English
18
Comparison PTE Academic (GSE scale) and IELTS and TOEFL
IELTS
TOEFL iBT

Sample page (from B1)
The Pearson Syllabus – General English

Overview
• A vocabulary framework linked to the Global Scale of English
(GSE) and the CEFR
• Organized by topics and subtopics based on the CoE Vantage
specifications categorization
• Describing vocabulary targets for learners of general English
• A probabilistic model of productive vocabulary learning
• Based on the principle of incremental learning of word
meanings, from basic to specialized
• Including 20k+ lemmas; 37k+ meanings; 80k+ collocations;
7k+ functional units
• Helping learners, teachers, and materials designers identify
level-appropriate vocabulary

Methodology
Combines frequency data and teacher judgements via 4 main
steps:
1. Corpus 2.5 billion words > extraction of frequency list
2. Semantic annotation
• Manual tagging of 37k word meanings using of CoE ‘Vantage’
3. Teacher ratings
• Rating of 37k word meanings by 10 teachers (scale: 1 to 5 + 99)
4. Statistical analysis
• Rank word meanings by combining frequency data and teacher ratings
5. Fit the data onto a model, link each meaning to the CEFR /GSE

Lemmas and meanings
Structure vocabulary around pedagogically relevant
sets using the CoE Vantage categorization
Example:
Specific Notions (Topics)
Fork > FOOD&DRINKS_tableware
SPORT&HOBBIES_gardening
TRAVEL_directions
23

Theoretical assumptions
A model of vocabulary growth based on current literature:
• Basic (A1) > 500-1k words (500 words as min. elementary level -Hill, 2013;
500-1k as general teaching target)
• Basic (A2)> boundary for high frequency vocabulary set at 3k families for
everyday conversation (Adolphs & Schmitt, 2003)
• Independent (B1) > 5k families to read authentic texts (Schmitt, 2007)
• Independent (B2) > minimum target of 10k lemmas at univ. level (Hazenberg
& Hulstijn, 1996) for Dutch; 8/9k f. for unassisted comprehension (Nation, 2006)
• Proficient (C1 upwards) > 20k f. known by educated L1 speakers (Nation,
2001); 50k w. known by most L1 speakers (Crystal, 1981)
Hill, D. R. (2001). Survey: Graded Readers. ELT Journal 55(3), Oxford University Press, 300-324
Adolphs, S. & Schmitt, N. (2003). Lexical coverage of spoken discourse. Applied Linguistics 24, 4: 425-438.
Schmitt, N. (2007). Current perspectives on vocabulary teaching and learning. In J. Cummins and C. Davison (eds.), International Handbook of
English language teaching: part II. NY: Springer, 827-841.
Hazenberg, S. & Hulstijn, J. H. (1996). Defining a minimal receptive second‐ language vocabulary for non‐native university students: An empirical
investigation. Applied Linguistics, 17 (2), 145‐163
Nation, I., S., P. (2006). How large a vocabulary is needed for reading and listening. The Canadian Modern Language Review, 63 (1), 59-82
Nation, P. (2001). Leaning vocabulary in another language. Cambridge: Cambridge University Press.Schmitt, N. (2000). Vocabulary in language
teaching. Cambridge: Cambridge University Press, pp.7-8
Crystal, D. (1981). Clinical Linguistics. Vienna, Springer

Data modelling 1
y = 0.006x3.539
R² = 0.9842
0
10,000
20,000
30,000
40,000
50,000
60,000
10 20 30 40 50 60 70 80 90
From GSE to ModelLem
Hypothesis: 'CumLem'
Model: 'ModelLem'

Meanings vs Lemmas
1.0
1.5
2.0
2.5
<T T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Average number of Meanings per Lemma

Vocabulary growth
0
2000
4000
6000
8000
10000
12000
14000
PreT T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Vocabulary growth by level
New Meanings New Lemmas

Cumulative vocabulary growth
0
10000
20000
30000
40000
50000
60000
PreT T A1 A2 A2+ B1 B1+ B2 B2+ C1 C2
Cumulative Vocabulary Growth by Level
Cumul Meanings Cumul Lemmas

The vocabulary usefulness rating
1 = Essential words learners would want to acquire first
2 = Important words that become necessary at a next stage
3 = Useful words enabling more detailed and specific
language
4 = Nice to have words to express concepts more accurately
5 = Extra words some language users will use occasionally
99 “Escape” words which are impossible to rate - you have
never heard of the word before or you cannot
decide between widely different ratings
 Teachers received online training and followed specific
guidelines
 Each word was rated by a random 10 out of the 19 raters in an
overlapping design using a pre-defined scale of 1-5

Combine ratings and Frequency data
Ra x rRating + Frank x (1- rRating) + Frank
Combine =
2
Where
Combine is the optimal combination of ratings and Frequency data
Ra is the Rating average
rRating is the Reliability of rating data
Frank is the scaled frequency rank.

adj.in People & relationships [personal traits]
 A1: happy (23), good (22);
 A2: angry (34), kind (36)
 A2+: noisy (39), silly (40)
 B1: upset (47), lonely (48)
 B1+: confident (51), nasty (53)
 B2: creative (59), sympathetic (63)
 B2+: kind-hearted (67), spoiled (70)
 C1: hypocritical (76), bashful (80)
 C2: shifty (86), sycophantic (88)
34

y = -3.8806x2 + 42.05x - 24.081
R² = 0.9974
10
20
30
40
50
60
70
80
90
1 2 3 4 5
Tourist
A1
A2
B1
B2
C1
C2
Essential
Important
Useful
Extra
Nice to have

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
10 20 30 40 50 60 70 80 90
LikelihoodofSuccess
GSE Task Difficulty
A learner at 25 on GSE
Girl, Mother
Boy, Father

John De Jong: Optimizing Test & Courseware Development

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (12)

Similar a John De Jong: Optimizing Test & Courseware Development

Similar a John De Jong: Optimizing Test & Courseware Development (20)

Más de eaquals

Más de eaquals (20)

Último

Último (20)

John De Jong: Optimizing Test & Courseware Development