SlideShare una empresa de Scribd logo
1 de 32
Dr. R. Green, Aug 2006 1
Principles of language
testing
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 2
Overview
 What are the principles of language testing?
 How can we define them?
 What factors can influence them?
 How can we measure them?
 How do they interrelate?
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 3
Reliability
Related to accuracy, dependability and consistency
e.g. 20°C here today, 20°C in North Italy – are they
the same?
According to Henning [1987], reliability is
 a measure of accuracy, consistency, dependability,
or fairness of scores resulting from the
administration of a particular examination e.g. 75%
on a test today, 83% tomorrow – problem with
reliability.
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 4
Validity: internal & external
Construct validity [internal]
 the extent to which evidence can be found to
support the underlying theoretical construct
on which the test is based
Content validity [internal]
 the extent to which the content of a test can
be said to be sufficiently representative and
comprehensive of the purpose for which it
has been designed
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 5
Validity [2]
Response validity [internal]
 the extent to which test takers respond in the
way expected by the test developers
Concurrent validity [external]
 the extent to which test takers' scores on one
test relate to those on another externally
recognised test or measure
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 6
Validity [3]
Predictive validity [external]
 the extent to which scores on test Y predict test
takers' ability to do X e.g. IELTS + success in
academic studies at university
Face validity [internal/external]
 the extent to which the test is perceived to reflect the
stated purpose e.g. writing in a listening test – is this
appropriate? depends on the target language
situation i.e. academic environment
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 7
Validity [4]
 'Validity is not a characteristic of a test, but a
feature of the inferences made on the basis
of test scores and the uses to which a test is
put.'
Alderson [2002: 5]
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 8
Practicality
The ease with which the test:
 items can be replicated in terms of resources
needed e.g. time, materials, people
 can be administered
 can be graded
 results can be interpreted
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 9
Factors which can
influence reliability,
validity and practicality…
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 10
Test [1]
 quality of items
 number of items
 difficulty level of items
 level of item discrimination
 type of test methods
 number of test methods
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 11
Test [2]
 time allowed
 clarity of instructions
 use of the test
 selection of content
 sampling of content
 invalid constructs
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 12
Test taker
 familiarity with test method
 attitude towards the test i.e. interest,
motivation, emotional/mental state
 degree of guessing employed
 level of ability
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 13
Test administration
 consistency of administration procedure
 degree of interaction between invigilators and
test takers
 time of day the test is administered
 clarity of instructions
 test environment – light / heat / noise /
space / layout of room
 quality of equipment used e.g. for listening
tests
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 14
Scoring
 accuracy of the key e.g. does it include
all possible alternatives?
 inter-rater reliability e.g. in writing,
speaking
 intra-rater reliability e.g. in writing,
speaking
 machine vs. human
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 15
How can we measure reliability?
Test-retest
 same test administered to the same test
takers following an interval of no more than 2
weeks
Inter-rater reliability
 two or more independent estimates on a test
e.g. written scripts marked by two raters
independently and results compared
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 16
Measuring reliability [2]
Internal consistency reliability estimates
e.g.
 Split half reliability
 Cronbach’s alpha / Kuder Richardson 20
[KR20]
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 17
Split half reliability
 test to be administered to a group of test takers is
divided into halves, scores on each half correlated
with the other half
 the resulting coefficient is then adjusted by
Spearman-Brown Prophecy Formula to allow for the
fact that the total score is based on an instrument
that is twice as long as its halves
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 18
Cronbach's Alpha [KR 20]
 this approach looks at how test takers
perform on each individual item and then
compares that performance against their
performance on the test as a whole
 measured on a -1 to +1 scale like
discrimination
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 19
Reliability is influenced by …..
 the longer the test, the more reliable it is likely to be
[though there is a point of no extra return]
 items which discriminate will add to reliability,
therefore, if the items are too easy / too difficult,
reliability is likely to be lower
 if there is a wide range of abilities amongst the test
takers, test is likely to have higher reliability
 the more homogeneous the items are, the higher
the reliability is likely to be
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 20
How can we measure validity?
According to Henning [1987]
 non-empirically, involving inspection, intuition
and common sense
 empirically, involving the collection and
analysis of qualitative and quantitative data
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 21
Construct validity
 evidence is usually obtained through such statistical
analyses as factor analysis [looks for items which
group together], discrimination; also through
retrospection procedures
Content validity
 this type of validity cannot be measured statistically;
need to involve experts in an analysis of the test;
detailed specifications should be drawn up to ensure
the content is both representative and
comprehensive
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 22
Response validity
 can be ascertained by means of interviewing test
takers [Henning]; asking them to take part in
introspection / retrospection procedures [Alderson]
Concurrent validity
 determined by correlating the results on the test with
another externally recognised measure. Care needs
to be taken that the two measures are measuring
similar skills and using similar test methods
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 23
Predictive validity
 can be determined by investigating the
relationship between a test taker's score e.g.
on IELTS/TOEFL and his/her success in the
academic program chosen
 problem - other factors may influence
success e.g. life abroad, ability in chosen
field, peers, tutors, personal issues, etc.;
also time factor element
Dr. R. Green, Aug 2006 24
Reliability vs. validity?
 'an observation can be reliable without being valid,
but cannot be valid without first being reliable. In
other words, reliability is a necessary, but not
sufficient, condition for validity.'
[Hubley & Zumbo 1996]
 ‘Of all the concepts in testing and measurement, it
may be argued, validity is the most basic and far-
reaching, for without validity, a test, measure or
observation and any inferences made from it are
meaningless’
[Hubley & Zumbo 1996, 207]
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 25
Reliability vs. validity [2]
 even an ideal test which is perfectly reliable
and possessing perfect criterion-related
validity will be invalid for some purposes
[Henning 1987]
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 26
Practicality
Designing and developing good test items
requires
 working with other colleagues
 materials i.e. paper, computer, printer etc.
 time
Some items look very attractive but this
attraction has to be weighed against these
factors.
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 27
References
 Alderson, J. C 2002 Conceptions of validity and validation.
Paper presented at a conference in Bucharest, June 2002.
 Angoff, 1988 Validity: An evolving concept. In H. Wainer & H.
Braun [Eds.] Test validity [pp. 19-32], Hillsdale, NJ: Erlbaum.
 Bachman, L. F. 1990 Fundamental considerations in language
testing. Oxford: O.U.P.
 Cumming A. & Berwick R. [Eds.] Validation in Language Testing
Multilingual Matters 1996
 Hatch, E. & Lazaraton, A. 1991 The Research Manual - Design
& Statistics for Applied Linguistics Newbury House
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 28
References [2]
 Henning, G. 1987 A guide to language testing: Development,
evaluation and research Cambridge, Mass: Newbury House
 Hubley, A. M. & Zumbo, B. D. A dialectic on validity: where we
have been and where we are going. The Journal of General
Psychology 1996. 123[3] 207-215
 Messick, S. 1988 The once and future issues of validity:
Assessing the meaning and consequences of measurement. In
H. Wainer & H. Braun [Eds.] Test validity [pp. 33-45], Hillsdale,
NJ: Erlbaum.
 Messick, S. 1989 Validity. In R. L. Linn [Ed.] Educational
measurement. [3rd ed., pp 13-103]. New York: Macmillan.
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 29
Item-total Statistics
Corrected Item-Total Alpha if Item
Correlation Deleted
R01 .5259 .7964
R02 .6804 .7594
R03 .6683 .7623
R04 .5516 .7940
R05 .7173 .7489
R16 .3946 .8288
N of Cases = 194.0 N of Items = 6 Alpha = .8121
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 30
Item-total Statistics
Corrected Item Total Alpha if Item
Correlation Deleted
R16 .5773 .7909
R17 .5995 .7863
R18 .7351 .7553
R19 .7920 .7419
R20 .6490 .7753
R01 .1939 .8663
N of Cases = 194.0 N of Items = 6 Alpha = .8185
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 31
Component Matrixa
.502 .559
.690 .423
.683 .461
.571 .404
.750 .343
.670 -.223
.631 -.508
.770 -.368
.789 -.383
.646 -.494
R01
R02
R03
R04
R05
R16
R17
R18
R19
R20
1 2
Component
Extraction Method: Principal Component Analysis.
2 components extracted.a.
EUROPOS SĄJUNGA
Dr. R. Green, Aug 2006 32
Thank you for your attention!
EUROPOS SĄJUNGA

Más contenido relacionado

La actualidad más candente

Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washbackMaury Martinez
 
validity and reliability
validity and reliabilityvalidity and reliability
validity and reliabilityaffera mujahid
 
Chapter 2(principles of language assessment)
Chapter 2(principles of language assessment)Chapter 2(principles of language assessment)
Chapter 2(principles of language assessment)Kheang Sokheng
 
Qualities of a good test (1)
Qualities of a good test (1)Qualities of a good test (1)
Qualities of a good test (1)kimoya
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validityshobhitsaxena67
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur HughesRajputt Ainee
 
Principles of language_assessment
Principles of language_assessmentPrinciples of language_assessment
Principles of language_assessmentLeidylanda
 
Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.Vadher Ankita
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validityMuhammad Faisal
 
Validity & reliability an interesting powerpoint slide i created
Validity & reliability  an interesting powerpoint slide i createdValidity & reliability  an interesting powerpoint slide i created
Validity & reliability an interesting powerpoint slide i createdSze Kai
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessmentSutrisno Evenddy
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrumentJohn Paul Hablado
 
Language Assessment Principles and Issues
Language Assessment Principles and IssuesLanguage Assessment Principles and Issues
Language Assessment Principles and IssuesMaury Martinez
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and FeasibilityJasna3134
 
Week 2 exercise_2015 (9)
Week 2 exercise_2015 (9)Week 2 exercise_2015 (9)
Week 2 exercise_2015 (9)Saida Efendieva
 

La actualidad más candente (20)

Validity
ValidityValidity
Validity
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washback
 
validity and reliability
validity and reliabilityvalidity and reliability
validity and reliability
 
Chapter 2(principles of language assessment)
Chapter 2(principles of language assessment)Chapter 2(principles of language assessment)
Chapter 2(principles of language assessment)
 
Qualities of a good test (1)
Qualities of a good test (1)Qualities of a good test (1)
Qualities of a good test (1)
 
Criteria of a good language test
Criteria of a good language testCriteria of a good language test
Criteria of a good language test
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur Hughes
 
Principles of language_assessment
Principles of language_assessmentPrinciples of language_assessment
Principles of language_assessment
 
Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validity
 
Validity & reliability an interesting powerpoint slide i created
Validity & reliability  an interesting powerpoint slide i createdValidity & reliability  an interesting powerpoint slide i created
Validity & reliability an interesting powerpoint slide i created
 
Test Usefulness
Test UsefulnessTest Usefulness
Test Usefulness
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessment
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrument
 
Language Assessment Principles and Issues
Language Assessment Principles and IssuesLanguage Assessment Principles and Issues
Language Assessment Principles and Issues
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
Week 2 exercise_2015 (9)
Week 2 exercise_2015 (9)Week 2 exercise_2015 (9)
Week 2 exercise_2015 (9)
 
Reliablity
ReliablityReliablity
Reliablity
 

Destacado

Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)Videoconferencias UTPL
 
Testing for Language Teachers
Testing for Language TeachersTesting for Language Teachers
Testing for Language Teachersmpazhou
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessmentAstrid Caballero
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language AssessmentA Faiz
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicalitySamcruz5
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importanceIerine Joy Caserial
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Maheen Iftikhar
 
Testing for language teachers 101 (1)
Testing for language teachers 101 (1)Testing for language teachers 101 (1)
Testing for language teachers 101 (1)Paul Doyon
 

Destacado (11)

Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
UTPL-LANGUAGE TESTING-II-BIMESTRE-(OCTUBRE 2011-FEBRERO 2012)
 
Criterion-related Validity (Overview)
Criterion-related Validity (Overview)Criterion-related Validity (Overview)
Criterion-related Validity (Overview)
 
Testing for Language Teachers
Testing for Language TeachersTesting for Language Teachers
Testing for Language Teachers
 
Validity
ValidityValidity
Validity
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessment
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language Assessment
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importance
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Testing for language teachers 101 (1)
Testing for language teachers 101 (1)Testing for language teachers 101 (1)
Testing for language teachers 101 (1)
 

Similar a Principles of language_testing_rita_green

Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxCopie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxMonsefJraid
 
Validity in Research
Validity in ResearchValidity in Research
Validity in ResearchEcem Ekinci
 
Principles of Second Language Assessmentc.pptx
Principles of Second Language Assessmentc.pptxPrinciples of Second Language Assessmentc.pptx
Principles of Second Language Assessmentc.pptxSubramanian Mani
 
PRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptxPRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptxJoelGuamani2
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessmentmunsif123
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTMusfera Nara Vadia
 
RCH 8301, Quantitative Research Methods 1 Course L
  RCH 8301, Quantitative Research Methods 1 Course L  RCH 8301, Quantitative Research Methods 1 Course L
RCH 8301, Quantitative Research Methods 1 Course LVannaJoy20
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of AssessmentYee Bee Choo
 
Validity and reliability (aco section 6a) sheena jayma msgs ed
Validity and reliability (aco section 6a) sheena jayma msgs edValidity and reliability (aco section 6a) sheena jayma msgs ed
Validity and reliability (aco section 6a) sheena jayma msgs edSheena Gyne Jayma
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 
Pilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude TestPilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude TestBahram Kazemian
 
Testing and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxTesting and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxSubramanian Mani
 
MAAM DAGUDAG -RESEARCH REPORT 2019.pptx
MAAM DAGUDAG -RESEARCH REPORT 2019.pptxMAAM DAGUDAG -RESEARCH REPORT 2019.pptx
MAAM DAGUDAG -RESEARCH REPORT 2019.pptxRODELAZARES3
 
POLIT.pptx
POLIT.pptxPOLIT.pptx
POLIT.pptxbeminaja
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011introKAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011introKAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011introKAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011introKAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011introKAthy Cea
 

Similar a Principles of language_testing_rita_green (20)

Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptxCopie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
Copie de PRESENTATION_ RELIABILITY _ VALIDITY.pptx
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Principles of Second Language Assessmentc.pptx
Principles of Second Language Assessmentc.pptxPrinciples of Second Language Assessmentc.pptx
Principles of Second Language Assessmentc.pptx
 
PRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptxPRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptx
 
Research Design
Research DesignResearch Design
Research Design
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessment
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENT
 
RCH 8301, Quantitative Research Methods 1 Course L
  RCH 8301, Quantitative Research Methods 1 Course L  RCH 8301, Quantitative Research Methods 1 Course L
RCH 8301, Quantitative Research Methods 1 Course L
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of Assessment
 
Validity and reliability (aco section 6a) sheena jayma msgs ed
Validity and reliability (aco section 6a) sheena jayma msgs edValidity and reliability (aco section 6a) sheena jayma msgs ed
Validity and reliability (aco section 6a) sheena jayma msgs ed
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
Pilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude TestPilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude Test
 
Testing and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxTesting and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptx
 
MAAM DAGUDAG -RESEARCH REPORT 2019.pptx
MAAM DAGUDAG -RESEARCH REPORT 2019.pptxMAAM DAGUDAG -RESEARCH REPORT 2019.pptx
MAAM DAGUDAG -RESEARCH REPORT 2019.pptx
 
POLIT.pptx
POLIT.pptxPOLIT.pptx
POLIT.pptx
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 

Último

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 

Último (20)

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Principles of language_testing_rita_green

  • 1. Dr. R. Green, Aug 2006 1 Principles of language testing EUROPOS SĄJUNGA
  • 2. Dr. R. Green, Aug 2006 2 Overview  What are the principles of language testing?  How can we define them?  What factors can influence them?  How can we measure them?  How do they interrelate? EUROPOS SĄJUNGA
  • 3. Dr. R. Green, Aug 2006 3 Reliability Related to accuracy, dependability and consistency e.g. 20°C here today, 20°C in North Italy – are they the same? According to Henning [1987], reliability is  a measure of accuracy, consistency, dependability, or fairness of scores resulting from the administration of a particular examination e.g. 75% on a test today, 83% tomorrow – problem with reliability. EUROPOS SĄJUNGA
  • 4. Dr. R. Green, Aug 2006 4 Validity: internal & external Construct validity [internal]  the extent to which evidence can be found to support the underlying theoretical construct on which the test is based Content validity [internal]  the extent to which the content of a test can be said to be sufficiently representative and comprehensive of the purpose for which it has been designed EUROPOS SĄJUNGA
  • 5. Dr. R. Green, Aug 2006 5 Validity [2] Response validity [internal]  the extent to which test takers respond in the way expected by the test developers Concurrent validity [external]  the extent to which test takers' scores on one test relate to those on another externally recognised test or measure EUROPOS SĄJUNGA
  • 6. Dr. R. Green, Aug 2006 6 Validity [3] Predictive validity [external]  the extent to which scores on test Y predict test takers' ability to do X e.g. IELTS + success in academic studies at university Face validity [internal/external]  the extent to which the test is perceived to reflect the stated purpose e.g. writing in a listening test – is this appropriate? depends on the target language situation i.e. academic environment EUROPOS SĄJUNGA
  • 7. Dr. R. Green, Aug 2006 7 Validity [4]  'Validity is not a characteristic of a test, but a feature of the inferences made on the basis of test scores and the uses to which a test is put.' Alderson [2002: 5] EUROPOS SĄJUNGA
  • 8. Dr. R. Green, Aug 2006 8 Practicality The ease with which the test:  items can be replicated in terms of resources needed e.g. time, materials, people  can be administered  can be graded  results can be interpreted EUROPOS SĄJUNGA
  • 9. Dr. R. Green, Aug 2006 9 Factors which can influence reliability, validity and practicality… EUROPOS SĄJUNGA
  • 10. Dr. R. Green, Aug 2006 10 Test [1]  quality of items  number of items  difficulty level of items  level of item discrimination  type of test methods  number of test methods EUROPOS SĄJUNGA
  • 11. Dr. R. Green, Aug 2006 11 Test [2]  time allowed  clarity of instructions  use of the test  selection of content  sampling of content  invalid constructs EUROPOS SĄJUNGA
  • 12. Dr. R. Green, Aug 2006 12 Test taker  familiarity with test method  attitude towards the test i.e. interest, motivation, emotional/mental state  degree of guessing employed  level of ability EUROPOS SĄJUNGA
  • 13. Dr. R. Green, Aug 2006 13 Test administration  consistency of administration procedure  degree of interaction between invigilators and test takers  time of day the test is administered  clarity of instructions  test environment – light / heat / noise / space / layout of room  quality of equipment used e.g. for listening tests EUROPOS SĄJUNGA
  • 14. Dr. R. Green, Aug 2006 14 Scoring  accuracy of the key e.g. does it include all possible alternatives?  inter-rater reliability e.g. in writing, speaking  intra-rater reliability e.g. in writing, speaking  machine vs. human EUROPOS SĄJUNGA
  • 15. Dr. R. Green, Aug 2006 15 How can we measure reliability? Test-retest  same test administered to the same test takers following an interval of no more than 2 weeks Inter-rater reliability  two or more independent estimates on a test e.g. written scripts marked by two raters independently and results compared EUROPOS SĄJUNGA
  • 16. Dr. R. Green, Aug 2006 16 Measuring reliability [2] Internal consistency reliability estimates e.g.  Split half reliability  Cronbach’s alpha / Kuder Richardson 20 [KR20] EUROPOS SĄJUNGA
  • 17. Dr. R. Green, Aug 2006 17 Split half reliability  test to be administered to a group of test takers is divided into halves, scores on each half correlated with the other half  the resulting coefficient is then adjusted by Spearman-Brown Prophecy Formula to allow for the fact that the total score is based on an instrument that is twice as long as its halves EUROPOS SĄJUNGA
  • 18. Dr. R. Green, Aug 2006 18 Cronbach's Alpha [KR 20]  this approach looks at how test takers perform on each individual item and then compares that performance against their performance on the test as a whole  measured on a -1 to +1 scale like discrimination EUROPOS SĄJUNGA
  • 19. Dr. R. Green, Aug 2006 19 Reliability is influenced by …..  the longer the test, the more reliable it is likely to be [though there is a point of no extra return]  items which discriminate will add to reliability, therefore, if the items are too easy / too difficult, reliability is likely to be lower  if there is a wide range of abilities amongst the test takers, test is likely to have higher reliability  the more homogeneous the items are, the higher the reliability is likely to be EUROPOS SĄJUNGA
  • 20. Dr. R. Green, Aug 2006 20 How can we measure validity? According to Henning [1987]  non-empirically, involving inspection, intuition and common sense  empirically, involving the collection and analysis of qualitative and quantitative data EUROPOS SĄJUNGA
  • 21. Dr. R. Green, Aug 2006 21 Construct validity  evidence is usually obtained through such statistical analyses as factor analysis [looks for items which group together], discrimination; also through retrospection procedures Content validity  this type of validity cannot be measured statistically; need to involve experts in an analysis of the test; detailed specifications should be drawn up to ensure the content is both representative and comprehensive EUROPOS SĄJUNGA
  • 22. Dr. R. Green, Aug 2006 22 Response validity  can be ascertained by means of interviewing test takers [Henning]; asking them to take part in introspection / retrospection procedures [Alderson] Concurrent validity  determined by correlating the results on the test with another externally recognised measure. Care needs to be taken that the two measures are measuring similar skills and using similar test methods EUROPOS SĄJUNGA
  • 23. Dr. R. Green, Aug 2006 23 Predictive validity  can be determined by investigating the relationship between a test taker's score e.g. on IELTS/TOEFL and his/her success in the academic program chosen  problem - other factors may influence success e.g. life abroad, ability in chosen field, peers, tutors, personal issues, etc.; also time factor element
  • 24. Dr. R. Green, Aug 2006 24 Reliability vs. validity?  'an observation can be reliable without being valid, but cannot be valid without first being reliable. In other words, reliability is a necessary, but not sufficient, condition for validity.' [Hubley & Zumbo 1996]  ‘Of all the concepts in testing and measurement, it may be argued, validity is the most basic and far- reaching, for without validity, a test, measure or observation and any inferences made from it are meaningless’ [Hubley & Zumbo 1996, 207] EUROPOS SĄJUNGA
  • 25. Dr. R. Green, Aug 2006 25 Reliability vs. validity [2]  even an ideal test which is perfectly reliable and possessing perfect criterion-related validity will be invalid for some purposes [Henning 1987] EUROPOS SĄJUNGA
  • 26. Dr. R. Green, Aug 2006 26 Practicality Designing and developing good test items requires  working with other colleagues  materials i.e. paper, computer, printer etc.  time Some items look very attractive but this attraction has to be weighed against these factors. EUROPOS SĄJUNGA
  • 27. Dr. R. Green, Aug 2006 27 References  Alderson, J. C 2002 Conceptions of validity and validation. Paper presented at a conference in Bucharest, June 2002.  Angoff, 1988 Validity: An evolving concept. In H. Wainer & H. Braun [Eds.] Test validity [pp. 19-32], Hillsdale, NJ: Erlbaum.  Bachman, L. F. 1990 Fundamental considerations in language testing. Oxford: O.U.P.  Cumming A. & Berwick R. [Eds.] Validation in Language Testing Multilingual Matters 1996  Hatch, E. & Lazaraton, A. 1991 The Research Manual - Design & Statistics for Applied Linguistics Newbury House EUROPOS SĄJUNGA
  • 28. Dr. R. Green, Aug 2006 28 References [2]  Henning, G. 1987 A guide to language testing: Development, evaluation and research Cambridge, Mass: Newbury House  Hubley, A. M. & Zumbo, B. D. A dialectic on validity: where we have been and where we are going. The Journal of General Psychology 1996. 123[3] 207-215  Messick, S. 1988 The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. Braun [Eds.] Test validity [pp. 33-45], Hillsdale, NJ: Erlbaum.  Messick, S. 1989 Validity. In R. L. Linn [Ed.] Educational measurement. [3rd ed., pp 13-103]. New York: Macmillan. EUROPOS SĄJUNGA
  • 29. Dr. R. Green, Aug 2006 29 Item-total Statistics Corrected Item-Total Alpha if Item Correlation Deleted R01 .5259 .7964 R02 .6804 .7594 R03 .6683 .7623 R04 .5516 .7940 R05 .7173 .7489 R16 .3946 .8288 N of Cases = 194.0 N of Items = 6 Alpha = .8121 EUROPOS SĄJUNGA
  • 30. Dr. R. Green, Aug 2006 30 Item-total Statistics Corrected Item Total Alpha if Item Correlation Deleted R16 .5773 .7909 R17 .5995 .7863 R18 .7351 .7553 R19 .7920 .7419 R20 .6490 .7753 R01 .1939 .8663 N of Cases = 194.0 N of Items = 6 Alpha = .8185 EUROPOS SĄJUNGA
  • 31. Dr. R. Green, Aug 2006 31 Component Matrixa .502 .559 .690 .423 .683 .461 .571 .404 .750 .343 .670 -.223 .631 -.508 .770 -.368 .789 -.383 .646 -.494 R01 R02 R03 R04 R05 R16 R17 R18 R19 R20 1 2 Component Extraction Method: Principal Component Analysis. 2 components extracted.a. EUROPOS SĄJUNGA
  • 32. Dr. R. Green, Aug 2006 32 Thank you for your attention! EUROPOS SĄJUNGA