SlideShare una empresa de Scribd logo
1 de 55
Using tests for high stakes evaluation, what
        educators need to know in Connecticut


Presenter - John Cronin, Ph.D.

Contacting us:
Rebecca Moore: 503-548-5129
E-mail: rebecca.moore@nwea.org

Visit our website: www.kingsburycenter.org
Connecticut requirements
•   Components of the evaluation
     – Student growth (45%) - including the state test, one non-standardized
       indicator, and (optional) one other standardized indicator.
         • Requires a beginning of the year, mid-year, and end-of year conference
     – Teacher practice and performance (40%) –
        • First and second year teachers – 3 in-class observations
        • Developing or below standard – 3 in-class observations
        • Proficient or exemplary – 3 observations of practice, one in-class
     – Whole-school learning indicator or student feedback (5%)
     – Parent or peer feedback (10%)
Connecticut requirements
Requirements for goal setting
• Process has each teacher set one to four goals with their principal taking into
   account:
• Take into account the academic track record and overall needs and strengths of
   the students the teacher is teaching that year/semester;
• Address the most important purposes of a teacher’s assignment through self-
   reflection;
• Be aligned with school, district and state student achievement objectives;
• Take into account their students’ starting learning needs vis a vis relevant baseline
   data when available.
• Consideration of control factors tracked by the state-wide public school
   information system that may influence teacher performance ratings, including, but
   not limited to, student characteristics, student attendance and student mobility
What changes for educators?

1.The proficiency standards get higher.
2.Teachers become accountable for all
students.
Difficulty of ACT college readiness
standards
Moving from Proficiency to Growth




              All students count when
            accountability is measured
                     through growth.
One district’s change in 5th grade math performance
           relative to Kentucky cut scores




       proficiency        college readiness
Number of 5th grade students meeting math growth
                         target in the same district
How does the process work?
How does the process work?
Connecticut requirements
•   Criteria for student growth indicator
     – Fair to students
         • The indicator of academic growth and development is used in such a way as to provide
           students an opportunity to show that they have met or are making progress in meeting the
           learning objective. The use of the indicator of academic growth and development is as free as
           possible from bias and stereotype.
     – Fair to teachers
         • The use of an indicator of academic growth and development is fair when a teacher has the
           professional resources and opportunity to show that his/her students have made growth
           and when the indicator is appropriate to the teacher’s content, assignment and class
           composition.
     – Reliable
     – Valid
     – Useful
         • The indicator may be used to provide the teacher with meaningful feedback about student
           knowledge, skills, perspective and classroom experience that may be used to enhance student
           learning and provide opportunities for teacher professional growth and development.
Issues in the use of growth and value-
added measures



           Measurement design of the
           instrument

           Many assessments are not
           designed to measure growth.
           Others do not measure growth
           equally well for all students.
Tests are not equally accurate for all
              students


          California STAR   NWEA MAP
Tests are not equally accurate for all
              students

       Grade 6 New York Mathematics
Issues in the use of growth and value-
added measures



           Measurement sensitivity

           Assessments must align with the
           curriculum and should be
           instructionally sensitive.
College and career readiness assessments will
         not necessarily be instructionally sensitive



  A thirdability might defined the discussion of
  …when science is ariseis defined inof knowledge
  When case in science in in terms terms of
  ethical that are taught in school…(then) where
  of factsand moral dimensions of science, those
  scientific reasoning…achievement will be less
  maturity, rather than intelligencethe facts will
  students who haveand exposure,or curriculum
  closely tied to age been taught and more
  exposure mighttothose most important factor.
  know them, and be the who have not will…not. A
  closely related general intelligence. In other
  Here it science reasoning tasks are relatively
  test that assesses thesethe assessment is not
  words, may well be that skills is likely to be
  particularly to instruction.
  highly sensitive to instruction.
  insensitive sensitive to instruction


Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design
principles drawn from international comparisons', Measurement:
Interdisciplinary Research & Perspective, 5: 1, 1 — 53
Issues in the use of growth and value-
added measures



           Measurement sensitivity

           Classroom tests, which are
           designed to measure mastery, may
           not measure improvement well.
Issues in the use of growth and value-
added measures



           Instructional alignment

           Tests should align to the teacher’s
           instructional responsibilities.
Issues in the use of growth and value-
added measures



           Uncovered Subjects and Teachers

           High quality tests may not be
           administered, or available, for many
           teachers and grades. Subjects like
           social studies may be particularly
           problematic.
Considerations for developing your own
     assessment and student learning objectives

• Developing valid instruments is very time consuming
  and resource intensive.
• The assessments developed must discriminate
  between effective and ineffective teachers.
• The assessments must be valid in other respects.
   – Aligned to curriculum
   – Unbiased items
• The assessments can’t be open to security
  violations or cheating
How does the process work?
Issues in the use of growth and value-
added measures



           Control for statistical error

           All models attempt to address this
           issue. Nevertheless, many teachers
           value-added scores will fall within
           the range of statistical error.
Sources of error in assessment

• The students.
• The testing conditions.
• The assessments.

Measurement error in the assessments can be dwarfed by error
  introduced by the testing conditions and the students.
New York City

• Margins of error can be very large
• Increasing n doesn't always decrease the margin o
• The margin of error in math is typically less than re
Range of teacher value-added
          estimates
Issues in the use of growth and value-
    added measures



                         “Among those who ranked in the top
                         category on the TAKS reading test, more
                         than 17% ranked among the lowest two
                         categories on the Stanford. Similarly
                         more than 15% of the lowest value-added
                         teachers on the TAKS were in the highest
                         two categories on the Stanford.”



Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes
Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI
(2010).
Issues in the use of growth and value-
added measures



           Instability of results

           A variety of factors can cause value-
           added results to lack stability.

           Results are more likely to be stable
           at the extremes. The use of
           multiple-years of data is highly
           recommended.
Los Angeles Unified

•   Teachers can easily rate in multiple categories
•   The choice of model can have a large impact
•   Models effect English more than Math
•   Teachers do better in some subjects than others
•   More complex models don't necessarily favor the t
Possible racial bias in models

“Significant evidence of bias plagued the value-added model
estimated for the Los Angeles Times in 2010, including significant
patterns of racial disparities in teacher ratings both by the race of
the student served and by the race of the teachers (see Green,
Baker and Oluwole, 2012). These model biases raise the possibility
that Title VII disparate impact claims might also be filed by teachers
dismissed on the basis of their value-added estimates.

Additional analyses of the data, including richer models using
additional variables mitigated substantial portions of the bias in the
LA Times models (Briggs & Domingue, 2010).”


                 Baker, B. (2012, April 28).
                 If it’s not valid, reliability doesn’t matter so much! More on VAM-ing
Instability at the tails of the
         distribution

      “The findings indicate that these modeling
      choices can significantly influence outcomes
      for individual teachers, particularly those in
      the tails of the performance distribution who
      are most likely to be targeted by high-stakes
      policies.”

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012)
Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specif




                                                             LA Times Teacher
                                                             #1
                                                             LA Times Teacher
                                                             #2
Reliability of teacher value-added
                         estimates
 Teachers with growth scores in lowest and
 highest quintile over two years using NWEA’s
 Measures of Academic Progress
               Bottom        Top quintile
               quintile      Y1&Y2
               Y1&Y2
 Number        59/493        63/493
 Percent       12%           13%


 r             .64           r2             .41


Typical r values for measures of teaching effectiveness range
between .30 and .60 (Brown Center on Education Policy, 2010)
How does the process work?
Challenges with goal setting

• Lack of a “racing form”. What have this
  teacher and these students done in the past?
• Lack of comparison groups. What have other
  teachers done in the past.
• What is the objective? Is the objective to
  meet a standard of performance or
  demonstrate improvement?
• Do you set safety goals or stretch goals?
Issues in the use of growth and value-
added measures



           Model Wars

           There are a variety of models in the
           marketplace. These models may
           come to different conclusions about
           the effectiveness of a teacher or
           school. Differences in findings are
           more likely to happen at the
           extremes.
Issues in the use of growth and value-
added measures

           Lack of random assignment

           The use of a value-added model
           assumes that the school doesn’t
           add a source of variation that isn’t
           controlled for in the model.

           e.g. Young teachers are assigned
           disproportionate numbers of
           students with poor discipline
           records.
How does the process work?
New York Rating System



•   60 points assigned from classroom observation
•   20 points assigned from state assessment
•   20 points assigned from local assessment
•   A score of 64 or less is rated ineffective.
Connecticut requirements
Other issues

      Security and Cheating

      When measuring growth, one
      teacher who cheats disadvantages
      the next teacher.
Other issues
      (1) Each district shall define effectiveness and
          ineffectiveness utilizing a pattern of summative
          ratings derived from the new evaluation system.

      (2) At the request of a district or employee, the State
          Department of Education or a third-party entity
          approved by the SDE will audit the evaluation
          components that are combined to determine an
          individual's summative rating in the event that such
          components are significantly dissimilar (i.e. include
          both exemplary and below standard ratings) to
          determine a final summative rating.

      (3) The State Department of Education or a third-party
          designated by the SDE will audit evaluations ratings
          of exemplary and below standard to validate such
          exemplary or below standard ratings by selecting ten
          districts at random annually
Other issues

      Security and Cheating

      When measuring growth, one
      teacher who cheats disadvantages
      the next teacher.
Cheating

      Atlanta Public Schools
      Crescendo Charter Schools
      Philadelphia Public Schools
      Washington DC Public Schools
      Houston Independent School
      District
      Michigan Public Schools
Case Study #1 - Mean value-added performance in mathematics by
school – fall to spring
Case Study #1 - Mean spring and fall test duration in minutes by
school
Case Study #1 - Mean value-added growth by school and test
duration
Case Study # 2


Differences in fall-spring test durations   Differences in growth index score
                                            based on fall-spring test durations
Case Study # 2

              How much of summer loss is really summer loss?

Differences in spring -fall test durations   Differences in raw growth based by
                                                   spring-fall test duration
Case Study # 2


 Differences in fall-spring test duration (yellow-black) and
 Differences in growth index scores (green) by school
Security considerations

• Teachers should not be allowed to view the contents
  of the item bank or record items.
• Districts should have policies for accomodation that
  are based on student IEPs.
• Districts should consider having both the teacher and
  a proctor in the test room.
• Districts should consider whether other security
  measures are needed for both the protection of the
  teacher and administrators.
Other issues

      Proctoring

      Proctoring both with and without the
      classroom teacher raises possible
      problems.

      Documentation that test
      administration procedures were
      properly followed is important.
Potential Litigation Issues


The use of value-added data for high stakes
personnel decisions does not yet have a strong,
coherent, body of case law.

Expect litigation if value-added results are the
lynchpin evidence for a teacher-dismissal case
until a body of case law is established.
Possible legal issues

• Title VII of the Civil Rights Act of 1964 –
  Disparate impact of sanctions on a protected
  group.
• State statutes that provide tenure and other
  related protections to teachers.
• Challenges to a finding of “incompetence”
  stemming from the growth or value-added
  data.
Recommendations

• Embrace the formative advantages of growth
  measurement as well as the summative.
• Create comprehensive evaluation systems with
  multiple measures of teacher effectiveness (Rand,
  2010)
• Select measures as carefully as value-added models.
• Use multiple years of student achievement data.
• Understand the issues and the tradeoffs.
Thank you for attending


Presenter - John Cronin, Ph.D.

Contacting us:
NWEA Main Number: 503-624-1951
E-mail: rebecca.moore@nwea.org

The presentation and recommended resources are
available at our website: www.kingsburycenter.org

Más contenido relacionado

La actualidad más candente

Fasp pd skills & beliefs
Fasp pd skills & beliefsFasp pd skills & beliefs
Fasp pd skills & beliefsyeolhuh
 
Multiple Measures Of Data Slide
Multiple Measures Of Data SlideMultiple Measures Of Data Slide
Multiple Measures Of Data SlideWSU Cougars
 
Measuring What Matters: Noncognitive Skills - GRIT
Measuring What Matters: Noncognitive Skills - GRITMeasuring What Matters: Noncognitive Skills - GRIT
Measuring What Matters: Noncognitive Skills - GRITSmarterServices Owen
 
Transforming with Technology
Transforming with TechnologyTransforming with Technology
Transforming with TechnologyForest Tyson
 
Last Curriculum Leadersip Class
Last Curriculum Leadersip ClassLast Curriculum Leadersip Class
Last Curriculum Leadersip Classdbrady3702
 
Preparing Teachers: Building Evidence for Sound Policy
Preparing Teachers: Building Evidence for Sound PolicyPreparing Teachers: Building Evidence for Sound Policy
Preparing Teachers: Building Evidence for Sound PolicyPioneer One
 
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...Bart Rienties
 
An introduction to contemporary educational testing and measurement
An introduction to contemporary educational testing and measurementAn introduction to contemporary educational testing and measurement
An introduction to contemporary educational testing and measurementIain Romel Nuenay
 
5 discussion issues on assessment.
5 discussion issues on assessment.5 discussion issues on assessment.
5 discussion issues on assessment.Sarjan Paul Vosko
 
Summative assessment( advantages vs. disadvantages)
Summative assessment( advantages vs. disadvantages)Summative assessment( advantages vs. disadvantages)
Summative assessment( advantages vs. disadvantages)Love Joy Amargo
 
Based standard assessment
Based standard assessmentBased standard assessment
Based standard assessmentjuli ani
 
Using Common Assessment Data to Predict High Stakes Performance- An Efficien...
Using Common Assessment Data to Predict High Stakes Performance-  An Efficien...Using Common Assessment Data to Predict High Stakes Performance-  An Efficien...
Using Common Assessment Data to Predict High Stakes Performance- An Efficien...Bethany Silver
 
Self-, peer-, and instructor-assessment from Bloom’s perspective
Self-, peer-, and instructor-assessment from Bloom’s perspective Self-, peer-, and instructor-assessment from Bloom’s perspective
Self-, peer-, and instructor-assessment from Bloom’s perspective dutra2009
 
Exploring learners’ motivations on Assessment in a Massive Open Online Course...
Exploring learners’ motivations on Assessment in a Massive Open Online Course...Exploring learners’ motivations on Assessment in a Massive Open Online Course...
Exploring learners’ motivations on Assessment in a Massive Open Online Course...Global OER Graduate Network
 
Recommendations on Formative Assessment and Feedback Practices for stronger e...
Recommendations on Formative Assessment and Feedback Practices for stronger e...Recommendations on Formative Assessment and Feedback Practices for stronger e...
Recommendations on Formative Assessment and Feedback Practices for stronger e...Global OER Graduate Network
 
Web 2 assessment pres2
Web 2 assessment pres2Web 2 assessment pres2
Web 2 assessment pres2kimbar94
 
MET_Gathering_Feedback_Practioner_Brief
MET_Gathering_Feedback_Practioner_BriefMET_Gathering_Feedback_Practioner_Brief
MET_Gathering_Feedback_Practioner_BriefPaul Fleischman
 
Predictors of Success: Student Achievement in Schools
Predictors of Success: Student Achievement in SchoolsPredictors of Success: Student Achievement in Schools
Predictors of Success: Student Achievement in SchoolsSchool Improvement Network
 
Modern Doctorate Literature Review
Modern Doctorate Literature ReviewModern Doctorate Literature Review
Modern Doctorate Literature Reviewkariwhaley
 

La actualidad más candente (19)

Fasp pd skills & beliefs
Fasp pd skills & beliefsFasp pd skills & beliefs
Fasp pd skills & beliefs
 
Multiple Measures Of Data Slide
Multiple Measures Of Data SlideMultiple Measures Of Data Slide
Multiple Measures Of Data Slide
 
Measuring What Matters: Noncognitive Skills - GRIT
Measuring What Matters: Noncognitive Skills - GRITMeasuring What Matters: Noncognitive Skills - GRIT
Measuring What Matters: Noncognitive Skills - GRIT
 
Transforming with Technology
Transforming with TechnologyTransforming with Technology
Transforming with Technology
 
Last Curriculum Leadersip Class
Last Curriculum Leadersip ClassLast Curriculum Leadersip Class
Last Curriculum Leadersip Class
 
Preparing Teachers: Building Evidence for Sound Policy
Preparing Teachers: Building Evidence for Sound PolicyPreparing Teachers: Building Evidence for Sound Policy
Preparing Teachers: Building Evidence for Sound Policy
 
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...
QAA Modelling and Managing Student Satisfaction: Use of student feedback to ...
 
An introduction to contemporary educational testing and measurement
An introduction to contemporary educational testing and measurementAn introduction to contemporary educational testing and measurement
An introduction to contemporary educational testing and measurement
 
5 discussion issues on assessment.
5 discussion issues on assessment.5 discussion issues on assessment.
5 discussion issues on assessment.
 
Summative assessment( advantages vs. disadvantages)
Summative assessment( advantages vs. disadvantages)Summative assessment( advantages vs. disadvantages)
Summative assessment( advantages vs. disadvantages)
 
Based standard assessment
Based standard assessmentBased standard assessment
Based standard assessment
 
Using Common Assessment Data to Predict High Stakes Performance- An Efficien...
Using Common Assessment Data to Predict High Stakes Performance-  An Efficien...Using Common Assessment Data to Predict High Stakes Performance-  An Efficien...
Using Common Assessment Data to Predict High Stakes Performance- An Efficien...
 
Self-, peer-, and instructor-assessment from Bloom’s perspective
Self-, peer-, and instructor-assessment from Bloom’s perspective Self-, peer-, and instructor-assessment from Bloom’s perspective
Self-, peer-, and instructor-assessment from Bloom’s perspective
 
Exploring learners’ motivations on Assessment in a Massive Open Online Course...
Exploring learners’ motivations on Assessment in a Massive Open Online Course...Exploring learners’ motivations on Assessment in a Massive Open Online Course...
Exploring learners’ motivations on Assessment in a Massive Open Online Course...
 
Recommendations on Formative Assessment and Feedback Practices for stronger e...
Recommendations on Formative Assessment and Feedback Practices for stronger e...Recommendations on Formative Assessment and Feedback Practices for stronger e...
Recommendations on Formative Assessment and Feedback Practices for stronger e...
 
Web 2 assessment pres2
Web 2 assessment pres2Web 2 assessment pres2
Web 2 assessment pres2
 
MET_Gathering_Feedback_Practioner_Brief
MET_Gathering_Feedback_Practioner_BriefMET_Gathering_Feedback_Practioner_Brief
MET_Gathering_Feedback_Practioner_Brief
 
Predictors of Success: Student Achievement in Schools
Predictors of Success: Student Achievement in SchoolsPredictors of Success: Student Achievement in Schools
Predictors of Success: Student Achievement in Schools
 
Modern Doctorate Literature Review
Modern Doctorate Literature ReviewModern Doctorate Literature Review
Modern Doctorate Literature Review
 

Destacado

Colorado assessment summit_oct12
Colorado assessment summit_oct12Colorado assessment summit_oct12
Colorado assessment summit_oct12John Cronin
 
New ways to think about framing accountability to your community
New ways to think about framing accountability to your communityNew ways to think about framing accountability to your community
New ways to think about framing accountability to your communityJohn Cronin
 
Parent conferencing with map
Parent conferencing with mapParent conferencing with map
Parent conferencing with mapJohn Cronin
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticutJohn Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growthJohn Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growthJohn Cronin
 
Teacher evaluation presentation mississippi
Teacher evaluation presentation mississippiTeacher evaluation presentation mississippi
Teacher evaluation presentation mississippiJohn Cronin
 
Teacher evaluation present
Teacher evaluation presentTeacher evaluation present
Teacher evaluation presentJohn Cronin
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 massJohn Cronin
 
Teacher evaluation presentation oregon
Teacher evaluation presentation   oregonTeacher evaluation presentation   oregon
Teacher evaluation presentation oregonJohn Cronin
 
Triggers for college success cr
Triggers for college success crTriggers for college success cr
Triggers for college success crJohn Cronin
 

Destacado (17)

Colorado assessment summit_oct12
Colorado assessment summit_oct12Colorado assessment summit_oct12
Colorado assessment summit_oct12
 
New ways to think about framing accountability to your community
New ways to think about framing accountability to your communityNew ways to think about framing accountability to your community
New ways to think about framing accountability to your community
 
Parent conferencing with map
Parent conferencing with mapParent conferencing with map
Parent conferencing with map
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Teacher evaluation presentation mississippi
Teacher evaluation presentation mississippiTeacher evaluation presentation mississippi
Teacher evaluation presentation mississippi
 
Nyinst
NyinstNyinst
Nyinst
 
Teacher evaluation present
Teacher evaluation presentTeacher evaluation present
Teacher evaluation present
 
Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
 
College
CollegeCollege
College
 
Teacher evaluation presentation oregon
Teacher evaluation presentation   oregonTeacher evaluation presentation   oregon
Teacher evaluation presentation oregon
 
Rv assessment
Rv assessment Rv assessment
Rv assessment
 
BLOCK HF trial
BLOCK HF trial BLOCK HF trial
BLOCK HF trial
 
Presentation1
Presentation1Presentation1
Presentation1
 
Triggers for college success cr
Triggers for college success crTriggers for college success cr
Triggers for college success cr
 

Similar a Connecticut mesuring and modeling growth

Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texasNWEA
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA
 
Colorado assessment summit_teacher_eval
Colorado assessment summit_teacher_evalColorado assessment summit_teacher_eval
Colorado assessment summit_teacher_evalJohn Cronin
 
Assessments for Programs and Learning
Assessments for Programs and LearningAssessments for Programs and Learning
Assessments for Programs and LearningLisa MacLeod
 
NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NWEA
 
2010 ohio tif meeting creating a comprehensive teacher effectiveness system
2010 ohio tif meeting  creating a comprehensive teacher effectiveness system2010 ohio tif meeting  creating a comprehensive teacher effectiveness system
2010 ohio tif meeting creating a comprehensive teacher effectiveness systemChristopher Thorn
 
Moving Beyond Student Ratings to Evaluate Teaching
Moving Beyond Student Ratings to Evaluate TeachingMoving Beyond Student Ratings to Evaluate Teaching
Moving Beyond Student Ratings to Evaluate TeachingVicki L. Wise
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthNWEA
 
Educational Assessment and Evaluation
Educational Assessment and Evaluation Educational Assessment and Evaluation
Educational Assessment and Evaluation HennaAnsari
 
IASB Student Growth Presentation
IASB Student Growth PresentationIASB Student Growth Presentation
IASB Student Growth PresentationRichard Voltz
 
Role on standarized and non standarized test in guidance on counseling
Role on standarized and non standarized test in guidance on counselingRole on standarized and non standarized test in guidance on counseling
Role on standarized and non standarized test in guidance on counselingUmaRani841531
 
Building Assessment Literacy with teachers and students
Building Assessment Literacy with teachers and studentsBuilding Assessment Literacy with teachers and students
Building Assessment Literacy with teachers and studentsahmadnaimullah1
 
Acis assessment presentation for posting
Acis assessment presentation for postingAcis assessment presentation for posting
Acis assessment presentation for postingJonathan Martin
 
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)Research in Action, Inc.
 
Assessment for higher education (for biology faculty seminar)
Assessment for higher education (for biology faculty seminar)Assessment for higher education (for biology faculty seminar)
Assessment for higher education (for biology faculty seminar)eduardo ardales
 
Assessment Literacy - Simon Roberts
Assessment Literacy - Simon RobertsAssessment Literacy - Simon Roberts
Assessment Literacy - Simon Robertsmdxaltc
 
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...10 Pros and Cons of Standardized Testing in Education | Future Education Maga...
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...Future Education Magazine
 

Similar a Connecticut mesuring and modeling growth (20)

Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13
 
Colorado assessment summit_teacher_eval
Colorado assessment summit_teacher_evalColorado assessment summit_teacher_eval
Colorado assessment summit_teacher_eval
 
Assessments for Programs and Learning
Assessments for Programs and LearningAssessments for Programs and Learning
Assessments for Programs and Learning
 
NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14
 
2010 ohio tif meeting creating a comprehensive teacher effectiveness system
2010 ohio tif meeting  creating a comprehensive teacher effectiveness system2010 ohio tif meeting  creating a comprehensive teacher effectiveness system
2010 ohio tif meeting creating a comprehensive teacher effectiveness system
 
Group1 Assessment
Group1 AssessmentGroup1 Assessment
Group1 Assessment
 
Moving Beyond Student Ratings to Evaluate Teaching
Moving Beyond Student Ratings to Evaluate TeachingMoving Beyond Student Ratings to Evaluate Teaching
Moving Beyond Student Ratings to Evaluate Teaching
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student Growth
 
ADOVH Validity and Reliability of Online Assessments.pdf
ADOVH Validity and Reliability of Online Assessments.pdfADOVH Validity and Reliability of Online Assessments.pdf
ADOVH Validity and Reliability of Online Assessments.pdf
 
Educational Assessment and Evaluation
Educational Assessment and Evaluation Educational Assessment and Evaluation
Educational Assessment and Evaluation
 
IASB Student Growth Presentation
IASB Student Growth PresentationIASB Student Growth Presentation
IASB Student Growth Presentation
 
Role on standarized and non standarized test in guidance on counseling
Role on standarized and non standarized test in guidance on counselingRole on standarized and non standarized test in guidance on counseling
Role on standarized and non standarized test in guidance on counseling
 
Building Assessment Literacy with teachers and students
Building Assessment Literacy with teachers and studentsBuilding Assessment Literacy with teachers and students
Building Assessment Literacy with teachers and students
 
Acis assessment presentation for posting
Acis assessment presentation for postingAcis assessment presentation for posting
Acis assessment presentation for posting
 
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
Assessment Selection Paper-Herman_Heritage_Goldschmidt (2011)
 
Assessment for higher education (for biology faculty seminar)
Assessment for higher education (for biology faculty seminar)Assessment for higher education (for biology faculty seminar)
Assessment for higher education (for biology faculty seminar)
 
Oerc june 2014 final ppt combined
Oerc june 2014 final ppt combinedOerc june 2014 final ppt combined
Oerc june 2014 final ppt combined
 
Assessment Literacy - Simon Roberts
Assessment Literacy - Simon RobertsAssessment Literacy - Simon Roberts
Assessment Literacy - Simon Roberts
 
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...10 Pros and Cons of Standardized Testing in Education | Future Education Maga...
10 Pros and Cons of Standardized Testing in Education | Future Education Maga...
 

Más de John Cronin

Nycoss presentation
Nycoss presentationNycoss presentation
Nycoss presentationJohn Cronin
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nweaJohn Cronin
 
Seven purposes presentation
Seven purposes presentationSeven purposes presentation
Seven purposes presentationJohn Cronin
 
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentationJohn Cronin
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement finalJohn Cronin
 
College readiness presentation
College readiness presentationCollege readiness presentation
College readiness presentationJohn Cronin
 
Tasa presentation version 2
Tasa presentation version 2Tasa presentation version 2
Tasa presentation version 2John Cronin
 
The purpose driven assessment system
The purpose driven assessment systemThe purpose driven assessment system
The purpose driven assessment systemJohn Cronin
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems croninJohn Cronin
 

Más de John Cronin (9)

Nycoss presentation
Nycoss presentationNycoss presentation
Nycoss presentation
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nwea
 
Seven purposes presentation
Seven purposes presentationSeven purposes presentation
Seven purposes presentation
 
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentation
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement final
 
College readiness presentation
College readiness presentationCollege readiness presentation
College readiness presentation
 
Tasa presentation version 2
Tasa presentation version 2Tasa presentation version 2
Tasa presentation version 2
 
The purpose driven assessment system
The purpose driven assessment systemThe purpose driven assessment system
The purpose driven assessment system
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
 

Connecticut mesuring and modeling growth

  • 1. Using tests for high stakes evaluation, what educators need to know in Connecticut Presenter - John Cronin, Ph.D. Contacting us: Rebecca Moore: 503-548-5129 E-mail: rebecca.moore@nwea.org Visit our website: www.kingsburycenter.org
  • 2. Connecticut requirements • Components of the evaluation – Student growth (45%) - including the state test, one non-standardized indicator, and (optional) one other standardized indicator. • Requires a beginning of the year, mid-year, and end-of year conference – Teacher practice and performance (40%) – • First and second year teachers – 3 in-class observations • Developing or below standard – 3 in-class observations • Proficient or exemplary – 3 observations of practice, one in-class – Whole-school learning indicator or student feedback (5%) – Parent or peer feedback (10%)
  • 3. Connecticut requirements Requirements for goal setting • Process has each teacher set one to four goals with their principal taking into account: • Take into account the academic track record and overall needs and strengths of the students the teacher is teaching that year/semester; • Address the most important purposes of a teacher’s assignment through self- reflection; • Be aligned with school, district and state student achievement objectives; • Take into account their students’ starting learning needs vis a vis relevant baseline data when available. • Consideration of control factors tracked by the state-wide public school information system that may influence teacher performance ratings, including, but not limited to, student characteristics, student attendance and student mobility
  • 4. What changes for educators? 1.The proficiency standards get higher. 2.Teachers become accountable for all students.
  • 5. Difficulty of ACT college readiness standards
  • 6. Moving from Proficiency to Growth All students count when accountability is measured through growth.
  • 7. One district’s change in 5th grade math performance relative to Kentucky cut scores proficiency college readiness
  • 8. Number of 5th grade students meeting math growth target in the same district
  • 9. How does the process work?
  • 10. How does the process work?
  • 11. Connecticut requirements • Criteria for student growth indicator – Fair to students • The indicator of academic growth and development is used in such a way as to provide students an opportunity to show that they have met or are making progress in meeting the learning objective. The use of the indicator of academic growth and development is as free as possible from bias and stereotype. – Fair to teachers • The use of an indicator of academic growth and development is fair when a teacher has the professional resources and opportunity to show that his/her students have made growth and when the indicator is appropriate to the teacher’s content, assignment and class composition. – Reliable – Valid – Useful • The indicator may be used to provide the teacher with meaningful feedback about student knowledge, skills, perspective and classroom experience that may be used to enhance student learning and provide opportunities for teacher professional growth and development.
  • 12. Issues in the use of growth and value- added measures Measurement design of the instrument Many assessments are not designed to measure growth. Others do not measure growth equally well for all students.
  • 13. Tests are not equally accurate for all students California STAR NWEA MAP
  • 14. Tests are not equally accurate for all students Grade 6 New York Mathematics
  • 15. Issues in the use of growth and value- added measures Measurement sensitivity Assessments must align with the curriculum and should be instructionally sensitive.
  • 16. College and career readiness assessments will not necessarily be instructionally sensitive A thirdability might defined the discussion of …when science is ariseis defined inof knowledge When case in science in in terms terms of ethical that are taught in school…(then) where of factsand moral dimensions of science, those scientific reasoning…achievement will be less maturity, rather than intelligencethe facts will students who haveand exposure,or curriculum closely tied to age been taught and more exposure mighttothose most important factor. know them, and be the who have not will…not. A closely related general intelligence. In other Here it science reasoning tasks are relatively test that assesses thesethe assessment is not words, may well be that skills is likely to be particularly to instruction. highly sensitive to instruction. insensitive sensitive to instruction Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53
  • 17. Issues in the use of growth and value- added measures Measurement sensitivity Classroom tests, which are designed to measure mastery, may not measure improvement well.
  • 18. Issues in the use of growth and value- added measures Instructional alignment Tests should align to the teacher’s instructional responsibilities.
  • 19. Issues in the use of growth and value- added measures Uncovered Subjects and Teachers High quality tests may not be administered, or available, for many teachers and grades. Subjects like social studies may be particularly problematic.
  • 20. Considerations for developing your own assessment and student learning objectives • Developing valid instruments is very time consuming and resource intensive. • The assessments developed must discriminate between effective and ineffective teachers. • The assessments must be valid in other respects. – Aligned to curriculum – Unbiased items • The assessments can’t be open to security violations or cheating
  • 21. How does the process work?
  • 22. Issues in the use of growth and value- added measures Control for statistical error All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.
  • 23. Sources of error in assessment • The students. • The testing conditions. • The assessments. Measurement error in the assessments can be dwarfed by error introduced by the testing conditions and the students.
  • 24. New York City • Margins of error can be very large • Increasing n doesn't always decrease the margin o • The margin of error in math is typically less than re
  • 25. Range of teacher value-added estimates
  • 26. Issues in the use of growth and value- added measures “Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.” Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).
  • 27. Issues in the use of growth and value- added measures Instability of results A variety of factors can cause value- added results to lack stability. Results are more likely to be stable at the extremes. The use of multiple-years of data is highly recommended.
  • 28. Los Angeles Unified • Teachers can easily rate in multiple categories • The choice of model can have a large impact • Models effect English more than Math • Teachers do better in some subjects than others • More complex models don't necessarily favor the t
  • 29. Possible racial bias in models “Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).” Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing
  • 30. Instability at the tails of the distribution “The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specif LA Times Teacher #1 LA Times Teacher #2
  • 31. Reliability of teacher value-added estimates Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress Bottom Top quintile quintile Y1&Y2 Y1&Y2 Number 59/493 63/493 Percent 12% 13% r .64 r2 .41 Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)
  • 32. How does the process work?
  • 33. Challenges with goal setting • Lack of a “racing form”. What have this teacher and these students done in the past? • Lack of comparison groups. What have other teachers done in the past. • What is the objective? Is the objective to meet a standard of performance or demonstrate improvement? • Do you set safety goals or stretch goals?
  • 34. Issues in the use of growth and value- added measures Model Wars There are a variety of models in the marketplace. These models may come to different conclusions about the effectiveness of a teacher or school. Differences in findings are more likely to happen at the extremes.
  • 35. Issues in the use of growth and value- added measures Lack of random assignment The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model. e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.
  • 36. How does the process work?
  • 37. New York Rating System • 60 points assigned from classroom observation • 20 points assigned from state assessment • 20 points assigned from local assessment • A score of 64 or less is rated ineffective.
  • 38.
  • 40. Other issues Security and Cheating When measuring growth, one teacher who cheats disadvantages the next teacher.
  • 41. Other issues (1) Each district shall define effectiveness and ineffectiveness utilizing a pattern of summative ratings derived from the new evaluation system. (2) At the request of a district or employee, the State Department of Education or a third-party entity approved by the SDE will audit the evaluation components that are combined to determine an individual's summative rating in the event that such components are significantly dissimilar (i.e. include both exemplary and below standard ratings) to determine a final summative rating. (3) The State Department of Education or a third-party designated by the SDE will audit evaluations ratings of exemplary and below standard to validate such exemplary or below standard ratings by selecting ten districts at random annually
  • 42. Other issues Security and Cheating When measuring growth, one teacher who cheats disadvantages the next teacher.
  • 43. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
  • 44. Case Study #1 - Mean value-added performance in mathematics by school – fall to spring
  • 45. Case Study #1 - Mean spring and fall test duration in minutes by school
  • 46. Case Study #1 - Mean value-added growth by school and test duration
  • 47. Case Study # 2 Differences in fall-spring test durations Differences in growth index score based on fall-spring test durations
  • 48. Case Study # 2 How much of summer loss is really summer loss? Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration
  • 49. Case Study # 2 Differences in fall-spring test duration (yellow-black) and Differences in growth index scores (green) by school
  • 50. Security considerations • Teachers should not be allowed to view the contents of the item bank or record items. • Districts should have policies for accomodation that are based on student IEPs. • Districts should consider having both the teacher and a proctor in the test room. • Districts should consider whether other security measures are needed for both the protection of the teacher and administrators.
  • 51. Other issues Proctoring Proctoring both with and without the classroom teacher raises possible problems. Documentation that test administration procedures were properly followed is important.
  • 52. Potential Litigation Issues The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law. Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established.
  • 53. Possible legal issues • Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group. • State statutes that provide tenure and other related protections to teachers. • Challenges to a finding of “incompetence” stemming from the growth or value-added data.
  • 54. Recommendations • Embrace the formative advantages of growth measurement as well as the summative. • Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010) • Select measures as carefully as value-added models. • Use multiple years of student achievement data. • Understand the issues and the tradeoffs.
  • 55. Thank you for attending Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org The presentation and recommended resources are available at our website: www.kingsburycenter.org

Notas del editor

  1. Race to the Top, Gates Foundation, Teach for America…