Presentation on the Hill by Department of ED's Technical Monitors details flaws in published Mathematica reports and presents strong positive impact estimates from the study in more credible standards based re-analysis correcting for identified report flaws. Contrary to the conclusions put forth by Mathematica for almost a decade the re-analysis found strong positive impacts for Upward Bound.
1. April 19 2012 Briefing
210 Cannon House Office Building
David Goodwin, Ph.D. Technical Monitor, First UB
Evaluation Contract; Former Division Director Policy
Analysis Studies (PAS); US Department of Education;
Retired Gates Foundation, currently Independent
Consultant
Margaret Cahalan, Ph.D. Technical Monitor, Final UB
Evaluation Contract; Currently Senior Scientist, Pell
Institute for the Study of Opportunity in Higher Education
3. Extreme unequal weighting and
serious representation issues
Project with 26 percent of
Figure 1. Percentage of sum of the weights by project of the 67 projects making up the
study sample: National Evaluation of Upward Bound, study conducted 1992-93-2003-04
weight (known as 69) was
sole representative of 4-
30
26.38
year public strata, but was
25
a former 2-year school
20
with largely less than 2-
15 Percent of weight
year programs
10
5
Project partnered with job
0
training program
1
3
6
8
0
2
4
7
9
2
4
6
3
4
9
1
3
5
7
9
1
8
0
5
7
9
1
4
6
8
0
2
7
P1
P1
P1
P1
P2
P2
P2
P2
P2
P3
P3
P3
P3
P4
P4
P6
P6
P7
P7
P7
P7
P7
P8
P4
P4
P4
P5
P5
P5
P5
P6
P6
P6
NOTE: Of the 67 projects making up the UB sample just over half (54 percent) have less than 1 percent of the weights each and one
Inadequate representation
project (69) accounts for 26.4 percent of the weights.
SOURCE: Data tabulated December 2007 using: National Evaluation of Upward Bound data files, study sponsored by the Policy and
Planning Studies Services (PPSS), of the Office of Planning, Evaluation and Policy Development (OPEPD), US Department of Education,:
study conducted 1992-93-2003-04.
of 4-year
4. Severe non-equivalency in project 69 in favor of control group
—explains observed negative results from project 69
Project 69 Other 66 projects in sample
100
100
Control, 20 Control, 23 90
90
80 Control, 49 Control, 49 Control, 51
80 70
70 60
Control, 79
60 50
50 40
Treatment, 80 Treatment, 77 30 Treatment, 51 Treatment, 51 Treatment, 49
40
20
30
10
20 0
Treatment, 21 High academic In 9th (younger) Expect advanced
10
risk grade in 1993-94 degree
0
Treatment Control
High academic In 9th (younger) Expect advanced
risk grade in 1993-94 degree
Treatment Control
The Pell Institute 4
5. 100
90
Control, 42 Control, 44
80
Control, 58
70
60
50
40
Treatment, 58 Treatment, 56
30
Treatment, 42
20
10
0
High academic In 9th (younger) Expect advanced
risk grade in 1993-94 degree
Treatment Control
The Pell Institute 5
6. Re-analyses corrected for
identified issues
Used similar statistical analysis procedures but unlike
published impact estimates the re-analyses:
1. Presented results with and without project 69
2. Standardized outcomes to expected high school graduation year
for sample that spanned 5 years of high school graduation dates
3. Used all applicable follow-up surveys (3 to 5) and 10 years of
federal aid files for source of data
4. Used National Student Clearinghouse (NSC) data only for BA
degree and not for enrollment or 2-year or less degrees because
coverage too low or non-existent in applicable period
7. Figure 3. Treatment on the Treated (TOT) and Intent to Treat (ITT) estimates of impact of
Upward Bound (UB) on postsecondary entrance within +1 year (18 months) of expected high
school graduation year (EHSGY) 1992-93 to 2003-04
Not UB participant (control) UB participant (treatment)
Difference
14.2****
TOT (excludes project 69) 60.4
74.6
64.3 Difference
ITT (excludes project 69)
73.3 9.0***
Difference
TOT (includes project 69) 62.5 11.0****
73.5
ITT (includes project 69 ) 66 Difference
72.9 6.9****
0 20 40 60 80
*/**/***/**** Significant at 0.10/0.05/. 01/00 level. NOTE. Model based estimates based on STATA logistic
and instrumental variables regression and also taking into account the complex sample design. Based on
responses to three follow-up surveys and federal student aid files. SOURCE: Data tabulated January 2008
using: National Evaluation of Upward Bound data files, study sponsored by the Policy and Program Studies
Services (PPSS), US Department of Education: study conducted 1992-93 to 2003-04; and federal Student
Financial Aid (SFA) files 1994-95 to 2003-04. (Excerpted from the Cahalan Re-Analysis Report, Figure IV)
8. Figure 4. Impact of Upward Bound (UB) on Bachelor’s (BA) degree attainment:
estimates based on 66 of 67 projects in UB sample: National Evaluation of Upward
Bound, study conducted 1992-93 to 2003-04
TOT (Longitudinal file BA in +8
years of EHSGY- evidence from 14.6
any Followup Survey (Third to
Fifith) or NSC; no evidence set 21.7
to 0)****
TOT(BA by end of the survey
period, Fifth Follow-Up 21.1
Control
responders only-adjusted for
28.7 Treatment
non-response)****
ITT (Longitudinal file BA in +8
years of EHSGY- evidence from 13.7
any Followup Survey (Third to
Fifith) or NSC; no evidence set 17.5
to 0)****
0 5 10 15 20 25 30 35
*/**/***/**** Significant at 0.10/0.05/.01/00 level; NS = not significant at the .10 level or below. NOTE: TOT
= Treatment on the Treated; ITT= Intent to Treat; EHSGY = Expected High School Graduation Year; NSC =
National Student Clearinghouse; SFA = Student Financial Aid All estimates significant at the .01 level or
higher. Estimates based on 66 of 67 projects in sample representing 74 percent of UB at the time of the study.
One project removed due to introducing bias into estimates in favor of the control group and representational
issues. Model based estimates based on STATA logistic and instrumental variables regression taking into
account the complex sample design. We use a 2-stage instrumental variables regression procedure to control for
selection effects for the Treatment on the Treated (TOT) impact estimates. ITT estimates include 14 percent of
control group who were in Upward Bound Math Science and 20-26 percent of treatment group who did not
enter Upward Bound. SOURCE: Calculated January 2010 using: National Evaluation of Upward Bound data
files, study sponsored by the Policy and Program Studies Services (PPSS), U.S. Department of Education; study
conducted 1992-9 to -2003-04.
9. Summary
Mathematica conclusions of “no detectable impact” of the Upward Bound program
on postsecondary entrance, financial aid, and degree attainment are not robust
and are seriously flawed
A credible standards based re-analysis correcting for identified sources of study
error detected statistically significant and educationally meaningful substantive
positive impacts for the Upward Bound program that are not acknowledged in the
Mathematica reports
The reports are not transparent in reporting study issues and alternative results
such that readers, including expert peer reviewers, have enough information to
make judgments concerning the validity of the Mathematica conclusions about
Upward Bound
There is a need to acknowledge publically that the Mathematica study is
not capable of producing robust estimates for the entire population of UB
at the time, and can produce reasonably robust estimates only for the 74
percent of UB not represented by project 69.
10. Flawed reports have had serious negative
consequences for the UB program
• Based on earlier reports from the study the program has Ineffective OMB PART
rating that still stands
• Administration recommendations for zero funding in FY05 and FY06 justified by
study Third follow-up findings published in 2004
Lack of impact findings are widely quoted in academic research and in testimony
to Congress.
Dr. Russ Whitehurst, former Director of IES, in November of 2011 listed UB as a program that
did not work- in testimony on the Federal Role in Education research for the reauthorization
of IES.
http://www.brookings.edu/testimony/2011/1116_education_research_whitehurst.aspx
American Youth Policy Institute’s Success at Every step publication
www.aypf.org/publications/SuccessAtEveryStep.htm also reports Mathematica
findings
11. Serious Concern Needs Addressing
The UB program reputation continues to be hurt by the
evaluation
Missed opportunity to build on the program’s successes
and find ways to strengthen and adapt program to achieve
nation’s goals of increased postsecondary access and
completion
Evaluation research as a whole suffers from not correcting
mistakes made and learning from them
12. More Information can be found
The full text of the COE Request for Correction can be
found at:
http://www.coenet.us/files/pubs_reports-COE_Request_for_C
Statement of concern by leading researchers in field:
http://www.coenet.us/files/ED-Statement_of_Concern_011712.p
Results of the re-analysis detailing study error issues can
be found at:
http://www.pellinstitute.org/downloads/publications-Do
Information on obtaining the restricted use UB data files
for additional research can be obtained by contacting:
Sandra.Furey@ed.gov
In this part of the presentation. I’m going to share with you some findings from a re-analysis we did at ED as part of a QA review of the Upward Bound study. We became aware of the issues that David has reviewed only gradually over a period of several years. ED first became aware of the fact that one project was carrying 26 percent of the weight in 2005 when a Mathematica analyst reported that one project, project 69, had negative impacts and this was changing the conclusions. We did not become aware of the representational issues with project 69 until 2-years later after the contract was over and ED received a copy of the sampling frame and learned the identity of project 69. We also did not know at first that the reason project 69 had seemingly negative impacts was because of the extreme imbalance between the treatment and control group. We started the re-analysis to attempt to understand what was going on and then correct and mitigate these issues. After consulting with statistical experts, we used a standards based approach using NCES and evaluation research standards as guides I used similar procedures –logistic regression and instrumental variables regression as did Mathematica but the re-analysis differed in 4 important ways: Presented the results with and without project 69 Standardized outcomes to expected high school graduation year for the sample which spanned 5 years of high school graduation dates Used all applicable follow-up survey data from the third through fifth follow-ups and th10 years of the federal aid files Used National Student Clearinghouse data only for BA and did not use for enrollment impact estimation or for 2-year and less than 2-year degrees as coverage was too low or non-existent as for 2-year and less than 2-year degrees with high potential for bias due to non-participation in applicable period for project 69
Figure 3 presents the Treatment on the Treated (TOT) and Intent to Treat (ITT) estimates of impact of Upward Bound on postsecondary entrance within 1 year of high school graduation . The TOT estimates compare those students completing a baseline survey to get onto a “waiting list” in middle school or early high school who were randomly assigned to UB and who participated in the program with those who were not assigned to UB and who did not participate. The ITT estimates compare those randomly assigned to treatment or control regardless of participation in the program. As can be seen there are substantial positive impacts with and without project 69 with larger impacts without project 69 due to the imbalance that David as pointed out . Looking at the Treated on Treated (TOT) estimates without project 69, we see that there is a 14.2 percentage point difference between the treatment and control group (60.4 percent for the control group and 74.6 percent for the treatment) group.
Figure 4 presents the significant large impact of Upward Bound on BA degree attainment for the 66 of the 67 sampled projects that as David has shown when taken together have a reasonable balance between the treatment and control group on baseline characteristics. This is among the most significant finding that is missed in the Mathematica reports which conclude that Upward Bound had no detectable impact on BA attainment. These results indicate that there is an almost 50 percent increase in BA attainment for the TOT estimate and a 30 percent impact for the ITT estimate. These are significant and very large impacts.
In summary, As the Technical Monitor for the study after a close examination of the study design and work with the data files over a period of several years, we believe that the Mathematica conclusions of “no detectable impact” of the UB program on postsecondary entrance, financial aid and degree attainment are seriously flawed. These flaws are a result of three factors: 1) serious unequal weighting; 2) misrepresentation of the largest 4-year stratum by a 2-year program school; and 3) importantly a serious bias in favor of the control group introduced by project 69’s severe differences between its treatment and control group indicating that a breakdown of the random assignment may have occurred in this project. A credible standards based re-analysis correcting for study errors shows significant and substantial positive impacts for UB on the goals of the program Moreover the reports are not transparent in reporting study issues and alternative results such that readers including expert peer reviewers have enough information to make a judgment concerning the validity of the Mathematica conclusions about UB. There is a need to acknowledge publically that the Mathematica study is not capable of producing robust estimates for the entire population of UB at the time, and can produce reasonably robust estimates only for the 74 percent of UB not represented by project 69.