SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
An Introduction to Impact Evaluation
in Observational (Non-Experimental) Settings

                                 Alexis Diamond
                  Development Impact Department
Goals for this Presentation


• To explain key differences between randomized experiments
  (RCTs) and observational studies

• To briefly sketch some of the most important methods of
  causal inference in observational studies, showing how they
  might be applied to answer questions in access to finance
  projects, and offering practical guidance on:
     Matching (an estimator, or a tool for designing observational studies?)
     Differences-in-differences
     Encouragement design (Instrumental variable, or “IV” regression)
     Regression discontinuity design
     Synthetic control methods




                                        2
Basic concepts

• Observational study: comparison of treated and control
  groups in which the objective is to estimate cause and effect
  relationships, without the benefit of random assignment.
  Observational studies are also known as quasi-experiments or
  natural experiments
• In a randomized experiment, random chance forms
  comparison groups (treatment and control), making groups
  comparable in terms of both measureable characteristics and
  characteristics that cannot be measured.
• Generally, If assumptions are met, causal conclusions follow—
  but generally only in randomized experiments do we KNOW
  assumptions are met; otherwise, assumptions aren’t testable.

                               3
Experiments (RCTs) vs. observational studies




            ―That’s not an experiment you have there,
            that’s an experience.‖

               — Sir R. A. Fisher (England, 1890-1962)
        Because of selection bias—the presence of confounders
Reflected in the controversy over Pitt/Khandker and Roodman/Morduch

                                 4
Selection bias: ―perfect implementation‖

 A microfinance project is reporting the ex-post impact
indicator $/day for participants and non-participants…

       i      Yi(observed)       Treatment Status
       1           5                Treatment
       2           6                Treatment
       3           4                Treatment
       4           4                  Control
       5           2                  Control
       6           6                  Control




                             5
Selection bias: ―perfect implementation‖

 A microfinance project is reporting the ex-post impact
indicator $/day for participants and non-participants…

       i      Yi(observed)       Treatment Status
       1           5                Treatment
       2           6                Treatment
       3           4                Treatment
       4           4                  Control
       5           2                  Control
       6           6                  Control

        Average for the treatment group: $5/day



                             6
Selection bias: ―perfect implementation‖

 A microfinance project is reporting the ex-post impact
indicator $/day for participants and non-participants…

       i      Yi(observed)       Treatment Status
       1           5                Treatment
       2           6                Treatment
       3           4                Treatment
       4           4                  Control
       5           2                  Control
       6           6                  Control

        Average for the treatment group: $5/day
         Average for the control group: $4/day


                             7
Selection bias: ―perfect implementation‖

 A microfinance project is reporting the ex-post impact
indicator $/day for participants and non-participants…

       i      Yi(observed)       Treatment Status
       1           5                Treatment
       2           6                Treatment
       3           4                Treatment
       4           4                  Control
       5           2                  Control
       6           6                  Control

        Average for the treatment group: $5/day
         Average for the control group: $4/day
                 Difference = +$1/day:

                             8
Selection bias: ―perfect implementation‖
 How should one think about that result, +$1/day?

  Does it mean the project has positive impact?




                        9
Selection bias: ―perfect implementation‖
     How should one think about that result, +$1/day?

      Does it mean the project has positive impact?

                     Impact on who?
i    Yi    Yi(1)     Yi(0)    Treatment Status   Yi(1) – Yi(0)
1     5     5          ?          Treatment           ?
2     6     6          ?          Treatment           ?
3     4     4          ?          Treatment           ?
4     4      ?        4            Control            ?
5     2      ?        2            Control            ?
6     6      ?        6            Control            ?




                             10
Selection bias: ―perfect implementation‖
              Avg Treatment Effect for Treated (ATT) = 3
              Avg Treatment Effect for Control (ATC) = -1
               Avg Treatment Effect (ATE) = +3 – (-1) = 4

    i    Yi      Yi(1)      Yi(0)    Treatment Status   Yi(1) – Yi(0)
    1     5        5         2           Treatment           +3
    2     6        6         3           Treatment           +3
    3     4        4         1           Treatment           +3
    4     4        3         4            Control            -1
    5     2        1         2            Control            -1
    6     6        5         6            Control            -1

That simple $1/day difference we identified earlier = ATT + BIAS


                                    11
Selection bias: Ignore it at your peril

Identifying impacts requires identifying Y(1) and Y(0) for the same units

      BIAS can be positive/negative, big/small, observed/hidden…

        i   Yi    Yi(1)     Yi(0)    Treatment Status   Yi(1) – Yi(0)
        1   5      5          ?          Treatment           ?
        2   6      6          ?          Treatment           ?
        3   4      4          ?          Treatment           ?
        4   4       ?         4           Control            ?
        5   2       ?         2           Control            ?
        6   6       ?         6           Control            ?



                                    12
Observational studies: Are they credible? Yes, but…

   A judgment-free method for dealing with problems of
   sample selection bias is the Holy Grail of the evaluation
   literature, but this search reflects more the aspirations of
   researchers than any plausible reality…
     —Rajeev Dehejia, ―Practical Propensity Score Matching‖

   • Some have tried to set up tests for observational methods:
   e.g., “Can method X (matching, regression, IV, etc.) recover
   the true (experimental) benchmark?”

   • Such efforts have generally failed to conclusively validate
   observational studies.


                                 13
Observational studies: Are they credible? Yes, but…
     History abounds with examples where causality has
     ultimately found general acceptance without any
     experimental evidence…
     The evidence of a causal effect of smoking on lung
     cancer is now generally accepted, without any direct
     experimental evidence to support it…
     At the same time, the long road toward general
     acceptance of the causal interpretation …shows the
     difficulties in gaining acceptance for causal claims
     without randomization.
             —Guido Imbens, ―Better LATE than Nothing‖


                               14
Why bother with observational studies?

• Studies that start as perfect RCTs often end as broken RCTs,
  not “gold-standard” RCTs. These broken RCTs may be better
  than many observational studies, but there is no bright line
  distinguishing broken RCTs from observational studies.

• Standard RCTs cannot address many important policy issues
  (i.e., macroeconomic questions, or cases with general
  equilibrium effects more broadly)

• Other issues are difficult to address with RCTs, setting up a
  trade-off between rigor and relevance. What’s better—the
  RCT in a lab setting, or the equivalent observational study?

• RCTs are often more expensive, time-consuming, and fragile
  than alternatives—can be high risk and not always strategic.

                                15
More advantages of observational studies

• Sometimes you can use pre-existing data, which has time and
  cost advantages (though there are clear trade-offs)
    o Typical out-of-pocket time/cost of a World Bank RCT: > 1 year & $500K

    o Occasionally they can be done cheaply and easily, especially in a place
      like India (there are examples where it costs < $50,000)

    o With administrative data, observational studies may have no (or trivial)
      out-of-pocket costs, and be completed in days or weeks.

• Sometimes you want to apply observational methods to
  experimental data

• Good for hypothesis-generation

• Avoids RCT’s ethical considerations

                                     16
Methodology #1: Matching


You: My clients enjoy big impacts from our bank’s financing
Critic: Compared to whom? Where’s the control group?
You: Ok, I’ll go find one—and then you’ll see!




                              17
Methodology #1: Matching


      You: My clients enjoy big impacts from our bank’s financing
      Critic: Compared to whom? Where’s the control group?
      You: Ok, I’ll go find one—and then you’ll see!
              Eg: Boonperm/Haughten, “Thailand Village Fund” (2009)

         Control 1
X2:
Age
              Treated

      Control 2
                     Control 3



                         X1: Education


                                            18
Methodology #1: Matching


      You: My clients enjoy big impacts from our bank’s financing
      Critic: Compared to whom? Where’s the control group?
      You: Ok, I’ll go find one—and then you’ll see!
              Eg: Boonperm/Haughten, “Thailand Village Fund” (2009)

         Control 1
X2:                                        X2:          Control 1
Age                                        Age
              Treated
                                                   Control 2            Treated
      Control 2                                                                   Control 3
                     Control 3



                         X1: Education             Rescale X1: Education multiplied by 2


                                            19
Matching: Points to consider


• Matching is (unfortunately) as much art as science, and there
  are more methodological varieties of matching than there are
  flavors of ice cream

• Widespread agreement that matching is, at a minimum, a
  useful pre-processing step to reduce model dependence.
  Unfortunately, no consensus on balance tests/diagnostics.

• Hugely important benefit of matching is that it is performed
  “blind to the answer”—comparing favorably with regression

• Matching helps with selection bias due to observed variables
  (confounders)—it does not help with unobserved confounders.
  For the latter, one can (and should) do sensitivity analysis.

                               20
Methodology #2: Differences-in-Differences (D-i-D)


You:      My clients enjoy big impacts from our bank’s financing
Critic:   Compared to whom? Where’s the control group?
You:      Ok, I’ll go find one—and then you’ll see!
Critic:   Too many unobservables. It’s a waste of time.
You:      Well, can you assume my control group’s growth rate
          (e.g., near zero), is a good proxy for the treatment
          group’s counterfactual growth rate (without the loan?)

  D-i-D: subtract one before/after difference from the other

  Addresses observed confounders (regression assumptions) &
  unobserved time-invariant confounders common to treatment
  and control groups. See Kondo’s work in the Philippines (ADB).

                                  21
Diffs-in-Diffs: Points to consider

                                              Treated
Income



                                                             Estimated ATET
                      Treated                       Counterfactual


         Pre-treatment                                  Control
             difference
                             Control



                          Before                   After
NOTE: Circles are observed, square (counterfactual) is unobserved (imputed).




                                       22
Diffs-in-Diffs: Points to consider



• If matching is implausible, why would D-i-D be plausible?
  Does the parallel trend assumption seem easier to believe?

• The parallel trend assumption must hold over the time
  period, implying composition of two groups should remain
  constant over time.

• D-i-D benefits from “placebo tests” run pre-treatment




                              23
Methodology #3: Encouragement Design


You:    Well, can you assume my control group’s growth rate
        (e.g., near zero), is a good proxy for the treatment
        group’s counterfactual growth rate (without the loan?)
Critic: No, also not credible.
You: OK, how about a natural experiment?
        Our FI established additional info kiosks in 100 villages
        to encourage loan take-up—these villages were not
        chosen at random, but it was ―practically‖ random.

  The encouragement (“instrument”, assumed “as good as
  random”) has an effect (for some) on probability of finance.
  This method leverages this “exogenous” variation to overcome
  potential bias from both observed and unobserved confounders.

                                 24
Encouragement Design: Points to consider

• Encouragement design requires strong assumptions:
    o Encouragement must really be random or almost random, and must
      have no direct effect on impacts (only an indirect effect via treatment)

    o The encouragement must NEVER discourage take-up (no defiers)

    o Causal estimates restricted to “compliers” only… (Who?)

    o Also, for credible results, encouragement had better be effective

• Strange quirk: different answers, from different models, can
  all be “correct” because complier populations may differ

• Was popular, now more disparaged in observational work

• Again, sensitivity tests are available and should be run

                                     25
Methodology #4: Regression Discontinuity Design


You:    OK, how about a natural experiment?
        Our FI established additional info kiosks in 100 villages
        to encourage loan take-up—these villages were not
        chosen at random, but it was ―practically‖ random.
Critic: I don’t buy it. Rollout was in fact strategic, not random.
You: Ok, I’ll try again. This bank always provides extra lines
        of credit at great terms to customers with credit scores
        above a certain threshold. Let’s compare results for
        customers just above and below the threshold.

  Treatment assumed as good as random at the threshold if the
  discontinuity is sharp. RDD addresses observed and unobserved
  confounders. What question will the RDD design above answer?

                                26
Regression Discontinuity Design: Points to consider


 • Generally considered a very strong design: US Dept of
   Education classifies it in the same category as RCT

 • Only informative for those at the discontinuity threshold

 • No “gaming” the threshold allowed (ideally, the threshold is
   unknown to the subjects, or outside subjects’ control)

 • Relatively low statistical power, requiring much larger
   sample sizes than RCTs or other observational methods.

 • Watch out for contamination by other treatments at the same
   discontinuity

 • Sensitivity tests available to probe plausibility of assumptions

                                 27
Methodology #5: Synthetic control method


Critic: I don’t buy it. It must’ve been strategic, not random.
You: Ok, I’ll try again. This bank always offers extra lines of
        credit at great terms to customers with credit scores
        above a certain threshold. Let’s compare results for
        customers just above and below the threshold.
Critic: I’m not interested in only a narrow set of borrowers.
You: Last try. How about we do an in-depth case-study of a
        greenfield microfinance institution, asking about the
        social welfare impact on the neighboring community?

  Synthetic controls allows inference for a single treated unit.

  This approach addresses observed and unobserved confounders.

                                  28
Methodology #5: Synthetic control method

Estimating Average Impact on Household Consumption in a Single Village
        1000000
              800000
              600000
              400000
              200000




                       1995                 2000                 2005                        2010
                                                         year

                              Treated District (Kabil)          Synthetic Control District




                                                     29
Synthetic controls: Points to consider


• Only method allowing for rigorous quantitative causal
  inference for a single treated unit

• Enormous growth in popularity in last 5 years

• Particularly well-suited to case-studies exploring program
  impacts at village/city/state/country level

• Requires time-series data and many control units

• Placebo tests are available to assess plausibility of critical
  assumptions




                                 30
Elaborate theories, multiple tests
When asked what can be done in observational
studies to clarify the step from association to
causation, Fisher replied: ―Make your theories
elaborate.‖ (Cochrane)

This is sage advice, but often misunderstood.
Fisher didn’t mean you should make your
theories and explanations complicated.

He meant, when constructing causal hypothesis,
envisage as many different consequences of its
truth as possible, and plan observational studies
to discover whether each holds.
      • Creating/testing elaborate theories is particularly helpful for
        indirectly testing for hidden biases (unconfoundedness).

                                    31
Final thoughts




      32
Final thoughts


• Ex-ante, be clear as to standard of evidence (going to depend upon the
  purpose of your inquiry, and who your audience is)




                                  33
Final thoughts


• Ex-ante, be clear as to standard of evidence (going to depend upon the
  purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.




                                  34
Final thoughts


• Ex-ante, be clear as to standard of evidence (going to depend upon the
  purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics
  while remaining blind to the answer.




                                   35
Final thoughts


• Ex-ante, be clear as to standard of evidence (going to depend upon the
  purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics
  while remaining blind to the answer.

• Run diagnostics/sensitivity tests for unobserved (hidden) bias




                                   36
Final thoughts


• Ex-ante, be clear as to standard of evidence (going to depend upon the
  purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics
  while remaining blind to the answer.

• Run diagnostics/sensitivity tests for unobserved (hidden) bias

• Devise/test multiple“elaborate theories”. Invest in learning about the
  substantive problem to be solved, and be skeptical of your own results.




                                   37

Más contenido relacionado

Similar a Alexis Diamond - quasi experiments

RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptxFrancois MAIGNEN
 
Experimental Evaluation Methods
Experimental Evaluation MethodsExperimental Evaluation Methods
Experimental Evaluation Methodsclearsateam
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchAlan Fricker
 
eHealth 2010, Barcelona
eHealth 2010, BarcelonaeHealth 2010, Barcelona
eHealth 2010, BarcelonaIlkka Korhonen
 
Personal Health Technologies for Management of Mental Health – Prevention, Ea...
Personal Health Technologies for Management of Mental Health – Prevention, Ea...Personal Health Technologies for Management of Mental Health – Prevention, Ea...
Personal Health Technologies for Management of Mental Health – Prevention, Ea...Plan de Calidad para el SNS
 
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologists
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologistsOutdoor therapy: Maverick or mainstream? A survey of clinical psychologists
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologistsUniversity of Leicester
 
Surgical_audit_&_research_mm (1).ppt
Surgical_audit_&_research_mm (1).pptSurgical_audit_&_research_mm (1).ppt
Surgical_audit_&_research_mm (1).pptSofiaJohn5
 
Evidence Aid: who and why
Evidence Aid: who and whyEvidence Aid: who and why
Evidence Aid: who and whyALNAP
 
Systematic reviews and trials (Claire Allen, Evidence Aid)
Systematic reviews and trials (Claire Allen, Evidence Aid)Systematic reviews and trials (Claire Allen, Evidence Aid)
Systematic reviews and trials (Claire Allen, Evidence Aid)ALNAP
 
Improving clinical services: no magic bullet... some things work better than ...
Improving clinical services: no magic bullet... some things work better than ...Improving clinical services: no magic bullet... some things work better than ...
Improving clinical services: no magic bullet... some things work better than ...NIHR CLAHRC West Midlands
 
3. How to Randomize
3. How to Randomize3. How to Randomize
3. How to Randomizevinhthedang
 
Research Design and Validity
Research Design and ValidityResearch Design and Validity
Research Design and ValidityHora Tjitra
 
ADAD forum webinar May 2012
ADAD forum webinar May 2012ADAD forum webinar May 2012
ADAD forum webinar May 2012Marty Reiswig
 
What are Patient Preferences, How Do You Measure Patient Preferences, and How...
What are Patient Preferences, How Do You Measure Patient Preferences, and How...What are Patient Preferences, How Do You Measure Patient Preferences, and How...
What are Patient Preferences, How Do You Measure Patient Preferences, and How...OARSI
 
Random control trial RCT community medicine .pptx
Random control trial RCT community medicine  .pptxRandom control trial RCT community medicine  .pptx
Random control trial RCT community medicine .pptxAkshayRaj781072
 
The art of the possible will
The art of the possible   willThe art of the possible   will
The art of the possible willhowardcooper
 
RSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionRSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionWesam Abuznadah
 
Outcomes: ASH 2010 (Multiple Myeloma)
Outcomes: ASH 2010 (Multiple Myeloma)Outcomes: ASH 2010 (Multiple Myeloma)
Outcomes: ASH 2010 (Multiple Myeloma)Curatio CME Institute
 
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Framework
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring FrameworkJPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Framework
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Frameworkjpndresearch
 

Similar a Alexis Diamond - quasi experiments (20)

RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptx
 
Experimental Evaluation Methods
Experimental Evaluation MethodsExperimental Evaluation Methods
Experimental Evaluation Methods
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative research
 
How did Goldilocks find out about the bears? Effectiveness of risk assessment...
How did Goldilocks find out about the bears? Effectiveness of risk assessment...How did Goldilocks find out about the bears? Effectiveness of risk assessment...
How did Goldilocks find out about the bears? Effectiveness of risk assessment...
 
eHealth 2010, Barcelona
eHealth 2010, BarcelonaeHealth 2010, Barcelona
eHealth 2010, Barcelona
 
Personal Health Technologies for Management of Mental Health – Prevention, Ea...
Personal Health Technologies for Management of Mental Health – Prevention, Ea...Personal Health Technologies for Management of Mental Health – Prevention, Ea...
Personal Health Technologies for Management of Mental Health – Prevention, Ea...
 
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologists
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologistsOutdoor therapy: Maverick or mainstream? A survey of clinical psychologists
Outdoor therapy: Maverick or mainstream? A survey of clinical psychologists
 
Surgical_audit_&_research_mm (1).ppt
Surgical_audit_&_research_mm (1).pptSurgical_audit_&_research_mm (1).ppt
Surgical_audit_&_research_mm (1).ppt
 
Evidence Aid: who and why
Evidence Aid: who and whyEvidence Aid: who and why
Evidence Aid: who and why
 
Systematic reviews and trials (Claire Allen, Evidence Aid)
Systematic reviews and trials (Claire Allen, Evidence Aid)Systematic reviews and trials (Claire Allen, Evidence Aid)
Systematic reviews and trials (Claire Allen, Evidence Aid)
 
Improving clinical services: no magic bullet... some things work better than ...
Improving clinical services: no magic bullet... some things work better than ...Improving clinical services: no magic bullet... some things work better than ...
Improving clinical services: no magic bullet... some things work better than ...
 
3. How to Randomize
3. How to Randomize3. How to Randomize
3. How to Randomize
 
Research Design and Validity
Research Design and ValidityResearch Design and Validity
Research Design and Validity
 
ADAD forum webinar May 2012
ADAD forum webinar May 2012ADAD forum webinar May 2012
ADAD forum webinar May 2012
 
What are Patient Preferences, How Do You Measure Patient Preferences, and How...
What are Patient Preferences, How Do You Measure Patient Preferences, and How...What are Patient Preferences, How Do You Measure Patient Preferences, and How...
What are Patient Preferences, How Do You Measure Patient Preferences, and How...
 
Random control trial RCT community medicine .pptx
Random control trial RCT community medicine  .pptxRandom control trial RCT community medicine  .pptx
Random control trial RCT community medicine .pptx
 
The art of the possible will
The art of the possible   willThe art of the possible   will
The art of the possible will
 
RSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionRSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and Question
 
Outcomes: ASH 2010 (Multiple Myeloma)
Outcomes: ASH 2010 (Multiple Myeloma)Outcomes: ASH 2010 (Multiple Myeloma)
Outcomes: ASH 2010 (Multiple Myeloma)
 
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Framework
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring FrameworkJPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Framework
JPI Conference Dublin - Edvard Beem - Evaluation and Monitoring Framework
 

Más de Microfinance Gateway

Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...
Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...
Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...Microfinance Gateway
 
Microfinance Ratings: What use are they to investors?
Microfinance Ratings: What use are they to investors? Microfinance Ratings: What use are they to investors?
Microfinance Ratings: What use are they to investors? Microfinance Gateway
 
Jake Kendall - hot issues session mobile banking
Jake Kendall -  hot issues session mobile banking Jake Kendall -  hot issues session mobile banking
Jake Kendall - hot issues session mobile banking Microfinance Gateway
 
Stefan Dercon - acting on evidence
Stefan Dercon  - acting on evidenceStefan Dercon  - acting on evidence
Stefan Dercon - acting on evidenceMicrofinance Gateway
 
Blaine Stephens and Mayada El-Zoghbi - Role of Monitoring
Blaine Stephens and Mayada El-Zoghbi - Role of MonitoringBlaine Stephens and Mayada El-Zoghbi - Role of Monitoring
Blaine Stephens and Mayada El-Zoghbi - Role of MonitoringMicrofinance Gateway
 
Syed Hashemi - qualitative research
Syed Hashemi  - qualitative researchSyed Hashemi  - qualitative research
Syed Hashemi - qualitative researchMicrofinance Gateway
 
Philip Davies - introduction to impact evaluation
Philip Davies - introduction to impact evaluationPhilip Davies - introduction to impact evaluation
Philip Davies - introduction to impact evaluationMicrofinance Gateway
 
Mark Napier - facilitating financial market development
Mark Napier - facilitating financial market developmentMark Napier - facilitating financial market development
Mark Napier - facilitating financial market developmentMicrofinance Gateway
 
Jim Tanburn - how do you measure market development experiences with logic
Jim Tanburn - how do you measure market development experiences with logicJim Tanburn - how do you measure market development experiences with logic
Jim Tanburn - how do you measure market development experiences with logicMicrofinance Gateway
 
Alan Gibson - market development, the 'why' and 'what'
Alan Gibson - market development, the 'why' and 'what'Alan Gibson - market development, the 'why' and 'what'
Alan Gibson - market development, the 'why' and 'what'Microfinance Gateway
 
Sukhwinder Arora - measuring progress in market development
Sukhwinder Arora - measuring progress in market developmentSukhwinder Arora - measuring progress in market development
Sukhwinder Arora - measuring progress in market developmentMicrofinance Gateway
 

Más de Microfinance Gateway (11)

Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...
Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...
Kenya Financial Diaries Project: Making Ends Meet in the Land of M-PESA & Bra...
 
Microfinance Ratings: What use are they to investors?
Microfinance Ratings: What use are they to investors? Microfinance Ratings: What use are they to investors?
Microfinance Ratings: What use are they to investors?
 
Jake Kendall - hot issues session mobile banking
Jake Kendall -  hot issues session mobile banking Jake Kendall -  hot issues session mobile banking
Jake Kendall - hot issues session mobile banking
 
Stefan Dercon - acting on evidence
Stefan Dercon  - acting on evidenceStefan Dercon  - acting on evidence
Stefan Dercon - acting on evidence
 
Blaine Stephens and Mayada El-Zoghbi - Role of Monitoring
Blaine Stephens and Mayada El-Zoghbi - Role of MonitoringBlaine Stephens and Mayada El-Zoghbi - Role of Monitoring
Blaine Stephens and Mayada El-Zoghbi - Role of Monitoring
 
Syed Hashemi - qualitative research
Syed Hashemi  - qualitative researchSyed Hashemi  - qualitative research
Syed Hashemi - qualitative research
 
Philip Davies - introduction to impact evaluation
Philip Davies - introduction to impact evaluationPhilip Davies - introduction to impact evaluation
Philip Davies - introduction to impact evaluation
 
Mark Napier - facilitating financial market development
Mark Napier - facilitating financial market developmentMark Napier - facilitating financial market development
Mark Napier - facilitating financial market development
 
Jim Tanburn - how do you measure market development experiences with logic
Jim Tanburn - how do you measure market development experiences with logicJim Tanburn - how do you measure market development experiences with logic
Jim Tanburn - how do you measure market development experiences with logic
 
Alan Gibson - market development, the 'why' and 'what'
Alan Gibson - market development, the 'why' and 'what'Alan Gibson - market development, the 'why' and 'what'
Alan Gibson - market development, the 'why' and 'what'
 
Sukhwinder Arora - measuring progress in market development
Sukhwinder Arora - measuring progress in market developmentSukhwinder Arora - measuring progress in market development
Sukhwinder Arora - measuring progress in market development
 

Alexis Diamond - quasi experiments

  • 1. An Introduction to Impact Evaluation in Observational (Non-Experimental) Settings Alexis Diamond Development Impact Department
  • 2. Goals for this Presentation • To explain key differences between randomized experiments (RCTs) and observational studies • To briefly sketch some of the most important methods of causal inference in observational studies, showing how they might be applied to answer questions in access to finance projects, and offering practical guidance on:  Matching (an estimator, or a tool for designing observational studies?)  Differences-in-differences  Encouragement design (Instrumental variable, or “IV” regression)  Regression discontinuity design  Synthetic control methods 2
  • 3. Basic concepts • Observational study: comparison of treated and control groups in which the objective is to estimate cause and effect relationships, without the benefit of random assignment. Observational studies are also known as quasi-experiments or natural experiments • In a randomized experiment, random chance forms comparison groups (treatment and control), making groups comparable in terms of both measureable characteristics and characteristics that cannot be measured. • Generally, If assumptions are met, causal conclusions follow— but generally only in randomized experiments do we KNOW assumptions are met; otherwise, assumptions aren’t testable. 3
  • 4. Experiments (RCTs) vs. observational studies ―That’s not an experiment you have there, that’s an experience.‖ — Sir R. A. Fisher (England, 1890-1962) Because of selection bias—the presence of confounders Reflected in the controversy over Pitt/Khandker and Roodman/Morduch 4
  • 5. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impact indicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control 5
  • 6. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impact indicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day 6
  • 7. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impact indicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day Average for the control group: $4/day 7
  • 8. Selection bias: ―perfect implementation‖ A microfinance project is reporting the ex-post impact indicator $/day for participants and non-participants… i Yi(observed) Treatment Status 1 5 Treatment 2 6 Treatment 3 4 Treatment 4 4 Control 5 2 Control 6 6 Control Average for the treatment group: $5/day Average for the control group: $4/day Difference = +$1/day: 8
  • 9. Selection bias: ―perfect implementation‖ How should one think about that result, +$1/day? Does it mean the project has positive impact? 9
  • 10. Selection bias: ―perfect implementation‖ How should one think about that result, +$1/day? Does it mean the project has positive impact? Impact on who? i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0) 1 5 5 ? Treatment ? 2 6 6 ? Treatment ? 3 4 4 ? Treatment ? 4 4 ? 4 Control ? 5 2 ? 2 Control ? 6 6 ? 6 Control ? 10
  • 11. Selection bias: ―perfect implementation‖ Avg Treatment Effect for Treated (ATT) = 3 Avg Treatment Effect for Control (ATC) = -1 Avg Treatment Effect (ATE) = +3 – (-1) = 4 i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0) 1 5 5 2 Treatment +3 2 6 6 3 Treatment +3 3 4 4 1 Treatment +3 4 4 3 4 Control -1 5 2 1 2 Control -1 6 6 5 6 Control -1 That simple $1/day difference we identified earlier = ATT + BIAS 11
  • 12. Selection bias: Ignore it at your peril Identifying impacts requires identifying Y(1) and Y(0) for the same units BIAS can be positive/negative, big/small, observed/hidden… i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0) 1 5 5 ? Treatment ? 2 6 6 ? Treatment ? 3 4 4 ? Treatment ? 4 4 ? 4 Control ? 5 2 ? 2 Control ? 6 6 ? 6 Control ? 12
  • 13. Observational studies: Are they credible? Yes, but… A judgment-free method for dealing with problems of sample selection bias is the Holy Grail of the evaluation literature, but this search reflects more the aspirations of researchers than any plausible reality… —Rajeev Dehejia, ―Practical Propensity Score Matching‖ • Some have tried to set up tests for observational methods: e.g., “Can method X (matching, regression, IV, etc.) recover the true (experimental) benchmark?” • Such efforts have generally failed to conclusively validate observational studies. 13
  • 14. Observational studies: Are they credible? Yes, but… History abounds with examples where causality has ultimately found general acceptance without any experimental evidence… The evidence of a causal effect of smoking on lung cancer is now generally accepted, without any direct experimental evidence to support it… At the same time, the long road toward general acceptance of the causal interpretation …shows the difficulties in gaining acceptance for causal claims without randomization. —Guido Imbens, ―Better LATE than Nothing‖ 14
  • 15. Why bother with observational studies? • Studies that start as perfect RCTs often end as broken RCTs, not “gold-standard” RCTs. These broken RCTs may be better than many observational studies, but there is no bright line distinguishing broken RCTs from observational studies. • Standard RCTs cannot address many important policy issues (i.e., macroeconomic questions, or cases with general equilibrium effects more broadly) • Other issues are difficult to address with RCTs, setting up a trade-off between rigor and relevance. What’s better—the RCT in a lab setting, or the equivalent observational study? • RCTs are often more expensive, time-consuming, and fragile than alternatives—can be high risk and not always strategic. 15
  • 16. More advantages of observational studies • Sometimes you can use pre-existing data, which has time and cost advantages (though there are clear trade-offs) o Typical out-of-pocket time/cost of a World Bank RCT: > 1 year & $500K o Occasionally they can be done cheaply and easily, especially in a place like India (there are examples where it costs < $50,000) o With administrative data, observational studies may have no (or trivial) out-of-pocket costs, and be completed in days or weeks. • Sometimes you want to apply observational methods to experimental data • Good for hypothesis-generation • Avoids RCT’s ethical considerations 16
  • 17. Methodology #1: Matching You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! 17
  • 18. Methodology #1: Matching You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! Eg: Boonperm/Haughten, “Thailand Village Fund” (2009) Control 1 X2: Age Treated Control 2 Control 3 X1: Education 18
  • 19. Methodology #1: Matching You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! Eg: Boonperm/Haughten, “Thailand Village Fund” (2009) Control 1 X2: X2: Control 1 Age Age Treated Control 2 Treated Control 2 Control 3 Control 3 X1: Education Rescale X1: Education multiplied by 2 19
  • 20. Matching: Points to consider • Matching is (unfortunately) as much art as science, and there are more methodological varieties of matching than there are flavors of ice cream • Widespread agreement that matching is, at a minimum, a useful pre-processing step to reduce model dependence. Unfortunately, no consensus on balance tests/diagnostics. • Hugely important benefit of matching is that it is performed “blind to the answer”—comparing favorably with regression • Matching helps with selection bias due to observed variables (confounders)—it does not help with unobserved confounders. For the latter, one can (and should) do sensitivity analysis. 20
  • 21. Methodology #2: Differences-in-Differences (D-i-D) You: My clients enjoy big impacts from our bank’s financing Critic: Compared to whom? Where’s the control group? You: Ok, I’ll go find one—and then you’ll see! Critic: Too many unobservables. It’s a waste of time. You: Well, can you assume my control group’s growth rate (e.g., near zero), is a good proxy for the treatment group’s counterfactual growth rate (without the loan?) D-i-D: subtract one before/after difference from the other Addresses observed confounders (regression assumptions) & unobserved time-invariant confounders common to treatment and control groups. See Kondo’s work in the Philippines (ADB). 21
  • 22. Diffs-in-Diffs: Points to consider Treated Income Estimated ATET Treated Counterfactual Pre-treatment Control difference Control Before After NOTE: Circles are observed, square (counterfactual) is unobserved (imputed). 22
  • 23. Diffs-in-Diffs: Points to consider • If matching is implausible, why would D-i-D be plausible? Does the parallel trend assumption seem easier to believe? • The parallel trend assumption must hold over the time period, implying composition of two groups should remain constant over time. • D-i-D benefits from “placebo tests” run pre-treatment 23
  • 24. Methodology #3: Encouragement Design You: Well, can you assume my control group’s growth rate (e.g., near zero), is a good proxy for the treatment group’s counterfactual growth rate (without the loan?) Critic: No, also not credible. You: OK, how about a natural experiment? Our FI established additional info kiosks in 100 villages to encourage loan take-up—these villages were not chosen at random, but it was ―practically‖ random. The encouragement (“instrument”, assumed “as good as random”) has an effect (for some) on probability of finance. This method leverages this “exogenous” variation to overcome potential bias from both observed and unobserved confounders. 24
  • 25. Encouragement Design: Points to consider • Encouragement design requires strong assumptions: o Encouragement must really be random or almost random, and must have no direct effect on impacts (only an indirect effect via treatment) o The encouragement must NEVER discourage take-up (no defiers) o Causal estimates restricted to “compliers” only… (Who?) o Also, for credible results, encouragement had better be effective • Strange quirk: different answers, from different models, can all be “correct” because complier populations may differ • Was popular, now more disparaged in observational work • Again, sensitivity tests are available and should be run 25
  • 26. Methodology #4: Regression Discontinuity Design You: OK, how about a natural experiment? Our FI established additional info kiosks in 100 villages to encourage loan take-up—these villages were not chosen at random, but it was ―practically‖ random. Critic: I don’t buy it. Rollout was in fact strategic, not random. You: Ok, I’ll try again. This bank always provides extra lines of credit at great terms to customers with credit scores above a certain threshold. Let’s compare results for customers just above and below the threshold. Treatment assumed as good as random at the threshold if the discontinuity is sharp. RDD addresses observed and unobserved confounders. What question will the RDD design above answer? 26
  • 27. Regression Discontinuity Design: Points to consider • Generally considered a very strong design: US Dept of Education classifies it in the same category as RCT • Only informative for those at the discontinuity threshold • No “gaming” the threshold allowed (ideally, the threshold is unknown to the subjects, or outside subjects’ control) • Relatively low statistical power, requiring much larger sample sizes than RCTs or other observational methods. • Watch out for contamination by other treatments at the same discontinuity • Sensitivity tests available to probe plausibility of assumptions 27
  • 28. Methodology #5: Synthetic control method Critic: I don’t buy it. It must’ve been strategic, not random. You: Ok, I’ll try again. This bank always offers extra lines of credit at great terms to customers with credit scores above a certain threshold. Let’s compare results for customers just above and below the threshold. Critic: I’m not interested in only a narrow set of borrowers. You: Last try. How about we do an in-depth case-study of a greenfield microfinance institution, asking about the social welfare impact on the neighboring community? Synthetic controls allows inference for a single treated unit. This approach addresses observed and unobserved confounders. 28
  • 29. Methodology #5: Synthetic control method Estimating Average Impact on Household Consumption in a Single Village 1000000 800000 600000 400000 200000 1995 2000 2005 2010 year Treated District (Kabil) Synthetic Control District 29
  • 30. Synthetic controls: Points to consider • Only method allowing for rigorous quantitative causal inference for a single treated unit • Enormous growth in popularity in last 5 years • Particularly well-suited to case-studies exploring program impacts at village/city/state/country level • Requires time-series data and many control units • Placebo tests are available to assess plausibility of critical assumptions 30
  • 31. Elaborate theories, multiple tests When asked what can be done in observational studies to clarify the step from association to causation, Fisher replied: ―Make your theories elaborate.‖ (Cochrane) This is sage advice, but often misunderstood. Fisher didn’t mean you should make your theories and explanations complicated. He meant, when constructing causal hypothesis, envisage as many different consequences of its truth as possible, and plan observational studies to discover whether each holds. • Creating/testing elaborate theories is particularly helpful for indirectly testing for hidden biases (unconfoundedness). 31
  • 33. Final thoughts • Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) 33
  • 34. Final thoughts • Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) • Also ex-ante, be clear re treatment, covariates, units, and assumptions. 34
  • 35. Final thoughts • Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) • Also ex-ante, be clear re treatment, covariates, units, and assumptions. • Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer. 35
  • 36. Final thoughts • Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) • Also ex-ante, be clear re treatment, covariates, units, and assumptions. • Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer. • Run diagnostics/sensitivity tests for unobserved (hidden) bias 36
  • 37. Final thoughts • Ex-ante, be clear as to standard of evidence (going to depend upon the purpose of your inquiry, and who your audience is) • Also ex-ante, be clear re treatment, covariates, units, and assumptions. • Try to adjust for (eliminate) differences in observed characteristics while remaining blind to the answer. • Run diagnostics/sensitivity tests for unobserved (hidden) bias • Devise/test multiple“elaborate theories”. Invest in learning about the substantive problem to be solved, and be skeptical of your own results. 37