Talk given at the Machine Learning and Data Analytics Symposium (MLDAS 2019). https://qcai.qcri.org/index.php/events/mldas-2019/.
Contact me if you're interested in the topic of poverty mapping or data for development in general.
Using advertising data to model migration, poverty and digital gender gaps
1. Using Advertising Data to Model Migration,
Poverty and Digital Gender Gaps
Ingmar Weber
April 1, 2019
MLDAS
@ingmarweber
2. Great Collaborators
• Mapping poverty in the Philippines
– with UNICEF and Thinking Machines
• Tracking digital gender gaps
– with Data2X and University of Oxford
• Monitoring the Venezuelan exodus
– with UNHCR, UNICEF and iMMAP
Joao Palotti
Masoomali Fatehkia
8. Why Map Poverty?
• Monitor sustainable development
• Plan better poverty reduction interventions
• Impact assessment of interventions
– Low latency a huge plus
9. Obtaining Training Data
• 2017 household survey implemented by the
Philippine Statistics Authority (PSA)
• Representative sample of ~40 households in
n=1214 “clusters”
• Asset ownership based wealth index (y=WI)
=> standard regression task
10. Sources of Ground Truth Noise
• Sampling noise
– Wealth index depends on particular households
– Expected R^2 = .95 (bootstrap estimate)
• Spatial perturbation
– True location is (x,y), but reported at (x’,y’)
– Protects privacy
– Expected R^2 = .89 (simulations)
• Combined
– Expected R^2 = .84
– “Expected upper bound”
11. Features to Map Poverty
24 variables on connection type,
device manufacturer, device type
12. Modeling the Wealth Index
● Model selection using LASSO:
Wealth Index / 1000 = - 96
+ 115 * (frac.FB users with 4G)
+ 216 * (frac. FB users with WiFi)
+ 48 * (frac. FB users with iOS)
- 89 * (frac. FB users with Cherry Mobile)
+ 11 * (frac. FB users with high end phones)
+ 30 * (FB penetration)
+ 3 * (log population density)
Tried regression trees, didn’t help
13. Modeling the Wealth Index
2017
2019
R^2 = 0.58
(10-fold CV)
Offl. baseline
R^2 = .37
Upper bound:
R^2 = .84
Due to DHS noise
15. Summary
- Challenging in low population areas (k-anonymity)
- Can catch temporal changes? Unclear.
+ Potentially more “causal” than satellite features
+ Supports demographic dis-aggregation
+ Does not break down at lowest decile
+ Promising to combine with other data sources
• Interested? Launching poverty mapping initiative
32. Advertising Audience Estimates
+ Global reach with over 2 billion users
+ FB, LinkedIn, Google, Snapchat, IG, ...
+ Real-time estimates
+ Uses anonymous and aggregate data
+ Gender, age, location, country of origin, ….
33. Advertising Audience Estimates
- Black box on how attributes are inferred
- Needs modeling for bias correction
- Usage patterns change over time
- Only includes people who are online
- Could create “use FB!” incentives
- Risk of misuse