How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
sisvsp2012_sessione9_giusti_marchetti_pratesi_
1. Official data and poverty indicators: small area estimates
in local governance
Monica Pratesi, Stefano Marchetti, Caterina Giusti, Nicola Salvati
Department of Statistics and Mathematics Applied to Economics, University of Pisa
SISVSP 2012
Rome, 19-20 April 2012
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 1 / 25
2. Structure of the Presentation
1 Motivation
2 Poverty indicators and SAE methods
3 Oversampling and Small Area Estimation: A Comparison
4 Application of small area M-quantile models to poverty mapping in Tuscany
5 Concluding remarks
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 2 / 25
3. Part I
Motivation
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 3 / 25
4. Motivation
Motivation
Problem: to estimate some key statistics for poverty at the small area level to
drive local governance
We focus on small area estimation of Laeken poverty indicators, such as head
count ratio and poverty gap
Proposed methodology
Using M-quantile models to estimate poverty indicators and to provide also an
estimator of the corresponding mean squared errors
Opportunity
Comparing model-based estimates with direct estimates computed with an
EU-SILC oversampling of households
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 4 / 25
5. Motivation
Motivation
Available data to measure poverty and living conditions in Italy come mainly
from sample surveys, such as the Survey on Income and Living Conditions
(EU-SILC)
However, EU-SILC data can be used to produce accurate estimates only at
the NUTS 2 level (that is, regional level)
To satisfy the increasing demand from official and private institutions of
statistical estimates on poverty and living conditions referring to smaller
domains (LAU 1 and LAU 2 levels, that is Provinces and Municipalities),
there is the need to resort to small area methodologies
We focus on the estimation of poverty measures at the small area level. For
this purpose we use data coming from the EU-SILC survey 2008 and from the
Population Census 2001
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 5 / 25
6. Part II
Poverty indicators and SAE methods
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 6 / 25
7. Poverty indicators and SAE methods
Poverty Measures
Denoting by t the poverty line and by y a measure of welfare, the Foster et al.
(1984) poverty measures (FGT) for a small area d can be defined as
−1
Zd (α, t) = Nd zjd (α, t) + zkd (α, t) .
j∈sd k∈rd
where for a generic unit i in area d
t − yid α
zid (α, t) = I(yid t) i = 1, . . . , Nd
t
zjd (α, t) is known for j ∈ sd
zkd (α, t) is unknown for k ∈ rd and should be predicted
Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the
Poverty Gap.
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 7 / 25
8. Poverty indicators and SAE methods
Poverty Measures
The HCR indicator is a widely used measure of poverty because of its ease of
construction and interpretation, since it counts the number of individuals
with income below the poverty line. At the same time this indicator also
assumes that all poor individuals are in the same situation. For example, the
easiest way of reducing the headcount index is by targeting benefits to people
just below the poverty line because they are the ones who are cheapest to
move across the line. Hence, policies based on the headcount index might be
sub-optimal.
For this reason we also obtain estimates of the PG indicator. The PG can be
interpreted as the average shortfall of poor people. It shows how much would
have to be transferred in mean to the poor to bring their expenditure up to
the poverty line.
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 8 / 25
9. Poverty indicators and SAE methods
M-quantile models
With regression models we model the mean of the variable of interest (y )
given the covariates (x)
A more complete picture is offered, however, by modeling not only the mean
of (y ) given (x) but also other quantiles. Examples include the median, the
25th, 75th percentiles. This is known as quantile regression
An M-quantile regression model for quantile q
Qq = xT β ψ (q)
jd
Main features of these models
No hypothesis of normal distribution
Robust methods (influence function of the M-quantile regression)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 9 / 25
10. Poverty indicators and SAE methods
Using M-quantile models to measure area effects
Central Idea: Area effects can be described by estimating an area specific q value
ˆ
(θd ) for each area (group) of a hierarchical dataset (Chambers & Tzavidis 2006)
Estimate the area specific target parameter by fitting an M-quantile model
ˆ
for each area at θd
ˆ jd
ˆ ˆ
yjd = xT β ψ (θd )
Mixed effects model use random effects to capture the dissimilarity between
domains. M-quantile models attempt to capture this dissimilarity via the
ˆ
domain-specific M-quantile coefficients θd
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 10 / 25
11. Poverty indicators and SAE methods
SAE Poverty Measures Estimators
Using a smearing-type predictor that follow the same idea of the Chambers and
Dunstan (1986) distribution function estimator we can predict the zkd (α, t) values
−1 t − ykjd
ˆ α
zkd (α, t) = nd
ˆ I(ˆkjd
y t) k ∈ rd , j ∈ sd
t
j∈sd
ˆ
ykjd = xT β ψ (θd ) + ejd
ˆ kd
ˆ
ejd = yjd − xT β ψ (θd )
jd
Finally, the small area estimator of FGT poverty measures is
ˆ −1
Zd (α, t) = Nd zjd (α, t) + zkd (α, t) .
ˆ
j∈sd k∈rd
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 11 / 25
12. Poverty indicators and SAE methods
A Mean Squared Error Estimator of the Poverty Measures
Estimators
To estimate the mean squared error of the M-quantile poverty estimators we can
use the bootstrap proposed by Tzavidis et al. (2010) and Marchetti et al. (2012).
Let b = (1, . . . , B), where B is the number of bootstrap populations
Let r = (1, . . . , R), where R is the number of bootstrap samples
Let Ω = (yk , xk ), k ∈ (1, . . . , N), be the target population
By ·∗ we denote bootstrap quantities
ˆ
Zd (α, t) denotes the FGT poverty measures estimator of the small area d
Let y be the study variable that is known only for sampled units and let x be
the vector of auxiliary variables that is known for all the population units
Let s = (1, . . . , n) be a within area simple random sample of the finite
population Ω = {1, . . . , N}
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 12 / 25
13. Poverty indicators and SAE methods
A Mean Squared Error Estimator of the Poverty Measures
Estimator
ˆ ˆ
Fit the M-quantile regression model on sample s, yjd = xT β ψ (θd )
ˆ jd
Compute the residuals, yjd − yjd = ejd
ˆ
Generate B bootstrap populations of dimension N, Ω∗b
∗ ˆ ˆ ∗
1 ykd = xT β ψ (θd ) + ekd , k = (1, . . . , N)
kd
∗
2 ekd are obtained by sampling with replacement residuals ejd
3 residuals can be sampled from the empirical distribution function or from a
smoothed distribution function
4 we can consider all the residuals (ej , j = 1, . . . , n), that is the unconditional
approach or only area residuals (ejd , j = 1, . . . , nd ), that is the conditional
approach.
From every bootstrap population draw R samples of size n without
replacement
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 13 / 25
14. Poverty indicators and SAE methods
A Mean Squared Error Estimator of the Poverty Measures
Estimator
Using the B bootstrap populations and from the R samples drawn from every
bootstrap population we can estimate the mean squared error of the FGT
estimator
Bias
ˆ ˆ B R ˆ
E Z (α, t)∗ − Z (α, t)∗ = B −1 b=1 R −1 r =1 Z (α, t)∗br − Z (α, t)∗b
Variance
2
ˆ
Var Z (α, t)∗ − Z (α, t)∗ = B −1
B
R −1
R ˆ ¯
ˆ
Z (α, t)∗br − Z (α, t)∗br
b=1 r =1
where
Z (α, t)∗b is the FGT of the bth bootstrap population
ˆ
Z (α, t)∗br is the FGT estimate for Z (α, t)∗b estimated using the r th sample
drown from the bth bootstrap population
¯
ˆ R ˆ
Z (α, t)∗br = R −1 r =1 Z (α, t)∗br
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 14 / 25
15. Part III
Poverty Mapping in the Province of Pisa: Oversampling
vs. Small Area Estimation
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 15 / 25
16. Oversampling and Small Area Estimation: A Comparison
Oversampling and Small Area Estimation: A Comparison
When direct estimates are unreliable there are two possible solutions:
Increase the sample size in the domains of interest in such a way that direct
estimates became reliable (oversampling solution)
Use small area methods (small area solution)
In order to make a comparison between these alternatives we can take the
opportunity to use data referring to an EU-SILC 2008 oversampling of households
for the Province of Pisa - side result of the SAMPLE project
(www.sample-project.eu).
Sample size for the province of Pisa EU-SILC 2008: 149 households
Sample size for the province of Pisa Oversample: 675 households (that
include the 149 household of the EU-SILC survey)
REMARK: Oversample has been managed by the ISTAT who warrantees the high
quality of the data
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 16 / 25
17. Oversampling and Small Area Estimation: A Comparison
SAE methods for poverty indicators in Tuscany Provinces
Data on the equivalised income in 2007 are available from the EU-SILC
survey 2008 for 1495 households in the 10 Tuscany Provinces
To better compare the living conditions in these areas we estimate the
indicators considering the gender of the head of the household
A set of explanatory variables is available for each unit in the population from
the Population Census 2001
We employ an M-quantile model to estimate Head Count Ratio (HCR) and
Poverty Gap (PG) for the Provinces by gender of the head of the household
(HH), for a total of 20 areas
National poverty line: 9310.74 Euros (equivalised household income)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 17 / 25
18. Oversampling and Small Area Estimation: A Comparison
Model Specifications
The selection of covariates to fit the small area models relies on prior studies
of poverty assessment
The following covariates have been selected:
household size (integer value)
ownership of dwelling (owner/tenant)
age of the head of the household (integer value)
years of education of the head of the household (integer value)
working position of the head of the household (employed / unemployed in the
previous week)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 18 / 25
19. Oversampling and Small Area Estimation: A Comparison
Oversampling and Small Area Estimation: A Comparison
We estimate the Head Count Ratio (HCR) and the Poverty Gap (HCR) in the
Province of Pisa considering the gender of the Head of the Household (HH) using:
Direct estimators based on the EU-SILC survey data
Direct estimators based on the Oversampling data
M-quantile small are estimators based on the EU-SILC survey data
Table: Direct estimates (without and with oversampling) and MQ/CD estimates of the HCR
and PG with corresponding estimated Root Mean Squared Errors (in brackets) and number of
sampled households (h) in the Province of Pisa, by gender of the Head of the Household (HH).
Estimates HH gender h HCR % PG %
Direct estimate Female 44 9.88 (4.28) 4.48 (2.56)
Male 105 6.62 (2.24) 2.25 (0.91)
MQ/CD estimates Female 44 20.72 (3.13) 8.64 (2.00)
Male 105 9.02 (1.63) 2.91 (0.74)
Direct estimates Female 193 23.57 (4.92) 6.64 (2.77)
(with oversampling) Male 482 8.21 (1.61) 2.40 (0.60)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 19 / 25
20. Part IV
Application of small area M-quantile models to poverty
mapping in Tuscany
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 20 / 25
21. Application of small area M-quantile models to poverty mapping in Tuscany
Estimates of the HCR at small area level in Tuscany
MS MS
LU LU
PT PO PT PO
FI FI
AR AR
PI PI
LI LI
SI SI
GR GR
8.48 10.17 16.76 24.04 31.63
Figure: Provinces by gender of the HH: males (left) and females (right)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 21 / 25
22. Application of small area M-quantile models to poverty mapping in Tuscany
Estimates of the PG at small area level in Tuscany
MS MS
LU LU
PT PO PT PO
FI FI
AR AR
PI PI
LI LI
SI SI
GR GR
2.69 3.31 6.37 10.39 15.05
Figure: Provinces by gender of the HH: males (left) and females (right)
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 22 / 25
23. Part V
Concluding remarks
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 23 / 25
24. Concluding remarks
Concluding remarks and ongoing research
Main results
Focus on the poverty indicators small area estimators
Small area methods play a crucial role in providing poverty measures for local
governance
Small area estimates are very close to the oversampling estimate and they are
(almost) costless
Ongoing and future research
Consider non-monetary measures of poverty (Cheli and Lemmi, 1995)
Enhance the fitting of the models, considering non parametric models and
spatial models
Compare with alternative methods
Take into account the survey weights
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 24 / 25
25. Concluding remarks
Essential bibliography
Breckling, J. and Chambers, R. (1988). M -quantiles. Biometrika, 75, 761–771.
Chambers, R. and Dunstan, R. (1986). Estimating distribution function from survey data, Biometrika. 73, 597–604.
Chambers, R. and Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika, 93, 255–268.
Chambers, R., Chandra, H. and Tzavidis, N. (2007). On robust mean squared error estimation for linear predictors for domains. CCSR Working
paper 2007-10, University of Manchester.
Cheli B. and Lemmi, A. (1995). A Totally Fuzzy and Relative Approach to the Multidimensional Analysis of Poverty. Economic Notes, 24,
115-134.
Foster, J., Greer, J. and Thorbecke, E. (1984) A class of decomposable poverty measures. Econometrica, 52, 761-766.
Giusti C., Pratesi M., Salvati N. (2009). Estimation of poverty indicators: a comparison of small area methods at LAU1-2 level in Tuscany,
Abstract Book, NTTS - Conferences on New Techniques and Technologies for Statistics, Brussels, 18-20 Febbraio 2009.
Hall, P. and Maiti, T. (2006). On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B, 68,
2, 221–238.
Marchetti, S., Tzavidis, N. and Pratesi, P. (2012). Non-parametric bootstrap mean squared error estimation for image-quantile estimators of
small area averages, quantiles and poverty indicators. Computational Statistical and Data Analysis, doi:10.1016/j.csda.2012.01.023
Lombardia M.J., Gonzalez-Manteiga W. and Prada-Sanchez J.M. (2003). Bootstrapping the Chambers-Dunstan estimate of finite population
distribution function. Journal of Statistical Planning and Inference, 116, 367-388.
Royall, R. and Cumberland, W.G. (1978). Variance Estimation in Finite Population Sampling. Journal of the American Statistical Association, 73,
351-358.
Tzavidis N., Marchetti S. and Chambers R. (2010). Robust estimation of small area means and quantiles. Australian and New Zealand Journal of
Statistics, 52, 2, 167–186.
Tzavidis, N., Salvati, N., Pratesi, M. and Chambers, R. (2007). M-quantile models for poverty mapping. Statistical Methods & Applications, 17,
393-411.
M. Pratesi (DSMAE, Pisa) Official data and poverty indicators 19-20 April 2012 25 / 25