1. Figure 1: License: CC BY-SA 4.0
Econometria aplicada com dados em painel, replicação
de Ani Katchova e Greene
Adriano Marcos Rodrigues Figueiredo1
aEscola de Administração e Negócios-ESAN, Rua Senador Filinto Muller n.1555 - Vila
Ipiranga - Unidade 10A Sala 04, Campo Grande, MS, 79074-460
Abstract
We analyse panel data applied to economic questions, replication of Ani Katchova’s
script and examples of Greene (2012).
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0
International License.
Script adaptado de Panel Data Models in R, Copyright 2013 by
Ani Katchova, Ohio State University, Department of Agricultural,
Environmental, and Development Economics, encontrado em https://
sites.google.com/site/econometricsacademy/econometrics-models/panel-
data-modelse exemplos de Greene (2012). Cornwell and Rupert (1988)
analyzed the returns to schooling in a balanced panel of 595 observa-
tions on heads of households. The sample data are drawn from years
1976–1982 from the “Non-Survey of Economic Opportunity” from
the Panel Study of Income Dynamics. Ver também Panel101R de
Oscar Torres-Reyna.
Introdução
Serão analisados os seguintes estimadores de dados em painel: Pooled OLS,
Between, Fixed effects (within), First differences, e Random effects.
Os primeiros passos são chamar o pacote plm e importar os dados. O script
ilustrativo de Ani Katchova prevê dados de salários na planilha panel_wage.csv.
São dados em painel oriundos do Painel de Estudo da Renda e Dinâmica
(PSID, Panel Study of Income and Dynamics) e deve-se considerar a variação
no tempo (within) e entre indivíduos (between).
Email address: adriano.figueiredo@ufms.br (Adriano Marcos Rodrigues Figueiredo)
Preprint submitted to Journal of Development Economics January 22, 2019
2. O logaritmo do salário (lwage) é a variável dependente e como variáveis
independentes têm-se: educação (ed), experiência (exp) e semanas trabalhadas
(wks). Também será incluída a experiência ao quadrado (exp2) para acomodar
não-linearidades nesta variável.
Para facilitar, a profa. Ani Katchova elaborou o script considerando matrizes
Y e X de modo a acomodar as variáveis dentro do modelo.
library(plm) # pacote básico para dados em painel
## Loading required package: Formula
mydata<- read.csv("panel_wage.csv") # chamando os dados
attach(mydata) # indexando os rótulos das colunas
Y <- cbind(lwage) # matriz de variável dependente
X <- cbind(exp, exp2, wks, ed) # matriz de variáveis independentes
O próximo passo é definir o banco de dados em formato de painel.Para isto,
utiliza-se a função pdata.frame, especificando os indicadores de cross-section “id”
e de tempo “t”. As estatísticas descritivas foram geradas na sequência. A rotina
original continha a função plm.data, a qual está atualmente desencorajada pelo
pacote plm.
# definindo o painel de dados com cross-section "id" e tempo "t"
pdata <- pdata.frame(mydata, index=c("id","t"))
# Descriptive statistics
library(knitr)
kable(summary(Y),caption="Estatísticas descritivas da variável dependente Y")
Table 1: Estatísticas descritivas da variável dependente Y
lwage
Min. :4.605
1st Qu.:6.395
Median :6.685
Mean :6.676
3rd Qu.:6.953
Max. :8.537
kable(summary(X),caption="Estatísticas descritivas das variáveis independentes X")
2
3. Table 2: Estatísticas descritivas das variáveis independentes X
exp exp2 wks ed
Min. : 1.00 Min. : 1.0 Min. : 5.00 Min. : 4.00
1st Qu.:11.00 1st Qu.: 121.0 1st Qu.:46.00 1st Qu.:12.00
Median :18.00 Median : 324.0 Median :48.00 Median :12.00
Mean :19.85 Mean : 514.4 Mean :46.81 Mean :12.85
3rd Qu.:29.00 3rd Qu.: 841.0 3rd Qu.:50.00 3rd Qu.:16.00
Max. :51.00 Max. :2601.0 Max. :52.00 Max. :17.00
É possível ver as dimensões do painel de dados. O painel é balanceado com
n = 595 observações de domicílios, T = 7 períodos de tempo (de 1976 a 1982), e
N = 4165 é um total de 4165 observações (595 x 7).
pdim(pdata)
## Balanced Panel: n = 595, T = 7, N = 4165
É possível ver a variação de Y (lwage) no tempo e entre cross-sections:
coplot(Y ~ id|t, type="l", data=pdata) # Lines
5678
1 70 162 266 370 474 578
1 70 162 266 370 474 578
5678
1 70 162 266 370 474 578
5678
id
Y
1 2 3 4 5 6 7
Given : t
3
4. coplot(Y ~ id|t, type="b", data=pdata) # Points and lines
5678
1 70 162 266 370 474 578
1 70 162 266 370 474 578
5678
1 70 162 266 370 474 578
5678
id
Y
1 2 3 4 5 6 7
Given : t
# Bars at top indicates corresponding graph (i.e. countries)
# from left to right starting on the bottom row (Muenchen/Hilbe:355)
O gráfico de dispersão permite visualizar um pouco mais sobre o painel:
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
plotmeans(Y ~ id, main="Heterogeineity across units", data=pdata)
4
5. 5.06.07.08.0
Heterogeineity across units
id
Y
1 35 75 121 172 223 274 325 376 427 478 529 580
n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7
plotmeans(Y ~ t, main="Heterogeineity across periods", data=pdata)
6.46.56.66.76.86.97.0
Heterogeineity across periods
t
Y
1 2 3 4 5 6 7
n=595 n=595 n=595 n=595 n=595 n=595 n=595
5
6. # plotmeansdraw a 95% confidence interval around the means
A partir desta organização da base de dados em painel, procedem-se as
estimativas.
Pooled OLS (MQO nos Dados Empilhados)
O primeiro estimador é do modelo “empilhado”, ou “Pooled estimator”, obtido
com Mínimos Quadrados Ordinários (MQO) na base de dados sem preocupar
com a indicação entre cross-section e tempos. Este método não considera a
heterogeneidade entre grupos ou tempo.
pooling <- plm(Y ~ X, data=pdata, model= "pooling")
suppressMessages(library(stargazer))
stargazer(pooling,
title="Título: Resultado da Regressão Pooled OLS",
align=TRUE,
type = "text", style = "all",
keep.stat=c("aic","bic","rsq", "adj.rsq","n")
)
##
## Título: Resultado da Regressão Pooled OLS
## ========================================
## Dependent variable:
## ---------------------------
## Y
## ----------------------------------------
## Xexp 0.045***
## (0.002)
## t = 18.670
## p = 0.000
## Xexp2 -0.001***
## (0.0001)
## t = -13.555
## p = 0.000
## Xwks 0.006***
## (0.001)
## t = 4.927
## p = 0.00000
## Xed 0.076***
## (0.002)
## t = 34.151
## p = 0.000
## Constant 4.908***
6
7. ## (0.067)
## t = 72.894
## p = 0.000
## ----------------------------------------
## Observations 4,165
## R2 0.284
## Adjusted R2 0.283
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Pooled OLS (Greene, 2012, p.351-352)
No exemplo 11.1 de Greene (2012, p.351), a estimação do modelo pooled é
conduzida da seguinte forma, agora utilizando a base de dados do pacote plm,
em que:
• exp years of full-time work experience.
• wks weeks worked.
• bluecol blue collar?
• ind works in a manufacturing industry?
• south resides in the south?
• smsa resides in a standard metropolitan statistical area?
• married married?
• sex a factor with levels “male” and “female” .
• union individual’s wage set by a union contract?
• ed years of education.
• black is the individual black?
• lwage logarithm of wage.
library(plm)
options("scipen"=100, "digits"=4)
# data set 'Wages' está organizado como dados empilhados de
# time series/painel balanceado
data("Wages", package = "plm")
Wag <- pdata.frame(Wages, index=595) # inseri o formato de painel
7
8. pooled.tab11_1 <- plm(lwage ~ exp + I(exp ^ 2) + wks + bluecol +
ind + south + smsa + married + union + ed +
sex + black,
data = Wag, index = 595,
model = "pooling")
summary(pooled.tab11_1)
## Pooling Model
##
## Call:
## plm(formula = lwage ~ exp + I(exp^2) + wks + bluecol + ind +
## south + smsa + married + union + ed + sex + black, data = Wag,
## model = "pooling", index = 595)
##
## Balanced Panel: n = 595, T = 7, N = 4165
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 5.2511236 0.0712868 73.66 < 0.0000000000000002 ***
## exp 0.0401047 0.0021592 18.57 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000474 -14.19 < 0.0000000000000002 ***
## wks 0.0042161 0.0010814 3.90 0.000098175299491 ***
## bluecolyes -0.1400093 0.0146567 -9.55 < 0.0000000000000002 ***
## ind 0.0467886 0.0117935 3.97 0.000073910494992 ***
## southyes -0.0556374 0.0125271 -4.44 0.000009171785569 ***
## smsayes 0.1516671 0.0120687 12.57 < 0.0000000000000002 ***
## marriedyes 0.0484485 0.0205687 2.36 0.019 *
## unionyes 0.0926267 0.0127995 7.24 0.000000000000545 ***
## ed 0.0567042 0.0026128 21.70 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0250971 -14.65 < 0.0000000000000002 ***
## blackyes -0.1669376 0.0220422 -7.57 0.000000000000044 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 887
## Residual Sum of Squares: 507
## R-Squared: 0.429
## Adj. R-Squared: 0.427
## F-statistic: 259.544 on 12 and 4152 DF, p-value: <0.0000000000000002
8
9. # o modelo com erros-padrões robustos de Arellano
summary(pooled.tab11_1, vcov = function(x) vcovHC(x, method = "arellano"))
## Pooling Model
##
## Note: Coefficient variance-covariance matrix supplied: function(x) vcovHC(x, method = "ar
##
## Call:
## plm(formula = lwage ~ exp + I(exp^2) + wks + bluecol + ind +
## south + smsa + married + union + ed + sex + black, data = Wag,
## model = "pooling", index = 595)
##
## Balanced Panel: n = 595, T = 7, N = 4165
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 5.2511236 0.1232643 42.60 < 0.0000000000000002 ***
## exp 0.0401047 0.0040671 9.86 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000911 -7.39 0.00000000000017514 ***
## wks 0.0042161 0.0015384 2.74 0.00616 **
## bluecolyes -0.1400093 0.0271807 -5.15 0.00000027103326127 ***
## ind 0.0467886 0.0236087 1.98 0.04756 *
## southyes -0.0556374 0.0260996 -2.13 0.03309 *
## smsayes 0.1516671 0.0240477 6.31 0.00000000031434550 ***
## marriedyes 0.0484485 0.0408504 1.19 0.23569
## unionyes 0.0926267 0.0236178 3.92 0.00008927341363737 ***
## ed 0.0567042 0.0055519 10.21 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0454704 -8.09 0.00000000000000079 ***
## blackyes -0.1669376 0.0442280 -3.77 0.00016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 887
## Residual Sum of Squares: 507
## R-Squared: 0.429
## Adj. R-Squared: 0.427
## F-statistic: 66.2142 on 12 and 594 DF, p-value: <0.0000000000000002
# o modelo com erros-padrões consistentes com heterocedasticidade de White
summary(pooled.tab11_1, vcov = function(x) vcovHC(x, method = "white1"))
## Pooling Model
9
10. ##
## Note: Coefficient variance-covariance matrix supplied: function(x) vcovHC(x, method = "wh
##
## Call:
## plm(formula = lwage ~ exp + I(exp^2) + wks + bluecol + ind +
## south + smsa + married + union + ed + sex + black, data = Wag,
## model = "pooling", index = 595)
##
## Balanced Panel: n = 595, T = 7, N = 4165
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 5.2511236 0.0743506 70.63 < 0.0000000000000002 ***
## exp 0.0401047 0.0021578 18.59 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000479 -14.06 < 0.0000000000000002 ***
## wks 0.0042161 0.0011426 3.69 0.00023 ***
## bluecolyes -0.1400093 0.0149357 -9.37 < 0.0000000000000002 ***
## ind 0.0467886 0.0119942 3.90 0.0000973424968314 ***
## southyes -0.0556374 0.0127442 -4.37 0.0000129800945396 ***
## smsayes 0.1516671 0.0120790 12.56 < 0.0000000000000002 ***
## marriedyes 0.0484485 0.0204945 2.36 0.01813 *
## unionyes 0.0926267 0.0123331 7.51 0.0000000000000717 ***
## ed 0.0567042 0.0027265 20.80 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0231003 -15.92 < 0.0000000000000002 ***
## blackyes -0.1669376 0.0207472 -8.05 0.0000000000000011 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 887
## Residual Sum of Squares: 507
## R-Squared: 0.429
## Adj. R-Squared: 0.427
## F-statistic: 256.818 on 12 and 594 DF, p-value: <0.0000000000000002
Between estimator (com variação)
between <- plm(Y ~ X, data=pdata, model= "between")
summary(between)
## Oneway (individual) effect Between Model
10
11. ##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "between")
##
## Balanced Panel: n = 595, T = 7, N = 4165
## Observations used in estimation: 595
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.9782 -0.2203 0.0366 0.2501 0.9856
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 4.683039 0.210099 22.29 < 0.0000000000000002 ***
## Xexp 0.038153 0.005697 6.70 0.00000000005 ***
## Xexp2 -0.000631 0.000126 -5.02 0.00000067570 ***
## Xwks 0.013090 0.004066 3.22 0.0014 **
## Xed 0.073784 0.004898 15.06 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 92.3
## Residual Sum of Squares: 62.2
## R-Squared: 0.326
## Adj. R-Squared: 0.322
## F-statistic: 71.4768 on 4 and 590 DF, p-value: <0.0000000000000002
First differences estimator
firstdiff <- plm(Y ~ X, data=pdata, model= "fd")
summary(firstdiff)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "fd")
##
## Balanced Panel: n = 595, T = 7, N = 4165
## Observations used in estimation: 3570
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.022 -0.049 0.016 0.027 0.087 2.426
##
11
12. ## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Xexp2 0.0017323 0.0000702 24.67 <0.0000000000000002 ***
## Xwks 0.0000810 0.0005910 0.14 0.89
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 118
## Residual Sum of Squares: 129
## R-Squared: 0.00406
## Adj. R-Squared: 0.00378
## F-statistic: -300.514 on 1 and 3568 DF, p-value: 1
Fixed effects or within estimator
fixed <- plm(Y ~ X, data=pdata, model= "within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "within")
##
## Balanced Panel: n = 595, T = 7, N = 4165
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.81209 -0.05111 0.00371 0.06143 1.94341
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Xexp 0.1137879 0.0024689 46.09 < 0.0000000000000002 ***
## Xexp2 -0.0004244 0.0000546 -7.77 0.00000000000001 ***
## Xwks 0.0008359 0.0005997 1.39 0.16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 241
## Residual Sum of Squares: 82.6
## R-Squared: 0.657
## Adj. R-Squared: 0.599
## F-statistic: 2273.74 on 3 and 3567 DF, p-value: <0.0000000000000002
12
13. Random effects estimator
random <- plm(Y ~ X, data=pdata, model= "random")
summary(random)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "random")
##
## Balanced Panel: n = 595, T = 7, N = 4165
##
## Effects:
## var std.dev share
## idiosyncratic 0.0232 0.1522 0.18
## individual 0.1021 0.3195 0.82
## theta: 0.823
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.0440 -0.1057 0.0071 0.1147 2.0876
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 3.8293661 0.0936336 40.9 <0.0000000000000002 ***
## Xexp 0.0888609 0.0028178 31.5 <0.0000000000000002 ***
## Xexp2 -0.0007726 0.0000623 -12.4 <0.0000000000000002 ***
## Xwks 0.0009658 0.0007433 1.3 0.19
## Xed 0.1117100 0.0060572 18.4 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 261
## Residual Sum of Squares: 151
## R-Squared: 0.42
## Adj. R-Squared: 0.419
## F-statistic: 753.113 on 4 and 4160 DF, p-value: <0.0000000000000002
LM test for random effects versus OLS
plmtest(pooling)
13
14. ##
## Lagrange Multiplier Test - (Honda) for balanced panels
##
## data: Y ~ X
## normal = 72, p-value <0.0000000000000002
## alternative hypothesis: significant effects
LM test for fixed effects versus OLS
pFtest(fixed, pooling)
##
## F test for individual effects
##
## data: Y ~ X
## F = 40, df1 = 590, df2 = 3600, p-value <0.0000000000000002
## alternative hypothesis: significant effects
Hausman test for fixed versus random effects model
phtest(random, fixed)
##
## Hausman Test
##
## data: Y ~ X
## chisq = 6200, df = 3, p-value <0.0000000000000002
## alternative hypothesis: one model is inconsistent
14