Econometria aplicada com dados em painel

Figure 1: License: CC BY-SA 4.0
Econometria aplicada com dados em painel, replicação
de Ani Katchova e Greene
Adriano Marcos Rodrigues Figueiredo1
aEscola de Administração e Negócios-ESAN, Rua Senador Filinto Muller n.1555 - Vila
Ipiranga - Unidade 10A Sala 04, Campo Grande, MS, 79074-460
Abstract
We analyse panel data applied to economic questions, replication of Ani Katchova’s
script and examples of Greene (2012).
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0
International License.
Script adaptado de Panel Data Models in R, Copyright 2013 by
Ani Katchova, Ohio State University, Department of Agricultural,
Environmental, and Development Economics, encontrado em https://
sites.google.com/site/econometricsacademy/econometrics-models/panel-
data-modelse exemplos de Greene (2012). Cornwell and Rupert (1988)
analyzed the returns to schooling in a balanced panel of 595 observa-
tions on heads of households. The sample data are drawn from years
1976–1982 from the “Non-Survey of Economic Opportunity” from
the Panel Study of Income Dynamics. Ver também Panel101R de
Oscar Torres-Reyna.
Introdução
Serão analisados os seguintes estimadores de dados em painel: Pooled OLS,
Between, Fixed effects (within), First differences, e Random effects.
Os primeiros passos são chamar o pacote plm e importar os dados. O script
ilustrativo de Ani Katchova prevê dados de salários na planilha panel_wage.csv.
São dados em painel oriundos do Painel de Estudo da Renda e Dinâmica
(PSID, Panel Study of Income and Dynamics) e deve-se considerar a variação
no tempo (within) e entre indivíduos (between).
Email address: adriano.figueiredo@ufms.br (Adriano Marcos Rodrigues Figueiredo)
Preprint submitted to Journal of Development Economics January 22, 2019

O logaritmo do salário (lwage) é a variável dependente e como variáveis
independentes têm-se: educação (ed), experiência (exp) e semanas trabalhadas
(wks). Também será incluída a experiência ao quadrado (exp2) para acomodar
não-linearidades nesta variável.
Para facilitar, a profa. Ani Katchova elaborou o script considerando matrizes
Y e X de modo a acomodar as variáveis dentro do modelo.
library(plm) # pacote básico para dados em painel
## Loading required package: Formula
mydata<- read.csv("panel_wage.csv") # chamando os dados
attach(mydata) # indexando os rótulos das colunas
Y <- cbind(lwage) # matriz de variável dependente
X <- cbind(exp, exp2, wks, ed) # matriz de variáveis independentes
O próximo passo é deﬁnir o banco de dados em formato de painel.Para isto,
utiliza-se a função pdata.frame, especiﬁcando os indicadores de cross-section “id”
e de tempo “t”. As estatísticas descritivas foram geradas na sequência. A rotina
original continha a função plm.data, a qual está atualmente desencorajada pelo
pacote plm.
# definindo o painel de dados com cross-section "id" e tempo "t"
pdata <- pdata.frame(mydata, index=c("id","t"))
# Descriptive statistics
library(knitr)
kable(summary(Y),caption="Estatísticas descritivas da variável dependente Y")
Table 1: Estatísticas descritivas da variável dependente Y
lwage
Min. :4.605
1st Qu.:6.395
Median :6.685
Mean :6.676
3rd Qu.:6.953
Max. :8.537
kable(summary(X),caption="Estatísticas descritivas das variáveis independentes X")
2

Table 2: Estatísticas descritivas das variáveis independentes X
exp exp2 wks ed
Min. : 1.00 Min. : 1.0 Min. : 5.00 Min. : 4.00
1st Qu.:11.00 1st Qu.: 121.0 1st Qu.:46.00 1st Qu.:12.00
Median :18.00 Median : 324.0 Median :48.00 Median :12.00
Mean :19.85 Mean : 514.4 Mean :46.81 Mean :12.85
3rd Qu.:29.00 3rd Qu.: 841.0 3rd Qu.:50.00 3rd Qu.:16.00
Max. :51.00 Max. :2601.0 Max. :52.00 Max. :17.00
É possível ver as dimensões do painel de dados. O painel é balanceado com
n = 595 observações de domicílios, T = 7 períodos de tempo (de 1976 a 1982), e
N = 4165 é um total de 4165 observações (595 x 7).
pdim(pdata)
## Balanced Panel: n = 595, T = 7, N = 4165
É possível ver a variação de Y (lwage) no tempo e entre cross-sections:
coplot(Y ~ id|t, type="l", data=pdata) # Lines
5678
1 70 162 266 370 474 578
1 70 162 266 370 474 578
5678
1 70 162 266 370 474 578
5678
id
Y
1 2 3 4 5 6 7
Given : t
3

coplot(Y ~ id|t, type="b", data=pdata) # Points and lines
5678
1 70 162 266 370 474 578
1 70 162 266 370 474 578
5678
1 70 162 266 370 474 578
5678
id
Y
1 2 3 4 5 6 7
Given : t
# Bars at top indicates corresponding graph (i.e. countries)
# from left to right starting on the bottom row (Muenchen/Hilbe:355)
O gráﬁco de dispersão permite visualizar um pouco mais sobre o painel:
library(gplots)
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
plotmeans(Y ~ id, main="Heterogeineity across units", data=pdata)
4

5.06.07.08.0
Heterogeineity across units
id
Y
1 35 75 121 172 223 274 325 376 427 478 529 580
n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7n=7
plotmeans(Y ~ t, main="Heterogeineity across periods", data=pdata)
6.46.56.66.76.86.97.0
Heterogeineity across periods
t
Y
1 2 3 4 5 6 7
n=595 n=595 n=595 n=595 n=595 n=595 n=595
5

# plotmeansdraw a 95% confidence interval around the means
A partir desta organização da base de dados em painel, procedem-se as
estimativas.
Pooled OLS (MQO nos Dados Empilhados)
O primeiro estimador é do modelo “empilhado”, ou “Pooled estimator”, obtido
com Mínimos Quadrados Ordinários (MQO) na base de dados sem preocupar
com a indicação entre cross-section e tempos. Este método não considera a
heterogeneidade entre grupos ou tempo.
pooling <- plm(Y ~ X, data=pdata, model= "pooling")
suppressMessages(library(stargazer))
stargazer(pooling,
title="Título: Resultado da Regressão Pooled OLS",
align=TRUE,
type = "text", style = "all",
keep.stat=c("aic","bic","rsq", "adj.rsq","n")
)
##
## Título: Resultado da Regressão Pooled OLS
## ========================================
## Dependent variable:
## ---------------------------
## Y
## ----------------------------------------
## Xexp 0.045***
## (0.002)
## t = 18.670
## p = 0.000
## Xexp2 -0.001***
## (0.0001)
## t = -13.555
## p = 0.000
## Xwks 0.006***
## (0.001)
## t = 4.927
## p = 0.00000
## Xed 0.076***
## (0.002)
## t = 34.151
## p = 0.000
## Constant 4.908***
6

## (0.067)
## t = 72.894
## p = 0.000
## ----------------------------------------
## Observations 4,165
## R2 0.284
## Adjusted R2 0.283
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Pooled OLS (Greene, 2012, p.351-352)
No exemplo 11.1 de Greene (2012, p.351), a estimação do modelo pooled é
conduzida da seguinte forma, agora utilizando a base de dados do pacote plm,
em que:
• exp years of full-time work experience.
• wks weeks worked.
• bluecol blue collar?
• ind works in a manufacturing industry?
• south resides in the south?
• smsa resides in a standard metropolitan statistical area?
• married married?
• sex a factor with levels “male” and “female” .
• union individual’s wage set by a union contract?
• ed years of education.
• black is the individual black?
• lwage logarithm of wage.
library(plm)
options("scipen"=100, "digits"=4)
# data set 'Wages' está organizado como dados empilhados de
# time series/painel balanceado
data("Wages", package = "plm")
Wag <- pdata.frame(Wages, index=595) # inseri o formato de painel
7

pooled.tab11_1 <- plm(lwage ~ exp + I(exp ^ 2) + wks + bluecol +
ind + south + smsa + married + union + ed +
sex + black,
data = Wag, index = 595,
model = "pooling")
summary(pooled.tab11_1)
## Pooling Model
##
## Call:
## plm(formula = lwage ~ exp + I(exp^2) + wks + bluecol + ind +
## south + smsa + married + union + ed + sex + black, data = Wag,
## model = "pooling", index = 595)
##
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 5.2511236 0.0712868 73.66 < 0.0000000000000002 ***
## exp 0.0401047 0.0021592 18.57 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000474 -14.19 < 0.0000000000000002 ***
## wks 0.0042161 0.0010814 3.90 0.000098175299491 ***
## bluecolyes -0.1400093 0.0146567 -9.55 < 0.0000000000000002 ***
## ind 0.0467886 0.0117935 3.97 0.000073910494992 ***
## southyes -0.0556374 0.0125271 -4.44 0.000009171785569 ***
## smsayes 0.1516671 0.0120687 12.57 < 0.0000000000000002 ***
## marriedyes 0.0484485 0.0205687 2.36 0.019 *
## unionyes 0.0926267 0.0127995 7.24 0.000000000000545 ***
## ed 0.0567042 0.0026128 21.70 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0250971 -14.65 < 0.0000000000000002 ***
## blackyes -0.1669376 0.0220422 -7.57 0.000000000000044 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 887
## Residual Sum of Squares: 507
## R-Squared: 0.429
## Adj. R-Squared: 0.427
## F-statistic: 259.544 on 12 and 4152 DF, p-value: <0.0000000000000002
8

# o modelo com erros-padrões robustos de Arellano
summary(pooled.tab11_1, vcov = function(x) vcovHC(x, method = "arellano"))
## Pooling Model
##
## Note: Coefficient variance-covariance matrix supplied: function(x) vcovHC(x, method = "ar
##
## Call:
##
##
## Residuals:
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## (Intercept) 5.2511236 0.1232643 42.60 < 0.0000000000000002 ***
## exp 0.0401047 0.0040671 9.86 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000911 -7.39 0.00000000000017514 ***
## wks 0.0042161 0.0015384 2.74 0.00616 **
## bluecolyes -0.1400093 0.0271807 -5.15 0.00000027103326127 ***
## ind 0.0467886 0.0236087 1.98 0.04756 *
## southyes -0.0556374 0.0260996 -2.13 0.03309 *
## smsayes 0.1516671 0.0240477 6.31 0.00000000031434550 ***
## marriedyes 0.0484485 0.0408504 1.19 0.23569
## unionyes 0.0926267 0.0236178 3.92 0.00008927341363737 ***
## ed 0.0567042 0.0055519 10.21 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0454704 -8.09 0.00000000000000079 ***
## blackyes -0.1669376 0.0442280 -3.77 0.00016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-Squared: 0.429
# o modelo com erros-padrões consistentes com heterocedasticidade de White
summary(pooled.tab11_1, vcov = function(x) vcovHC(x, method = "white1"))
## Pooling Model
9

##
## Note: Coefficient variance-covariance matrix supplied: function(x) vcovHC(x, method = "wh
##
## Call:
##
##
## Residuals:
## -2.18965 -0.23536 -0.00988 0.22906 2.08738
##
## Coefficients:
## (Intercept) 5.2511236 0.0743506 70.63 < 0.0000000000000002 ***
## exp 0.0401047 0.0021578 18.59 < 0.0000000000000002 ***
## I(exp^2) -0.0006734 0.0000479 -14.06 < 0.0000000000000002 ***
## wks 0.0042161 0.0011426 3.69 0.00023 ***
## bluecolyes -0.1400093 0.0149357 -9.37 < 0.0000000000000002 ***
## ind 0.0467886 0.0119942 3.90 0.0000973424968314 ***
## southyes -0.0556374 0.0127442 -4.37 0.0000129800945396 ***
## smsayes 0.1516671 0.0120790 12.56 < 0.0000000000000002 ***
## marriedyes 0.0484485 0.0204945 2.36 0.01813 *
## unionyes 0.0926267 0.0123331 7.51 0.0000000000000717 ***
## ed 0.0567042 0.0027265 20.80 < 0.0000000000000002 ***
## sexfemale -0.3677852 0.0231003 -15.92 < 0.0000000000000002 ***
## blackyes -0.1669376 0.0207472 -8.05 0.0000000000000011 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-Squared: 0.429
Between estimator (com variação)
between <- plm(Y ~ X, data=pdata, model= "between")
summary(between)
## Oneway (individual) effect Between Model
10

##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "between")
##
## Observations used in estimation: 595
##
## Residuals:
## -0.9782 -0.2203 0.0366 0.2501 0.9856
##
## Coefficients:
## (Intercept) 4.683039 0.210099 22.29 < 0.0000000000000002 ***
## Xexp 0.038153 0.005697 6.70 0.00000000005 ***
## Xexp2 -0.000631 0.000126 -5.02 0.00000067570 ***
## Xwks 0.013090 0.004066 3.22 0.0014 **
## Xed 0.073784 0.004898 15.06 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 92.3
## Residual Sum of Squares: 62.2
## R-Squared: 0.326
First diﬀerences estimator
firstdiff <- plm(Y ~ X, data=pdata, model= "fd")
summary(firstdiff)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "fd")
##
## Observations used in estimation: 3570
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.022 -0.049 0.016 0.027 0.087 2.426
##
11

## Coefficients:
## Xexp2 0.0017323 0.0000702 24.67 <0.0000000000000002 ***
## Xwks 0.0000810 0.0005910 0.14 0.89
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-Squared: 0.00406
## F-statistic: -300.514 on 1 and 3568 DF, p-value: 1
Fixed eﬀects or within estimator
fixed <- plm(Y ~ X, data=pdata, model= "within")
summary(fixed)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "within")
##
##
## Residuals:
## -1.81209 -0.05111 0.00371 0.06143 1.94341
##
## Coefficients:
## Xexp 0.1137879 0.0024689 46.09 < 0.0000000000000002 ***
## Xexp2 -0.0004244 0.0000546 -7.77 0.00000000000001 ***
## Xwks 0.0008359 0.0005997 1.39 0.16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual Sum of Squares: 82.6
## R-Squared: 0.657
12

Random eﬀects estimator
random <- plm(Y ~ X, data=pdata, model= "random")
summary(random)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Y ~ X, data = pdata, model = "random")
##
##
## Effects:
## var std.dev share
## idiosyncratic 0.0232 0.1522 0.18
## individual 0.1021 0.3195 0.82
## theta: 0.823
##
## Residuals:
## -2.0440 -0.1057 0.0071 0.1147 2.0876
##
## Coefficients:
## (Intercept) 3.8293661 0.0936336 40.9 <0.0000000000000002 ***
## Xexp 0.0888609 0.0028178 31.5 <0.0000000000000002 ***
## Xexp2 -0.0007726 0.0000623 -12.4 <0.0000000000000002 ***
## Xwks 0.0009658 0.0007433 1.3 0.19
## Xed 0.1117100 0.0060572 18.4 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-Squared: 0.42
LM test for random eﬀects versus OLS
plmtest(pooling)
13

##
## Lagrange Multiplier Test - (Honda) for balanced panels
##
## data: Y ~ X
## normal = 72, p-value <0.0000000000000002
## alternative hypothesis: significant effects
LM test for fixed effects versus OLS
pFtest(fixed, pooling)
##
## F test for individual effects
##
## data: Y ~ X
## F = 40, df1 = 590, df2 = 3600, p-value <0.0000000000000002
## alternative hypothesis: significant effects
Hausman test for fixed versus random effects model
phtest(random, fixed)
##
## Hausman Test
##
## data: Y ~ X
## chisq = 6200, df = 3, p-value <0.0000000000000002
## alternative hypothesis: one model is inconsistent
14

Econometria aplicada com dados em painel

Recommended

Recommended

More Related Content

Similar to Econometria aplicada com dados em painel

Similar to Econometria aplicada com dados em painel (20)

More from Adriano Figueiredo

More from Adriano Figueiredo (9)

Recently uploaded

Recently uploaded (20)

Econometria aplicada com dados em painel