The document provides an introduction to panel data analysis. It defines time series data, cross-sectional data, and panel data, which combines the two. Panel data has advantages over single time series or cross-sectional data like more observations, capturing heterogeneity and dynamics. Panel data can be balanced or unbalanced, and micro or macro. The document demonstrates structuring panel data in Excel for empirical analysis in Eviews, including an activity to arrange time series data into a panel data format.
2. 2
Introduction to Panel Data Analysis
OUTLINE
OUTLINE
OUTLINE
OUTLINE
1.0
1.0
1.0
1.0Preamble
Preamble
Preamble
Preamble
Time
Time
Time
Time series data
series data
series data
series data
Cross
Cross
Cross
Cross-
-
-
-sectional data
sectional data
sectional data
sectional data
Panel data
Panel data
Panel data
Panel data
Practical illustrations
Practical illustrations
Practical illustrations
Practical illustrations
Objectives of the course
Objectives of the course
Objectives of the course
Objectives of the course
2.0
2.0
2.0
2.0 Why panel data?
Why panel data?
Why panel data?
Why panel data?
3.0
3.0
3.0
3.0 Structure of panel data
Structure of panel data
Structure of panel data
Structure of panel data
Balanced panel vs. Unbalanced panel
Balanced panel vs. Unbalanced panel
Balanced panel vs. Unbalanced panel
Balanced panel vs. Unbalanced panel
Macro panel vs. Micro panel
Macro panel vs. Micro panel
Macro panel vs. Micro panel
Macro panel vs. Micro panel
Practical illustrations
Practical illustrations
Practical illustrations
Practical illustrations
Activity 1
Activity 1
Activity 1
Activity 1
Activ
Activ
Activ
Activity 2
ity 2
ity 2
ity 2
4.0
4.0
4.0
4.0 Panel data models
Panel data models
Panel data models
Panel data models
Static panel vs. Dynamic panel
Static panel vs. Dynamic panel
Static panel vs. Dynamic panel
Static panel vs. Dynamic panel
Estimation of static panel data models
Estimation of static panel data models
Estimation of static panel data models
Estimation of static panel data models
Pooled regression
Pooled regression
Pooled regression
Pooled regression
Practical illustrations
Practical illustrations
Practical illustrations
Practical illustrations
Activity 3
Activity 3
Activity 3
Activity 3
Fixed effects
Fixed effects
Fixed effects
Fixed effects
Empirical illustrations
Empirical illustrations
Empirical illustrations
Empirical illustrations
Activity 4
Activity 4
Activity 4
Activity 4
Random effects
Random effects
Random effects
Random effects
Empirical illustrations
Empirical illustrations
Empirical illustrations
Empirical illustrations
Activity 5
Activity 5
Activity 5
Activity 5
3. 3
Introduction to Panel Data Analysis
1.0 Preamble
In applied research, three categories of data sets have remained prominent. They are time
series data, cross-sectional data and panel data. A time series data set is a collection of
observations on several variables over a period of time e.g. Exchange rate and Consumer Price
Index (CPI) in Nigeria over the period 1980 - 2008. A cross-sectional data set refers to data
collected on several units such as households, firms, countries, states, regions etc. at a
particular point in time. Examples of these data in Nigeria are the 2009 Labour Force Survey
(LFS) and 2006 Core Welfare Indicators Questionnaire (CWIQ). However, a panel data set refers
to data covering several units over a period of time. Thus, a panel data set is a combination of
time series data and cross-sectional data. Examples of panel data are Gross Domestic Product
(GDP) per capita for Sub-saharan African (SSA) countries over the period 1960 – 2008 and stock
prices of listed companies in Nigeria over the period 1990 - 2010.
In a typical symbolic representation, time series variables are usually denoted by subscript ݐ
while cross-sectional variables are denoted by subscript ݅. Since panel data have both time
series and cross-sectional dimensions, their variables are represented by subscript ݅ݐ. Using the
absolute income hypothesis as an example, we can specify the following single-equation
models in respect of these data sets below:
ܥ௧ = ߙ + ߚܻ௧ + ߝ௧; ݐ = 1, … , ܶ (1)
ܥ = ߙ + ߚܻ + ߝ; ݅ = 1, … , ܰ (2)
ܥ௧ = ߙ + ߚܻ௧ + ߝ௧; (3)
Where ܥ denotes consumption and ܻ represents disposable income. ݐ is the time series
dimension and ݅ is the cross-sectional dimension. Equation (1) above follows a time series
framework, equation (2) follows a cross-sectional framework while equation (3) follows a panel
data framework. If we are testing this hypothesis in a time series framework using Nigerian
data for example over the period 1980 - 2010, it implies that ݐ = 1980, 1981, … ,2010. If it
follows a cross-sectional framework using data covering SSA for year 2010, then, ݅= list of SSA.
If we are dealing with a panel data framework also using data covering SSA however over a
period of 1980 - 2010, ݅= list of SSA and ݐ = 1980, 1981, … ,2010.
Note the following on the three modelling frameworks:
1. On the time series case, we are dealing with only one unit – ܰ = 1 (i.e. one individual,
one firm, one country, state or any other single type of unit) over a period of time -
ݐ = 1, … , ܶ. Thus, we can draw one or more variables over a period of time for the
single unit. The total number of observations is ܶ.
4. 4
2. On the cross-sectional case, we are dealing with several units - ݅ = 1, … , J (individuals,
firms, states or countries) at a specific point in time – H B 1. That is, each unit has only
one data point on each variable. Therefore, the total number of observations is J.
3. As regards the panel data framework, several units are considered over a period of time.
That is, one or more variables can be drawn on each unit over a period of time.
Therefore, the total number of observations is JH.
We consider practical illustrations of these data sets below.
Figure 1: Time series data set covering selected macroeconomic variables in Nigeria over the
period 1980 – 2007.
Time Series
Dimension
5. 5
Figure 2: Cross-sectional data set covering selected SSA for year 2005
Figure 3: Panel Data covering selected SSA over the period 1996-2005
Cross-sectional
Dimension
6. 6
For the purpose of this exercise, our subsequent discussions shall focus on panel data and,
therefore, we provide a more detailed explanation on the latter in the next sections. At the end
of the lecture, students should be able to:
Structure panel data for the purpose of estimation
Deal with pooled regression models, fixed effect models and random effect models
Compute relevant diagnostic tests to validate panel data regressions
Interpret panel data regressions
2.0 Why Panel Data?
The use of panel data in applied research is increasingly gaining relevance both in developed
and developing countries partly because of the need to harmonize regional policies and more
generally due to the inherent benefits for empirical analyses. Hsiao (2003) documents the
advantages of using panel data. These are represented in Baltagi (2008) and summarized as
follows:
1. Panel data provide sufficient observations and, consequently, more sample variability,
less collinearity, more degrees of freedom, and more accurate inference of model
parameters. For example, when dealing with cross-sectional data, ܶ = 1 and similarly,
for time series data, ܰ = 1. However, in the case of panel data, ݐ = 1, … , ܶ like wise
݅ = 1, … , ܰ providing more observations and more sample variability than either cross-
sectional data or time series data.
2. In connection with (1), panel data models better capture the complexity of human
behaviour than a single cross-section or time series data. For example, consider a cross-
sectional sample of university students with an average grade of 50% in all the courses
for the same period in time. This suggests that every student has the chances of having
50% grade based on information obtained from their performance at a particular
year/level of academic study. Thus, current information about a student’s academic
performance is a perfect predictor of his/her future performance. However, sequential
observations for students contain information about their academic performance in
different years/levels of academic study that are captured in the cross-sectional
framework. With panel data models, performance of each student can be observed over
time and more informed judgments can be made.
Similarly, consider a time series sample of a student’s academic performance.
Generalizing with the information obtained from the student will lead to unbiased and
inefficient estimates. Therefore, by pooling a cross-sectional sample of students over
time, variations in each unit over time are captured.
3. In connection with (2), panel data models are better able to capture the heterogeneity
inherent in each individual unit. The structure of panel data suggests that the cross-
sectional units whether individuals, firms, states or countries are heterogeneous. In
empirical modelling, ignoring these heterogeneous effects when in fact they exist leads
to biased and inefficient results. We can illustrate this with an empirical example using
Baltagi and Levin (1992) paper. They consider cigarette demand across 46 American
7. 7
states for the years 1963-88. Consumption is modelled as a function of lagged
consumption, price and income. They however note that there are a lot of variables that
may be state-invariant or time-invariant that may affect consumption. Examples of
these state invariant variables are advertisement on nationwide television and radio and
national policies while time-invariant variables are religion and education.
4. In connection (2), behaviour of economic agents is intrinsically dynamic and this can be
accounted for using panel data models. Cross-sectional models can only ascertain the
behavioural pattern at a particular point and therefore cannot be used to capture
behavioural dynamics. Similarly, time series data that provides information over a
period of time only limits itself to one unit and, therefore, we are unable to compare
changes in the behaviour of different economic agents. For example, we may be
interested in quantifying the consumption pattern of a number of cross-sections over
time. These dynamics can neither be captured by cross-sectional framework nor time
series models. However, since panel data sets provide time series on each cross-
sectional unit in a group, it becomes relatively easy to evaluate the changes in the
behavioural pattern using the data.
5. The use of micro-data in panel regressions gives more accurate prediction than the
prediction of aggregate data often used in the time series case. More notably, if the
micro units are heterogeneous, policy prescriptions drawn from time series aggregate
data may be invalid. Panel data containing time series observations for a number of
individuals is ideal for investigating the “homogeneity” versus “heterogeneity” issue
(Hsiao, 2006).
Although not limited to those highlighted above, these benefits have provided some reasonable
justification for the consideration of panel data in empirical and policy based studies.
3.0 Structure of Panel Data
Essentially, panel data sets are broadly categorized into two namely: Balanced Panel data and
Unbalanced Panel data. The balanced panels refer to a case where all the cross-sectional units
have equal time series dimension. That is, all the cross-sectional units are observed over the
same period of time. Thus, balanced panels are also referred to complete panels. The
unbalanced panels on the other hand refer to situations where the cross-section units under
scrutiny have unequal time periods. While some cross-sectional units may have complete
information over a specified period (say ܶ), others may have unequal time periods less than ܶ
which may be due to missing observations. This is practically possible particularly in longitudinal
data sets of ܶ dimension where the death of one or more individuals in a survey or exit of one
or more firms in a market being surveyed or withdrawal of one or more countries from an
economic union/cooperation being studied creates some missing observations for the affected
cross-sectional units.
Whether balanced or unbalanced, panel data sets can either be micro-based or macro-based.
When large cross-sectional units (usually households or individuals) are surveyed over a short
time period (not exceeding a maximum of ܶ = 20), such panel data are referred to as micro
8. 8
panels. However, when large cross-sectional units (usually countries) are surveyed over a
considerable period of time (exceeding 20 time periods), such are regarded as macro panels.
Thus, for micro panels, ܰ is large and ܶ is small (less than 20) while for macro panels, we are
dealing with large ܰ and ܶ. As a result of the nature of time series dimension of both micro
and macro panels, they attract different treatments in empirical research. Since we are dealing
with large ܶ in macro panels, the problem of unit root (i.e. non-stationarity) must be addressed.
Also, because macro panels usually involve cross-section of countries, the problem of cross-
sectional dependence may also be evident and therefore, has to be dealt otherwise the results
will be inefficient. We provide below a schema showing the structure of panel data.
Figure 4: Panel Data Structure
Equal T for all i’s Unequal T for the i’s
If ܶ 20 If ܶ 20
Panel Data
Balanced/Complete
Panels
Unbalanced/Incomplete
Panels
Micro Panels Macro Panels
Stationarity and Cross-
sectional Independence
Unit Root and Cross-
sectional Dependence
9. 9
3.1 Some Practical Illustrations
How can we properly structure panel data for estimation purpose? Our instructional software
for the purpose of this exercise is Eviews 7 version. We provide a step-by-step procedure on
how to structure panel data.
Step 1:It is convenient to start with an empty Excel Sheet. We provide below a figure showing a
fresh excel sheet with appropriate labels.
Figure 5: Panel Data Structure in Excel Sheet
Individuals/Households/
Firms/States/Countries
Code for representing
each CS for ease of
identification
Day/Week/Month/Year
List of variables
of interest
10. 10
For example, if CS=countries= Ghana, Nigeria and Time period = years= 2004 – 2009 with the
same set of variables as in figure 5, then we have:
Figure 6: Panel Data for CS=countries= Ghana, Nigeria and Time period = years= 2004 – 2009
Activity 1:
Arrange the following time series data in tables 1- 10 for selected SSA in panel structure in excel
worksheet. Note that llr=Liquid Liabilities as a percentage of GDP; pcr = Private Credit by
Deposit Money Banks as a percentage of GDP; bdr=Bank Deposits as a percentage of GDP; and
rgdp=GDP per capita (constant 2000 US$). Also, all the countries have the same units of
measurement. This is very important in panel data to make meaningful comparative analyses.
Table 1: Time Series for Togo
Year llr Pcr bdr Rgdp
1996 23.0315 16.0748 15.0802 250.4119
1997 20.0072 15.3494 13.713 276.0873
1998 21.3712 17.1686 14.4285 259.7281
1999 21.1455 15.905 13.4929 256.5784
2000 24.5442 15.6822 14.9995 245.9931
15. 15
2004 90.6856 54.2893 72.8972 1540.83
2005 97.1707 57.7634 78.3015 1561.895
Table 10: Time Series for Botswana
Year llr pcr bdr Rgdp
1996 20.882 11.3541 18.8731 2725.958
1997 20.7326 9.7002 18.9318 2938.997
1998 24.4001 10.5208 22.6142 3185.879
1999 27.409 12.6244 25.3974 3354.679
2000 26.0662 14.0234 23.9447 3572.956
2001 24.8806 13.9019 23.2918 3706.873
2002 27.9743 16.6799 26.5515 3867.929
2003 27.6 17.9923 26.1646 4060.682
2004 29.4949 19.6398 27.353 4264.321
2005 28.1966 19.3561 26.1213 4382.497
Step 2: Save your panel data generated from tables 1 – 10 above in a directory you can easily
remember (say Desktop) and ensure the excel worksheet is in 2003 version. Kindly note the file
name for your worksheet. This is important for loading data into Eviews for estimation purpose.
Step 3: Load data into Eviews. This requires the following sub-steps:
a. View your data in the excel sheet to ensure they are structured in panel data form.
b. Click the Eviews 7 icon. You will find the Eviews window as shown below:
16. 16
c. Define the structure of the Panel Data in Eviews. The following pictures show relevant
steps on how to do this.
(i) Opening the Eviews Workfile
17. 17
(ii) Choose Balanced Panel as the workfile structure type
(iii) By choosing balanced panel, it allows you to indicate the time and cross-section dimensions
for your panel data as shown below:
18. 18
(iv) By clicking OK in (iii) above, you will find the Eviews work area in panel data form
Time Series
Dimension
Cross-sectional
Dimension
Work Area
Total Observations
19. 19
d. Import panel data from the excel file: Although, there are several methods of loading data into
Eviews, for the purpose of this exercise however, we will consider the import method. Students
may explore other methods later. This also involves a number of sub-steps as follows:
(i) Choose import method as shown below
20. 20
(ii) By clicking Read Text-Lotus-Excel in (di) above, you will find a box prompting you to indicate
the excel file name containing the panel data. Ensure the excel file is not opened before you
import the data.
(iii) Following the information provided in (dii) above and clicking ok, you will find the import
box below:
File Directory
File name
File type
Column row identifiers for the
first cell of actual data in excel
No. of columns
for actual data
Sample period
21. 21
(iv) By clicking ok after providing necessary information in (diii) above, the Eviews work area is updated
to reflect the data imported. This is shown below:
When you compare (div) with (civ), you will find that the former contains data on variables intended
to be modelled in panel data form while the latter is an empty Eviews workfile.
(e) At this juncture, save your Eviews workfile to secure the data
22. 22
Also note the file name for your Eviews workfile and its directory.
Activity 2
Structure tables 1 -5 in exercise 1 in panel data form in excel sheet and create an Eviews
workfile for the generated excel sheet using the import method.
4.0 Panel Data Models
Panel data models are broadly divided into two namely Static Panel data models and Dynamic
Panel Models. The most notable difference between the two models is the inclusion of the
lagged dependent variable as a regressor in the latter. This exercise focuses on the static panel
data models. A typical static panel data regression can be expressed as:
ܻ௧ = ߙ + ߚܺ௧
ୀଵ
+ ݑ௧ ; ݇ = 1, … , ܬ (4)
where ܻ is the dependent variable and ܺ are the explanatory variables. The subscripts ݅ and ݐ
as earlier defined refer to cross-sectional dimension and time series dimension respectively. ݑ௧
is the composite error term which can be decomposed further into specific effects and
remainder disturbance term. Henceforth, we will refer to ݅ as individual observations which of
course may imply either households, firms, countries or any other group of units. There are two
sets of specific effects namely the individual specific effects and time specific effects. If only one
set of specific effects is included in the regression, such is referred to as one-way error
components model. However, if both sets of specific effects are included, we refer to the model
as two-way error components model. Equations (5) and (6) show decomposition of ݑ௧ into
one-way and two-way error components.
ݑ௧ = ߮ + ߝ௧ (5ܽ)
ݑ௧ = ߛ௧ + ߝ௧ (5ܾ)
ݑ௧ = ߮ + ߛ௧ + ߝ௧ (6)
where ߮ and ߛ௧ denote the unobserved individual and time specific effects respectively. We
shall limit our empirical applications to the one-way error components.
4.1 Estimation of Panel Data Models
Basically, the static panel data models can be estimated using Ordinary Least Square (OLS),
Fixed Effects (FE) and Random Effects (RE). Each of these methods has its underlying
assumptions which must necessarily be satisfied to obtain unbiased and efficient estimates. We
consider in turn each of these methods with the underlying assumptions.
23. 23
Figure 5: Panel Data Models
4.1.1 Pooled Regression
The pooled regression involves estimating equation (4) with OLS. The Pooled regression is
employed based on the assumption that the ܺ are able to capture all the relevant
characteristics of the individual units and, therefore, the unobserved specific effects as
expressed in equations (5) and (6) are redundant. Thus, ߮ in the case of equation (5a), ߛ௧ in
equation (5b) and both effects in the case of (6) will be dropped and and a pooled OLS
regression may be used to fit the model, treating all the observations for all of the time periods
as a single sample. However, ignoring these effects when in fact they are significant yields
inefficient estimates and biased standard errors.
Empirical Application
We are using the panel data in the Eviews workfile created from tables 1-10 to illustrate the
pooled regression. The Panel data regression model is given as:
݀݃ݎ௧ = ߙ + ߚଵ݈݈ݎ௧ + ߚଶݎܿ௧ + ߚଷܾ݀ݎ௧ + ݑ௧
We have carefully structured the procedure for estimating the model in Eviews in the following pictures
below:
Panel Data Models
Static Panel Data
Models
Dynamic Panel Data
Models
Pooled
Regression
Fixed
Effects
Arellano
and Bover
Random
Effects
Arellano
Bond
Blundell and
Bond
25. 25
(c) Pooled Regression Results
Interpretation of Regression Results
Based on the probability values, llr and bdr are statistically significant while pcr is not. Since the
regressors in the model are in percentages, therefore, 1% increase in llr leads to a significant
reduction in rgdp by 122.64USD on the average. Conversely, a 1% increase in bdr leads to a
significant rise in bdr by 177.15USD on the average. Although, the effect of pcr is not
statistically significant, its increase by 1% has the potential effect of reducing rgdp by 3.37USD.
Note that llr is a typical measure of financial depth in economy and thus of the overall size of
the financial sector; pcr is a measure of the activity of financial intermediaries (i.e. ability to
channell savings to investors) and bdr is a measure of size and efficiency of the financial
intermediaries as it captures the ability of the banks to mobilize funds from the public (the
surplus unit). Based on these definitions, the economic interpretations can also be offered.
The two dominant approaches used when the unobserved specific effects are significant in a
panel data model are the fixed effects regressions and the random effects regression models.
We consider these two in the next sections.
Activity 3
Using the Eviews workfile created in exercise 2, estimate the panel data model below assuming
there is no specific effects.
Z[]IA B C * DR^^ZIA * D_]`ZIA * DaYZIA * bI * FIA
Prob. Value (Pv) is used to
determine the level of
significance of each regressor in
the model. If the Pv is 0.05 for
example, it implies that the
regressor in question is
statistically significant at 5%
level; otherwise, it is not
significant at that level.
26. 26
4.1.2 Fixed Effects Regressions
As earlier emphasized, one of the approaches used to capture specific effects in panel data
models is the fixed effects (FE) regression. Thus, equation (4) is modified using any of the error
components equations (5a, 5b or 6) depending on the specific effects being considered. The FE
approach is based on the assumption that the effects are fixed parameters that can be
estimated. Consequently, a number of econometric problems may be encountered: (i) the
unobserved specific fixed effects may be correlated with the regressors employed; (ii) some of
the regressors may be correlated with shocks (remainder disturbance term) that affect the
dependent variable; and (iii) there is possibility of simultaneity biases resulting from the
endogeneity of some regressors. The FE approach can overcome these problems by using any
of the three estimation methods namely: Within Group (WG) estimator; Least Square Dummy
Variable (LSDV) estimator and First Difference (FD) estimator. The WG estimator involves the
deviation approach which eliminates the unobserved effects. Similarly, the FD estimator
involves taking the first difference of the modified panel data model to eliminate the
unobserved effects. Thus, under the WG estimator and FD estimator, the model is transformed
in such a way that the unobserved effect is eliminated. The LSDV estimator however involves
the inclusion of dummy variables as regressors in the panel data model (equation 4) to capture
the specific effects. In this case, either the intercept ߙ or one of the dummy variables must be
dropped to avoid perfect collinearity or dummy trap problem.
We provide below the specifications in respect of these estimation techniques for fixed effects.
Within Regression: ܻ௧ − ܻ
ത. = ߚ(ܺ௧ −
ୀଵ
ܺ
ത.) + ߝ௧ − ߝ. ; ݇ = 1, … , ܬ (7)
First Difference Regression: ܻ௧ − ܻ௧ିଵ = ߚ(ܺ௧ −
ୀଵ
ܺ௧ିଵ) + ߝ௧ − ߝ௧ିଵ (8)
Or
First Difference Regression: ∆ܻ௧ = ߚ∆ܺ௧
ୀଵ
+ ߝ௧ − ߝ௧ିଵ (8)
LSDV Regression: ܻ௧ = ߙ + ߚܺ௧
ୀଵ
+ ߛΓ +
ே
ୀଶ
ߝ௧ ; (9)
Or
LSDV Regression: ܻ௧ = ߚܺ௧
ୀଵ
+ ߛΓ +
ே
ୀଵ
ߝ௧ ; (9)
To confirm whether the specific effects estimated are actually fixed effects, the F-test is used in
this regard. The null hypothesis for this test is given as:
ܪ: ߤଵ = ߤଶ = ⋯ = ߤଶ = 0
27. 27
We reject ܪ when the F-statistic is statistically significant suggesting that the fixed effects are
important in the model, otherwise, we do not reject it.
Empirical Application
We are still using the panel data in the Eviews workfile created from tables 1-10 to illustrate the
FE regressions. The Panel data regression model with fixed effects is given as:
Z[]IA B C * DR^^ZIA * D_]`ZIA * DaYZIA * bI * FIA
(a) Equation editor
(b) Model Specification
Click panel options to include
the fixed effects
28. 28
(c) Specifying the fixed effects
(d) Fixed Effects Regression
Click ok after specifying
the fixed effects
Fixed effects
29. 29
The interpretation of regression results here is not different from the style used under the
pooled regression. However, we need to further test whether the inclusion of the effects is
significant or not.
(e) Fixed Effects Testing
Prob. Value (Pv) is used to
test the significance of the
fixed effects in the model.
If the Pv is 0.05 for
example, it implies that the
effects are statistically
significant at 5% level;
otherwise, they are not
significant at that level.
1
2
3
30. 30
Recall that the null hypothesis for the fixed effect test is that the fixed effects are not important
in the model. Based on the Pv, we can conclude that the fixed effects are significant in the
model and, therefore, estimating without these effects which is the case in pooled regression
will yield biased standard errors and policy prescriptions drawn from the analysis will be invalid.
Activity 4
Using the Eviews workfile created in exercise 2, estimate the panel data model below assuming
the cross-sectional units are fixed parameters that can be estimated. Also, conduct relevant
diagnostic tests to validate your results.
݀݃ݎ௧ = ߙ + ߚଵ݈݈ݎ௧ + ߚଶݎܿ௧ + ߚଷܾ݀ݎ௧ + ߤ + ߝ௧
4.1.3 Random Effects Regressions
One of the fundamental limitations of fixed effects regressions is that there is usually a high loss
of degrees of freedom when there are too many parameters to estimate due to large cross-
sectional units. This high loss of degrees of freedom can be avoided if the cross-sectional units
can be assumed random. In addition, the specific effects (߮ ߛ௧) are assumed independent of
ߝ௧ and also ܺ are independent of both the specific effects (߮ ߛ௧) and ߝ௧. Thus, random
effects model assumes exogeneity of all regressors. Practically, this may be the case if the
individual observations constitute a random sample from a given population. However, where
the unobserved specific effects are not distributed independently of the explanatory variables,
the fixed effects should be used instead to avoid the problem of unobserved heterogeneity
bias. The random effects regression is given as:
ܻ௧ − ߶ܻ
ത. = ߚ(ܺ௧ −
ୀଵ
߶ܺ
ത.) + ߝ௧ − ߶ߝҧ. (10)
Where ߶ = 1 −
ఙഄ
మ
்ఙഋ
మାఙഄ
మ . If there are no random effects, then, ߪఓ
ଶ
= 0 and therefore ߶ = 0. By
substituting this into equation (10) gives back equation (4). In such a case, we can perform
either the pooled regression or the fixed effects regression. Thus, we may need to test further
using the F-test earlier explained to ascertain whether there are fixed effects or not.
To confirm whether the specific effects estimated are actually random effects and are
uncorrelated with the explanatory variables, we carry out the Hausman test. The rejection of
the null hypothesis (when the statistic is statistically significant) implies an adoption of the fixed
effects model and non-rejection is considered as an adoption of the random effects model. The
rejection of the null hypothesis also implies correlated specific effects are better captured with
31. 31
fixed effects model. Figure 6 below provides some useful hints on how to choose the most
appropriate panel data regression model.
Empirical Application
We are still using the panel data in the Eviews workfile created from tables 1-10. The Panel data
regression model with random effects is given as:
Z[]IA B C * DR^^ZIA * D_]`ZIA * DaYZIA * bI * FIA
(a) Equation editor
(b) Model Specification
Click panel options to include
the random effects
33. 33
The interpretation of regression results here is not different from the style used under the
pooled regression. However, we need to further test whether the inclusion of the random
effects is significant and also whether these effects are correlated or not. As earlier explained,
the Hausman test is used in this regard.
(e) Random Effects Testing
Prob. Value (Pv) is used to
test whether the random
effects are correlated in the
model. If the Pv is 0.05
for example, it implies that
the effects are correlated at
5% level; otherwise, they
are not at that level.
34. 34
It should be noted that one of the assumptions underlying the use of random effects model is
that specific effects are independently and identically distributed. To confirm whether the
specific effects estimated are actually random effects and are uncorrelated, the Hausman test is
often used. As earlier emphasized, the rejection of the null hypothesis (when the statistic is
statistically significant) implies an adoption of the fixed effects model and non-rejection is
considered as an adoption of the random effects model. The rejection of the null hypothesis
also implies correlated specific effects are better captured with fixed effects model. Based on
the pv of the test summary as shown above, the effects are correlated and therefore the fixed
effects regression may give a better fit than the random effects model.
Figure 6: Static Panel Data Models
Yes No
No
Yes
Yes No
Are the cross-sectional units
randomly sampled?
Consider Pooled, Fixed Effects
Random Effects regressions
Consider Fixed Effects
Regression and Pooled
regressions
Perform Hausman Test to
choose the appropriate Model
Does the Hausaman Test
indicate non-rejection of H0?
Perform Random Effects
Regression
Perform F-test to choose btw
the two regressions
Does the F-test indicate non-
rejection of H0?
Perform Pooled Reg. Perform Fixed Effects Reg.
35. 35
The table below will guide you as regards the interpretation of linear regression results.
Model Interpretation
Linear – Linear:
ݕ௧ = ߙ + ߙଵݔ௧ + ݒ௧
a 1 unit change in ݔ will lead to a ߙଵ unit change in the mean of ݕ. For
example, if the estimated coefficient is 0.05 that implies a one unit
increase in ݔ will generate a 0.05 unit increase in ݕ.
Linear – Log (Semi-log):
ݕ௧ = ߙ + ߙଵlnݔ௧ + ݒ௧
a 1 per cent change in ݔ will lead to a (ߙଵ/100) unit change in the
mean of ݕ. For example, if the estimated coefficient is 72.07, it implies
that a 1 per cent increase in ݔ, will lead to a 0.72 unit increase in ݕ.
Log – Linear (Semi-log):
lnݕ௧ = ߙ + ߙଵݔ௧ + ݒ௧
a 1 unit change in ݔ will lead to a (100ߙଵ) per cent change in the mean
of ݕ. For example, if the estimated coefficient is -0.20 that means a one
unit increase in ݔ will lead to a 20 per cent decrease in ݕ.
Double log:
lnݕ௧ = ߙ + ߙଵlnݔ௧ + ݒ௧
a 1 per cent change in ݔ will lead to a ߙଵ per cent change in the mean
of ݕ. For example, if the estimated coefficient is 0.05 that implies a one
per cent increase in ݔ will yield a 0.05 per cent increase in ݕ.
Activity 5
Using the Eviews workfile created in exercise 2, estimate the panel data model below assuming
the cross-sectional units are randomly drawn. Also, conduct relevant diagnostic tests to validate
your results.
݀݃ݎ௧ = ߙ + ߚଵ݈݈ݎ௧ + ߚଶݎܿ௧ + ߚଷܾ݀ݎ௧ + ߤ + ߝ௧
References
Baltagi, B.H. (2008). Econometric Analysis of Panel Data. Fourth Edition. John Wiley Sons Ltd.,
UK.
Hsiao, C. (2003). Analysis of Panel Data. Cambridge University Press, Cambridge.