Econometrics Methods for Empirical Economics By Tarn Suwankiri Member of Kori Zanyari Meriwan www.meriwan.com
This paper aims to explain the econometrics methods which are widely used in empirical economics study. There are three methods described in this paper, namely, OLS: Ordinary Least Squares, FEM : Fixed Effects Model and REM : Random Effects Model. I discuss in details the advantages and drawbacks of each method. At the end of the paper, I briefly explain two types of test, which are commonly used to justify which method is the most appropriate to fit the variety of a set of empirical data. The first test is called ‘The Breusch-Pagan Lagrange Multiplier Test’ which is used for testing OLS against REM. The second test is called ‘The Hausman Test’ which is used to test FEM against REM.
The Econometrics methods To illustrate the estimation methods, for simplicity, we use a model with only one explanatory variable (
X ijt
). We can write a typically estimated equation as
Yijt = β 0 + β 1 Xijt + aij + uijt
where
(1)
Yijt
is
dependent variable.
β0
is
the intercept.
X ijt
is
independent variable.
β1
is
a parameter to be estimated.
a ij
is
an unobserved effect or the individual fixed effect which is specific to each country pair.
uijt
is
idiosyncratic error or time-varying error.
The Method of Ordinary Least Squares (OLS) The pooled OLS is the simplest approach for the regression analysis since we treat the data as if it comes from a single set of data. That is we stack all observations for each individual (i) one on top of the other. The typical equation in economics to be estimated by OLS estimation looks like Yijt = β 0 + β 1 Xijt + εijt where
εijt
(1)
εijt = aij + uijt
is unobserved and referred to as an error term or disturbance term. The error term of
the pooled OLS is known as a white noise error term as it satisfies the standard OLS assumptions, that is E (εijt ) = 0
var(εijt ) = σ ε2 cov(εijt , εijt + s ) = 0
s≠0
where the error term has a normal distribution with a mean value of zero and variance of
σ ε2 and the covariance of the error terms between two points of time is zero (Gujarati, 2003, pp.67-70).
aij is constant through time and indifferent for all individual, therefore aij is fixed and can be written as uijt
a
.
is the idiosyncratic error and has a normal distribution ;
uijt
~
N (0, σ u2 )
The method of ordinary least squares is a basic linear regression to estimate parameters from samples or a set of data which produces the smallest value of the residual sum of squares (Gujarati, 2003, pp.79). According to The Gauss-Markov theorem, the OLS estimator is said to be the Best Linear Unbiased Estimator (BLUE) because it is an
unbiased estimator as the expected value of coefficient
β
is the same as the true
β
And
it is an efficient estimator since it has a minimum variance.
However, it has some drawbacks because the
pooled OLS can not
distinguish the difference between two different points of time for the same individual. It also disregards space of data by assuming that intercept and slope coefficients are the same for all individuals. Though OLS is widely accepted as it is easy to compute, it has limited ability due to its rigorous assumptions, which are incompatible with the real world. It may not show the true relationship between the observations and the dependent variable.
The Fixed Effects Model (FE)
The concept of FE is to run a regression on the value of variables’ deviations from their means. The equation for the Fixed Effects Model is, Yijt = β 0 + β 1 Xijt + aij + uijt
Here,
a ij
(3)
represents all the unobserved effect which has an effect on the dependent
variable but could not be measured.
aij
is specific to each individual but it is assumed to
be fixed through time. In order to estimate the equation by FE, we find the average of the equation over time, that is the mean of each variable as shown in equation (4), and then we subtract it from the actual value of the observations as shown in equation (5).
Since,
a ij
Y ijt = β 0 + β 1 X ijt + aij + uijt
(4)
Yijt − Y ijt = ( β 0 − β 0 ) + β 1( Xi jt − X ijt ) + ( a ij − a ij ) + (u ijt − uijt )
(5)
is fixed for all t, we get the following equation.
Yijt − Y ijt = β 1( Xi jt − X ijt ) + (u ijt − uijt ) In this FE model, we can eliminate the unobserved effect ( the unobserved effect
a ij
(6) a ij
) from the equation since
is assumed to be fixed through time. As it does not vary through
time, it does not deviate from its mean value. Hence, the term of
a ij
is cancelled out from
the error term. The FE estimator is consistent because its error term does not contain the unobserved effect which might be correlated with the observations. In addition, the fixed effects model allows the intercept of the equation to vary across individuals but the intercept for each individual is fixed through time. Wooldridge (2002) explains that FE produces consistent parameters under the following assumptions, 1) There is no correlation between the error term and the unobserved effect, E (uijtaij ) = 0
2) The error term is not correlated with the observed variables, E (uijtXijt ) = 0
The advantage of the fixed effects model is that it is able to produce consistent parameters when unobserved variables ( (
X ijt
a ij
) are correlated with the observed variables
). However, FE model has some disadvantages. Firstly, if we apply FE when
and
X ijt
a ij
are uncorrelated, FE produces inefficient estimators. Secondly, FE has a problem
of Degree of Freedom when there are many dummy variables in the equation. (df = the total number of observations minus the number of parameters to be estimated). Thirdly, FE fails to identify the effect of time invariant variables. The time invariant variables, such as distance between countries, common language, which do not change through time will disappear from the model.
The Random Effects Model (RE)
The aim of the Random Effects Model or Error Components Model is to present the differences between cross section units through the error term. In the Random Effects Model, we assume that
a ij
in equation (1) is a random variable, unlike OLS and
FE in which
a ij
is assumed to be fixed. Egger (2002) summarizes some important
assumptions for RE model as follows. 1.) E (aijXijt ) = 0 2.) E ( Xijtuijt ) = 0 3.) E (aijuijt ) = 0 2 4.)aij ~ N (0, σ a ) 2 5.)uij ~ N (0, σ u )
The first assumption means that
a ij
and
X ijt
(7)
are uncorrelated
a ij
and
X ijt
Second, the observations
X ijt
must be independent of idiosyncratic error (
If we apply RE when
are correlated, the RE estimator is inconsistent.
Third, the error of the specific individual ( (
uijt
of
a ij
. Fifth, the error term,
uijt
)
) must be independent of the residual errors
) . Fourth, we assume that the unobserved effects,
σ a2
uijt
aij has zero mean and the variance
, is expected to be normally distributed with mean zero
2 and variance σ u .
RE produces more efficient estimators than FE when all the conditions above are satisfied. However, RE is not suitable in some cases when the observations can not be randomly drawn from a large sample, for example, data on states or provinces, which are limited in number (Gujarati, 2003, pp.650). In brief, the feature distinguishing each method is the way it treats unobserved effect. OLS and FE assume
a ij
a fixed variable whereas RE treats
random variable.
The Justification of the Estimators The Breusch-Pagan Lagrange Multiplier Test
a ij
a ij
,an
as a
It is the test to compare the estimator between pooled OLS and RE by testing the null hypothesis that there are no individual specific effects. H0 :
(
σa = 0
)
σa ≠ 0
)
HA : (
If one can reject the null hypothesis, RE is preferable. In contrast, if the null holds, OLS is more appropriate. Under the null hypothesis, RE reduces to pooled OLS regression. Thus, pooled OLS is appropriate but in practice, this test almost always rejects the null hypothesis (Verbeek, 2000, p.325).
The Hausman Test :
This is the test to justify whether the unobserved effect ( observed variables (
X ijt
) are uncorrelated across time. To run Hausman test, it is
important that the observations ( time and individuals (
uijt
aij ) and the
X ijt
) are independent of the idiosyncratic error across
) because both estimators, FE and RE, are inconsistent if this
assumption is not satisfied. The null hypothesis of the Hausman test is that the difference of coefficients between these two models, FE and RE, is not significant. FE is consistent when
aij and X ijt are correlated but RE is not. Therefore, a large difference between these
two models indicates that
aij and X ijt are correlated which results in an inconsistent
estimate from RE. In this case, we reject the null hypothesis and we prefer FE since it produces consistent estimates. The advantage of FE is that it always produces consistent estimates irrespective of the question whether the unobserved variables are correlated to observed variables. On the other hand, if we can not reject the null, it means that the difference of the
coefficients of FE and RE is not significant. If the assumption that
aij and X ijt are
uncorrelated holds under the null, then RE estimator is more preferable because it is consistent and more efficient than the FE estimator.
References
Egger, P. (2002). An econometric view of the estimation of gravity models and the calculation of trade potential. The World Economy, 25, pp.297-312. Gujarati, D.N. (2003). Basic Econometrics. New York ; London: McGraw-Hill. Verbeek, M. (2000). A Guide to Modern Econometrics. Chi Chester; Wiley. Wooldridge, J. M. (2001). Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass.; London: MIT Press.