GLS and SUR

Viewer
Transcript

ECON 240B, Spring 2011

Alex Rothenberg

GLS and SUR Let’s revisit the assumptions of the classical normal linear regression model (CNLRM): 1. Linear Expectation: E[ y ] = Xβ, where β is an unknown (K × 1) vector. 2. Scalar Variance-Covariance Matrix : V[ y ] = E[(y − E[y])(y − E[y])0 ] = σ 2 IN where σ 2 > 0. 3. Non-stochastic Regressors: The (N × K) matrix X is non-random. 4. Invertibility: rank(X) = K. 5. Normality: y is normally distributed. So far, we’ve dropped several assumptions of the CNLRM. In 240A, we dropped the normality assumption, and in doing so, we began to develop an understanding of asymptotic approximations. In the first few weeks of class, we dropped the linear expectation assumption when we started learning about non-parametric estimators. Moreover, when working with time series models, we’ve seen how to relax the independence assumptions typically required for consistent estimation. In this set of notes, we go after another assumption: the scalar variance-covariance matrix.

1

Generalized Linear Regression Model (GLRM)

1.1

Assumptions

The second assumption in the CLNRM actually had two nested implications: 1. Homoskedasticity: Each observation i has the same variance, i.e. Var(yi ) = Var(εi ) = σ 2 for all i. 2. No cross-observation dependence: Individual observations are uncorrelated, i.e. Cov(yi , yj ) = Cov(εi , εj ) = 0

∀ i 6= j. In a time series context, as we discussed last week, this

assumption would mean no serial correlation. b was the This spherical error assumption was critical for showing that our least squares estimator, β, most efficient estimator for β within the class of all other linear, unbiased estimators (the Gauss-Markov theorem). Looking back over the proof of the Gauss-Markov theorem, you will see that assumption 2 is critical. In the generalized linear regression model, we now assume that the variance covariance matrix of the error term has a general structure, Σ, where Σ is any symmetric, positive-definite, and non-singular matrix. Definition 1.1: Let y be (N × 1) and X an (N × K) matrix. The generalized linear regression model (GLRM) is governed by the following assumptions: 1. Linear Expectation: E[ y ] = Xβ, where β is an unknown vector of dimension K. 1

ECON 240B, Spring 2011

Alex Rothenberg

2. Non-Scalar Variance-Covariance Matrix: V[ y ] = Σ = σ 2 Ω where σ 2 > 0 is a constant of proportionality and Ω is assumed to be symmetric, positive-definite, and non-singular matrix. 3. Non-stochastic Regressors: The (N × K) matrix X is non-random 4. Invertibility: rank(X) = K

1.2

Why study the GLRM?

Studying the GLRM, the optimality of the least squares estimator, and properties of other estimators that might outperform least squares in the GLRM is important because many classical topics fit into the GLRM framework. These other topics will make up the bulk of the work we do this quarter, and include: • Seemingly Unrelated Regression (SUR) models (covered in this section). • Heteroskedasticity (covered in section 5). • Serial correlation (covered in section 6). • Panel Data (covered in section 7). Here, we present the general theory for handling all of these topics, and in future sections we will discuss specific issues when applying GLS to certain situations.

2

Properties of LS in the GLRM

2.1

Finite-Sample Properties of LS in GLRM

How does least squares perform in the GLRM? What are its properties? 1. Does least squares provide unbiased estimates? Yes, βbOLS is unbiased because i h E βbOLS − β = E (X0 X)−1 X0 ε = (X0 X)−1 X0 E[ε] = 0 Hence, non-spherical errors do not cause first moment problems. We would expect, then, that all of the problems come from the second moments. 2. What is the variance-covariance matrix of the least squares estimator? From the usual rules for working with variance-covariance matrices, we have: 0 V(βbOLS ) = (X0 X)−1 X0 V(y) (X0 X)−1 X0 = σ 2 (X0 X)−1 X0 ΩX(X0 X)−1 2

ECON 240B, Spring 2011

Alex Rothenberg

This matrix is nicely called the “sandwich” matrix and is definitely different from σ 2 (X0 X)−1 , the variance covariance matrix in the LRM. 3. Is it Efficient? No is not. To prove this we will find another estimator (the GLS estimator, which we explore in detail later in these notes), and we will prove that this estimator has a smaller variance in the GLRM context.

2.2

Asymptotic Properties of LS in GLRM

Is least squares a consistent estimator in the GLRM? Does it have a sensible limiting distribution? As always, we can write: βb − β =

N 1 X xi x0i N i=1

!−1

N 1 X x i εi N

!

i=1

This is our starting point any exercise in asymptotic approximations. Since we didn’t change anything about the nature of our regressors, X, we can use a weak law of large numbers and a continuity theorem argument to show: N 1 X xi x0i N

!−1

−1 p −→ E xi x0i ≡ D−1

i=1

However, the error term now has a new variance-covariance matrix, so we will have to do some extra things to work out the limiting behavior of the numerator term. We want to make use of a weak law of large numbers and a central limit theorem, but the WLLN and LLCLT both required i.i.d. data. Assuming V(y) = σ 2 Ω violates this, in general. So, how do we proceed? Without assuming specific forms for Ω, it is hard to say anything in general. The problem is that because of the general, positive definite variance-covariance matrix for y, we can’t impose restrictions on the variance of the sample average of xi εi . Instead of trying to work out a general result, we’ll revisit this under special cases (e.g. heteroskedasticity and autocorrelation). For now, we just assume that we have enough conditions to make use of a law of large numbers and a central limit theorem, so that we can derive the following results:

√ N

N 1 X p xi εi −→ E[xi εi ] = 0 N i=1 ! N 1 X d xi εi − 0 −→ N (0, C) N i=1

where we define C as: C = plim n→∞

1 0 V Xε N

3

ECON 240B, Spring 2011

Alex Rothenberg 1 2 0 σ X ΩX N

= plim n→∞

1. Consistency Combining the results above using the Slutsky theorem, it is easy to show: p βb − β −→ 0

2. Limiting Distribution From above, we know that: √ N

N 1 X x i εi − 0 N

! d

−→ N (0, C)

i=1

So, by Slutsky, √

2.3

N (βb − β) =

1 0 XX N

−1

√

N

1 0 Xε N

d

−→ N (0, D−1 CD−1 )

Robust Inference with LS in GLRM

Note that if we want to stick with β[ OLS , we will need to modify our standard errors. Our usual estimator of the variance-covariance matrix is not consistent for the variance-covariance matrix in the GLRM: s 2 X0 X

−1

p −→ C

So, we will need to work with a more “robust” estimator of the variance covariance matrix. We will return to this subject when we discuss heteroskedasticity, specifically by discussing the White (1980) robust variance-covariance matrix estimator.

3

Estimation in the GLRM: Generalized Least Squares

Another possibility is to look for a different estimator that it is actually efficient in the GLRM. We explore this here. To find a better estimator, we first transform the model by multiplying y = Xβ + ε through by Ω−1/2 ,

and we know the matrix square root exists because Ω is nonsingular. We then confirm that this

model satisfies the classical linear regression assumptions, so that we can apply the Gauss-Markov Theorem.

4

ECON 240B, Spring 2011

3.1

Alex Rothenberg

Transforming the Model

Recall that we had said that Ω was assumed to be positive definite, so its inverse is well defined. Moreover, we can define a variety of “matrix square roots” of this inverse: Ω−1 = Ω−1/20 Ω−1/2 This decomposition is not unique, and we can choose Ω−1/2 so that it is symmetric. To transform the model, we simply premultiply by Ω−1/2 : y∗ ≡ Ω−1/2 y X∗ ≡ Ω−1/2 X ε∗ ≡ Ω−1/2 ε So, the transformed model can be written as: y ∗ = X ∗ β + ε∗ Now, consider the properties of this transformed regression model: 1. Linear Expectation h i E [ε∗ ] = E Ω−1/2 ε = Ω−1/2 E [ε] =0 2. Scalar Variance Covaraince Matrix h

V Ω

−1/2

i

0

ε = Ω−1/2 V [ε] Ω−1/2 h i 0 0 = Ω−1/2 σ 2 Ω1/2 Ω1/2 Ω−1/2 = σ 2 IN

3. Non Stochastic Regressors: If X is non stochastic, then X∗ is non stochastic 4. Full rank regressors rank(X∗ ) = rank(X) = k. (See Ruud, p.855 for a proof) Thus, transforming the model lead us back to a world we know and love, and more importantly, one where we already know how to define the most efficient linear unbiased estimator.

5

ECON 240B, Spring 2011

3.2

Alex Rothenberg

The GLS Estimator

Given that that the transformed model lead us to a transformed version of the LRM, it is natural to define the generalized least squares estimator as just the least squares estimator of the transformed regression model: −1 ∗0 ∗ βbGLS = X∗0 X∗ X y −1 = (Ω−1/2 X)0 (Ω−1/2 X) (Ω−1/2 X)0 (Ω−1/2 y) 0

0

= (X0 Ω−1/2 Ω−1/2 X)−1 X0 Ω−1/2 Ω−1/2 y = (X0 Ω−1 X)−1 X0 Ω−1 y Note that this estimator does not depend upon the constant of proportionality, σ 2 .

3.3

Finite-Sample Properties of GLS

• Unbiased h

i b E βGLS = E (X0 Ω−1 X)−1 X0 Ω−1 y −1 0 −1 = X0 Ω−1 X X Ω E [y] −1 0 −1 = X0 Ω−1 X X Ω Xβ = β • Variance-covariance matrix h i V βbGLS = V (X0 Ω−1 X)−1 X0 Ω−1 y 0 = (X0 Ω−1 X)−1 X0 Ω−1 V [y] (X0 Ω−1 X)−1 X0 Ω−1 = (X0 Ω−1 X)−1 X0 Ω−1 σ 2 Ω Ω−1 X(X0 Ω−1 X)−1 −1 0 −1 −1 = σ 2 X0 Ω−1 X X Ω X X0 Ω−1 X −1 = σ 2 X0 Ω−1 X

3.4

GLS is BLUE

We will show that in the GLRM the GLS estimator is the most efficient estimator among all linear unbiased estimators. This is really just a review of the original proof of the Gauss-Markov theorem. Let β˜ be some linear, unbiased estimator of β. Assuming this, we know that β˜ can be written as β˜ = Ay for some K × N non-stochastic matrix A. Since it is an unbiased estimator, we have the following: h i E β˜ = β ⇐⇒ E [Ay] = β ⇐⇒ AE[y] = β 6

ECON 240B, Spring 2011

Alex Rothenberg

⇐⇒ AXβ = β ⇐⇒ AX = I and X0 A0 = I0 = I Now, note that by an add and subtract trick, we have: h i h i V βe = V βbGLS + βe − βbGLS h i h i h i h i0 = V βbGLS + V βe − βbGLS + C βbGLS , (βe − βbGLS ) + C βbGLS , (βe − βbGLS ) Now, note that a covariance term is given by: h i h i h i C βbGLS , (βe − βbGLS ) = C βbGLS , βe − V βbGLS So,h if we could i show h thati the covariance term (and its transpose) went to zero, so that C βbGLS , βe = V βbGLS , we would have shown that the variance of βe is equal to the variance of βbGLS plus a positive definite matrix, so it is larger in a positive definite sense, and we’d be done. Working out the covariance between βb and βe directly, we have: h i h i −1 0 −1 C βbGLS , βe = C X0 Ω−1 X X Ω y, Ay −1 0 −1 = X0 Ω−1 X X Ω V [ y ] A0 −1 0 −1 = σ 2 X0 Ω−1 X X Ω ΩA0 −1 0 0 = σ 2 X0 Ω−1 X XA −1 = σ 2 X0 Ω−1 X (AX)0 h i −1 = V βbGLS = σ 2 X0 Ω−1 X h

i h i b e b So, we’ve shown C βGLS , β = V βGLS . Hence, we have the following: h i h i h i h i V βe = V βbGLS + V βe − βbGLS ≥ V βbGLS where I’m abusing notation with the ≥ sign. What this says is that the variance of βe is equal to the variance of βbGLS plus a positive definite matrix, so it is larger. Since we can do this for an arbitrary e this implies that βbGLS has the smallest variance among all linear unbiased estimators. choice of β,

3.5

FGLS: Unknown Ω

Until now we have assumed that we know the structure of Ω to estimate GLS. What happens if this is not true? We have two alternatives: 1. Stick with OLS, but fix the standard errors. Use the robust estimator of the variance covariance

7

ECON 240B, Spring 2011

Alex Rothenberg

matrix to get the right inferences.1 2. Use Feasible GLS (FGLS). We assume a structure of Ω in terms of a set of parameters θ with the hope that we can get a consistent estimator of it to have a consistent estimator of the population parameters. We first parameterize Ω = Ω(θ). We can use the LS residuals to obtain the estimates θb such that p b −→ b ≡ Ω(θ) θb −→ θ and thus estimate Ω Ω(θ). With this estimation compute: p

−1 b −1 X b −1 y βbF GLS = X0 Ω X0 Ω b are consistent This estimator is asymptotically equivalent to βbGLS in the case that θb and thus Ω estimators. We will discuss an example of Feasable GLS when we discuss the SUR model.

4

SUR Models

Notationally, it will be convenient to express our SUR estimator using Kronecker product notation. We introduce this here: Definition 4.1: Let A be a (M × N ) matrix defined as A = [aik ]. Let B be a (P × Q) arbitrary matrix. The Kronecker product of A and B is the (M P × N Q) matrix given by: 

a1N B



  a21 B a22 B . . . a2N B A ⊗ B = [aij B] =  .. .. .. ..  . . . .  aM 1 B aM 2 B . . . aM N B

    

a11 B

a12 B

...

Some useful properties of the Kronecker product are: 1. Transposes satisfy a distributive law : (A ⊗ B)0 = A0 ⊗ B0 2. The mixed product property: (A ⊗ B)(C ⊗ D) = AC ⊗ BD 3. Inverses, if they exist, satisfy a distributive law : (A ⊗ B) is invertible if and only if A and B are both invertible. Suppose A is a (N × N ) matrix, and B is a (M × M ) matrix. It should be easy to see from the mixed product property that: IN M = IN ⊗ IM = AA−1 ⊗ BB−1 = (A ⊗ B) A−1 ⊗ B−1 =⇒ (A ⊗ B)−1 = A−1 ⊗ B−1 1

We will discuss this in more detail when we get to heteroskedasticity.

8

ECON 240B, Spring 2011

4.1

Alex Rothenberg

General Description

Seemingly Unrelated Regression (SUR) is least squares estimation on a system of equations. Here, we follow Powell’s notation (and Goldberger’s) and first stack each individual observations, indexed by i, on top of each other for a given equation, indexed by j, and then we stack each of these equations on top of each other.2 The system of equations thus contains at least two distinct dependent variables, and we assume that all individuals in our sample are observed in each equation. The key thing in SUR models is that they allow for individual errors across equations to be correlated.

4.2

An Example

Katz, Kling, and Liebman (2007, Econometrica) study the effects of the Moving to Opportunity (MTO) program.3 This program randomly assigned housing vouchers to families who were eligible for public housing in the cities of Baltimore, Boston, Chicago, Los Angeles, and New York. By comparing people who received housing vouchers and moved away from public housing with people who did not receive vouchers and stayed behind, the authors provide experimental estimates of “neighborhood effects.” These experimental estimates are an advance over most work on neighborhoods, which suffers from pernicious identification problems due to the absence of random assignment of individuals into locations. Their empirical model is described as follows. Let i = 1, ..., N index individuals, and let Y1 , Y2 , ..., YJ denote a set of J outcome variables. In their study, outcome variables include a measure of psychological distress, a measure of anxiety, an indicator for depression, indicators for drug and alcohol use, indicators for whether or not the respondent was working, on welfare, in school, etc. The point is that they have a bunch of dependent variables, and they want to estimate the impact of the moving to opportunity program on all of these dependent variables. Let Zi be an indicator variable, equal to 1 if individual i received a housing voucher and 0 otherwise. The authors estimate intent to treat effects with the following model: Yij = Wij0 δj + πj Zi + ij

(1)

The parameter πj captures the difference between the treatment group and control group mean of outcome variable Yj conditional on covariates Wij .

4.3

Estimation with OLS

One could estimate the parameters in (1) just by doing OLS, equation by equation. Collect Wij and Zi in a vector: Xij = [Wij0 , Zi ]0 , and let βj = [δj0 , πj ]0 . Then, for each individual and equation, our model is: Yij = Xij0 βj + ij i = 1, ..., N j = 1, ..., J 2

Ruud, Chapter 30, presents SUR estimation slightly differently. The authors actually don’t report any SUR estimation in their main findings. But they discuss SUR estimation in the appendix to their Econometrica paper. 3

9

ECON 240B, Spring 2011

Alex Rothenberg

Stacking observations i for a given equation j, we have: 



y1j



0 β X1j j

  0 β   X2j j = ..   .   0 XN j βj

  y2j  .  .  . yN j (N ×1)

1j



    2j + .   .   . N j

    





(N ×1)

(N ×K)(K×1)

or yj = xj βj + j So, our OLS estimates for equation j are given by: βˆj = (x0j xj )−1 x0j yj All OLS estimates for all equations, βˆ1 , βˆ2 , ..., βˆJ , can be obtained immediately estimating by one giant OLS regression on a stacked dataset: 

y1





x1

0

...

0



β1





1



    

y2 .. .

    =    

0 .. .

x2 . . . .. . . . .

0 .. .

    

β2 .. .

    +    

2 .. .

    

0

0

yJ (N J×1)

. . . xJ

(N J×KJ)

βJ

J

(KJ×1)

(N J×1)

or Y = Xβ + Some notes on this setup: • Note that here, we’re explicitly allowing for a different set of regressors in each equation. For instance, we might want to control for local unemployment rates when estimating whether or not the program had an effect on working, but we might not want to control for that when estimating whether or not the program affected the incidence of depression or anxiety. • Also, if we run the giant stacked regression system and impose homoskedasticity on the error structure across all equations, we won’t get the same standard errors that we would get if we estimated the equations separately. To get the same standard errors from equation by equation least squares, we’ll need to use robust standard errors when estimating the entire system.

10

ECON 240B, Spring 2011

4.4

Alex Rothenberg

Problem with OLS

What’s wrong with the OLS approach? To see the problem, consider the intent to treat model, again, but focus on only two dependent variables: 0 Yi1 = Wi1 δ1 + π1 Zi + i1 0 Yi2 = Wi2 δ2 + π2 Zi + i2

In this model, i1 and i2 might represent unobservables, such as motivation, willingness to overcome addictive behavior, or “ability” which affect our dependent variables but which we can’t control for. Since we observe each individual’s realization of multiple outcome variables, such as drug addiction, schooling, etc., it might be sensible to assume that these unobserved components might be correlated across outcome variables for a given individual. By construction, Zi ⊥ ij , so our parameters are identified and OLS will produce unbiased, consistent estimates. But, if i1 is correlated with i2 (e.g. both the probability of working and the incidence of depression are affected by some “ability” unobservable), OLS is not the most efficient estimator. It doesn’t take this cross-equation correlation information into account.

4.5

Estimation with SUR

The SUR model allows for a very specific type of cross-equation correlation in the ij . The assumptions are: 1. E[yj ] = xj βj . 2. V[yj ] = σjj IN . 3. C(yj , yk ) = E[(yj − xj βj )(yk − xk βk )0 ] = E[j 0k ] = σjk IN . In other words, 

σjk

  0 C(yj , yk ) = C(j , k ) =   ..  . 0

...

0



σjk . . . .. .. . .

0 .. .

   = σjk IN  

0

0

. . . σjk

So that we have: (a) Cov(yij , yik ) = σjk for all i (b) Cov(yij , yi0 k ) = 0 for i 6= i0 4. xj are non-stochastic and full rank with probability 1. Assumptions 1, 2, and 4 have the same interpretation as the classical regression model. Assumption 3 is new. It says that the errors are correlated only across equations for a given individual (and that the

11

ECON 240B, Spring 2011

Alex Rothenberg

cross-equation correlation is the same for all individuals). The errors for different individuals, both within and across equations, are not correlated. Working with the giant stacked equation representation, we have: Y = Xβ + To write V[Y] nicely, we will use the Kronecker product representation. Assumptions 2 and 3 in the behavior of the individual equations imply: σ11 IN

σ12 IN

. . . σ1J IN



  σ21 IN V[Y] =  ..  .  σJ1 IN

σ22 IN .. .

. . . σ2J IN .. .. . .

    

σJ2 IN

. . . σJJ IN



= Σ ⊗ IN Substituting this variance into βGLS thus yields: βˆSU R = X0 (Σ ⊗ IN

4.6

−1

X)−1 X0 (Σ ⊗ IN )−1 y

Comparing OLS and GLS

The conditional variances of each estimator are: 0 V(βˆOLS ) = (X0 X)−1 X0 V(y) (X0 X)−1 X0 = (X0 X)−1 X0 (Σ ⊗ IN )X(X0 X)−1 0 V(βˆGLS ) = (X0 (Σ ⊗ IN )−1 X)−1 X0 (Σ ⊗ IN )−1 V(y) (X0 (Σ ⊗ IN )−1 X)−1 X0 (Σ ⊗ IN )−1 = (X0 (Σ ⊗ IN )−1 X)−1 X0 (Σ ⊗ IN )−1 (Σ ⊗ IN )(Σ ⊗ IN )−1 X(X0 (Σ ⊗ IN )−1 X)−1 = (X0 (Σ ⊗ IN )−1 X)−1 You should be able to follow the general GLS efficiency proof for the SUR special case.

4.7

When do SUR and GLS estimates coincide?

Powell derives in his lectures notes (and Ruud discusses this in his textbook) two distinct cases in which GLS in the SUR model is equivalent to estimating each dependent variable separately with OLS: 1. The equations are completely unrelated (not just seemingly): Σ is diagonal because σjk = 0 for j 6= k. (See Exercise 5.3 on this, below.) 2. Each equation has the same explanatory variables: Xj = X0 for each j.

12

ECON 240B, Spring 2011

Alex Rothenberg

To see the last claim, note that we can write the SUR estimator as: −1 −1 0 βˆSU R = X0 (Σ ⊗ IN X) X (Σ ⊗ IN )−1 y −1 0 = Z0 X Zy where we define Z = (Σ ⊗ IN )−1 X. Now, note the following: Z = (Σ ⊗ IN )−1 X 

 X0 0   X0 −1  0 = (Σ ⊗ IN )  . ..  .. .  0 0

.. . .. .

 0   0   ..  .. . .   .. . X0

= (Σ ⊗ IN )−1 (IJ ⊗ X0 ) = Σ−1 ⊗ IN (IJ ⊗ X0 ) = Σ−1 ⊗ X0 = (IJ ⊗ X0 ) Σ−1 ⊗ IK = X (Σ ⊗ IK )−1 where the third, fourth, and fifth lines use properties of the Kronecker product. Given this representation for Z, it is now easy to show: −1 0 βbSU R = Z0 X Zy −1 = (Σ ⊗ IK )−1 X0 X (Σ ⊗ IK )−1 X0 y −1 = X0 X (Σ ⊗ IK ) (Σ ⊗ IK )−1 X0 y −1 0 = X0 X X y = βbOLS To get some intuition for this result, consider the 2 dependent variable case. Prof. Powell notes that the in this case, the SUR estimators, βbS,1 and βbS,2 have a nice relationship to their corresponding OLS estimators:

−1 0 σ21 X01 X1 X1 eS,2 σ22 −1 0 σ12 b = βLS,2 − X02 X2 X2 eS,1 σ11

βbS,1 = βbLS,1 − βbS,2

where eS,1 and eS,2 are the SUR residuals (i.e. eS,1 = y1 − X1 βbS,1 ). Essentially, you can think of SUR as an adjustment to OLS, where the adjustment involves a residual regression. For instance, the adjustment for equation 1 involves regressing residuals from equation 2 equation on X1 .

13

ECON 240B, Spring 2011

Alex Rothenberg

In a single equation, ordinary least squares involves an orthogonal projection of y. This projection decomposes y into a component that lives in the space spanned by the columns of X and a component that lives in the orthogonal complement of that space. To predict y given X, we don’t use any information in the residuals; by construction, e is orthogonal to X. However, when we do SUR, if we have cross-equation correlations in the errors, then there is information in the other equation’s residuals that can be used to predict a given y. The formulas above tell us exactly how to use that information to obtain a more efficient estimator. But, when the X’s are all the same, so are all of the residuals, and there isn’t any information in the other equations’ residuals that can be used to adjust least squares estimates for a given dependent variable. This is why least squares and SUR coincide when we use the same dependent variables in each equation.

4.8

Feasible GLS estimation

Note that by writing V(y) = (Σ ⊗ IN ), we’ve parametrized the variance covariance matrix (i.e. Ω = Ω(θ), where θ is a parameter vector). In our case, we have a total of J variances and J(J − 1)/2 cross-equation covariances, so in this notation, θ would be a ((J + J(J − 1)/2) × 1) vector of parameters. Because we’ve successfully parameterized the variance-covariance matrix, we can use the feasible p GLS approach. We just need to find some θb −→ θ, and plug it in to Ω(θ). A natural routine for feasibly estimating the SUR system is the following: 1. Estimate the system using OLS, equation by equation. This gives us a set of J residual vectors: ej = yj − xj βˆj j = 1, ..., J 2. Estimate the cross-equation covariances as follows: σc jk =

N 1 0 1 X ej ek = eij eik for all j, k N N i=1

3. Form the feasible GLS estimator, βbF GLS , as follows: b ⊗ IN )−1 X)−1 X0 (Σ b ⊗ IN )−1 y βbF GLS = (X0 (Σ where 

σc 11

σc 12

... σ d 1J

 d  σc σc 22 . . . σ 2J b =  .21 Σ .. .. ..  . . . .  . σ d d d J1 σ J2 . . . σ JJ

14

   .  

ECON 240B, Spring 2011

5 5.1

Alex Rothenberg

Exercises 2004 Exam, Question 1A

Question: True/False/Explain. If the Generalized Regression models holds – that is, E[ y | X ] = Xβ, V[ y | X ] = σ 2 Ω, and X is full rank with probability one – then the covariance matrix between Aitken’s Generalized LS estimator of βbGLS (with known Ω matrix) and the classical LS estimator βbLS is equal to the variance matrix of the LS estimator. Answer: False.

C( βbGLS , βbLS | X) = C((X0 Ω−1 X)−1 X0 Ω−1 y, (X0 X)−1 X0 y | X) = ((X0 Ω−1 X)−1 X0 Ω−1 )C( y, y | X)((X0 X)−1 X0 )0 = ((X0 Ω−1 X)−1 X0 Ω−1 )(σ 2 Ω)X(X0 X)−1 = σ 2 (X0 Ω−1 X)−1 X0 Ω−1 ΩX(X0 X)−1 = σ 2 (X0 Ω−1 X)−1 X0 X(X0 X)−1 = σ 2 (X0 Ω−1 X)−1 h i = V βbGLS | X The correct statement would be that the covariance of the GLS and the LS estimators is equal to the variance of the *GLS* estimator.

5.2

SUR (From Goldberger 30.1)

Question: True or False? In the SUR model with two equations, if the explanatory variables in the two equations are identical, then the LS residuals from the two equations are uncorrelated with each other. Answer: The statement is false unless σ12 = 0, thereby making the equations urelated.

Let

y1 y2

! =

X1

0

0

X2

!

β1 β2

!

ε1

+

ε2

! where

V[ y | X ] =

σ11 I σ12 I

!

σ21 I σ22 I

Suppose X1 = X2 = X. Then using OLS, βb1 = (X01 X1 )−1 X01 y1 = (X0 X)−1 X0 y1 and βb2 = (X02 X2 )−1 X02 y2 = (X0 X)−1 X0 y2 . The residual vector from the first equation is: e1 = y1 − X1 βb1 = Iy1 − X(X0 X)−1 X0 y1 = (I − PX )y1

15

ECON 240B, Spring 2011

Alex Rothenberg

where PX = X(X0 X)−1 X0 is a projection matrix so (I − PX ) is a projection matrix. Similarly for the second equation, e2 = y2 − X2 βb2 = Iy2 − X(X0 X)−1 X0 y2 = (I − PX )y2 .

C(e1 , e2 | X) = C((I − PX )y1 , (I − PX )y2 ) | X) = (I − PX )C(y1 , y2 |X)(I − PX )0 = (I − PX )σ12 I(I − PX ) = σ12 (I − PX )(I − PX ) = σ12 (I − PX ) 6= 0

5.3

SUR II (From Goldberger 30.2)

Question: True or False? 1. In the SUR Model, if the explanatory variables in the two equations are orthogonal to each other, then the LS coefficient estimates for the two equations are uncorrelated with each other. 2. The GLS estimate reduces to the LS estimate. Answer: The first statement is true, the second statement is false.

y1

1. Let

y2

! =

X1

0

0

X2

!

β1 β2

!

ε1

+

! where

ε2

V ar(y|X) =

σ11 I σ12 I

!

σ21 I σ22 I

Using OLS, βb1 = (X01 X1 )−1 X01 y1 and βb2 = (X02 X2 )−1 X02 y2 . If the explanatory variables in the two equations are orthogonal to each other, then X01 X2 = 0.

C(βb1 , βb2 |X) = ((X01 X1 )−1 X01 )C(y1 , y2 |X)((X02 X2 )−1 X02 )0 = (X01 X1 )−1 X01 σ12 I(X2 (X02 X2 )−1 ) = σ12 (X01 X1 )−1 X01 X2 (X02 X2 )−1 = σ12 (X01 X1 )−1 (0)(X02 X2 )−1 = 0

Thus, it is true that the covariance of OLS estimators βb1 and βb2 is zero. 2. (Note Professor Powell added this part to Goldberger 30.2 in the 2003 exam.)  βbGLS = 

=

X1

0

0

X2

!0

σ11 I σ12 I σ21 I σ22 I !−1

σ22 X01 X1

0

0

σ11 X02 X2

!−1

X1

0

0

X2

!−1 

σ22 X01 y1 − σ12 X01 y2 σ11 X02 y2 − σ21 X02 y1 16

!

X1

0

0

X2

!0

σ11 I σ12 I σ21 I σ22 I

!−1

y1 y2

!

ECON 240B, Spring 2011

= = 6=

Alex Rothenberg

1 0 −1 σ22 (X1 X1 )

!

0

σ22 X01 y1 − σ12 X01 y2

!

σ11 X02 y2 − σ21 X02 y1 ! 12 (X01 X1 )−1 X01 y1 − σσ22 (X01 X1 )−1 X01 y2 1 0 −1 σ11 (X2 X2 )

0

21 (X02 X2 )−1 X02 y2 − σσ11 (X02 X2 )−1 X02 y1 ! (X01 X1 )−1 X01 y1

(X02 X2 )−1 X02 y2

= βbOLS Thus, βbGLS does not reduce to βbOLS in this case.

5.4

GLS Vs LS (From Goldberger 30.3)

Question: Suppose that E(y1 ) = X1 β1 , E(y2 ) = X2 β2 , V (y1 ) = 4I, V (y2 ) = 5I, and C(y1 , y2 ) = 2I. Here y1 , y2 , X1 , and X2 are n × 1, with x01 X1 = 5, x02 X2 = 6, x01 X2 = 3. Calculate the variances of the OLS and GLS estimators. Answer: Let

y1 y2

! =

!

X1

0

0

X2

β1

!

ε1

+

β2

! where

ε2

V ar(y|X) = (Σ ⊗ IN ) =

4I 2I

!

2I 5I

OLS Variance Recall that V ar(βOLS |X) = V ar((X0 X)−1 X0 y|X) = (X0 X)−1 X 0 (Σ ⊗ IN )X(X0 X)−1 :

(X0 X)−1 =

= X 0 (Σ ⊗ IN )X = = (X0 X)−1 X 0 ΣX(X0 X)−1 = =

X1 0 5 0

!0

0 X2 !−1

=

0 6 X1

0

0

X2

0

1/6

!0

X1

0

0

X2

1/5

0

1/6 ! 4I 2I

!!−1 =

X01 X1

0

0

X02 X2

!

0

X1

0

!

= 2I 5I 0 X2 ! ! 4X01 X1 2X01 X2 20 6 = 2X02 X1 5X02 X2 6 30 ! ! ! 1/5 0 20 6 1/5 0 4/5 1/5

!−1

6

30

!

1/5 5/6

17

0

1/6

4X01 2X01 2X02 5X02

!

X1

0

0

X2

!

ECON 240B, Spring 2011

Alex Rothenberg

GLS Variance Recall that V ar(βˆGLS |X) = (X 0 (Σ ⊗ IN )−1 X)−1 :

4I 2I

(Σ ⊗ IN )−1 =

!−1

2I 5I "

(X 0 (Σ ⊗ IN )−1 X)−1 =

!0

5I

−2I

−2I

4I

5I

−2I

−2I

4I !!−1

X1

0

0

X2

1 16

5X01 X1

−2X01 X2

−2X02 X1

4X02 X2

1 16

=

1 = 16

!

!

X1

0

0

X2

1 16

=

!#−1

25

−6

−6

24

!!−1 =

32 47 8 47

8 47 100 141

!

Note that the difference between the OLS and GLS variances is positive definite, which is what we expect in this case since GLS is more efficient.

5.5

Relative Efficiency of GLS to OLS

Question: True/False/Explain. βbGLS is efficient relative to βbOLS in the generalized regression model. Answer: True. We expect this statement to be true because both are linear unbiased estimators of β and the case in which βbOLS is the most efficient estimator is a special case of the generalized regression model. βbOLS is as efficient as βbGLS in this special case of Σ = σ 2 IN but is less efficient for all other nonsingular, positive definite, symmetric Σ. As usual we prove this claim by showing that V ar(βbOLS ) − V ar(βbGLS ) is positive semi-definite. V ar(βbOLS |X) = V ar((X0 X)−1 X0 y|X) = ((X0 X)−1 X 0 )V ar(y|X)((X0 X)−1 X 0 )0 = σ 2 (X0 X)−1 X 0 ΩX(X0 X)−1

This question reduces to showing that σ 2 (X0 X)−1 X 0 ΩX(X0 X)−1 − σ 2 (X 0 Ω−1 X)−1 is positive semi-definite. σ 2 does not affect the positive semi-definiteness of this difference because it is postive. Accordingly, we use Amemiya (p. 461) and check the positive semi-definiteness of:

(X 0 Ω−1 X) − ((X0 X)−1 (X 0 ΩX)(X0 X)−1 )−1 = (X 0 Ω−1 X) − (X0 X)(X 0 ΩX)−1 (X0 X) 0

0

0

0

= (X 0 Ω−1/2 Ω−1/2 X) − (X 0 Ω−1/2 Ω1/2 X)(X 0 Ω1/2 Ω1/2 X)−1 (X 0 Ω1/2 Ω−1/2 X) 0

0

0

= X 0 Ω−1/2 (I − Ω1/2 X(X 0 Ω1/2 Ω1/2 X)−1 X 0 Ω1/2 )Ω−1/2 X 18

ECON 240B, Spring 2011

Alex Rothenberg

0

= X 0 Ω−1/2 (I − PΩ1/2 X )Ω−1/2 X

This matrix is positive semi-definite by the same argument given in section 3.4

5.6

2004 Exam, 2

Question: A feasible GLS fit of the generalized regression model with K = 3 regressors yields the estimates βb = (2, −1, 2) where the GLS covariance matrix V = σ 2 [X 0 Ω−1 X]−1 is estimated as 

2 1 0



  Vˆ =  1 1 0  0 0 1 using consistent estimators of σ 2 and Ω. The sample size N = 403 is large enough so that it is reasonable to assume a normal approximation holds for the GLS estimator. Use these results to test the null hypothesis H0 : θ = 1 against a two-sided alternative asymptotic 5% level, where 1

θ = g(β) = ||β|| = (β12 + β22 + β32 ) 2

Answer: We reject the null hypothesis by using the delta method to construct an approximate t-statistic. Recall that p Vˆ −→ V .

√

d N (βbGLS − β) −→ N (0, V ) where V = σ 2 (X 0 Ω−1 X)−1 . We are given a Vˆ such that

ˆ which we analyze by the Delta Method: We are interested in the limiting distribution of θˆ = g(β), √ d 0 N (θˆ − θ) −→ N (0, GV G ) where

G=

∂g(β) ∂β 0 1

∂(β12 + β22 + β32 ) 2 = ∂β 0 1 = 1 (β1 , β2 , β3 ) (β12 + β22 + β32 ) 2 1 = (β1 , β2 , β3 ) g(β) Therefore an approximate test statistic is

ˆ A √ θ−θ ∼ GV G0

N (0, 1).

p ˆ because G ˆ −→ We estimate G with G G by the Continuous Mapping Theorem where

19

ECON 240B, Spring 2011

Alex Rothenberg

ˆ= G

1 b b b (β , β , β ) ˆ 1 2 3 g(β) 1

=

1

(22 + (−1)2 + (−2)2 ) 2 1 = (2, −1, 2) 3

(2, −1, 2)

p ˆ Vˆ G ˆ 0 −→ By Slutsky’s Theorem G GV G0 where



2 1 0





2



  1  ∗  1 1 0  ∗  −1  3 0 0 1 −2   2 1   = 3, 1, −2  −1  9 −2

ˆ Vˆ G ˆ0 = G

1

3

2, −1, −2

=1 Thus to test H0 : θ = 1 against a two-sided alternative, the absolute value of the statistic is |3 − 1| |θˆ − θ0 | p = =2 1 ˆ Vˆ G ˆ0 G which exceeds 1.96, the upper 97.5% critical value of a standard normal. Note that the sample size is absorbed in the estimation of the variance-covariance matrix.

20

GLS-Boston-2017-Hotel-Recommendations.pdf

Looking back over the proof of the Gauss-Markov theorem, you will see that ... of large numbers and a central limit theorem, but the WLLN and LLCLT both ...

Download PDF

204KB Sizes 3 Downloads 216 Views

Report

GLS-Boston-2017-Hotel-Recommendations.pdf

GLS-Boston-2017-Hotel-Recommendations.pdf

Saisie sur salaire.pdf

geomorphology and geology of san antonio del sur ...

Bicentennial Anniversary of Ilocos Sur and Kannawidan Ylocos ...

EXILIADA DEL SUR BUNKERS.pdf

[Clarinet_Institute] Klose Sur Ta Rive.pdf

Circuit Aixe sur Vienne.pdf

New letter sur l'inédie.pdf

Partitura vois sur ton chemin.pdf

quizz sur l'Ã©glise.pdf

Dissertation Sur L'etranger D'albert Camus.pdf

GLS â Uganda 2015 attendees sponsored by Men of Willow.pdf ...

Controle sur table 3_41213.pdf

PDF La maison sur mesure Read online

LISTA RATING SUR ALFABETICO.pdf

Question sur corpus exemple.pdf

Page 1 2Isr2 uila Gls, originator HienÅ¿sed eliefs Å¿istial Rais aloercisela ...

Watch Typhon sur Nagasaki (1957) Full Movie Online Free ...

Bulletin inscription Lucy-sur-Yonne 2015.pdf

Amnesty-International-Belgique-Dix-prÃ©jugÃ©s-sur-la-migration ...

Read [PDF] Sculpture sur bois : Manuel complet Full Books

GLS and SUR

GLS-Boston-2017-Hotel-Recommendations.pdf

GLS-Boston-2017-Hotel-Recommendations.pdf

Saisie sur salaire.pdf

geomorphology and geology of san antonio del sur ...

Bicentennial Anniversary of Ilocos Sur and Kannawidan Ylocos ...

EXILIADA DEL SUR BUNKERS.pdf

[Clarinet_Institute] Klose Sur Ta Rive.pdf

Circuit Aixe sur Vienne.pdf

New letter sur l'inédie.pdf

Partitura vois sur ton chemin.pdf

quizz sur l'Ã©glise.pdf

Dissertation Sur L'etranger D'albert Camus.pdf

GLS â Uganda 2015 attendees sponsored by Men of Willow.pdf ...

Controle sur table 3_41213.pdf

PDF La maison sur mesure Read online

LISTA RATING SUR ALFABETICO.pdf

Question sur corpus exemple.pdf

Page 1 2Isr2 uila Gls, originator HienÅ¿sed eliefs Å¿istial Rais aloercisela ...

Watch Typhon sur Nagasaki (1957) Full Movie Online Free ...

Bulletin inscription Lucy-sur-Yonne 2015.pdf

Amnesty-International-Belgique-Dix-prÃ©jugÃ©s-sur-la-migration ...

Read [PDF] Sculpture sur bois : Manuel complet Full Books

GLS and SUR

Recommend Documents

GLS â Uganda 2015 attendees sponsored by Men of Willow.pdf ...