Tests Based on t-Statistics for IV Regression with Weak ...

Viewer
Transcript

Tests Based on t-Statistics for IV Regression with Weak Instruments Benjamin Mills1 New York Fed

Marcelo J. Moreira FGV/EPGE

Lucas P. Vilela FGV/EPGE

December 4, 2013

1 This

paper supersedes the working paper “Optimal One-Sided Tests for Instrumental Variables Regression.” We thank Donald Andrews, James Stock, two anonymous referees, and an associate editor for comments and suggestions. We gratefully acknowledge the research support of CNPq, FAPERJ, and NSF (via grant SES-0819761).

Abstract

This paper considers tests of the parameter of an endogenous variable in an instrumental variables regression model. The focus is on one-sided conditional t-tests. Theoretical and numerical work shows that the conditional 2SLS and Fuller t-tests perform well even when instruments are weakly correlated with the endogenous variable. When the population F-statistic is as small as two, their power is reasonably close to the power envelopes for similar and non-similar tests which are invariant to rotation transformations of the instruments. This ﬁnding is surprising considering the bad performance of two-sided conditional t-tests found in Andrews, Moreira, and Stock (2007). We show these tests have bad power because the conditional null distributions of t-statistics are asymmetric when instruments are weak. Taking this asymmetry into account, we propose two-sided tests based on t-statistics. These novel tests are approximately unbiased and can perform as well as the conditional likelihood ratio (CLR) test.

Keywords: Instrumental variables regression, invariant tests, optimal tests, similar tests, unbiased tests, weak instruments. JEL Classiﬁcation Numbers: C12, C30.

1

Introduction

Instrumental variables (IVs) are commonly used to make inferences about the coefﬁcient β of an endogenous regressor in a structural equation. When instruments are strongly correlated with the regressor, the tests based on the score, likelihood ratio, and t-statistics are asymptotically equivalent. This trinity of tests provides reliable inference as long as the instruments are strong. When identiﬁcation is weak, the three approaches are no longer comparable. Kleibergen (2002) and Moreira (2002) show that a score (LM) statistic has a standard chi-square distribution regardless of the strength of the instruments. Moreira (2003) proposes a conditional likelihood ratio (CLR) test which is shown by Andrews, Moreira, and Stock (2006a) (hereinafter, AMS06a) to be nearly optimal. However, most results in the literature on the performance of tests based on the commonly used t-statistics are negative: Dufour (1997) shows that standard tests based on t-statistics can have size arbitrarily close to one; Andrews, Moreira, and Stock (2007) (hereinafter, AMS07) ﬁnd that conditional t-tests are severely biased; and Andrews and Guggenberger (2010) prove that subsampling tests based on the two-stage least squares (2SLS) t-statistic do not have correct asymptotic size. See Stock, Wright, and Yogo (2002), Dufour (2003), and Andrews and Stock (2007) for surveys on weak IVs. In this paper, we propose to use the conditional one-sided t-tests for testing the null hypothesis H0 : β = β 0 (or the augmented null H0 : β ≤ β 0 ) against the alternative H1 : β > β 0 (the adjustment for H1 : β < β 0 is straightforward). We consider t-statistics centered around the 2SLS, the limited information maximum likelihood (LIML), the bias-adjusted 2SLS (B2SLS), and Fuller’s (1977) estimators. We also introduce conditional tests based on a one-sided score (LM1) statistic, a likelihood ratio (LR1) statistic for H0 : β = β 0 , and a likelihood ratio statistic (MLR1) for H0 : β ≤ β 0 . We develop a theory of optimal tests for one-sided alternatives that parallels the two-sided results of AMS06a. We adopt the invariance condition under which inference is unchanged if the IVs are transformed by an orthogonal matrix, e.g., by changing the order in which the IVs appear. We develop the Gaussian power envelope using point-optimal invariant similar (POIS) tests. When the null hypothesis is H0 : β = β 0 , the conditional LR1 (CLR1) test is nearly optimal in the sense that its power function is numerically close to the power envelope. The LM1 test is a POIS test and does not have good power overall. For the more relevant null H0 : β ≤ β 0 , the CLR1 test does not control size but the conditional t-tests do. The conditional test based on the 2SLS estimator numerically outperforms the one based on the B2SLS estimator. The test based on the Fuller estimator dominates the LIML counterpart and the conditional MLR1 (CMLR1) test. Hence, we recommend the 1

conditional 2SLS and Fuller t-tests in empirical practice. The good performance of the one-sided conditional 2SLS and Fuller t-tests is somewhat surprising considering the bad performance of two-sided conditional ttests found in AMS07. We show that the bad performance is due to the asymmetric distribution of t-statistics under the null H0 : β = β 0 when instruments are weak. We consider two methods to improve power for two-sided tests based on t-statistics. First, we propose novel tests which are by construction approximately unbiased. Second, we modify the t-statistics so that their null distribution is nearly symmetric. Both methods yield some t-tests whose power is close to the CLR test of Moreira (2003). Hence, this paper restores the triad of tests based on score, likelihood ratio, and t-statistics with reasonably good performance even when instruments are weak for two-sided hypothesis testing. By inverting the conditional t-tests, we can obtain informative conﬁdence regions around diﬀerent estimators –including the commonly used 2SLS estimator. The foregoing results are developed under the assumption of normal reduced-form errors with known covariance matrix. The ﬁnite-sample theory is extended to nonnormal errors with unknown covariance matrix at the cost of introducing asymptotic approximations. Under weak instrumental variable (WIV) asymptotics, the exact distributional results extend in large samples to feasible versions of the proposed tests. The ﬁnite-sample Gaussian power envelopes are also the asymptotic Gaussian power envelopes with unknown covariance matrix. Under strong-IV (SIV) asymptotics, we derive consistency even when errors are non-normal and asymptotic eﬃciency (AE) when errors are normal1 . This paper is organized as follows. Section 2 introduces the model with one endogenous regressor variable, multiple exogenous regressor variables, and multiple IVs. This section determines suﬃcient statistics for this model with normal errors and reduced-form covariance matrix. Section 3 introduces one-sided invariant similar tests. Section 4 focuses on the one-sided conditional t-tests. Section 5 ﬁnds the power envelope for invariant similar one-sided tests. Section 6 adjusts the tests to allow for an estimated error covariance matrix. Section 7 obtains consistency and asymptotic eﬃciency for one-sided tests. Section 8 compares the power of the tests considered in earlier sections under WIV asympotics and provides numerical evidence that the conditional t-tests have correct size. Section 9 introduces novel unbiased two-sided t-tests. An appendix contains proofs of the results. The supplement derives the nonsimilar power envelope which is numerically very close to the similar power envelope (this fact further strengthens our optimality results); studies asymptotic properties 1

In principle, we could follow Cattaneo, Crump, and Jansson (2012) to obtain eﬃcient one-sided tests when errors are nonnormal, but we do not pursue this line of research here.

2

under weak IVs; show that the conditional t-tests are asymptotically similar in a uniform sense under general assumptions; presents power comparisons for diﬀerent one-sided and two-sided tests; and obtains conﬁdence intervals for returns to schooling using the data of Angrist and Krueger (1991).

2

Model and Suﬃcient Statistics

The model consists of a structural equation and a reduced-form equation: y1 = y2 β + Xγ 1 + u, + Xξ 1 + v2 , y2 = Zπ

(2.1)

where y1 , y2 ∈ Rn , X ∈ Rn×p , and Z ∈ Rn×k are observed variables; u, v2 ∈ Rn are unobserved errors; and β ∈ R, γ 1 , ξ 1 ∈ Rp , and π ∈ Rk are unknown parameters. The has matrices X and Z are taken to be ﬁxed (i.e., non-stochastic) and Z = [X : Z] full column rank p + k. We are interested in the one-sided hypothesis testing problem H0 : β = β 0 (or H0 : β ≤ β 0 ) against H1 : β > β 0 .

(2.2)

It is convenient to transform the IV matrix Z into a matrix Z which is orthogonal to X: Z X = 0. We choose Z = MX Z as the transformed IV matrix, where MA = I −PA and PA = A(A A)−1 A for any full column matrix A. The reduced-form model for Y = [y1 : y2 ] can be written in matrix notation as Y = Zπa + Xη + V, where a = (β, 1) , η = [γ : ξ], and γ = γ 1 + ξβ, and ξ = ξ 1 + (X X)−1 X Zπ.

(2.3)

The reduced-form errors V = [v1 : v2 ] = [u+v2 β : v2 ] are assumed to be iid across rows. To obtain exact optimal tests, we assume that each row has a mean zero bivariate normal distribution with known 2 × 2 nonsingular covariance matrix Ω = [ω ij ] . As shown below, this assumption can be relaxed when asymptotic approximations are considered. The probability model for (2.3) is a member of the curved exponential family, and low dimensional suﬃcient statistics are available. Lemma 1 of AMS06a shows that X Y and Z Y are independent and suﬃcient for (γ , ξ ) and (β, π ) , respectively. Standard suﬃciency arguments show that we can focus on tests based on Z Y . As

3

shown by Moreira (2003), we can apply a one-to-one transformation to Z Y that yields the k × 2 suﬃcient statistic [S : T ], where S = (Z Z)−1/2 Z Y b0 · (b0 Ωb0 )−1/2 and T = (Z Z)−1/2 Z Y Ω−1 a0 · (a0 Ω−1 a0 )−1/2 , where b0 = (1, −β 0 ) and a0 = (β 0 , 1) .

(2.4)

Henceforth, the matrix square root is the (unique) symmetric square root. The distribution of the suﬃcient statistic [S : T ] is multivariate normal, vec[S : T ] ∼ N (hβ ⊗ µπ , I2k ) ,

(2.5)

with ﬁrst moment depending on the following quantities: µπ = (Z Z)1/2 π ∈ Rk and hβ = (cβ , dβ ) ∈ R2 , where cβ = (β − β 0 ) · (b0 Ωb0 )−1/2 and dβ = a Ω−1 a0 · (a0 Ω−1 a0 )−1/2 .

3

(2.6)

Invariant Similar Tests

Most tests in the IV literature do not depend on the coordinate system used for the instruments. The only exceptions to our knowledge are tests that exclude speciﬁc instruments, cf., Donald and Newey (2001). We follow AMS06 and Chamberlain (2007), and consider tests which are invariant to orthogonal transformations on the suﬃcient statistic [S : T ].2 By Theorem 6.2.1 of Lehmann and Romano (2005, p. 214) and Theorem 1 of AMS06a, every invariant test can be written as a function of S S S T QS QST Q = [S : T ] [S : T ] = = . (3.1) QST QT T S T T Henceforth, we use Q and (Q1 , QT ) = (QS , QST , QT ) interchangeably. The statistic Q = (Q1 , QT ) has a Wishart distribution with rank one that depends on ξ β (q) = hβ qhβ = c2β qS + 2cβ dβ qST + d2β qT , where qS qST q= ∈ R2×2 . qST qT 2

(3.2)

Moreira (2009a) shows that the group of transformations on [S : T ] is isomorphic to a group of transformations on the original data Y without the explanatory variables X. We can accommodate X by allowing a translation subgroup on Y .

4

Note that ξ β (q) ≥ 0 because q is positive semi-deﬁnite a.s. The density of Q evaluated at (q1 , qT ) = (qS , qST , qT ) is given by fQ1 ,QT (q1 , qT ; β, λ) = K1 exp(−λ(c2β + d2β )/2) det(q)(k−3)/2 × exp(−(qS + qT )/2)(λξ β (q))−(k−2)/4 I(k−2)/2 (

(3.3)

λξ β (q)),

where K1 is a constant, Iν (·) denotes the modiﬁed Bessel function of the ﬁrst kind of order ν, and λ = π Z Zπ ≥ 0. (3.4) Examples of invariant test statistics are the Anderson and Rubin (1949), Lagrange Multiplier (also known as score), and likelihood ratio statistics: AR = QS /k, LM = Q2ST /QT , and 1 2 2 LR = QS − QT + (QS − QT ) + 4QST . 2

(3.5)

When the concentration parameter λ/ (ω 22 · k) is small, most test statistics are not approximately distributed normal or chi-square. For example,√ under the weak instrument asymptotics of Staiger and Stock (1997) where π = C/ n, the LR statistic is not asymptotically pivotal. Its asymptotic distribution is nonstandard and depends on the nuisance, concentration parameter λ/ (ω 22 · k) under the null. Consequently, the null rejection probability of the standard likelihood ratio test depends on the concentration parameter. Hereinafter, it is convenient to work with the statistics 1/2

Q(k−1) = S MT S, LM1 = QST /QT , and QT , (3.6) which are a one-to-one transformation of Q. We note that Q(k−1) , LM1 is independent of QT and has a nuisance-parameter free distribution when β = β 0 . Moreira (2003) proposes similar tests which reject the null hypothesis when the test statistic ψ exceeds a critical value that depends on QT : ψ(Q(k−1) , LM1, QT ) > κψ,α (QT ), where the critical value function is

κψ,α (qT ) ≡ inf x ∈ R; Pβ0 ψ(Q(k−1) , LM1, qT ) > x ≤ α . 5

(3.7)

(3.8)

Here, we omit the dependence of the test statistic ψ and its conditional quantile κψ,α (qT ) on Ω for convenience. If the distribution of ψ conditional on qT is continuous, we obtain (3.9) Pβ 0 ψ(Q(k−1) , LM1, qT ) > κψ,α (qT ) ≡ α, The conditional test based on the statistic ψ is denoted Cψ, e.g., CLR is the conditional test based on the LR statistic. The conditional test has rejection probability at β = β 0 being smaller than or equal to α. If the conditional distribution is continuous, the test is exactly similar at β = β 0 ; see Moreira (2009b) for further results on similar tests in IV regression. In practice, we approximate the critical value function κψ,α (QT ) of test given in (3.7) by numerical simulations of the conditional the distribution of Q(k−1) , LM1 when β = β 0 . The approximation to κψ,α (qT ) is the 1 − α sample quantile of {ψ(Qik−1 , LM1i , qT ) : i = 1, ..., I}, where Qik−1 and LM1i are i-th draws from a chi-square distribution with k − 1 degrees of freedom and a standard normal distribution. We now introduce several new one-sided invariant similar tests for testing H0 : β = β 0 (or H0 : β ≤ β 0 ) against H1 : β > β 0 . Each similar test rejects the null when the one-sided statistic ψ is larger than the critical value function κψ,α . We brieﬂy present each one-sided statistic ψ now. The k-class estimators β (k) yield one-sided t-statistics:3 t (k) = β (k) =

β (k) − β 0 , where σ u (k) [y2 PZ y2 + n (1 − k) ω 22 ]−1/2 y2 PZ y1 + n (1 − k) ω 12 and σ 2u (k) = (1, −β y2 PZ y2 + n (1 − k) ω 22

(3.10) (k)) Ω (1, −β (k)) .

The nonstandard formula for the t-statistics in (3.10) arises here because we take Ω as known (for present purposes only). The commonly used 2SLS estimator, the limited information maximum likelihood for known Ω (LIMLK) estimator, the bias-adjusted (B2SLS) estimator proposed by Nagar (1959), and the estimator proposed by Fuller (1977) belong to the k-class: 2SLS: LIMLK: B2SLS: Fuller:

k k k k

= 1, = k LIM LK = smallest root κ of |(Y PZ Y /n + Ω) − κΩ| = 0, = 1 + (k − 2) /n, = k LIM LK − 1/n.

3

(3.11)

To avoid confusion with the number k of exogenous variables, we use k to deﬁne Theil’s class of estimators rather than the more traditional k.

6

The ﬁnite-sample properties of the estimators β (k) depend on k. Consequently, the behavior of the t (k) statistics can be sensitive to the choice of k. We construct two statistics from the likelihood of the model given in (2.3) with Ω known. The ﬁrst statistic is based on the standard LR statistic (i.e., −2 times the logarithm of the likelihood ratio) for testing H0 : β = β 0 : LR1 = 2 sup lc (Y ; β, Ω) − lc (Y ; β 0 , Ω) = R(β 0 ) − inf R(β), where β≥β 0

β≥β 0

R(β) =

b Y PZ Y b with b = (1, −β) , b Ωb

(3.12)

and lc (Y ; β, Ω) is the log-likelihood function for known Ω with all parameters concentrated out except β. In the Appendix, we show that R(β) and LR1 depend on the observations only through Q deﬁned in (3.1) and LR1 = LR × 1(β (k LIM LK ) ≥ β 0 ) + max {0, R(β 0 ) − R(∞)} × 1(β (k LIM LK ) < β 0 ), (3.13) where 1 (·) is an indicator function and R(∞) = limβ→∞ R(β) (hence, R(∞) equals R(β) with b replaced by (0, −1) ). We show later that the CLR1 test’s power function Pβ,λ (LR1 > κLR1,α (QT )) is not monotonic for β < β 0 . Furthermore, the CLR1 test will not have correct size when the null hypothesis is H0 : β ≤ β 0 . The second statistic is a standard LR statistic for testing H0 : β ≤ β 0 : MLR1 = 2 sup lc (Y ; β, Ω) − sup lc (Y ; β, Ω) = inf R(β) − R(β (k LIM LK )). β

β≤β 0

β≤β 0

(3.14) In the Appendix, we show that MLR1 = [LR − max {0, R(β 0 ) − R(∞)}] × 1(β (k LIM LK ) ≥ β 0 ) = LR1 − max {0, R(β 0 ) − R(∞)} .

(3.15)

(For H1 : β < β 0 , the inequalities in (3.13) and (3.15) are reversed.)

4

The Conditional t-Tests

We now elaborate more detailed expressions for the conditional t-tests. It is convenient to write [S : T ] = (Z Z)−1/2 Z Y Ω−1/2 J, (4.1) 7

where J is the orthogonal matrix

Ω−1/2 a0 Ω1/2 b0 . : J= b0 Ωb0 a0 Ω−1 a0

(4.2)

From expressions (4.2) and (4.1), we obtain

1/2

Y PZ Y = Ω

1/2

JQJ Ω

1/2

, where Ω

J=

c11 c12 c21 c22

.

(4.3)

We show that the t-statistics are then given by β (k) − β 0

t (k) =

, where (4.4) −1/2 σ u (k) [c221 QS + 2c21 c22 QST + c222 QT + n (1 − k) ω 22 ] c11 c21 QS + (c12 c21 + c11 c22 )QST + c12 c22 QT + n (1 − k) ω 12 . β (k) = c221 QS + 2c21 c22 QST + c222 QT + n (1 − k) ω 22 The term n (1 − k) simpliﬁes for each estimator considered in (3.11). Algebraic manipulations show that 2SLS: LIMLK: B2SLS: Fuller:

n (1 − k) = 0, n (1 − k LIM LK ) = LR − QS , n (1 − k) = 2 − k, n (1 − k) = n (1 − k LIM LK ) + 1.

(4.5)

1/2

By writing QS = Q(k−1) + LM = Q(k−1) + LM12 and QST = LM1 · QT , we can ﬁnd the critical value function for each t-statistic as in expression (3.8). For example, the conditional null distribution of the t-statistic based on the 2SLS estimator becomes β (k) − β 0 −1/2 , where 1/2 σ u (k) c221 Q(k−1) + LM12 + 2c21 c22 LM1 · qT + c222 qT 1/2 c11 c21 Q(k−1) + LM12 + (c12 c21 + c11 c22 )LM1qT + c12 c22 qT , (4.6) β (1) = 1/2 c221 Q(k−1) + LM12 + 2c21 c22 LM1 · qT + c222 qT t (1) =

where Q(k−1) has a chi-square distribution with k − 1 degrees of freedom and LM1 has a standard normal distribution. A closed-form expression for conditional quantiles of t-statistics is unknown. However, we can approximate the critical value functions by numerical simulations for each ﬁxed value qT . The next lemma also ﬁnds the limit for the critical value functions when qT → ∞. 8

Lemma 1 Let zα be the 1 − α quantile of the standard normal distribution. For any k in expression (3.11), κt(k ),α (qT ) → zα as qT → ∞. Comments: 1. A standard t-test rejects the null when the t-statistic t(k ) is larger than zα . Hence, replacing zα by the critical value function κt(k ),α (qT ) is innocuous for large qT . 2. Analogously, we can show that κLR1,α (qT ) and κM LR1,α (qT ) converge to zα provided the signiﬁcance level α ∈ (0, 1/2).

5

Power Envelopes

In this section, we address the question of optimal invariant similar tests when the IVs may be weak. To evaluate the performance of the novel one-sided conditional tests, we derive a power envelope for similar tests. The use of suﬃciency and invariance reduces the dimension of the parameters from 1 + k + 2p for θ = (β, π , ξ , γ ) to just 2 for (β, λ) . The dimension reduction allows the power envelope to meaningfully assess the performance of our one-sided tests. The envelope we derive here consists of upper bound for power and lower bound for either H0 : β = β 0 or H0 : β ≤ β 0 . The following theorem is the main result of this section: Theorem 1 Deﬁne the statistic LRβ

∗ ∗

λ

ϕ1 (q1 , qT ; β ∗ , λ∗ ) fQ1 ,QT (q1 , qT ; β ∗ , λ∗ ) = , (Q1 , QT ) = fQT (qT ; β ∗ , λ∗ )fQ1 |QT (q1 |qT ; β 0 ) ϕ2 (qT ; β ∗ , λ∗ )

(5.1)

where ϕ1 (q1 , qT ; β, λ) =

exp(−λc2β /2)(λξ β (q))−(k−2)/4 I(k−2)/2

−(k−2)/4 ϕ2 (qT ; β, λ) = λd2β qT I(k−2)/2 λd2β qT .

λξ β (q)

and (5.2)

Let κβ ∗ λ∗ ,α (QT ) be a shorthand for κLRβ ∗ λ∗ ,α (QT ). Then the following hold: (a) For (β ∗ , λ∗ ) with β ∗ > β 0 , the test that rejects H0 : β = β 0 when LRβ ∗ λ∗ (Q1 , QT ) > κβ ∗ λ∗ ,α (QT ) maximizes power over all level α invariant similar tests. (b) For (β ∗ , λ∗ ) with β ∗ < β 0 , the test that rejects H0 : β = β 0 when LRβ ∗ λ∗ (Q1 , QT ) < κβ ∗ λ∗ ,1−α (QT ) minimizes the null rejection probability over all level α invariant similar tests.

9

Comments: 1. We denote the test that rejects the null when LRβ ∗ λ∗ (Q1 , QT ) > κβ ∗ λ∗ ,α (QT ) as a point-optimal invariant similar (POIS) test. We determine the power upper bound by considering the POIS tests for arbitrary values (β ∗ , λ∗ ) when β ∗ > β 0 . The power upper bound is for similar tests when the null hypothesis is H0 : β = β 0 . We do not impose the additional constraint that tests must have correct size, and so the upper bound could be conservative for H0 : β ≤ β 0 . We shall see later that even for small values of λ, some tests for H0 : β ≤ β 0 do reach the upper bound. 2. The test which rejects the null when LRβ ∗ λ∗ (Q1 , QT ) < κβ ∗ λ∗ ,1−α (QT ) is called POIS0 test. We determine the null lower bound by ﬁnding the power of POIS0 tests for arbitrary values (β ∗ , λ∗ ) when β ∗ < β 0 . 3. The power envelope is the union of the power upper bound and null lower bound. Both bounds are relevant because we would like to compare the probability of making the type I and type II errors for diﬀerent tests. 4. The denominator ϕ2 (qT ; β ∗ , λ∗ ) does not depend on q1 and can be absorbed into the conditional critical value. Thus, the test based on LRβ ∗ λ∗ (Q1 , QT ) is equivalent to a test based on the numerator of ϕ1 (q1 , qT ; β ∗ , λ∗ ). For reasons of numerical stability, however, we recommend constructing critical values using ln(LRβ ∗ λ∗ (Q1 , QT )). We now show that such tests do not depend on λ∗ , so that the POIS and POIS0 tests are of a relatively simple form. Using a series expansion of I(k−2)/2 (x), we can write ϕ1 (q1 , qT ; β, λ) = 2

−(k−2)/2

exp(−λc2β /2)

∞ j=0

(λξ β (q1 , qT )/4)j . j!Γ((k − 2)/2 + j + 1)

(5.3)

The term ϕ2 (qT ; β, λ) can be written analogously. The function ϕ1 (q1 , qT ; β, λ) is increasing in ξ β (q1 , qT ) ≥ 0. As a result, for a ﬁxed value of β, say β ∗ > β 0 , the optimal test for ﬁxed alternative β ∗ rejects H0 : β = β 0 when ξ β ∗ (Q1 , QT ) > κβ ∗ ,α (QT ), (5.4) where κβ ∗ ,α (QT ) is a shorthand for κξβ ∗ ,α (QT ) as deﬁned in (3.8). This POIS test is one-sided because it directs power at a single point β ∗ that is greater than the null value β 0 . An analogous argument shows that the POIS0 test that minimizes rejection probabilities for ﬁxed β ∗ < β 0 rejects H0 when ξ β ∗ (Q1 , QT ) < κβ ∗ ,1−α (QT ).

(5.5)

Corollary 1 For β ∗ > β 0 , the POIS test based on ξ β ∗ (Q1 , QT ) is the uniformly most powerful test among invariant similar tests against the alternative distributions 10

indexed by {(β ∗ , λ) : λ > 0}. For β ∗ < β 0 , the POIS0 test based on ξ β ∗ (Q1 , QT ) uniformly minimizes the null rejection probability among invariant similar tests against the alternative distributions indexed by {(β ∗ , λ) : λ > 0}. Comments: 1. The form of the POIS test depends on the alternative β ∗ . Hence, there does not exist a uniformly most powerful invariant (UMPI) test. Although the form of the POIS and POIS0 tests does not depend on λ∗ , their power depends on the true value of λ. Hence, the power envelope depends on both parameters β and λ. 2. A test based on ξ β ∗ (Q1 , QT ) is equivalent to a test that rejects the null hypothesis when √ QS + δS2 QS − k

P OIS1δ = > κδ,α (QT ), where 2k + δ 2

δ = (2dβ∗ /cβ ∗ ) QT , S2 = QST /(||S|| · ||T ||), (5.6) and κδ,α (QT ) is a shorthand for κP OIS1δ ,α (QT ) deﬁned in (3.8). This formulation of the test is convenient because QS , S2 = QST /(||S|| · ||T ||), and QT are independent under β = β 0 , which simpliﬁes the calculation of critical values. 3. Provided ω 12 −ω 22 β 0 = 0, the quantity dβ ∗ is a linear function of β ∗ and equals zero if and only if β ∗ = β AR , where ω 11 − ω 12 β 0 . (5.7) ω 12 − ω 22 β 0 √ In this case, δ = 0 and P OIS1δ reduces to QS / 2k, which is the AR statistic rescaled. Hence, the AR test, usually conceived as a two-sided test, is one-sided POIS against the alternative β = β AR . This ﬁnding is in agreement with Chernozhukov, Hansen, and Jansson (2009) who use completeness of QT to show that the weighted average power likelihood ratio (WAP-LR) tests of Andrews, Moreira, and Stock (2004) are admissible; see Moreira and Moreira (2013) on admissible WAP-LR similar tests without completeness. 4. The locally most powerful invariant (LMPI) test is the POIS test for β ∗ local to β 0 with β ∗ > β 0 . This test is equivalent to the one-sided LM test that rejects H0 if 1/2 (5.8) LM1 = QST /QT > zα , β AR =

where zα is the 1 − α quantile of the standard normal distribution. Analogously, if 1/2 β ∗ is local to β 0 with β ∗ < β 0 , then the LMPI test rejects H0 if −QST /QT > zα . 5. The sign of δ in (5.6) can change as β ∗ changes even for β ∗ values on the same side of the null hypothesis because dβ ∗ is a linear function of β ∗ . As a result, the form 11

of the P OIS1δ statistic (and the power envelope) changes dramatically as β ∗ varies. The constant δ determines the weight put on the statistic S2 . The optimal value of δ for small values of β > β 0 has the wrong sign for large values of β and vice versa. This fact has adverse consequences for the overall one-sided power properties of POIS tests. 6. The optimal one-sided test for β ∗ arbitrarily large rejects H0 if QS + 2(det(Ω))−1/2 (β 0 ω 22 − ω 12 )QST > κ∞,α(QT )

(5.9)

for κ∞,α (·) as deﬁned in (3.8). Remarkably, the same test is the optimal one-sided test for β ∗ negative and arbitrarily large in absolute value for any λ∗ . Consequently, the optimal two-sided test for |β ∗ − β 0 | arbitrarily large is the test in (5.9). Corollary 1 shows that the POIS test for an alternative (β ∗ , λ∗ ) depends only on ∗ β . Because the true parameter β is unknown, it is natural to assess the performance of feasible one-sided conditional t-tests (or likelihood ratio tests) using the power envelope.

6

Unknown Ω and Nonnormal Errors

We now introduce tests that are suitable for (possibly) non-normal, homoskedastic, uncorrelated errors and unknown covariance matrix. Here, we consider the same model and hypotheses as in Section 2, but with non-normal disturbances with unknown error covariance matrix. We estimate Ω ∈ R2×2 pd via n = V V /n, where V = MZ,X Y = Y − PZ Y − PX Y. Ω

(6.1)

n = [ The estimator Ω ω ij ] is consistent under general assumptions; see Lemma 1 of Andrews, Moreira, and Stock (2006b) (hereinafter, AMS06b). The feasible versions of Sn , Tn , Q1,n , and QT,n are n b0 )−1/2 , Sn = (Z Z)−1/2 Z Y b0 · (b0 Ω −1 −1/2 , −1 a0 · (a Ω Tn = (Z Z)−1/2 Z Y Ω 0 n a0 ) n T,n = T Tn . Q1,n = QS,n , QST,n = Sn Sn , Sn Tn , and Q n

12

(6.2)

The feasible one-sided t-statistics for unknown Ω are t( k) =

β( k) − β 0

, where (6.3) σ u ( k) [y2 PZ y2 + n (1 − k) ω 22 ]−1/2 12 y PZ y1 + n (1 − k) ω n 1, −β( k) . , and σ 2u ( k) = 1, −β( k) Ω β( k) = 2 y2 PZ y2 + n (1 − k) ω 22 The values k are obtained from (3.11) after estimating Ω: k = 1, n − κΩ n = 0, k LIM L = smallest root κ of Y PZ Y /n + Ω LIML: k = B2SLS: k = 1 + (k − 2) /n, Fuller: k = k LIM L − 1/n. 2SLS:

(6.4)

Simple algebraic manipulations show that β( k) are indeed the k-class estimators in the presence of covariates and unknown Ω: kMZ )y1⊥ y ⊥(I − , where Y ⊥ = y1⊥ : y2⊥ = MX Y β( k) = 2 y2⊥(I − kMZ )y2⊥ (and β( k LIM L ) is the limited information maximum likelihood estimator). 1,n , Q T,n , and Ω n in the same way the The feasible t( k) statistics depend on Q t(k) statistics depend on Q1 , QT , and Ω as described in expression (4.4). For all remaining test statistics, we just need to replace Q1 and QT by their analogues in n as well. For example, the LR1 test statistic for unknown which Ω is estimated by Ω 1,n and Q T,n . We denote Ω is deﬁned as in (3.13), but with Q1 and QT replaced by Q n . The analogues of LM1 and MLR1 are denoted the resulting test statistic by LR1 n and MLR1 n , respectively. LM1 T,n ), as deﬁned The critical value function for each test statistic ψ is simply κψ,α (Q in equation (3.8). Staiger and Stock (1997) model weak IVs and ﬁxed alternatives (WIV-FA), where π is local to zero and the alternative β is ﬁxed, not local to the null value β 0 . Assumption WIV-FA. (a) π = C/n1/2 for some non-stochastic k-vector C. (b) β is a ﬁxed constant for all n ≥ 1. (c) k is a ﬁxed positive integer that does not depend on n. By the continuous mapping theorem, the limiting distributions of the feasible conditional tests equal the rejection probability for the unfeasible counterparts derived 13

under the assumption of normal errors and known covariance Ω (with λn = π Z Zπ replaced by λ∞ = C DZ C, where DZ is the probability limit of n−1 Z Z) under fairly general assumptions; see Section 3 of AMS07 and the supplement for details. In particular, the conditional t-tests which reject the null when t( k) is larger than κψ,α (QT,n ) are asymptotically similar at level α under WIV-FA (as long as their conditional distribution is continuous).

7

Strong IV Asymptotics

In this section, we analyze asymptotic properties of the conditional tests. We study power of the conditional tests based on t(k), LM1, LR1, and MLR1 statistics for both local alternatives (SIV-LA) and ﬁxed alternatives (SIV-FA). We show below that QT,n /n converges in probability to a constant larger than zero under either SIV-LA or SIV-FA when β = β AR . As QT,n diverges to inﬁnity in probability as n → ∞, the lemma implies that the critical value functions converge in probability to the 1 − α quantile of the standard normal distribution as qT → ∞; see Lemma 1 and Comment 2 that follows that lemma. Under SIV-LA, we establish asymptotic eﬃciency (AE) for one-sided tests. Under SIV-FA, we address consistency. The t-statistics are asymptotically normal, and the conditional t-tests are asymptotically eﬃcient and consistent. In counterpart, the one-sided likelihood ratio statistics do not have standard asymptotic distribution (such as normal or chi-square) and the score test is not consistent. Hereinafter, we make the following assumptions for the asymptotic behavior of the instruments, exogenous regressors, and reduced-form errors. Assumption 1. n−1 Z Z →p D for some pd (k + p) × (k + p) matrix D. Assumption 2. n−1 V V →p Ω for some pd 2 × 2 matrix Ω.

Assumption 3. n−1/2 vec(Z V ) →d N(0, Φ) for some pd 2(k + p)×2(k + p) matrix Φ, where vec(·) denotes the column by column vec operator. Assumption 4. Φ = Ω ⊗ D, where Φ is deﬁned in Assumption 3. The quantities C, D, and Ω are assumed to be unknown. Andrews, Moreira, and Stock (2004) (hereinafter, AMS04) show that Assumptions 1-3 hold under general conditions. Assumption 4 holds under Assumptions 1-3 and homoskedasticity of the errors Vi , i.e., E(Vi Vi |Z i ) = EVi Vi = Ω a.s.

14

7.1

Strong IV with Local Alternatives

For local alternatives, we consider the Pitman drift where β is local to the null value β 0 as n → ∞. Assumption SIV-LA. (a) β = β 0 + B/n1/2 for some constant B > 0. (b) π is a ﬁxed non-zero k-vector for all n ≥ 1. (c) k is a ﬁxed positive integer that does not depend on n. We use Lemma 6 of AMS06a to establish the strong IV-local alternative limiting distribution of tests. Under Assumptions SIV-LA and 1-4, (Sn , Tn /n1/2 ) →d (SB∞ , αT ), n ) − (Sn , Tn /n1/2 , Ω) →p 0, (Sn , Tn /n1/2 , Ω n ) →d (SB∞ , αT , Ω), (Sn , Tn /n1/2 , Ω

(7.1)

where SB∞ and αT are k-vectors deﬁned as follows: 1/2

SB∞ ∼ N(αS , Ik ) and αT = DZ π(a0 Ω−1 a0 )1/2 , where αS = D=

1/2 DZ πB(b0 Ωb0 )−1/2 , DZ

D11 D12 D21 D22

(7.2)

−1 = D11 − D12 D22 D21 , and

, D11 ∈ Rk×k , D12 ∈ Rk×p , and D22 ∈ Rp×p .

These deﬁnitions allow us to determine the behavior of the LM1, LR1, MLR1, and t (k) statistics under SIV-LA asymptotics. Theorem 2 Under Assumptions SIV-LA and 1-4: (a) if k = k + Op (n−1 ) = 1 + Op (n−1 ), then t( k) = t (k) + op (1) →d (αT SB∞ )/||αT || n = LM1n + op (1) →d (αT SB∞ )/||αT ||, (b) LM1 1/2 = LR11/2 (c) LR1 n + op (1) →d max {(α SB∞ )/||αT ||, 0}, n

1/2

T

n = MLR11/2 (d) MLR1 n + op (1) →d max {(αT SB∞ )/||αT ||, 0}.

Comments. 1. The requirement k = 1 + Op (n−1 ) is satisﬁed by many k-class estimators, including the 2SLS and LIMLK estimators, and modiﬁcations thereof given in (3.11). 2. The requirement k = k + Op (n−1 ) allows us to show that the replacement of k by k does not have any asymptotic eﬀect for the t-statistics. Together with Lemma 1 (and Comment 2 that follows that lemma), Theorem 2 yields the following optimality result for a sequence of experiments under SIV-LA and 15

iid normal errors with unknown covariance matrix Ω. Under SIV-LA, the curvature of the model (2.3) vanishes asymptotically and standard local asymptotically normal (LAN) likelihood ratio theory is applicable. For one-sided alternatives, the usual one-sided LM test has standard large sample optimality properties among the class of unbiased tests for one-sided alternatives; see Lehmann and Romano (2005). We refer to tests with such properties as one-sided asymptotically eﬃcient (AE) tests under SIV-LA asymptotics and iid normal errors. Other one-sided tests we propose are also shown to be AE. Theorem 3 Suppose Assumptions SIV-LA and 1 hold and the reduced-form errors {Vi : i ≥ 1} are iid normal, independent of {Z i : i ≥ 1} with mean zero and pd covariance matrix Ω which may be known or unknown. Then the score test based on n and the conditional tests based on LM1 t (k) are one-sided AE. If α ∈ (0, 1/2), the 1/2 1/2 n are also AE. conditional tests based on LR1 and MLR1 n

7.2

Strong IV with Fixed Alternatives

We now analyze properties of the tests under strong IV ﬁxed alternative (SIV-FA) asymptotics. This asymptotic framework is novel in the weak-instrument literature and determines the consistency of tests. Assumption SIV-FA. (a) β = β 0 is a ﬁxed scalar for all n ≥ 1. (b) π is a ﬁxed non-zero k-vector for all n ≥ 1. (c) k is a ﬁxed positive integer that does not depend on n. The strong IV-ﬁxed alternative (SIV-FA) asymptotic behavior of tests depends on ς k ∼ N(0, Ik ) and λF A = π DZ π, (7.3) where DZ is deﬁned in (7.2). Lemma 2 Under Assumptions SIV-FA and 1-3, 1/2 1/2 (a) (Sn /n1/2 , Tn /n1/2 ) →p (cβ DZ π, dβ DZ π), (b) (Sn /n1/2 , Tn /n1/2 ) = (Sn /n1/2 , Tn /n1/2 ) + op (1), and (c) if β = β AR and Assumption 4 holds, then Tn →d ς k , Tn = Tn + op (1). Lemma 2 allows us to determine the limiting behavior of the LM1, LR1, MLR1, and t (k) test statistics under SIV-FA asymptotics.

16

Theorem 4 Under Assumptions SIV-FA and 1-3, 1/2 (a) if k = k + op (1) = 1 + op (1), t( k)/n1/2 = t (k) /n1/2 + op (1) →p cβ λF A × (b0 Ωb0 /b Ωb)1/2 , n /n1/2 = LM1n /n1/2 + op (1) →p cβ λ1/2 (b) if β = β AR , then LM1 F A, 1/2 (c) if β = β AR and Assumption 4 holds, then LM1n /n = LM1n /n1/2 + op (1) →d 1/2 cβ π DZ ς k /||ς k ||, 1/2 1/2 1/2 /n1/2 = LR11/2 + op (1) →p cβ λF A , and (d) if β > β 0 , LR1 n /n n 1/2 1/2 1/2 /n1/2 = LR11/2 (e) if β < β 0 , LR1 + op (1) →p max c2β − ω −1 n /n n 22 , 0 λF A , 1/2 1/2 1/2 n /n1/2 = MLR11/2 /n + o (1) → min c2β , ω −1 (f) if β > β 0 , MLR1 n p p 22 λF A , and 1/2

1/2 n /n1/2 = MLR11/2 + op (1) →p 0. (g) if β < β 0 , MLR1 n /n

Comments: 1. When β = β AR , the critical values of the conditional tests are either constants or converge in probability to constants as n → ∞ (see the comments following Lemma 1). When β = β AR , the critical value functions of these tests (for each β 0 ) are bounded. Therefore, this theorem addresses the consistency of each test. 2. Part (a) establishes consistency of the conditional tests based on t-statistics for testing H0 : β ≤ β 0 against H1 : β > β 0 . n /n1/2 > zα /n1/2 . Because 3. The one-sided LM test rejects the null when LM1 1/2 zα /n1/2 converges to zero and the probability of cβ π DZ ς k /||ς k || being smaller than zero equals 50%, the LM1 test is not consistent at β = β AR . 4. Part (d) shows that the CLR1 test is consistent against any alternative β > β 0 for the null hypothesis H0 : β = β 0 . Part (e) shows that the CLR1 test asymptotically rejects the null with probability one for some value of β < β 0 . Hence, the CLR1 test has asymptotic size equal to one once we augment the null hypothesis to H0 : β ≤ β 0 . 5. Parts (f) and (g) show that the conditional MLR1 is consistent whether H0 : β = β 0 or H0 : β ≤ β 0 . 6. Consider the ad-hoc statistic ALR1 = LR×1(β (k LIM LK ) ≥ β 0 ) instead of the LR1 and MLR1 statistics. Under SIV-LA, the CALR1 test is AE. Under SIV-FA, 1/2 1/2 converges to cβ λF A when β > β 0 and to zero when β < β 0 . Because ALR11/2 /n 1/2 1/2 min c2β , ω −1 cβ λF A ≥ 22 λF A found in part (f), the CALR1 test could dominate CMLR1 under strong instruments. For example, if β 0 = 0 and ω 11 = ω 22 then 2 −1 1/2 1/2 cβ λF A > min cβ , ω 22 λF A holds when β > 1. This ﬁnding suggests comparing ALR1 and MLR1 statistics under either Bahadur or Hodges-Lehmann eﬃciency instead of Pitman drifts (i.e., SIV-LA asymptotics). We leave this theoretical exercise for future research. 17

8

Numerical Simulations

This section reports numerical simulations for power envelopes and comparative powers of tests developed earlier for known Ω and normal errors. By transforming variables and parameters in the model (2.3), we can set β 0 = 0 and the reduced-form errors to be normal with unit variances and correlation ρ.4 Without loss of generality, no X matrix is included. The parameters characterizing the distribution of the tests are λ (= π Z Zπ), the number of IVs k, the correlation between the reduced form errors ρ, and the structural coeﬃcient β.

8.1

Power Comparison

The numerical simulations apply asymptotically to feasible tests which replace Ω for stochastic regressors and non-normal errors. Following Section 6.4 of with Ω AMS06a, the power envelopes obtained here are asymptotically valid when the errors are iid normal with unknown covariance matrix. Numerical simulations have been computed at the signiﬁcance level α = 0.05 for λ/k = 0.5, 1, 2, 4, 8, 16, which span the range from weak to strong instruments, ρ = 0.2, 0.5, 0.9, and k = 2, 5, 10, 20. To conserve space, we focus here on testing H0 : β ≤ 0 against H1 : β > 0 when λ/k = 1, 2, ρ = 0.5, 0.9, and k = 5. Additional numerical simulations are available in the supplement (including testing H0 : β ≥ 0 against H1 : β < 0). The simulations are presented as plots of power envelopes and power functions against various alternative values of β and λ. Power is plotted as a function of the rescaled alternative βλ1/2 . This can be thought of as a local power plot, where the local neighborhood is 1/λ1/2 instead of the usual 1/n1/2 , since λ measures the eﬀective sample size. We report simulations for all four conditional t-tests, the CLR1, and the CMLR1 test. Figures 1 and 2 assess the power properties of several tests for ρ = 0.5 and ρ = 0.9. We report power curves for the conditional t-tests as well as both CLR1 and CMLR1 tests. Conditional critical values for all test statistics are computed based on 100,000 Monte Carlo simulations for each observed value QT = qT . In the absence of a UMPI test, we consider tests whose power functions may be near the one-sided power envelope for invariant similar tests based on Corollary 1. In the supplement, we provide numerical evidence that the power envelopes for similar and non-similar tests are alike. 4

There is no loss of generality in taking β 0 = 0 because the structural equation y1 = y2 β+Xγ 1 +u = 0, where + Xγ + u and H0 : β and hypothesis H0 : β = β 0 can be transformed into y1 = y2 β 1 y1 = y1 − y2 β 0 and β = β − β 0 .

18

λ/k = 1 1

0.8

0.6

power

power

0.8

λ/k = 2 1

PE - s imilar t(k 2 S L S ) t(k L I M L K) t(k B 2 S L S) t(k F U L L ) CLR1 CMLR1

0.6

0.4

0.4

0.2

0.2

0 −6 −5 −4 −3 −2 −1

0 √

1

2

3

4

5

6

β λ

0 −6 −5 −4 −3 −2 −1

0 1 √ β λ

2

3

4

5

6

Figure 1: Asymptotic power of one-sided conditional tests (ρ = 0.5) The CLR1 test has rejection probabilities close to the power upper bound for alternatives β > β 0 . However, this test has null rejection probabilities close to one for small enough values of β < β 0 . This bad behavior is in accordance with Theorem 4 which shows that the CLR1 test is not consistent. Hence, this test is not very useful for applied researchers5 . The CMLR1 and all one-sided conditional t-tests do have correct size for H0 : β ≤ β 0 . Perhaps surprisingly, the conditional t-tests based on the 2SLS and Fuller estimator have good performance. The conditional test based on the 2SLS estimator numerically outperforms the one based on the B2SLS estimator. The test based on the Fuller estimator dominates the LIML counterpart and the CMLR1 test. As λ increases, the power of the conditional t-tests approaches the conservative power envelope. This result is in accordance with Section 7, which shows that the conditional t-tests (along with the CLR1 and CMLR1 tests) are asymptotically eﬃcient under strong-instrument asymptotics. When λ/k is as small as 2, the conditional 2SLS t-test performs near the power envelope for ρ = 0.5 while the conditional Fuller t-test has power close to the power envelope for ρ = 0.9. The supplement provides further evidence for the use of the one-sided conditional t-tests (in particular, the ones based on the 2SLS and Fuller estimators) in empirical practice. 5

Additional numerical results show that the CALR1 test based on the ALR1 = LR × 1(β (k LIMLK ) ≥ β 0 ) statistic also does not have correct size for testing H0 : β ≤ 0. Its null rejection probability can be as close to 20% when λ/k = 0.5 and to 10% when λ/k = 1 for values of β away from zero.

19

1

λ/k = 2 1

0.8

0.6

power

power

0.8

λ/k = 1 PE - s imilar t(k 2 S L S ) t(k L I M L K ) t(k B 2 S L S ) t(k F U L L ) CLR1 CMLR1

0.6

0.4

0.4

0.2

0.2

0 −6 −5 −4 −3 −2 −1

0 √

1

2

3

4

5

6

β λ

0 −6 −5 −4 −3 −2 −1

0 1 √ β λ

2

3

4

5

6

Figure 2: Asymptotic power of one-sided conditional tests (ρ = 0.9)

8.2

Monte Carlo Simulations

Both WIV-FA and SIV-LA asymptotic approximations show that we can drop the assumption of normal errors with known covariance Ω for a large sample size. In this n section, we ﬁnd evidence that the conditional t k tests based on the estimate Ω have approximately correct size even for small samples. We perform 10,000 Monte Carlo simulations to evaluate rejection probabilities when the sample size n = 100. We set the number of instruments k = 5 and choose the concentration parameter λ/k = 0, 0.5, 2, 8. The reduced-form error variances are ω 11 = ω 22 = 1 and ω 12 = ρ = 0.5, 0.9, 0.99. Table 1 presents rejection probabilities at β 0 = 0 when the errors are normal. All four tests have null rejection probabilities very close to the 5% nominal level when testing the alternative H1 : β > 0 (columns 3-6). The interesting case arises when testing the alternative H1 : β < 0 (columns 7-10). As expected, all conditional t-tests still have correct size. However, the conditional B2SLS t-test is not approximately similar when the degree of endogeneity is large and λ/k = 0. This eﬀect arises because we set the term [y2 PZ y2 − (k − 2) ω 22 ]1/2 in the B2SLS t-statistic to be equal to zero if the term inside the square root is negative. Hence, the conditional distribution of the B2SLS t-statistic is discontinuous when qT is close to zero. Because the discontinuity happens for the lower conditional quantiles, the conditional B2SLS t-test underrejects when testing H1 : β < 0 for small signiﬁcance levels α (or when testing H1 : β > 0 for large α); see the deﬁnition of the critical value function in (3.8). 20

Table 1: Normal Disturbances H1 : β > 0 ρ 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.99 0.99 0.99 0.99

λ/k 0 0.5 2 8 0 0.5 2 8 0 0.5 2 8

H1 : β < 0

2SLS

LIML

FULL

B2SLS

2SLS

LIML

FULL

B2SLS

5.49 5.72 5.48 5.34 5.96 6.04 5.90 5.52 6.08 5.69 6.01 5.53

5.34 5.68 5.30 5.07 5.93 5.45 5.05 4.91 6.40 4.98 4.94 4.94

5.32 5.60 5.31 5.07 5.89 5.44 5.04 4.91 6.36 5.22 4.95 4.94

5.36 5.69 5.43 5.35 5.87 5.93 5.85 5.50 6.06 5.89 5.92 5.51

5.12 5.15 5.12 4.96 5.18 4.88 4.76 5.08 5.08 5.07 4.76 5.08

4.92 5.19 5.12 5.27 4.89 4.76 5.05 5.09 5.24 4.79 5.20 4.92

5.13 5.40 5.12 5.25 5.10 4.81 5.05 5.09 5.08 4.56 5.02 4.92

3.68 4.41 4.99 4.96 3.22 5.17 5.28 5.08 1.62 5.24 4.99 5.11

Table 2 presents the same set of results with nonnormal errors.√ The structural errors u and v2 are serially uncorrelated with v1t = (η21t − 1) / 2 and v2t = √ (η 22t − 1) / 2, where η 1t and η2t are normally distributed with variance one and cor√ relation ρ. As in the normal design, the rejection probabilities at β 0 = 0 for all tests have correct size. Again, only the conditional B2SLS t-test is not approximately similar at λ/k = 0. We can obtain a similar test by replacing the term [y2 PZ y2 − (k − 2) ω 22 ]1/2 in the B2SLS t-statistic by either [y2 PZ y2 ]1/2 or Fuller’s (1977) correction6 . Numerical results (not reported here) show that (1) indeed both corrections yield conditional B2SLS t-tests which are approximately similar at β 0 = 0 even when ρ = 0.99 and λ/k = 0; and (2) the conditional 2SLS t-test dominates the two adjusted B2SLS t-tests. In the supplement, we show that all conditional t-tests are asymptotically similar in a uniform sense as long as their conditional distributions are continuous. This fact supports the numerical ﬁndings in Tables 1 and 2. 6

Speciﬁcally, we adapt his equation (4) to set [y2 PZ y2 − (k − 2) ω 22 ] 1/2 [2 · ω 22 ] , otherwise.

21

1/2

if y2 PZ y2 > k · ω22 ; and

Table 2: Nonnormal Disturbances H1 : β > 0 ρ 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.99 0.99 0.99 0.99

9

λ/k 0 0.5 2 8 0 0.5 2 8 0 0.5 2 8

H1 : β < 0

2SLS

LIML

FULL

B2SLS

2SLS

LIML

FULL

B2SLS

5.83 5.85 5.50 5.31 5.94 6.33 6.04 5.51 5.91 5.63 5.95 5.56

5.83 5.69 5.28 5.33 6.27 5.46 5.33 5.32 6.22 5.32 5.33 5.32

5.97 5.71 5.27 5.33 6.28 5.52 5.34 5.32 6.23 5.46 5.38 5.31

5.76 5.75 5.49 5.31 6.05 6.29 5.99 5.48 5.88 5.93 5.94 5.44

5.24 5.20 5.08 5.03 4.99 4.78 4.82 5.06 5.08 4.86 4.52 5.00

5.00 5.01 5.28 5.21 5.20 4.51 5.19 5.12 4.80 4.76 5.03 5.18

5.27 5.30 5.33 5.22 5.03 4.46 5.21 5.15 5.05 4.93 4.87 5.19

3.81 4.65 5.16 5.04 3.33 4.75 5.12 5.25 1.52 4.78 4.64 5.04

Two-Sided Tests

The good performance of one-sided conditional t-tests is striking, considering the bad performance documented by AMS07 for two-sided conditional t-tests. The goal of this section is to solve this apparent contradiction between one-sided and twosided conditional t-tests. It turns out that AMS07’s ﬁnding strongly relies on the asymmetry of the conditional null distribution of the t-statistics considered. This can be corrected by properly augmenting the conditional argument or by using other t-statistics. Let φ be a test function of Q(k−1) , LM1, and QT such that 0 ≤ φ ≤ 1. For each observation of Q(k−1), LM1, and QT , the test rejects the null with probability φ Q(k−1) , LM1, QT , β 0 , Ω . For example, Moreira’s (2003) conditional tests can be written as 1 ψ(Q(k−1) , LM1, qT ) > κψ,α (qT ) . (9.1) For testing H0 :β = β 0 against H1 : β = β 0 , Theorem 1 of AMS06b proves that an unbiased test φ Q(k−1) , LM1, QT , β 0 , Ω must satisfy (9.2) Eβ 0 φ Q(k−1) , LM1, qT = α and Eβ 0 φ Q(k−1) , LM1, qT · LM1 = 0 (9.3) for almost all values of qT . By Corollary 1 of AMS06b, the CLR test satisﬁes both 22

boundary conditions. Other conditional tests –such as tests based on t (k)2 – do not necessarily satisfy (9.3). This places considerable limits on the applicability of Moreira’s (2003) conditional method of generating unbiased tests. ln( q T /k ) = 1

ln( q T /k ) = 4

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 −3

−2

−1

0

1

2

0 −3

3

t(k 2 S L S ) t(k L I M L K) t(k F U L L ) t(k B 2 S L S) N (0, 1)

−2

−1

0

1

2

3

Figure 3: Probability density function for t(k) conditional on QT Consider instead two-sided unbiased tests based on one-sided statistics t (k) which reject the null when t (k) < κt(k),1−xα (qT ) or t (k) > κt(k),α−xα (qT ) ,

(9.4)

where xα ∈ [0, α] is chosen to approximately satisfy (9.2) and (9.3). Inverting the approximately unbiased t-tests in (9.4) allows us to construct conﬁdence regions around a chosen estimator (we do not obtain equal-tailed two-sided intervals, otherwise the test would be biased). In particular, we can construct conﬁdence regions based on the 2SLS estimator, which is commonly used in applied research. If the null distribution of t (k) conditional on qT were symmetric around zero, then xα = α/2 and κt(k),1−xα (qT ) = −κt(k),α−xα (qT ) ,

(9.5)

where κt(k),α−xα (qT ) is the 1 − α/2 quantile of the conditional distribution. This test 2 is the same as rejecting the null when t (k)2 > κt(k),α/2 (qT ) . That is, we would have obtained the conditional test based on t (k)2 where the critical value function 2 is κt(k)2 ,α (qT ) = κt(k),α/2 (qT ) . However, the symmetry of the null distribution of t (k) conditional on qT does not hold. Figure 3 illustrates this asymmetry around zero for the t (k) statistics based on the 2SLS, LIML, and Fuller estimators. When 23

qT is small, the standard normal distribution does not adequately approximate the conditional null distribution of t-statistics. Only when qT is large does the normal approximation work well. ln( q T /k ) = 1

ln( q T /k ) = 4

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 −3

−2

−1

0

1

2

0 −3

3

t 0 (k 2 S L S ) t 0 (k L I M L K) t 0 (k F U L L ) t 0 (k B 2 S L S) N (0, 1)

−2

−1

0

1

2

3

Figure 4: Probability density function for t0 (k) conditional on QT Figure 4 plots the conditional distribution for modiﬁed versions of t-statistics: t0 (k) =

β (k) − β 0

σ 0 · [y2 PZ y2 + n (1 − k) ω 22 ]−1/2

,

(9.6)

which use σ 20 = (1, −β 0 ) Ω(1, −β 0 ) as the estimator of the variance of structural error. The conditional distributions for the t0 (k) statistics based on the 2SLS and Fuller estimators are also asymmetric around zero when qT is small. However, the t0 (k) statistic for the LIML estimator is nearly symmetric around zero for any value of qT . Hence, the conditional test based on t0 (k LIM LK )2 is nearly unbiased and should not suﬀer the bad power properties found by AMS07 for the t (k)2 statistics. In the supplement, we provide numerical results showing that the conditional ttest based on the t0 (k LIM LK )2 statistic and some of the unbiased t-tests can perform as well as the CLR test. Hence, the conclusion of AMS07 is only valid for a smaller class of t-tests.

10

Appendix of Proofs

Derivation of the One-sided Likelihood Ratio Statistics

24

Ignoring an additive constant, the log-likelihood function for known Ω with all parameters concentrated out except β is n 1 lc (Y ; β, Ω) = − ln det(Ω) − tr(Ω−1 Y MX Y ) + R(β) . (10.1) 2 2 Hence, we have LR1 = 2 sup lc (Y ; β, Ω) − lc (Y ; β 0 , Ω) = R(β 0 ) − inf R(β). β≥β 0

β≥β 0

(10.2)

We now determine inf β≥β 0 R(β). By deﬁnition, β (k LIM LK ) maximizes lc (Y ; β, Ω) over β ∈ R. Equivalently, β (k LIM LK ) minimizes R(β) over β ∈ R. If β (k LIM LK ) ≥ β 0 , then inf β≥β 0 R(β) = R(β (k LIM LK )) = inf β∈R R(β) and LR1 = R(β 0 )−inf β∈R R(β) = LR. If β (k LIM LK ) < β 0 , then inf β≥β0 R(β) equals either R(β 0 ) or R(∞) because R(β) is the ratio of two quadratic forms in β with pd weight matrices. Hence, the second equality in (3.13) holds. A similar reasoning yields (3.15). We now provide an expression for β (k LIM LK ) . The LIMLK estimator maximizes lc (Y ; β, Ω) or minimizes R(β) over β ∈ R. We note that b Ω−1/2 Y PZ Y Ω−1/2b b JQJ b = , where b = Ω1/2 b. (10.3) R(β) = bb bb The minimum of R(β) is obtained by the eigenvector b∗ that corresponds to the smallest eigenvalue of JQJ . Hence, β (k LIM LK ) = −b∗ /b∗ , where b∗ = (b∗ , b∗ ) = Ω−1/2b∗ . (10.4) 2

1

1

2

Proof of Lemma 1. For part (a), we need to analyze t-statistics based on the 2SLS, LIMLK, B2SLS, and Fuller estimators. The null distribution of the t-statistics conditional on QT = qT depends on the null distribution of QS and S2 . We ﬁrst consider the t-statistic based on the 2SLS estimator. The null distribution of t (1) conditional on QT = qT is given by 1/2 QS σ 0 c22 QS S2 + c21 1/2 qT t (1) = 1/2 , where σ u (1) c222 + c221 QqTS + 2c21 c22 1/2

c12 c22 + (c12 c21 + c11 c22 ) β (1) =

c222 + 2c21 c22

Q S S2

1/2

Q S S2 1/2

qT

σ 2u (1) = (1, −β (1)) Ω (1, −β (1)) . 25

1/2

qT

1/2

Q S S2 1/2

qT

+ c11 c21 QqTS

+ c221 QqTS

and (10.5)

We have β (1) →p c12 /c22 = β 0 as qT → ∞. In consequence, σ 2u (1) →p b0 Ωb0 ≡ σ 20 . As qT → ∞, we obtain 1/2 (10.6) t (1) →d QS S2 = LM1, which has a standard normal distribution. The same limiting result holds for the more general case (3.10) in which k = 1 as long as the additional term n (1 − k) /qT converges to zero as qT → ∞. The t-statistic based on the LIMLK estimator simpliﬁes to t (k LIM LK ) = κLIM LK

β (k LIM LK ) − β 0

σ u (k LIM LK ) [y2 PZ y2 − κLIM LK ω 22 ]−1/2 1 2 2 = QS + QT − (QS − QT ) + 4QST , 2

, where

−1

β (k LIM LK ) = [y2 PZ y2 − κLIM LK ω 22 ] [y2 PZ y1 − κLIM LK ω 12 ] and σ 2u (k LIM LK ) = (1, −β (k LIM LK ))Ω(1, −β (k LIM LK )) . (10.7) Because n (1 − k LIM LK ) /qT ≡ −κLIM LK /qT converges to zero as qT → ∞, the null distribution of t (k LIM LK ) conditional on qT also converges to a standard normal. The critical values for the t-statistics based on B2SLS and Fuller estimators are computed from the t-statistics based on k = 1 + (k − 2) /n and k = k LIM LK − 1/n, respectively. The result holds as long as the term n (1 − k) /qT converges to zero as qT → ∞. For example, for k = 1 + (k − 2) /n: k−2 n (1 − k) =− →0 qT qT

(10.8)

as qT → ∞. Hence the critical value of the t-statistic based on the B2SLS estimator also converges to that of a standard normal. Analogous results can be derived for the t-statistic based on the Fuller estimator. Proof of Comment 2 to Lemma 1. We show two intermediate limiting results as qT → ∞. First, the null conditional distribution of max{R (β 0 ) − R (∞) , 0} goes to zero as qT → ∞: 1/2 1/2 1/2 max (k − c221 )QS − c22 qT (c22 qT + 2c21 QS S2 ), 0 →p 0. (10.9) Second, by expression (A.12) of AMS06a, the null conditional distribution of the LR statistic converges to a chi-square-one as qT → ∞: 2qT 1 2 QS − qT + (qT − QS ) 1 + + op (1) QS S2 LR = 2 (qT − QS )2 (10.10) = QS S22 + op (1) . 26

Expression (10.7) gives the convergence of the null conditional distribution of t (k LIM LK ). In fact, because the joint null conditional distribution of LR and t (k LIM LK ) converges 1/2 to that of QS S22 and QS S2 , √ √ LR1 = LR × 1(t (k LIM LK ) > 0) + op (1) (10.11) 1/2 = QS S22 × 1(QS S2 > 0) + op (1) →

d

1/2

max{QS S2 , 0} as qT → ∞.

1/2

The critical value for max{QS S2 , 0} at level α (with 0 < α < 1/2) is zα because 1/2 1/2 (10.12) P max{QS S2 , 0} ≥ zα = P QS S2 ≥ zα = α. The null conditional distribution of max{R (β 0 ) − R (∞) , 0} goes to zero as qT → ∞. The result for the MLR1 statistic now follows from (10.11) and (10.12). Proof of Theorem 1: The power function is given by K(φ; β, λ) = φ(q1 , qT )fQ1 ,QT (q1 , qT ; β, λ)dq1 dqT .

(10.13)

R+ ×R×R+

We want to ﬁnd a test that maximizes power at (β ∗ , λ∗ ) among all level α invariant similar tests. By Theorem 2 of AMS06a, invariant similar tests must be similar conditional on QT = qT for almost all qT . In addition, the unconditional power equals the expected conditional power given QT . Hence, it is suﬃcient to determine the test that maximizes conditional power given QT = qT among invariant tests that are similar conditional on QT = qT , for each qT . By the Neyman-Pearson Lemma, the test of signiﬁcance level α that maximizes conditional power given QT = qT is of the likelihood ratio (LR) form and rejects H0 when the LR is suﬃciently large (part a) or small (part b). In particular, the conditional LR test statistic is LRβ ∗ λ∗ (q1 , qT ) =

fQ1 |QT (q1 |qT ; β ∗ , λ∗ ) fQ1 ,QT (q1 , qT ; β ∗ , λ∗ ) = . fQ1 |QT (q1 |qT ; β 0 ) fQT (qT ; β ∗ , λ∗ )fQ1 |QT (q1 |qT ; β 0 )

(10.14)

From the density fQ1 ,QT (q1 , qT ; β, λ) given in (3.3), we can determine fQT (qT ; β ∗ , λ∗ ) and fQ1 |QT (q1 |qT ; β 0 ) to provide the explicit expression for LRβ ∗ λ∗ (Q1 , QT ) that appears in (5.1); see Lemma 3 of AMS06a. Proof of Theorem 2. For part (a), n = Y PZ Y + n (1 − k) Ω + op (1) and Y PZ Y + n (1 − k) Ω σ 2u (k) = σ 2u (k) + op (1) ( = σ 20 + op (1) as well), 27

(10.15)

n is a consistent estimator of Ω. Hence, if k − 1 = Op (n−1 ) and Ω t (k) = t (k) + op (1). The t (k) statistic in turn is given by t (k) =

n−1/2 y2 PZ (y1 − y2 β 0 ) + n1/2 (1 − k) (ω 12 − ω 22 β 0 ) σ u (k) [n−1 y2 PZ y2 + (1 − k) ω 22 ]1/2

.

(10.16)

Because k − 1 = Op (n−1 ), we have n−1/2 y2 PZ (y1 − y2 β 0 ) + n1/2 (1 − k) (ω 12 − ω 22 β 0 ) 1/2 = n−1/2 y2 PZ Y b0 + op (1) = αT S b0 Ωb0 /a0 Ω−1 a0 + op (1) .

(10.17)

Analogously, n−1 y2 PZ y2 + (1 − k) ω 22 = π DZ π + op (1) = αT αT /(a0 Ω−1 a0 ) + op (1) .

(10.18)

Using the fact that σ u (k) →p (b0 Ωb0 )1/2 , we have t (k) →d (αT SB∞ )/||αT ||.

(10.19)

Finally, using k instead of k does not have any eﬀect asymptotically. The proof of part (b) follows immediately from (7.1). For part (c), recall that the LR1 statistic is LR1 = LR × 1(β (k LIM LK ) > β 0 ) + max {R(β 0 ) − R (∞) , 0} × 1(β (k LIM LK ) < β 0 ). (10.20) Under local alternatives β = β 0 + B/n1/2 , max {R(β 0 ) − R (∞) , 0} →p 0 because R(β 0 ) is Op (1) and R (∞) →p ∞. Therefore, LR1 = LR × 1(t (k LIM LK ) > 0) + op (1) → d (αT SB∞ )2 /||αT ||2 × 1[αT SB∞ /||αT || > 0],

(10.21)

where the third equality follows from the continuous mapping theorem and the joint convergence of LR and t (k LIM LK ) to (αT SB∞ )2 /||αT ||2 and αT SB∞ /||αT ||, respectively; see AMS06a, Theorem 6(c), regarding the convergence in distribution of LR to (αT SB∞ )2 /||αT ||2 . Part (d) also follows from (10.21) because max{R (β 0 ) − R (∞) , 0} converges in probability to zero. Proof of Theorem 3. Following the proof of Theorem 7 of AMS06a, we know that 1/2 the one-sided LM statistic for known Ω is LM1n = QST /QT , which is asymptotically eﬃcient by standard results. 28

By Lemma 1 (and Comment 2 that follows that lemma), the critical values of conditional tests based on LR1n , MLR1n , and t-statistics converge to a standard normal 1 − α quantile (provided α ∈ (0, 1/2) for the likelihood ratio statistics). 1/2 1/2 The LR1n and MLR1n statistics are not asymptotically equivalent to LM1n . 1/2 1/2 However, the asymptotic power of the one-sided tests based on LR1n , MLR1n , and LM1n is the same: P (max{(αT SB∞ )/||αT ||, 0} ≥ zα ) = P (max{ς 1 + λ1/2 B (b0 Ωb0 ) 1/2

−1/2

, 0} ≥ zα )

−1/2 (b0 Ωb0 )

≥ zα ) = P (ς 1 + λ B = P (αT SB∞ /||αT || ≥ zα ) ,

(10.22)

where ς 1 ∼ N(0, 1), B > 0, and zα is a positive critical value. n By Theorem 2, the asymptotic behavior of the tests above are the same when Ω replaces Ω. Hence, these tests are asymptotically eﬃcient when Ω is estimated. Proof of Lemma 2. Part (a) of the Lemma is established as follows: Sn /n1/2 = (n−1 Z Z)−1/2 n−1 Z Y b0 · (b0 Ωb0 )−1/2 1/2

1/2

→p DZ πa b0 · (b0 Ωb0 )−1/2 = DZ πcβ .

(10.23)

Similarly, Tn /n1/2 = (n−1 Z Z)−1/2 n−1 Z Y Ω−1 a0 · (a0 Ω−1 a0 )−1/2 1/2

1/2

→p DZ πa Ω−1 a0 · (a0 Ω−1 a0 )−1/2 = DZ πdβ .

(10.24)

Part (b) of the Lemma follows from Lemma 1 of AMS06b and part (i). Next, we prove part (c) of the Lemma. If β = β AR , then a Ω−1 a0 = 0 and using Assumption 4, we have Tn = (n−1 Z Z)−1/2 n−1/2 Z V Ω−1 a0 · (a0 Ω−1 a0 )−1/2 → d ς k ∼ N(0, Ik ).

(10.25)

Part (c) now follows from Lemma 1 of AMS06b. n →p Ω. Proof of Theorem 4: The proof follows from Lemma 2 and Ω −1/2 For part (a), the argument in the proof of Theorem 2 gives n t (k) = n−1/2 t (k)+ op (1). As in (10.16), the statistic n−1/2 t (k) can be written as n−1/2 t (k) =

n−1 y2 PZ (y1 − y2 β 0 ) + (1 − k) (ω 12 − ω 22 β 0 ) σ u (k) [n−1 y2 PZ y2 + (1 − k) ω 22 ]1/2 29

.

(10.26)

Because k − 1 = op (1), we have n−1 y2 PZ y2 + (1 − k) ω 22= n−1 y2 PZ y2 + op (1) → p λF A and

(10.27)

n−1 y2 PZ (y1 − y2 β 0 ) + (1 − k) (ω 12 − ω 22 β 0 ) = n−1 y2 PZ Y b0 + op (1) → p cβ · λF A (b0 Ωb0 )

Using the fact that σ u (k) →p (b Ωb)1/2 , we have t (k) /n1/2 →p cβ (λF A )1/2 (b0 Ωb0 /b Ωb)

1/2

.

(10.28)

instead of k does not have any eﬀect asymptotically. In addition, using k Parts (b) and (c) hold by simple calculations. For parts (d) and (e), we write the LR1 statistic as in (3.13). To study the behavior of LR1 under ﬁxed alternatives, we use the following: β (k LIM LK ) →p β, R (∞) /n →p ω −1 22 λF A , 2 R(β 0 )/n →p cβ λF A , and LRn /n →p c2β λF A .

(10.29)

For a proof of (10.29), see Theorem 13 (a) of AMS04. If β (k LIM LK ) →p β > β 0 , LR1/n →p c2β λF A .

(10.30)

LR1/n →p max c2β − ω −1 22 , 0 λF A .

(10.31)

If β (k LIM LK ) →p β < β 0 ,

Further algebraic manipulations show that (10.30) and (10.31) hold even at the value β = β AR For parts (f) and (g), we write the MLR1 statistic as in (3.15). If β (k LIM LK ) →p β > β 0,

2 −1

2 2 (10.32) MLR1/n →p min ω −1 22 λF A − cβ λF A , 0 + cβ λF A = min cβ , ω 22 λF A . If β (k LIM LK ) →p β < β 0 ,

MLR1/n →p 0,

as we wanted to show.

30

(10.33)

1/2

.

References Anderson, T. W., and H. Rubin (1949): “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,” Annals of Mathematical Statistics, 20, 46–63. Andrews, D. W. K., and P. Guggenberger (2010): “Applications of subsampling, hybrid, and size-correction methods,” Journal of Econometrics, 158, 285– 305. Andrews, D. W. K., M. J. Moreira, and J. H. Stock (2004): “Optimal Invariant Similar Tests for Instrumental Variables Regression,” NBER Working Paper t0299. (2006a): “Optimal Two-Sided Invariant Similar Tests for Instrumental Variables Regression,” Econometrica, 74, 715–752. (2006b): “Optimal Two-Sided Invariant Similar Tests for Instrumental Variables Regression,” Econometrica, 74, 715–752, Supplement. (2007): “Performance of Conditional Wald Tests in IV Regression with Weak Instruments,” Journal of Econometrics, 139, 116–132. Andrews, D. W. K., and J. H. Stock (2007): “Inference with Weak Instruments,” in Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society, ed. by T. P. R. Blundell, W. K. Whitney, vol. 3, chap. 6. Cambridge University Press, Cambridge. Angrist, J., and A. B. Krueger (1991): “Does Compulsory School Attendance Aﬀect Schooling and Earnings?,” The Quarterly Journal of Economics, 106, 979– 1014. Cattaneo, M., R. Crump, and M. Jansson (2012): “Optimal Inference for Instrumental Variables Regression with Non-Gaussian Errors,” Journal of Econometrics, 167, 1–15. Chamberlain, G. (2007): “Decision Theory Applied To an Instrumental Variables Model,” Econometrica, 75, 609–652. Chernozhukov, V., C. Hansen, and M. Jansson (2009): “Admissible Invariant Similar Tests for Instrumental Variables Regression,” Econometric Theory, 25, 806– 818. 31

Donald, S. G., and W. K. Newey (2001): “Choosing the Number of Instruments,” Econometrica, 69, 1161–1192. Dufour, J.-M. (1997): “Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models,” Econometrica, 65, 1365–1388. (2003): “Presidential Address: Identiﬁcation, Weak Instruments, and Statistical Inference in Econometrics,” Canadian Journal of Economics, 36, 767–808. Fuller, W. A. (1977): “Some Properties of a Modiﬁcation of the Limited Information Estimator,” Econometrica, 45, 939–953. Kleibergen, F. (2002): “Pivotal Statistics for Testing Structural Parameters in Instrumental Variables Regression,” Econometrica, 70, 1781–1803. Lehmann, E. L., and J. P. Romano (2005): Testing Statistical Hypotheses. Third edn., Springer. Moreira, H., and M. J. Moreira (2013): “Contributions to the Theory of Optimal Tests,” Ensaios Economicos, 747, FGV/EPGE. Moreira, M. J. (2002): “Tests with Correct Size in the Simultaneous Equations Model,” Ph.D. thesis, UC Berkeley. (2003): “A Conditional Likelihood Ratio Test for Structural Models,” Econometrica, 71, 1027–1048. (2009a): “A Maximum Likelihood Method for the Incidental Parameter Problem,” Annals of Statistics, 37, 3660–3696. (2009b): “Tests with Correct Size when Instruments Can Be Arbitrarily Weak,” Journal of Econometrics, 152, 131–140. Nagar, A. (1959): “The Bias and Moment Matrix of the General k-Class Estimators of Parameters in Simultaneous Equations,” Econometrica, 27, 575–595. Staiger, D., and J. H. Stock (1997): “Instrumental Variables Regression with Weak Instruments,” Econometrica, 65, 557–586. Stock, J. H., J. Wright, and M. Yogo (2002): “A Survey of Weak Instruments and Weak Identiﬁcation in Generalized Method of Moments,” Journal of Business and Economic Statistics, 20, 518–529. 32

Performance of conditional Wald tests in IV regression ...