Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 273–313, http://dx.doi.org/10.1080/10485252.2012.752083

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Testing conditional symmetry without smoothing Tao Chena and Gautam Tripathib * a Department

of Economics, University of Waterloo, Waterloo, Ontario, Canada; b Faculty of Law, Economics and Finance, University of Luxembourg, 1511 Luxembourg, Luxembourg (Received 15 June 2012; final version received 14 November 2012)

We test the assumption of conditional symmetry used to identify and estimate parameters in regression models with endogenous regressors, without making any distributional assumptions. The Kolmogorov– Smirnov-type statistic we propose is consistent, computationally tractable because it does not require optimisation over an uncountable set, free of any kind of nonparametric smoothing, and can detect n1/2 deviations from the null. Results from a simulation experiment suggest that our test can work very well in moderately sized samples. Keywords: testing conditional symmetry; endogenous regressors

1.

Introduction

1.1. Motivation Let (Y , X)1+dim(X)×1 be an observed random vector such that Y = μ(X, θ0 ) + ε for some θ0 ∈  ⊂ Rdim() , where μ is a real-valued function of X known up to θ0 . The latent random variable ε, assumed to be continuously distributed with full support on R, represents the aggregate effect of all variables affecting the outcome Y that are not part of the model. In observational data models, it is therefore not unlikely for ε, interpreted as an unobserved regressor, to be correlated with some or all of the observed regressors. In other words, some or all coordinates of X, the vector of explanatory variables, may be endogenous. The usual approach to identify and estimate θ0 in the presence of endogenous regressors is via the method of instrumental variables (IV), whereby one assumes the existence of a vector of random variables Wdim(W )×1 , called the IV (or just instruments) for X, which are exogenous with respect to ε in a certain sense. For example, it is common to assume that ε is mean independent of W , that is, E[ε|W ] = 0 with probability one (w.p.1). However, when conditional mean restrictions may not be plausible, such as when ε is believed to have thick tails, or when behavioural assumptions imply other restrictions on the distribution of ε|W (cf. Example 1.1), then other assumptions asserting exogeneity have to be made. These include assuming ε to be stochastically independent or (more weakly) quantile independent of W or assuming that the distribution of ε is symmetric conditional on W . Compare Powell (1994, Section 2) for an extensive *Corresponding author. Email: [email protected] © American Statistical Association and Taylor & Francis 2013

274

T. Chen and G. Tripathi

discussion on the various stochastic restrictions used to identify and estimate semiparametric models. In this paper, we focus on conditional symmetry and assume that the unknown conditional distribution of ε given W is symmetric about the origin, that is, the null hypothesis we would like to test is that

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

H0 : Law(ε|W ) = Law(−ε|W ).

(1)

Note that the instruments W and the regressors X can have elements in common because the exogenous coordinates of X act as their own instruments. If all regressors are exogenous, then, of course, W = X. For notational ease, the vector containing the distinct coordinates of X and W is written simply as (X, W ). The main objective of this paper is to propose a smoothing-free test for Equation (1) against the alternative that it is false, that is, the alternative hypothesis is that H1 : ∀θ ∈ ,

Law(ε(θ )|W )  = Law(−ε(θ)|W ),

(2)

where ε(θ ) := Y − μ(X, θ ). (Of course, ε(θ0 ) = ε.) Symmetry is a powerful shape restriction because it unambiguously fixes the location under minimal assumptions, for example, ε need not possess any moments. Moreover, it yields additional information about the underlying distribution in the form of an infinite number of moment conditions, namely, that the mean of every odd functions vanishes, that can be exploited to increase the efficiency of estimators. These fundamental properties are essentially the reason why symmetry is often imposed on statistical models to identify parameters and estimate them more efficiently (cf., e.g., Bickel 1982; Manski 1984; Powell 1986; Newey and Powell 1987; Newey 1988a, 1988b). Symmetry considerations can also play important roles in economic modelling. For instance, since constraints in the nominal wage adjustment process can affect the smooth functioning of labour markets, much attention has been paid in empirical labour economics to determine whether the changes in nominal wages are symmetrically distributed in order to test the hypothesis that nominal wages are downwards rigid (cf., e.g., Card and Hyslop 1996; Christofides and Stengos 2001; Stengos and Wu 2004). In empirical finance, the pricing of risky assets depends upon the skewness of the distribution of their returns, that is, assets that make the portfolio returns more left (right) skewed command higher (lower) expected returns (cf. Harvey and Siddique 2000 and references therein). Additional examples of symmetry restrictions in empirical macroeconomics and finance can be found in Bai and Ng (2001). The following example illustrates how conditional symmetry can result from more primitive assumptions. Example 1.1 Consider a paired sibling study where Yi1 := Ci + Xi1 θ0 + εi1 and Yi2 := Ci + Xi2 θ0 + εi2 represent observed market outcomes for the ith pair of twins, Ci denotes unobserved traits that for genetic reasons are common to the sibling pair, Xi1 and Xi2 are observed characteristics that vary across the siblings, and εi1 and εi2 denote idiosyncratic shocks, say, produced by ‘luck’, to the respective outcomes that may be correlated with Ci and some, or all, of the observed regressors. Assume the existence of IV Wi , for example, Wi could be an appropriately chosen subset of Xi1 and Xi2 , such that, given (Ci , Wi ), the idiosyncratic shocks εi1 and εi2 are independently and identically drawn from the same distribution. (A weaker alternative is to assume that εi1 and εi2 are exchangeable conditional on (Ci , Wi ).) It is then straightforward to verify that the conditional distribution of εi := εi1 − εi2 given Wi is symmetric about the origin. Therefore, θ0 can be estimated from the model Yi1 − Yi2 = (Xi1 − Xi2 ) θ0 + εi , where Law(εi |Wi ) = Law(−εi |Wi ). Note that conditional symmetry here was not directly imposed on the estimable model, but was

Journal of Nonparametric Statistics

275

a consequence of the primitive assumptions made regarding the error terms in the structural equations.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

1.2. Brief literature review Since symmetry restrictions are widely used, it is not surprising that nonparametric tests to determine whether the assumption of symmetry is supported by the data have been extensively studied. For instance, there is a large statistical literature on testing symmetry in univariate models (cf. Randles and Wolfe 1979, Section 3.5 and references therein). For multivariate models, Fan and Gencay (1995), Ahmad and Li(1997), Zheng (1998), and Hyndman and Yao (2002) use kernel smoothing, Bai and Ng (2001) employ the martingale transformation proposed by Khmaladze (1993), and Neumeyer, Dette, and Nagel (2005), Delgado and Escanciano (2007), and Neumeyer and Dette (2007) rely on empirical process theory. While the smoothing approach leads to tests that are consistent against fixed alternatives, these tests possess zero power against n1/2 -deviations from the null and moreover they depend upon smoothing parameters which are not always easy to choose. Khmaladze’s transformation leads to distribution-free tests which have power against n1/2 -alternatives, although it too requires nonparametric smoothing. The empirical process-based tests require no smoothing but are not distribution free so that critical values for implementing the test cannot be tabulated in advance; instead, they are obtained by simulation.

1.3. Our contribution Of the works cited earlier, the ones most closely related to our paper are Neumeyer et al. (2005) and Nagel and Delgado and Escanciano (2007), although our paper differs from these two papers in some important ways. First, unlike these papers, we allow the regressors in our model to be endogenous. (In contrast, the aforementioned papers assume ε to be independent (resp. mean independent) of X so that endogenous regressors are ruled out by assumption.) This is a nontrivial extension because endogeneity of regressors does affect the behaviour of the test statistic in both large and small samples (cf. Example 3.1 and Section 6.1). Second, the Kolmogorov– Smirnov (KS)-type statistic we propose is computationally more tractable than the one proposed in these papers because it only requires a search over a finite number of points (cf. Section 2). By contrast, the KS statistics in Delgado and Escanciano (2007) and Neumeyer et al. (2005) are implemented by searching over the uncountable set Rd , where d depends upon the dimension of the variables involved. (Delgado and Escanciano also propose a Cramér–von Mises-type statistic that is computationally less demanding than their KS statistic.) Third, we expand the literature by showing that the use of simulated critical values for specification tests can be extended to handle models with endogenous regressors (cf. Section 4). Finally, unlike the earlier papers that only use the empirical distribution function to construct the test statistic, our technical treatment is very general and encompasses a large class of test functions allowing us to determine the asymptotic properties of a large class of test statistics in a unified manner.

1.4. Organisation The remainder of the paper is organised as follows. Section 2 motivates the test statistic, and its large sample behaviour under the null hypothesis is described in Sections 3. As the limiting distribution of the test statistic turns out to be nonpivotal, Section 4 shows how to simulate the critical values. In Section 5, we demonstrate that our test is consistent against fixed alternatives and possesses nontrivial power against sequences of alternatives that are n1/2 -distant from the

276

T. Chen and G. Tripathi

null. Small sample properties of the test are examined in Section 6, and Section 7 concludes the paper. All proofs are given in the appendices.

2. The test statistic

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Our test for conditional symmetry is easy to motivate and is based on the almost obvious fact (whose proof, for the sake of completeness, is provided in the appendix) that d

Law(ε|W ) = Law(−ε|W ) ⇐⇒ (ε, W ) = (−ε, W ),

(3)

d

where ‘=’ is shorthand for ‘equal in distribution’. This suggests that a test for Equation (1) can be obtained by comparing the empirical distributions of Z := (ε, W )1+dim(W )×1 and Z r := (−ε, W ), where Z r denotes Z with its first coordinate reflected about the origin. Since Z and Z r are unobserved as they depend upon ε, we compare the empirical distributions of their feasible ˆ and θˆ is an estimator versions Zˆ := (ˆε, W ) and Zˆ r := (−ˆε , W ) instead, where εˆ := Y − μ(X, θ) of θ0 , for example, the IV estimator, obtained using data (Y1 , X1 , W1 ), . . . , (Yn , Xn , Wn ). Given z ∈ R1+dim(W ) , let (−∞, z] := (−∞, z(1) ] × · · · (−∞, z(1+dim(W )) ] denote the closed lower orthant in R1+dim(W ) . Then, letting 1S denote the indicator function of set S,  one possible statistic for testing Equation (1) is Rˆ := supz∈R1+dim(W ) |n−1 nj=1 1(−∞,z] (Zˆ j ) −  n−1 nj=1 1(−∞,z] (Zˆ jr )|, which resembles the usual KS statistic for testing whether two samples come from the same distribution. The null hypothesis is rejected if the observed value of Rˆ is large enough. The computational cost of implementing Rˆ can be reduced by restricting search to the finite subset Zˆ := {Zˆ 1 , . . . , Zˆ n } that in a certain sense becomes dense in supp(Z), the support of Z, as the sample size increases. Therefore, motivated by Andrews (1997), we focus our attention on     n n   −1   −1 r  ˆRmax := max n ˆ ˆ 1(−∞,z] (Zj ) − n 1(−∞,z] (Zj ) . (4)  z∈Zˆ   j=1 j=1 It will be shown later that Rˆ and Rˆ max behave similarly in large samples. (Note that since Zˆ consists of estimated observations, unlike the statistic in Andrews’s paper which is maximised over the ‘true’ observations, the effect of estimating θ0 has to be taken into account when showing that Zˆ is dense in supp(Z) as n → ∞; cf. Lemma A.4 for exact details.) As far as checking the equality of two probability measures is concerned, it is worth keeping in mind that the lower orthants used in the construction of Rˆ and Rˆ max can be substituted by the unit sphere in any measure determining class of sets. For instance, letting Sdim(W ) denote  R1+dim(W ) , Equation (1) could also be tested with Hˆ := sup(z,t)∈Sdim(W ) ×R |n−1 nj=1 1H(z,t) (Zˆ j ) −  n−1 nj=1 1H(z,t) (Zˆ jr )|, which employs closed half-spaces H(z, t) := {s ∈ R1+dim(W ) : s z ≤ t} instead of orthants. Analogous statistics when closed/open lower orthants are replaced by closed/open upper orthants or when closed half-spaces are replaced by open half-spaces, etc. follow mutatis mutandis. It is useful to consider the statistics described above as special cases of a general statistic so that asymptotic properties of a large class of test statistics can be investigated in a unified manner. To accomplish this goal, we introduce some compact notation. Henceforth, let PU be the ˆ distribution of a random vector  U and PU denote the empirical measure induced by n observations on U, that is, Pˆ U := n−1 nj=1 δUj , where δUj is the Dirac (i.e. point) measure at Uj ; for example,   Pˆ Zˆ := n−1 nj=1 δZˆ and Pˆ r := n−1 nj=1 δZˆ r are the empirical measures induced by Zˆ 1 , . . . , Zˆ n and j



j

Journal of Nonparametric Statistics

277

Zˆ 1r , . . . , Zˆ nr , respectively. Let F be a collection of ‘test’ functions from R1+dim(W ) → R and ∞ (F) denote the set of bounded functions from F → R. Weoften use linear functional notation to write  the integral of f ∈ F with respect to PZ as PZ f := f (z) PZ (dz) = f (u, w) Pε,W (du, dw). (If the region of integration is omitted, it means that integration is over the support of Z.) A general KS statistic for testing Equation (1) can now be defined as  F := sup |Pˆ ˆ f − Pˆ r f | = sup |(Pˆ ˆ − Pˆ r )f |. KS Z Z Zˆ Zˆ

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

f ∈F

f ∈F

(5)

 F for appropriately chosen F. For instance, The statistics described earlier are encompassed by KS if F1 := {1(−∞,z] : z ∈ R1+dim(W ) } is the set of indicator functions of lower orthants in Z, then  F1 = R. ˆ Similarly, if F2 := {1H(z,t) : (z, t) ∈ Sdim(W ) × R} is the collection of indicators of KS  F2 = H.  ˆ with ˆ Although Rˆ max can also be written as KS closed half-spaces in R1+dim(W ) , then KS F1 ˆ for technical reasons, we prefer that F not be random. Hence, we will Fˆ 1 := {1(−∞,z] : z ∈ Z}, ˆ deal with Rˆ max separately when showing that it has the same asymptotic properties as R. Remarks (i) It is important at this point to emphasise that the statistic we recommend to practitioners for applied work is n1/2 Rˆ max as defined in Equation (4), because it is easy to implement  F , though not suited and works well in small samples (cf. Section 6). The general KS statistic KS for practical implementation, is useful nonetheless, because it allows us to derive the large sample behaviour of n1/2 Rˆ max and related statistics in a unified manner. Finally, although it may be of  F statistic, this task is beyond interest to investigate the choice of F that leads to an ‘optimal’ KS the scope of our paper and we leave it for future research.  F can be further simplified. In particular, instead of expressing KS F (ii) The expression for KS as a contrast between Pˆ Zˆ and Pˆ Zrˆ as we do in Equation (5), there is an equivalent representation  F = supf ∈F |Pˆ ˆ (f − f r )|. in terms of a single measure. To see this, let f r (Z) := f (Z r ). Then, KS Z Besides reducing computational cost, as only one empirical measure has to be calculated instead of two, this equivalence also simplifies some technical arguments, for example, showing the  F , because we only have to deal with one measure instead of two. asymptotic tightness of KS r  F will be small, ideally zero, if Note that since f − f is antisymmetric in its first coordinate, KS the null hypothesis is true; hence, the simpler form also has the correct interpretation. d (iii) Even though Equation (3) ⇐⇒ (ε, −W ) = (−ε, −W ), because a sign change is an oneto-one transformation and conditional expectations are invariant to one-to-one transformations of conditioning variables, the Rˆ max statistic based on instruments (W1 , . . . , Wn ) will, in general, be different from one based on (−W1 , . . . , −Wn ). This lack of invariance with respect to sign changes of the data is a generic feature of KS statistics (cf. Andrews 1997, p. 1100, who also describes a standard way to construct a sign invariant version). Following his approach, an Rˆ max statistic that is invariant to sign changes of the coordinates of W can be constructed as follows for use in applications where it is desirable to have such invariance. (Say, for instance, in applications where the instruments are obtained by differencing other variables.) First, for each possible sign permutation π of the coordinates W , construct the statistic Rˆ max (π ) as defined in Equation (4). (There are 2dim(W ) sign permutations, denoted by , to consider.) Next, define the sign invariant statistic Sˆ max to be the maximum of these statistics, that is, Sˆ max := Rˆ max ∞ := maxπ ∈ Rˆ max (π ). Sˆ max is clearly invariant to sign changes in the coordinates of W , although, depending upon the dimension of W , it might be computationally more burdensome to calculate. The asymptotic properties of n1/2 Sˆ max remain analogous to those of n1/2 Rˆ max . (Briefly, under the null hypothesis or under the sequence of local alternatives specified in Section 5 (cf. Lemmas 3.3 and 5.3), each n1/2 Rˆ max (π ) converges in distribution to a functional of a Gaussian process indexed by π ; hence, by the continuous mapping theorem, n1/2 Sˆ max converges in distribution to the maximum (over

278

T. Chen and G. Tripathi

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

π ∈ ) of this functional.) Therefore, as in Andrews’s paper, we only provide a formal description of the properties of n1/2 Rˆ max . Additional notation. The following additional notation is used for the remainder of the paper. A(θ ) := (ε(θ ), X, W ) is the random vector containing ε(θ) and the distinct coordinates of X and W ; hence, if all regressors are exogenous, that is, W = X, then A(θ ) = (ε(θ ), X). We write A0 := A(θ0 ) for notational convenience. Keeping in mind that ε(θ ) is continuously distributed, whereas X and W may have discrete components, pA (θ )(u, x, w) du κ(dx, dw) := PA (θ )(du, dx, dw) denotes the density of PA (θ ), where the dominating measure κ is a mixture of the Lebesgue and counting measures. Given a random vector U, L2 (PU ) is the set of real-valued functions of U that are square integrable with respect to PU . The L2 (PU ) inner product and norm are  1/2

a1 , a2 PU := a1 (u)a2 (u) PU (du) and a 2,PU := a, aPU . The Euclidean norm is · and B(θ , ) is the open ball of radius centred at θ ; its · -closure is B(θ , ). Throughout the paper we maintain the assumption that the observations (Yj , Xj , Wj ), j = 1, . . . , n, are independent and identically distributed (i.i.d.). Unless stated otherwise, all limits are taken as the sample size n → ∞.

3.

Large sample results under the null hypothesis

 F when the null hypothesis is true. To In this section, we derive the asymptotic distribution of KS do so, we make use of the notion of convergence in distribution of stochastic processes taking values in ∞ (F) (cf. Chapter 1.5 of van der Vaart and Wellner 1996, henceforth referred to as V&W, for details regarding convergence in distribution in ∞ (F)). If θ0 is known and H0 is true, then (Pˆ Zˆ − Pˆ Zrˆ )f = (Pˆ Z − PZ )(f − f r ), f ∈ F. Hence, under standard conditions on F, it can be shown that the process {n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f : f ∈ F} converges in distribution in ∞ (F) to a mean zero Gaussian process {G0 f : f ∈ F} with covariance func F converges in tion EG0 f1 G0 f2 := E[(f1 − f1r )(Z)(f2 − f2r )(Z)], f1 , f2 ∈ F. Consequently, n1/2 KS distribution to the random variable supf ∈F |G0 f |. Of course, in practice, θ0 is unknown and has to be estimated. As we now show, the estimation error from using θˆ instead of θ0 shifts the limiting F. distribution of {n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f : f ∈ F} and, hence, n1/2 KS For maximum generality, we try to impose as few restrictions as possible on our model. For instance, we do not require μ to be smooth in θ ; instead, we only assume that it is mean-square differentiable at θ0 , that is, Assumption 3.1 There exists a neighbourhood of θ0 such that for all θ in this neighbourhood, μ(x, θ ) = μ(x, θ0 ) + μ˙  (x, θ0 )(θ − θ0 ) + ρ(x, θ , θ0 ), where μ˙ : Rdim(X) × Rdim() → Rdim() satisfies μ(x, ˙ θ0 ) 2 PX (dx) < ∞ and the remainder term ρ : Rdim(X) ×  → R is such that  supθ∈B(θ0 ,δ) ρ 2 (x, θ , θ0 ) PX (dx) = o(δ 2 ) as δ → 0. Since mean-square differentiability is weaker than pointwise differentiability, our setup allows for kinks in the functional forms for μ. Note that ρ is identically zero if μ is linear in parameters. Let ∂1 denote partial differentiation with respect to the first argument. The next assumption stipulates that a.e. on R and the conditional second Assumption 3.2 (i) u  → pε|X=x,W =w (u) is differentiable  moment of ∂1 pε|X=x,W =w , denoted by vA0 (x, w) := (∂1 log pε|X=x,W =w )2 (u) pε|X=x,W =w (u) du, is uniformly bounded, that is, vA0 ∞ := sup(x,w)∈supp(X)×supp(W ) |vA0 (x, w)| < ∞.

Journal of Nonparametric Statistics

279

(ii) The conditional distribution function of ε|X, W is Lipschitz,   that is, there exists a nonnegative ζ ∈ L2 (PX,W ) such that | (−∞,t1 ] pε|X=x,W =w (u) du − (−∞,t2 ] pε|X=x,W =w (u) du| ≤ ζ (x, w)|t1 − t2 |, t1 , t2 ∈ R. 1/2

(i) implies (cf. Hájek 1972, Lemma A.3) that the square-root density pε|X=x,W =w is mean-square differentiable, that is, given δ ∈ L2 (PX,W ), 1/2

1/2

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

pε|X=x,W =w (u + δ(x, w)) − pε|X=x,W =w (u)

1 ∂1 pε|X=x,W =w (u) δ(x, w) + r(δ(x, w), u, x, w), 2 pε|X=x,W =w (u) (6)  where, for each > 0, r 2 (δ(x, w), u, x, w) Pε|X=x,W =w (du) ≤ δ 2 (x, w). As a consequence, r(δ, ·, ·, ·) 2,PA0 = o( δ 2,PX,W ) as δ 2,PX,W → 0. This fact is used to bound remainder terms for Equation (A.3) in the proof of Lemma 3.1. (ii) implies that F1 and F2 satisfy Assumption 3.3(v). The requirements on the class of test functions are as follows. 1/2 pε|X=x,W =w (u)

=

Assumption 3.3 (i) F separates probability measures on R1+dim(W ) , that is, if P1  = P2 are probability measures on R1+dim(W ) , then P1 f  = P2 f for some f ∈ F. (ii) F has a bounded envelope, that is, MF := sup(f ,z)∈F ×R1+dim(W ) |f (z)| < ∞. (iii) F is PA0 -Donsker. (iv) Elements of F are stable with respect to perturbations in their first argument in the sense that f ∈ F =⇒ f (· + t, ·) ∈ F for all t ∈ R. (v) There exists a continuous function q : [0, ∞) → [0, ∞) such that q(0) = 0, supf ∈F f (· − (X, θ, θ0 ), ·) − f (·, ·) 2,PA0 ≤ q( θ − θ0 ), and supf ∈F f r (· − (X, θ, θ0 ), ·) − f r (·, ·) 2,PA0 ≤ q( θ − θ0 ), where (X, θa , θb ) := μ(X, θa ) − μ(X, θb ). (i) is necessary because we employ elements of F to distinguish between the distributions of Z and Z r (cf. the proof of Theorem 5.1). F1 and F2 satisfy (i) due to the fact that orthants and half-spaces separate probability measures. (ii), clearly satisfied by indicator functions, helps to uniformly (in F) bound remainder terms in Equation (A.3) in the proof of Lemma 3.1. In (iii), F is required to be Donsker with respect to the ‘bigger’ measure PA0 rather than PZ in order to control the estimation uncertainty (from estimating θ0 ) associated with the endogenous components of X. (iii), which implies that F r := {f r : f ∈ F} is also PA0 -Donsker because F and F r are covered by the same number of L2 (PA0 )-brackets, is used in the proof of Lemma 3.2 to show that n1/2 (Pˆ Zˆ − Pˆ Zrˆ ) converges in distribution in ∞ (F). Following V&W (Example 2.5.4), F1 (hence F1r ) is PA0 -Donsker because its L2 (PA0 )-bracketing entropy is finite; the same argument shows that F2 (hence F2r ) is PA0 -Donsker. (iv) implies that test functions with estimation error as an argument are still test functions. To see this, let f ∈ F1 so that f = 1(−∞,τ ]×(−∞,v] for some (τ , v) ∈ R × Rdim(W ) . Then, f (Y − μ(X, θˆ ), W ) = 1(−∞,τ ]×(−∞,v] (ε − (X, θˆ , θ0 ), W ) = ˆ 1(−∞,τ +(X,θ,θ ˆ 0 )]×(−∞,v] (ε, W ), implying that f (Y − μ(X, θ), W ) ∈ F1 ; the same argument works for F2 as well. (v) is similar to Equation (2.7) of Khmaladze and Koul (2004) and Equation (3) of van der Vaart and Wellner (2007). Under Assumption 3.2(ii), F1 and F2 satisfy (v) with q(t) ∝ t 1/2 (cf. Lemma A.3). (iii)–(v) plus an asymptotic equicontinuity argument helps uniformly (in F) bound Equation (A.2) in the proof of Lemma 3.1. We also assume that θˆ is a consistent estimator of θ0 , that is, Assumption 3.4 plim(θˆ ) = θ0 . For our results to hold, it is sufficient for θˆ to be consistent under implications of conditional symmetry that are strictly weaker than Equation (3). For example, it is enough for θˆ to be a

280

T. Chen and G. Tripathi

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

consistent generalised method of moments (GMM) or IV estimator of θ0 under E[ε|W ] = 0 w.p.1 or E[W ε] = 0 (Newey and McFadden 1994), or even under the conditional median restriction median(ε|W ) = 0 w.p.1 (Sakata 2007). We are now ready to describe how the estimation of θ0 affects the empirical distributions of  F . Henceforth, μ˙ 0 (·) := μ(·, Zˆ and Zˆ r ; this will help us derive the limiting distribution of KS ˙ θ0 ) and the symbol oPr◦ indicates asymptotic negligibility in outer probability (Pr ◦ ) to take care of measurability issues that may arise when taking the supremum over an uncountable set. Lemma 3.1 Let Assumptions 3.1–3.4 hold. Then, sup |(Pˆ Zˆ − Pˆ Z )f − f , (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θ0 ),

(7)

sup |(Pˆ Zrˆ − Pˆ Zr )f − f r , (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θ0 ).

(8)

f ∈F

f ∈F

Lemma 3.1 is a linearisation result. It shows how the processes {Pˆ Zˆ f : f ∈ F} and {Pˆ Zrˆ f : f ∈ F} can be expanded about θ0 up to first order. The term ·, (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 ), which vanishes if θˆ = θ0 , captures the uncertainty arising from estimating θ0 . It is useful to note that conditional symmetry of ε|W was not used to derive Equations (7) and (8). Lemma 3.1 is thus of independent interest and can be used in other situations as well; for example, the fact that Equations (7) and (8) hold whether the null hypothesis (1) is true or not, provided θ0 is suitably redefined (cf. Section 4), F. is used to show the consistency of n1/2 KS  F . Begin by observing that since n1/2 KS F Next, we derive the limiting distribution of n1/2 KS 1/2 ˆ is a functional of the stochastic process {n (PZˆ − Pˆ Zrˆ )f : f ∈ F}, its distribution is determined by the distribution of this process which in turn can be identified from the joint distribution of its marginals. Towards this end, fix f ∈ F and write (Pˆ Zˆ − Pˆ Zrˆ )f = (Pˆ Zˆ − Pˆ Z )f − (Pˆ Zrˆ − Pˆ Zr )f + (Pˆ Z − Pˆ Zr )f . Hence, since PZ = PZr by Equation (1), under the hull hypothesis we have that (Pˆ Zˆ − Pˆ Zrˆ )f = ((Pˆ Zˆ − Pˆ Z )f − f , (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 )) − ((Pˆ Zrˆ − Pˆ Zr )f − f r , (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 )) + (Pˆ Z − PZ )f − (Pˆ Zr − PZr )f + f − f r , (∂1 log pA0 )μ˙ 0 PA0 (θˆ − θ0 ). Then, by Lemma 3.1 and the fact that (Pˆ Z − PZ )f − (Pˆ Zr − PZr )f = (Pˆ Z − PZ )(f − f r ), ˆ 0 (f ) + oPr◦ (1) + oPr ( n1/2 (θˆ − θ0 ) ) n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f = X

unif. in f ∈ F,

(9)

ˆ 0 (f ) := n1/2 (Pˆ Z − PZ )(f − f r ) + f − f r , (∂1 log pA0 )μ˙  PA n1/2 (θˆ − θ0 ). To identify the where X 0 0 ˆ 0 , we need some additional information about θˆ . In particular, we assume marginal distribution of X that Assumption 3.5 θˆ is asymptotically linear with influence function ϕ, that is, n1/2 (θˆ − θ0 ) = n −1/2 2 n j=1 ϕ(Yj , Xj , Wj , θ0 ) + oPr (1), where Eϕ(Y , X, W , θ0 ) = 0 and E ϕ(Y , X, W , θ0 ) < ∞. This assumption implies that n1/2 (θˆ − θ0 ) is asymptotically normal. In practical terms, this means that θˆ , centred at θ0 , is approximately Gaussian in finite samples. However, it is well known by now that IV estimators may not be normally distributed in small or large samples if the parameters they are estimating are poorly identified, as, for instance, may occur if the instruments

Journal of Nonparametric Statistics

281

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

are weak and the degree of endogeneity is high (cf., e.g., Phillips 1989; Nelson and Startz 1990; Choi and Phillips 1992; Maddala and Jeong 1992). In Section 6.1, we examine how the strength of instruments and the degree of endogeneity of the regressors affects the finite sample performance of our test. By Assumption 3.5 and the central limit theorem (CLT) for i.i.d. random vectors, (n1/2 (Pˆ Z − PZ )(f − f r ), n1/2 (θˆ − θ0 )) is asymptotically multivariate normal for each f ∈ F. In particular, d n1/2 (θˆ − θ0 ) converges in distribution in Rdim() to Nϕ0 = N(0dim()×1 , Eϕ0 ϕ0 ), where ϕ0 := ϕ(Y , X, W , θ0 ). Hence, for all f ∈ F, d ˆ 0 (f ) − X → X0 (f )

in R,

(10)

where X0 (f ) := G0 f + f − f r , (∂1 log pA0 )μ˙ 0 PA0 Nϕ0 and G0 is the Gaussian process defined earlier. The limit X0 is a Gaussian process because all finite-dimensional distributions of ˆ 0 are asymptotically Gaussian (due to the previously stated fact that (n1/2 (Pˆ Z − PZ )(f − X f r ), n1/2 (θˆ − θ0 )) is asymptotically multivariate normal for each f ∈ F). Since the remainder terms in Equation (9) are asymptotically negligible in probability uniformly in f ∈ F, the process {n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f : f ∈ F} will converge in distribution in ∞ (F) to the limiting process ˆ 0 (f ) : f ∈ F} is asymptotically tight. As the next result shows, {X0 (f ) : f ∈ F} provided that {X this is indeed the case. Lemma 3.2 Let Assumptions 3.1–3.5 hold and Equation (1) be true. Then, {n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f : d

f ∈ F} − → {X0 (f ) : f ∈ F} in ∞ (F). Since x → supf ∈F |x(f )| (being a norm) is continuous on ∞ (F) and the limiting process in Lemma 3.2 takes values in ∞ (F), an application of the continuous mapping theorem (V&W,  F := supf ∈F |n1/2 (Pˆ ˆ − Pˆ r )f |. Theorem 1.3.6) yields the limiting distribution of n1/2 KS Z Zˆ Theorem 3.1 Let Assumptions 3.1–3.5 hold and Equation (1) be true. Then, the random variable  F converges in distribution to the random variable supf ∈F |X0 (f )|. n1/2 KS  F1 . This leads immediately to the asymptotic distribution of n1/2 Rˆ = n1/2 KS Corollary 3.1 Let the conditions of Theorem 3.1 hold. Then, n1/2 Rˆ converges in distribution to the random variable R0 := supz∈R1+dim(W ) |G0 1(−∞,z] + 1(−∞,z] − 1r(−∞,z] , (∂1 log pA0 )μ˙ 0 PA0 Nϕ0 |. Example 3.1 (No endogenous regressors) The limiting distribution of the test statistics simplifies if there are no endogenous regressors. To see this, suppose that all regressors are exogenous, that is, W = X, so that A0 = Z. Since 2f = (f + f r ) + (f − f r ) and Equation (1) implies that ∂1 pZ is antisymmetric in its first coordinate, it follows from Corollary 3.1 that n1/2 Rˆ converges in dis tribution to the random variable supz∈R1+dim(X) |G0 1(−∞,z] + 2 1(−∞,z] , (∂1 log pZ )μ˙ 0 PZ Nϕ0 |. We also expect n1/2 Rˆ max to converge in distribution to R0 under the null hypothesis because Fˆ 1 is dense in F1 with probability approaching one (w.p.a.1) (cf. Lemma A.4). The next result confirms this intuition (cf. Andrews 1997, p. 1105). Lemma 3.3 Let the conditions of Theorem 3.1 hold. Then, n1/2 Rˆ max and n1/2 Rˆ both converge in distribution to the same random variable, that is, R0 . Now that we know how Rˆ max behaves in large samples, a size-α test, α ∈ (0, 1), for Equation (1) based on Rˆ max can be formalised as follows: reject H0 if n1/2 Rˆ max ≥ cα , where cα is the 1 − α

282

T. Chen and G. Tripathi

quantile of R0 . However, cα cannot be obtained from a table because the distribution of R0 depends upon θ0 (via Nϕ0 ) and PA0 ; that is, in other words, n1/2 Rˆ max is not an asymptotically pivotal statistic. Instead, quantiles of R0 can be simulated as described in Section 4.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

4. The test using simulated critical values Simulated critical values for specification tests have been used earlier in the literature, cf., e.g., Su and Wei (1991), Hansen (1996), Neumeyer et al. (2005), Delgado, Domínguez, and Lavergne (2006), and Delgado and Escanciano (2007). In this section, we enlarge this literature by showing that the use of simulated critical values for specification tests can be extended to handle models with endogenous regressors. Intuitively, the basic idea behind simulating critical values for Rˆ max is to introduce additional ∗ randomness in the data and construct an artificial random variable (say Rˆ max ) that has the same ∗ distribution as Rˆ max under the null hypothesis. Repeated draws of Rˆ max then yield a random sample from the distribution of Rˆ max under the null, which can be used to estimate the quantiles of Rˆ max to any desired level of accuracy since the number of draws is in control of the researcher. The test using the simulated critical values is implemented as follows: (i) Use θˆ and the residuals εˆ j := Yj − μ(Xj , θˆ ), j = 1, . . . , n, to calculate n1/2 Rˆ max . (ii) Independent of the observed data Dn := {(Yj , Xj , Wj ) : 1 ≤ j ≤ n}, use a random numi.i.d. ˆ + Rj εˆ j , ber generator to generate R1 , . . . , Rn ∼ Rademacher and define Yj∗ := μ(Xj , θ) j = 1, . . . , n. Since Law(Rˆε |W1 , . . . , Wn ) = Law(−Rˆε|W1 , . . . , Wn ) by construction, the simulated sample {(Rj εˆ j , Wj ) : 1 ≤ j ≤ n} satisfies the null hypothesis. (Recall that a random variable R is said to have the Rademacher, or symmetric Bernoulli, distribution if PR := (δ−1 + δ1 )/2, i.e., R takes the values −1 and 1 with equal probability.) (iii) Re-estimate θ0 with (Yj∗ , Xj , Wj ), j = 1, . . . , n, using the same procedure used earlier to obtain θˆ to make the estimation error in the simulated sample resemble the estimation error in the data. Denote the resulting estimator by θˆ ∗ and let εˆ j∗ := Yj∗ − μ(Xj , θˆ ∗ ), j = 1, . . . , n, be the corresponding residuals. ∗ (iv) Next, with Zˆ ∗ := (ˆε ∗ , W ), Zˆ := (−ˆε ∗ , W ), Zˆ ∗ := {Zˆ 1∗ , . . . , Zˆ n∗ }, calculate n1/2 Rˆ max := n n ∗ 1/2 −1 −1 1/2 ˆ ˆ 1 ( Z ) − n 1 ( Z )|. As shown subsequently, n n maxz∈Zˆ ∗ |n (−∞,z] (−∞,z] j j j=1 j=1 ∗ Rˆ max has the same limiting distribution as n1/2 Rˆ max when the null hypothesis is true. (v) Repeat (ii)–(iv) B times to obtain B random draws from the distribution of n1/2 Rˆ max under ∗ the null hypothesis. Calculate the 1 − α sample quantile (cα,B ) for these draws. ∗ 1/2 ˆ ’ then leads (vi) The decision rule ‘Reject H0 if the value of n Rmax observed in (i) exceeds cα,B to a size-α test for Equation (1). Alternatively, the p-value can be obtained by calculating the fraction of draws in (v) that exceed the observed value of n1/2 Rˆ max . We now show that (vi) is justified asymptotically. To summarise our approach, we first  ∗F := n1/2 supf ∈F |(Pˆ ∗ − Pˆ ˆ )f | has a well defined limiting distribution irreprove that n1/2 KS Z Zˆ 1/2 ˆ ∗ spective of whether the null hypothesis is true   or not. From this, it follows that n R := n1/2 supz∈R1+dim(W ) |n−1 nj=1 1(−∞,z] (Zˆ j∗ ) − n−1 nj=1 1(−∞,z] (Zˆ j )| has the same limiting distribution as n1/2 Rˆ under the null hypothesis and is bounded in probability otherwise. The demonstration ∗ ends by showing that n1/2 Rˆ max has the same limiting distribution as n1/2 Rˆ ∗ whether the null ∗ hypothesis is true or not, implying in particular that n1/2 Rˆ max (hence its quantiles) is bounded in probability even when the null hypothesis is false. Therefore, the test using critical values from

Journal of Nonparametric Statistics

283

∗ the simulated distribution of n1/2 Rˆ max is consistent, that is, it rejects a false null w.p.a.1, because 1/2 ˆ n Rmax → ∞ w.p.a.1 when the null hypothesis is false (cf. Section 5). We begin by assuming that θˆ and θˆ ∗ have a well defined limit even when the null hypothesis is false, i.e., irrespective of whether (1) is true or not,

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Assumption 4.1 plim((θˆ ∗ , θˆ ) − (θ1 , θ1 )) = 0 for some θ1 ∈ . The parameter θ1 is called the ‘pseudo-true value’ and exists under very general conditions (cf., e.g., Hall and Inoue 2003). (Of course, if the null hypothesis is true then θ1 = θ0 .) Let Z(θ ) := (ε(θ ), W ), Z r (θ ) := (−ε(θ ), W ), A1 := A(θ1 ), and let R denote the operator that ˆ := (Rˆε, W ) and ‘Rademacherizes’ the first component of its argument, for example, R(Z) R(A1 ) := (Rε(θ1 ), X, W ). Given f ∈ F, we can write (cf. the remark at the end of this section) (Pˆ Z∗ˆ − Pˆ Zˆ )f = (Pˆ R(Z(θ1 )) − Pˆ R(Z r (θ1 )) )f + (Pˆ Z∗ˆ − Pˆ R(Z(θ1 )) )f − (Pˆ Zˆ − Pˆ R(Z r (θ1 )) )f = (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ) + (Pˆ Z∗ˆ − Pˆ R(Z(θ1 )) )f − (Pˆ Zˆ − Pˆ R(Z r (θ1 )) )f . (11) To get some intuition behind why it makes sense to centre the last two terms about Pˆ R(Z(θ1 )) and ˆ = Rε(θ1 ) − (X, θˆ ∗ , θ) ˆ − R(X, θ, ˆ θ1 ). Pˆ R(Z r (θ1 )) , respectively, note that εˆ ∗ = Rˆε − (X, θˆ ∗ , θ) ∗ ˆ ˆ Hence, P − PR(Z(θ1 )) captures the estimation error from re-estimating θ0 using the simulated Zˆ

sample; a similar interpretation holds for Pˆ Zˆ − Pˆ R(Z r (θ1 )) . To study the properties of Pˆ Z∗ˆ and Pˆ Zˆ without assuming the null hypothesis to be true, we strengthen Assumptions 3.1–3.3 as follows.

Assumption 4.2 (i) There exists a neighbourhood of θ1 such that for each θ in this neighbourhood, μ(x, θˆ ∗ ) = μ(x, θ ) + μ˙  (x, θ )(θˆ ∗ − θ) + ρ(x, θˆ ∗ , θ) with supθ˜ ∈B(θ ,δ) ρ ˜ θ ) 2,PX = o(δ) as δ → 0, μ(·, ˆ − μ(·, (·, θ, ˙ θ1 ) ∈ L2 (PX ), and μ(·, ˙ θ) ˙ θ1 ) 2,PX = oPr (1). (ii) pε(θ1 )|X,W satisfies Assumption 3.2. (iii) Assumption 3.3 holds with PA0 replaced by PR(A1 ) and ˜ ˜ ˜ X, θˆ ∗ , θˆ , θ1 ) and (1, X, θˆ ∗ , θˆ , θ1 ), where (R, X, θa , θb , θ1 ) := (X, θ, θ0 ) replaced by (−1, (X, θa , θb ) + R(X, θb , θ1 ). ˆ + Since plim(θˆ ) = θ1 by Assumption 4.1, (i) implies that, w.p.a.1, μ(x, θˆ ∗ ) = μ(x, θ) ∗ ˜ ˆ ˆ ˆ ρ(·, θ, θ) = o( θ − θ ). Under (iii), μ˙ (x, θˆ )(θˆ ∗ − θˆ ) + ρ(x, θˆ ∗ , θˆ ) and supθ∈B( 2,PX ˜ ˆ θˆ ∗ −θˆ ) θ, 

ˆ∗ ˆ f θ ,θ − f θ1 ,θ1 2,PR(A1 ) ≤ q( (θˆ ∗ , θˆ ) − (θ1 , θ1 ) ). These facts help prove the first result in this section, which shows how the re-estimation step affects the empirical distributions of Zˆ ∗ and Zˆ (cf. Lemma 3.1). Henceforth, let μ˙ 1 (·) := μ(·, ˙ θ1 ) and, for f ∈ F, f θa ,θb (Rε(θ1 ), X, W ) := f (Rε(θ1 ) − rθ ,θ ˜ ˜ X, θa , θb , θ1 ), W ). (R, X, θa , θb , θ1 ), W ) and f a b (Rε(θ1 ), X, W ) := f r (Rε(θ1 ) − (R,

Lemma 4.1 Let Assumptions 4.1 and 4.2 hold. Then, whether or not the null hypothesis is true, ˆ ˆ sup |(Pˆ Z∗ˆ − Pˆ R(Z(θ1 )) )f − PR(A1 ) (f θ,θ − f θ1 ,θ1 )

f ∈F

ˆ = oPr◦ (n−1/2 ) + oPr ( θˆ ∗ − θˆ ) − 0.5 f − f r , (∂1 log pA1 )μ˙ 1 PA1 (θˆ ∗ − θ)| and ˆ ˆ sup |(Pˆ Zˆ − Pˆ R(Z r (θ1 )) )f − PR(A1 ) (f rθ,θ − f rθ1 ,θ1 )

f ∈F

ˆ = oPr◦ (n−1/2 ) + oPr ( θˆ ∗ − θˆ ). − 0.5 f r − f , (∂1 log pA1 )μ˙ 1 PA1 (θˆ ∗ − θ)|

284

T. Chen and G. Tripathi

As shown in the appendix, ˆ ˆ

ˆ ˆ

PR(A1 ) (f θ,θ − f θ1 ,θ1 ) = PR(A1 ) (f rθ ,θ − f rθ1 ,θ1 ).

(12)

Therefore, by Equation (11) and Lemma 4.1 and irrespective of whether the null hypothesis is true or not,

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

ˆ 1∗ (f ) + oPr◦ (1) + oPr ( n1/2 (θˆ ∗ − θ) ) ˆ n1/2 (Pˆ Z∗ˆ − Pˆ Zˆ )f = X unif. in f ∈ F, where ˆ 1∗ (f ) := n1/2 (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ) + f − f r , (∂1 log pA1 )μ˙ 1 PA n1/2 (θˆ ∗ − θ). ˆ X 1

(13)

The empirical process {n1/2 (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ) : f ∈ F} converges in distribution in ∞ (F) to G1 , a mean zero Gaussian process with covariance function EG1 f1 G1 f2 := E[(f1 − ˆ ∗ is thus a simulated version f1r )(Z(θ1 ))(f2 − f2r )(Z(θ1 ))], f1 , f2 ∈ F (cf. the proof of Lemma 4.2). X 1 ˆ of X0 that remains well defined even if the null hypothesis is false. This also illustrates the importance of re-estimating θ0 using the simulated sample: if θ0 was not re-estimated, that is, set ˆ ∗ would not mimic X ˆ 0 under the null hypothesis and there would θˆ ∗ = θˆ in Equation (13), then X 1 1/2  ∗ F be no reason to believe why n KSF would possess the same limiting distribution as n1/2 KS when the null hypothesis is true. Let Pr ∗ and E∗ stand for probability and expectation conditional on Dn , that is, integrals with respect to the Rademacher distribution because given Dn , the only source of randomness in (Y ∗ , X, W , θˆ ) is from R. Stochastic order symbols under Pr ∗ are written as oPr∗ and OPr∗ . ˆ ∗ (f ) : f ∈ F}, we assume that To derive the distribution of {X 1 ˆ∗ Assumption 4.3 (i) Conditional influence function ϕ ∗ , n on∗ Dn∗, θ is asymptotically linear with 1/2 ˆ ∗ −1/2 ∗ ∗ ∗ ˆ ˆ ˆ ∗ that is, n (θ − θ ) = n j=1 ϕ (Yj , Xj , Wj , θ ) + oPr (1), where E ϕ (Y , X, W , θ) = 0 and ∗ ∗ ∗ 2 ˆ < ∞. (ii) Lindeberg’s condition is satisfied, that is, for all > 0 and E ϕ (Y , X, W , θ) ˆ > n1/2 σλ∗ )] = o(σλ∗2 ), where σλ∗2 := λ ∈ Rdim() , E∗ [|λ ϕ ∗ (Y ∗ , X, W , θˆ )|2 1(|λ ϕ ∗ (Y ∗ , X, W , θ)|  n E∗ [n−1/2 j=1 λ ϕ ∗ (Yj∗ , Xj , Wj , θˆ )]2 . (iii) σλ∗2 = E[λ ϕ1 ]2 + oPr (1), where ϕ1 := ϕ(Y , X, W , θ1 ) and ϕ is the influence function defined earlier in Assumption 3.5. This assumption is straightforward to verify if μ is linear in parameters (cf. Neumeyer et al. 2005, For suppose that μ(X, θ ) = X  θ and dim(W )  = dim(X). Then, θˆ := instance, letting n p. 705). n n n  −1  −1 ∗ ∗ ˆ ( j=1 Wj Xj ) j=1 Wj Yj be the IV estimator of θ0 so that θ := ( j=1 Wj Xj ) j=1 Wj Yj , it is easy to verify that ⎛ n1/2 (θˆ ∗ − θˆ ) = ⎝n−1

n 

⎞−1 ⎛ Wj Xj ⎠

⎝n−1/2

j=1

⎛ − ⎝n−1

n  j=1

n 

⎞ W j εj R j ⎠

j=1

⎞−1 ⎛ Wj Xj ⎠

⎝n−1

n 

⎞ Wj Xj Rj ⎠ n1/2 (θˆ − θ0 ).

(14)

j=1

 Since n−1 nj=1 Wj Xj Rj = oPr∗ (1), the second term is asymptotically negligible in probability, conditional on Dn . A necessary condition for the critical values to be accurately simulated is that θˆ ∗ mimic the ˆ In particular, since θˆ centred at θ0 is asymptotically linear by Assumption 3.5, it is behaviour of θ.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Journal of Nonparametric Statistics

285

required that θˆ ∗ centred at θˆ also be asymptotically linear with its influence function resembling the ˆ (That, in fact, is why θˆ ∗ is re-estimated using the same IV procedure used influence function of θ. to obtain θˆ even though, conditional on the observed data, the simulated residuals (R1 εˆ 1 , . . . , Rn εˆ n ) are independent of the regressors (X1 , . . . , Xn ).) Recall that when θ0 is poorly identified, as is the case when the instruments are weak and the regressors are highly endogenous, θˆ − θ0 may not be Gaussian in small or large samples, hence, not asymptotically linear. In contrast, θˆ ∗ , by virtue of being centred at θˆ instead of θ0 , is likely to be asymptotically linear even under these circumstances. (This is easily seen from Equation (14), where the only requirement on θˆ for the second term to be asymptotically negligible in probability is that n1/2 (θˆ − θ0 ) = OPr (1), that is, in particular, asymptotic normality of n1/2 (θˆ − θ0 ) is not required.) The practical implication is that if the instruments are weak and the degree of endogeneity is high, then θˆ ∗ may not mimic θˆ well enough in finite samples to guarantee that the critical values are accurately simulated. We investigate this issue in Section 6.1. Conditional on Dn , n1/2 λ (θˆ ∗ − θˆ )/σλ∗ converges in distribution to a standard Gaussian random variable byAssumption 4.3(i), (ii), and Lindeberg’s CLT. Hence, byAssumption 4.3(iii), n1/2 (θˆ ∗ − ˆ converges in distribution, conditional on Dn , to Nϕ1 , a mean zero Gaussian random vector with θ) variance E[ϕ1 ϕ1 ]. A dominated convergence argument (Andrews 1997, p. 1101, Footnote 2) then implies that n1/2 (θˆ ∗ − θˆ ) also converges in distribution unconditionally to Nϕ1 . This leads to the following result about the limiting distribution of Pˆ Z∗ˆ − Pˆ Zˆ . Lemma 4.2 Let Assumptions 4.1–4.3 hold. Then, irrespective of whether the null hypothesis is d true or not, {n1/2 (Pˆ ∗ − Pˆ ˆ )f : f ∈ F} − → {X1 (f ) : f ∈ F} in ∞ (F), conditionally on Dn , hence Zˆ

Z

unconditionally, where X1 (f ) := G1 f + f − f r , (∂1 log pA1 )μ˙ 1 PA1 Nϕ1 . It follows by the continuous mapping theorem that



 F converges in disCorollary 4.1 Under the assumptions maintained in Lemma 4.2, n1/2 KS tribution to supf ∈F |X1 (f )| whether the null hypothesis is true or not. Consequently, letting F = F1 , the same holds for n1/2 Rˆ ∗ , that is, it converges in distribution to the random variable R1 := supz∈R1+dim(W ) |G1 1(−∞,z] + 1(−∞,z] − 1r(−∞,z] , (∂1 log pA1 )μ˙ 1 PA1 Nϕ1 |. Therefore, n1/2 Rˆ ∗ has the same limiting distribution as n1/2 Rˆ under the null hypothesis (because then θ1 = θ0 ) and is bounded in probability otherwise. Finally, we have that ∗ Lemma 4.3 Under Assumptions 4.1–4.3, n1/2 Rˆ max and n1/2 Rˆ ∗ have the same limiting distribution whether or not the null hypothesis is true.

Since n1/2 Rˆ max and n1/2 Rˆ have the same asymptotic distribution under the null hypothesis ∗ (cf. Lemma 3.3), it follows by Corollary 4.1 and Lemma 4.3 that n1/2 Rˆ max has the same limiting distribution as n1/2 Rˆ max when the null hypothesis is true and is bounded in probability ∗ otherwise. Therefore, under the null hypothesis, the simulated critical value cα,B converges in probability (conditional on Dn ) to cα , the 1 − α quantile of R0 , provided B → ∞ as n → ∞ (cf. Andrews 1997, p. 1108). This completes our argument justifying the use of simulated critical values.

286

T. Chen and G. Tripathi d

Remark By Rademacherization, (Rε(θ1 ), W ) = (−Rε(θ1 ), W ). Thus, for f ∈ F, PR(Z(θ1 )) f = Ef (R(Z(θ1 ))) = Ef (Rε(θ1 ), W ) = Ef (−Rε(θ1 ), W ) = Ef r (Rε(θ1 ), W )

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

= PR(Z(θ1 )) f r . Hence, PR(Z(θ1 )) (f − f r ) = 0. Moreover, since f (R(Z r (θ1 ))) = f (−Rε(θ1 ), W ) = f r (Rε(θ1 ),   W ) = f r (R(Z(θ1 ))) implies that Pˆ R(Z r (θ1 )) f = n−1 nj=1 f (R(Zjr )) = n−1 nj=1 f r (R(Zj (θ1 ))) = Pˆ R(Z(θ1 )) f r , it follows that (Pˆ R(Z(θ1 )) − Pˆ R(Z r (θ1 )) )f = (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ).

5.

Large sample results under the alternative

We begin by showing that our test is consistent, that is, it rejects any deviation from conditional symmetry w.p.a.1. Theorem 5.1 Let Equation (2) be true and Assumptions 3.1–3.4 hold with (θ0 , ε) replaced by  F = ∞ w.p.a.1. (θ1 , ε(θ1 )). Then, limn→∞ n1/2 KS  F will reject Equation (1) w.p.a.1 as n → ∞ because the simulated critical Hence, n1/2 KS values are bounded in probability (cf. Section 4). Letting F = F1 in Theorem 5.1, we have that n1/2 Rˆ is consistent. With a little additional effort, we can also show that Theorem 5.2 limn→∞ n1/2 Rˆ max = ∞ w.p.a.1 under the conditions of Theorem 5.1. Hence, n1/2 Rˆ max is consistent against fixed alternatives as well. Next, we derive the power of n KSF against a sequence of alternatives that lie in a n1/2 -neighbourhood of the null. To create the local alternatives of interest, begin by assuming that the null hypothesis is true, that is, PZ = PZr . Next, let θn be a sequence in  such that limn→∞ θn = θ0 and assume that An := (ε(θn ), X, W ) is drawn from the perturbed measure PAn := PA0 (1 + n−1/2 h), where h : R1+dim(W ) → R is such that h ∞ := supR×supp(W ) |h(u, w)| < ∞ and h dPZ = 0 (these conditions ensure that PAn is a probability measure for n ≥ h 2∞ ). Reflecting the first coordinate about the origin, this leads to a sequence of distributions for Z(θn ) := (ε(θn ), W ) and Z r (θn ) := (−ε(θn ), W ) given by 1/2 

H1n : PZ (θn ) := PZ (1 + n−1/2 h)

and

PZr (θn ) := PZ (1 + n−1/2 hr ).

(15)

Henceforth, let hr  = h. Then, for f ∈ F, (5.1) =⇒ (PZ (θn ) − PZr (θn ))f = n−1/2 PZ (h − hr )f =: n−1/2 h (f ) and supf ∈F |h (f )| > 0 because F is measure determining and h is not symmetric in its first coordinate. Thus, Equation (15) defines a sequence of local alternatives for Equation (3). Since observations under H1n are independent but (for different n) not identically distributed because the underlying measures PZ (θn ) and PZr (θn ) depend upon n, the assumptions introduced  F under H1n . We in Section 3 have to be strengthened in order to derive the distribution of n1/2 KS begin with Assumption 3.1. Assumption 5.1 There exists a neighbourhood of θ0 such that for each θn in this neighbourhood, μ(x, ·) is mean-square differentiable at θn , that is, μ(x, θ) = μ(x, θn ) + ˙ θ0 ) ∈ μ˙  (x, θn )(θ − θn ) + ρ(x, θ , θn ) with supθ∈B(θn ,δ) ρ(·, θ, θn ) 2,PX = o(δ) as δ → 0, μ(·, L2 (PX ), and μ(·, ˙ θn ) − μ(·, ˙ θ0 ) 2,PX = o(1).

Journal of Nonparametric Statistics

287

Next, we strengthen Assumption 3.2.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Assumption 5.2 (i) There exists a neighbourhood of θ0 such that for each θn in this neighbourhood, pε(θn )|X=x,W =w is a.e. differentiable on R and the conditional second moment of the derivative is uniformly bounded on N × supp(X) × supp(W ). (ii) For each θn in the aforementioned neighbourhood, the conditional distribution function of ε(θn ) given (X, W ) is Lipschitz and the nonnegative Lipschitz constants (ζn ) are such that supn∈N ζn ∈ L2 (PX,W ). Assumption 3.3 is modified as follows: Assumption 5.3 (i),  ∞(ii), and (iv) are same as (i), (ii), and (iv) of Assumption 3.3. (iii) The bracketing integral 0 supn∈N log N[ ] ( , F, L2 (PAn )) d < ∞. (v) Same as Assumption 3.3(v) but with (PA0 , θ0 ) replaced by (PAn , θn ). F1 and F2 satisfy (iii) because indicators of orthants and half-spaces are VC, hence, universally Donsker (V&W, Example 2.5.4 and Problem 2.6.14). The argument in Lemma A.3 shows that F1 and F2 also satisfy (v) under Assumption 5.2(ii). By V&W (Theorem 2.8.4), (ii) and (iii) imply that F is Donsker and pre-Gaussian uniformly in (PAn ). Since F and −F r are covered by the same number of (pointwise) L2 (PAn )-brackets, N[ ] ( , F − F r , L2 (PAn )) ≤ N[ ] ( , F, L2 (PAn )) × N[ ] ( , −F r , L2 (PAn )) = N[2] ( , F, L2 (PAn )). Hence, by (iii), ∞ r sup log N[ ] ( , F − F , L2 (PAn )) d ≤ n∈N

0

0



sup 2 log N[ ] ( , F, L2 (PAn )) d < ∞. n∈N

Therefore, by V&W (Theorem 2.8.4), F − F r is Donsker and pre-Gaussian uniformly in (PAn ). This fact is used in the proof of Lemma 5.2. Finally, Assumption 3.4 becomes Assumption 5.4 plim(θˆ − θn ) = 0. Under these conditions it is straightforward to show that Lemma 3.1 remains valid with θ0 replaced by θn (the proof of the following result is virtually identical to the proof of Lemmas 3.1 and 4.1 and is therefore omitted), that is, Lemma 5.1 Let Assumptions 5.1–5.4 hold. Then, under H1n , sup |(Pˆ Zˆ − Pˆ Z (θn ))f − f , (∂1 log pAn )μ˙ 0 PAn (θˆ − θn )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θn )

f ∈F

sup |(Pˆ Zrˆ − Pˆ Zr (θn ))f − f r , (∂1 log pAn )μ˙ 0 PAn (θˆ − θn )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θn ).

f ∈F

 F under H1n . Begin As in Section 3, we use Lemma 5.1 to derive the distribution of n1/2 KS by observing that Pˆ Zˆ − Pˆ Zrˆ = (Pˆ Zˆ − Pˆ Z (θn )) − (Pˆ Zrˆ − Pˆ Zr (θn )) + (Pˆ Z (θn ) − PZ (θn )) − (Pˆ Zr (θn ) − PZr (θn )) + (PZ (θn ) − PZr (θn )). Hence, by Equation (15) and Lemma 5.1, ˆ θn (f ) + h (f ) + oPr◦ (1) + oPr ( n1/2 (θˆ − θn ) ) n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f = X

unif. in f ∈ F,

288

T. Chen and G. Tripathi

ˆ θn (f ) := n1/2 (Pˆ Z (θn ) − PZ (θn ))(f − f r ) + f − f r , (∂1 log pAn )μ˙  PA n1/2 (θˆ − θn ). To where X n ˆ θn assume that (cf. Andrews 1997, Assumption E2(i)): identify the marginal distribution of X

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Assumption 5.5 (i) Assumption 3.5 holds with θ0 replaced by θn . (ii) The Lindeberg con2 dition E[|λ ϕn |2 1(|λ ϕn | > n1/2 σλ,n )] = o(σλ,n ) holds for all > 0 and λ ∈ Rdim() , where 2  2 ϕn := ϕ(Y , X, W , θn ) and σλ,n := E[λ ϕn ] → E[λ ϕ0 ]2 . By Assumption 5.5, n1/2 (θˆ − θn ) converges in distribution to Nϕ0 by Lindeberg’s CLT for inid random variables. Let ∂1 h exist and ∂1 h ∞ < ∞. Since log pAn = log pA0 + log(1 + n−1/2 h) (the second term is well defined for n ≥ h 2∞ ),

f − f r , (∂1 log pAn )μ˙ 0 PAn = f − f r , (∂1 log pA0 )μ˙ 0 PAn + n−1/2 f − f r , (∂1 h)μ˙ 0 PA0 so that, by Cauchy–Schwarz, f − f r , (∂1 log pAn )μ˙ 0 PAn − f − f r , (∂1 log pA0 )μ˙ 0 PA0 ≤ n−1/2 f − f r , (h + ∂1 h)(∂1 log pA0 )μ˙ 0 PA0 ≤ n−1/2 f − f r 2,PA0 h + ∂1 h ∞ vA0 ∞ (μ˙ 0 μ˙ 0 )1/2 2,PX . ˆ θn (f ) converges in distribution to X0 (f ) by Lindeberg’s CLT. Therefore, Hence, for each f ∈ F, X r 1/2 ˆ ˆ ˆ θn + h )F is asymptotically tight. n (PZˆ − PZˆ )F converges in distribution in ∞ (F) because (X Lemma 5.2 Let Assumptions 5.1–5.5 hold. Then, under H1n , {n1/2 (Pˆ Zˆ − Pˆ Zrˆ )f : f ∈ F} − → {X0 (f ) + h (f ) : f ∈ F} in ∞ (F). d

F Consequently, by the continuous mapping theorem, we have the limiting distribution of n1/2 KS under H1n . F Theorem 5.3 Let Assumptions 5.1–5.5 hold. Then, under H1n , the random variable n1/2 KS converges in distribution to the random variable supf ∈F |X0 (f ) + h (f )|. Letting F = F1 in Theorem 5.3, we immediately obtain that Corollary 5.1 If Assumptions 5.1–5.5 hold, then, under H1n , n1/2 Rˆ converges in distribution to the random variable supz∈R1+dim(W ) |G0 1(−∞,z] + 1(−∞,z] − 1r(−∞,z] , (∂1 log pA0 )μ˙ 0 PA0 Nϕ0 + h (1(−∞,z] )|. Given that we know that n1/2 Rˆ max and n1/2 Rˆ behave similarly under the null hypothesis, it is also not surprising that Lemma 5.3 Under the conditions of Theorem 5.3, n1/2 Rˆ max and n1/2 Rˆ have the same limiting distribution under H1n .  F is asymptotically Finally, as in Andrews (1997, p. 1114), it can be shown that n1/2 KS 1/2 ˆ locally unbiased (the same argument works for n Rmax as well). Indeed, since (F, · 2,PA0 ) is totally bounded (cf. the proof of Lemma 3.2), there exists a sequence of increasing finite sets (Fj ) whose limit ∪∞ j=1 Fj is dense (in the · 2,PA0 norm) in F. Hence, supf ∈F |(X0 + h )f | = supf ∈∪∞ |(X +  |(X0 + h )f |, where the second equality follows because 0 h )f | = supf ∈∪∞ j=1 Fj j=1 Fj

Journal of Nonparametric Statistics

289

X0 + h is · 2,PA0 -continuous on F w.p.1. (Almost all sample paths of X0 (F) are uniformly · 2,PA0 -continuous by V&W (Addendum 1.5.8), and uniform · 2,PA0 -continuity of h follows because |h (f )| ≤ 2 h ∞ f 2,PA0 for all f ∈ F, that is, h is a bounded linear functional on F.) Therefore, if B → ∞ as n → ∞, ∗  F > cα,B ) = Pr(sup |(X0 + h )f | > cα ) lim Pr H1n (n1/2 KS

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

n→∞

f ∈F



= lim Pr sup |(X0 + h )f | > cα j→∞

f ∈Fj



≥ lim Pr sup |X0 (f )| > cα j→∞

= Pr

(continuity of prob. meas.)

f ∈Fj

(Anderson’s lemma)



sup |X0 (f )| > cα

f ∈∪∞ j=1 Fj



(continuity of prob. meas.).

Hence, our test has nontrivial power against the local alternatives.

6.

Monte Carlo study

We now investigate the small sample properties of n1/2 Rˆ max for a linear IV model. Results are reported in Tables 1–5 for three different sample sizes: n = 50, 100, 200, and two sets of designs: the first with one instrument and the second with two instruments. (Parameters in the former are Table 1.

Empirical size of n1/2 Rˆ max for the linear IV model with one instrument.

Design

n

 corr(X, U)

High endogeneity, weak instrument √ r = 1, π1 = 1/99 √ σ1 = 1, σ2 = 2, σV = 1, γ = 0.99

50

0.8

0.03

100 200

0.8 0.8

0.02 0.02

50

0.6

0.61

100 200

0.6 0.6

0.6 0.6

50

0.06

0.03

100 200

0.06 0.06

0.02 0.02

50

0.07

0.6

100 200

0.07 0.07

0.6 0.6

High endogeneity, strong instrument r = 0.5, π1 = 3 √ σ1 √ = 0.3, σ2 =√ 0.31, σV = 1.5, γ = 0.65 Low endogeneity, weak instrument r = 1, π1 = 0.1 √ σ1 = 1, σ2 = 2, σV = 1, γ = 0.08 Low endogeneity, strong instrument √ r = 9 11, π1 = 0.04 √ σ1 = 1, σ2 = 2, σV = 1, γ = 0.14

2 RX,W

FX,W

0.01

0.05

0.10

1.6

0.005 (0.003)

0.021 (0.007)

0.04 (0.009)

2.2 3.1

0.003 (0.003) 0.000 (0.003)

0.016 (0.007) 0.005 (0.007)

0.038 (0.009) 0.019 (0.009)

79

0.014 (0.003)

0.06 (0.007)

0.106 (0.009)

154 305

0.01 (0.003) 0.014 (0.003)

0.043 (0.007) 0.041 (0.007)

0.094 (0.009) 0.087 (0.009)

1.6

0.009 (0.003)

0.037 (0.007)

0.091 (0.009)

2.2 3.2

0.014 (0.003) 0.011 (0.003)

0.048 (0.007) 0.046 (0.007)

0.088 (0.009) 0.084 (0.009)

82

0.014 (0.003)

0.043 (0.007)

0.083 (0.009)

161 317

0.016 (0.003) 0.007 (0.003)

0.044 (0.007) 0.051 (0.007)

0.095 (0.009) 0.097 (0.009)

∗ Notes: The last three columns report the fraction of simulations for which n1/2 Rˆ max > cα,B . Monte Carlo standard errors are given in parentheses.

290

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Table 2.

T. Chen and G. Tripathi Empirical power of n1/2 Rˆ max for the linear IV model with one instrument. FX,W

0.01

Nominal size (α) 0.05

0.10

0.03

1.5

0.048

0.095

0.156

0.98 0.98

0.01 0.01

1.3 1.1

0.035 0.038

0.101 0.111

0.181 0.188

50

0.51

0.62

93

0.132

0.319

0.442

100 200

0.56 0.59

0.6 0.58

164 291

0.315 0.716

0.561 0.903

0.678 0.950

50

0.07

0.02

1.1

0.02

0.093

0.153

100 200

0.06 0.07

0.01 0.01

1.5 2.3

0.08 0.31

0.237 0.45

0.307 0.573

50

0.04

0.66

108

0.106

0.3

0.431

100 200

0.06 0.07

0.63 0.6

189 343

0.308 0.706

0.569 0.894

0.702 0.958

Design

n

 corr(X, U)

2 RX,W

High endogeneity, weak instrument π1 = 1/n, γ = 0.99

50

0.98

100 200

High endogeneity, strong instrument π1 = 2.4, γ = 0.95 Low endogeneity, weak instrument π1 = 0.2, γ = 0.1 Low endogeneity, strong instrument π1 = 2.6, γ = 0.18

∗ Note: The last three columns report the fraction of simulations for which n1/2 Rˆ max > cα,B .

Table 3.

Empirical size of n1/2 Rˆ max for the linear IV model with two instruments. Nominal size (α) 0.05

0.10

1.1 0.013 (0.003)

0.032 (0.007)

0.059 (0.009)

0.02

1

0.009 (0.003)

0.023 (0.007)

0.052 (0.009)

0.8

0.01

1

0.005 (0.003)

0.016 (0.007)

0.04 (0.009)

0.55

0.61

39

0.014 (0.003)

0.055 (0.007)

0.099 (0.009)

2  corr(X, U) RX,W

Design

n

High endogeneity, collectively weak instruments

50

0.8

0.04

r = 1, π1 = 1/n, π2 = 1/n √ √ (2) σ1 = W √ + 0.5, σ2 = 10, σV = 10, γ = 9.99

100

0.8

200 50

High endogeneity, collectively strong instruments √ √ r = 0.5, π1 = 2/3, π2 = 1/3 √ σ1 = 0.3, σ2 = (2) + 0.5), 0.31(W √ σV = 1.5, γ = 0.65 Low endogeneity, collectively weak instruments √ r = 1,√π1 = 0.1 2/3, π2 = 0.1 1/3 σ1 = 1, σ2 = 2(W (2) + 0.5), σV = 1, γ = 0.08 Low endogeneity, collectively strong instruments √ r = 9 11, π1 = 0.03, π2 = 0.02 √ √ σ1 = W (2) + 0.5, σ2 = 2, σV = 1, γ = 0.14

FX,W

0.01

100

0.55

0.6

76

0.009 (0.003)

0.049 (0.007)

0.103 (0.009)

200

0.55

0.6

150

0.016 (0.003)

0.051 (0.007)

0.109 (0.009)

50

0.06

0.05

1.3 0.019 (0.003)

0.055 (0.007)

0.099 (0.009)

100

0.06

0.03

1.5 0.012 (0.003)

0.056 (0.007)

0.099 (0.009)

200

0.05

0.02

1.9 0.009 (0.003)

0.047 (0.007)

0.082 (0.009)

50

0.03

0.6

41

0.012 (0.003)

0.051 (0.007)

0.101 (0.009)

100

0.03

0.6

79

0.012 (0.003)

0.056 (0.007)

0.099 (0.009)

200

0.02

0.6

156

0.018 (0.003)

0.057 (0.007)

0.116 (0.009)

∗ Notes: The last three columns report the fraction of simulations for which n1/2 Rˆ max > cα,B . Monte Carlo standard errors are given in parentheses.

Journal of Nonparametric Statistics

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Table 4.

291

Empirical power of n1/2 Rˆ max for the linear IV model with two instruments.

Design

n

 corr(X, U)

2 RX,W

FX,W

0.01

Nominal size (α) 0.05

0.10

High endogeneity, collectively weak instruments √ √ π1 = 1/n 2/3, π2 = 1/n 1/3, γ = 0.99

50

0.98

0.05

1.2

0.036

0.105

0.163

100

0.99

0.02

1.2

0.035

0.121

0.194

200

0.99

0.01

1.1

0.052

0.153

0.225

50

0.6

0.5

30

0.084

0.238

0.383

100

0.6

0.5

55

0.252

0.544

0.707

High endogeneity, collectively strong instruments √ √ π1 = 2.4 2/3, π2 = 2.4 1/3, γ = 0.95 Low endogeneity, collectively weak instruments √ √ π1 = 0.2 2/3, π2 = 0.2 1/3, γ = 0.1 Low endogeneity, collectively strong instruments √ √ π1 = 2.6 2/3, π2 = 2.6 1/3, γ = 0.18

200

0.65

0.5

99

0.725

0.917

0.966

50

0.1

0.05

1.2

0.056

0.133

0.227

100

0.09

0.05

1.1

0.117

0.255

0.365

200

0.09

0.01

1.5

0.371

0.526

0.624

50

0.07

0.58

37

0.068

0.211

0.331

100

0.08

0.55

64

0.237

0.493

0.644

200

0.09

0.53

116

0.699

0.896

0.949

∗ Note: The last three columns report the fraction of simulations for which n1/2 Rˆ max > cα,B .

estimated using a set of just-identified moment conditions, those in the latter are estimated from a set of over-identified moment conditions.) To keep the running time manageable, these results are based on 1000 simulations and 500 replications per simulation for simulating the critical values. Code for the simulation experiment is written in R and is available from the authors. 6.1.

Linear IV model with one instrument

We consider the simple linear regression model Y := θ0 + θ1 X + U, where the regressor X is correlated with U. The reduced form for the endogenous regressor is given by X := π0 + π1 W + V , with W the single instrument. We fix θ0 = θ1 = π0 = 1; choice of π1 is described subsequently. The random variables U and W are generated such that the conditional distribution of U given W is symmetric about the origin without U being independent of W . This is done as follows. First, for a pre-selected positive number r specified in Table 1, we draw W from the two-point distribution (δr + δ−r )/2. Next, given W , we generate (U, V ) bivariate normal with zero mean and covariance γ ; choice of γ is described subsequently. The variance of U, denoted by σU2 , is allowed to depend upon W by choosing σU2 := σ12 1(W = r) + σ22 1(W = −r), with σ1  = σ2 ; the variance of V , denoted by σV2 , is constant. The parameters (θ0 , θ1 ) are estimated from the just-identified moment conditions EU = 0 and E[WU] = 0 using the simulated data 1 ),  (Y1 , X1, W . . . , (Yn , Xn , Wn ). That ˜ j X˜ j )−1 nj=1 W ˜ j Yj , where W ˜ := is, the estimator of (θ0 , θ1 ) is given by θˆ := (θˆ0 , θˆ1 )2×1 := ( nj=1 W (1, W )2×1 and X˜ := (1, X)2×1 . As mentioned earlier in Sections 3 and 4, the finite sample performance of our test may be affected if the instruments are weak and the regressors are highly endogenous. To study the impact that the degree of endogeneity of X and strength of the instrument W may have on the small sample properties of our test statistic, the empirical size and power of n1/2 Rˆ max are reported in Tables 1 and 2, respectively, for the 2 × 2 contingency table {high endogeneity, low endogeneity} ×

292

T. Chen and G. Tripathi

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Table 5. Empirical size and power of n1/2 Rˆ max for the linear IV model with one strong (W (1) ) and one weak (W (2) ) instrument.

0.01

Nominal size (α) 0.05

0.10

39

0.014 (0.003)

0.045 (0.007)

0.078 (0.009)

0.6

76

0.011 (0.003)

0.054 (0.007)

0.098 (0.009)

0.55

0.6

150

0.013 (0.003)

0.05 (0.007)

0.107 (0.009)

50

0.54

0.6

44

0.082

0.21

0.357

100

0.56

0.6

79

0.239

0.533

0.693

200

0.6

0.6

141

0.721

0.91

0.959

50

0.03

0.6

41

0.006 (0.003)

0.045 (0.007)

0.091 (0.009)

100

0.03

0.6

80

0.015 (0.003)

0.048 (0.007)

0.09 (0.009)

200

0.02

0.6

156

0.015 (0.003)

0.062 (0.007)

0.117 (0.009)

50

0.06

0.67

53

0.068

0.195

0.323

100

0.07

0.63

92

0.224

0.488

0.646

200

0.08

0.61

166

0.681

0.889

0.951

Design

n

 corr(X, U)

2 RX,W

FX,W

High endogeneity (null hypothesis is true) √ r = 0.5, √ π1 = 0.99, π2 = 0.01 √ σ1 = 0.3, σ2 = (2) + 0.5), 0.31(W √ σV = 1.5, γ = 0.65

50

0.55

0.61

100

0.55

200

High endogeneity (alternative hypothesis is true) √ 2.4 0.99, π2 = π1 = √ 2.4 0.01, γ = 0.95 Low endogeneity (null hypothesis is true) √ r = 0.5, √ π1 = 0.99, π2 = 0.01 √ σ1 = 0.3, σ2 = (2) + 0.5), 0.31(W √ σV = 1.5, γ = 0.65 Low endogeneity (alternative hypothesis is true) √ π1 = √ 2.6 0.99, π2 = 2.6 0.01, γ = 0.18

∗ Notes: The last three columns report the fraction of simulations for which n1/2 Rˆ max > cα,B . Monte Carlo standard errors are given in parentheses.

{weak instrument, strong instrument}. By a strong (resp. weak) instrument we mean here that 2 2 2 ≥ 0.6 (resp. RX,W RX,W ≤ 0.03), where RX,W is the R2 statistic for the reduced form regression. 2 Tables 1 and 2 report RX,W and the first-stage F-statistic FX,W for testing π1 = 0, averaged over the simulations. (The strength of the instrument can also be stated in terms of the sample ‘concentration parameter’ by using the fact that since EW = 0, the population concentration parameter is given 2 by R02 /(1 − R02 ), where R02 is the population R2 for the reduced form regression, that is, RX,W is the 2  sample analogue of R0 .] Similarly, high (resp. low) endogeneity means that |corr(X, U)| ≥ 0.8   denotes sample correlation. (resp. |corr(X, U)| ≤ 0.08), where corr A point to note: since the max√ imum value that corr 2 (X, W ) ∧ corr(X, U) can take is ( 5 − 1)/2 (cf. Lemma A.1) for the high 2  endogeneity and strong instrument design, we generate the data so that RX,W ≈ corr(X, U) ≈ 0.6. The values of π1 and γ for the first design in Tables 1 and 2 are chosen to create a combination of highly endogenous X, weak instrument, and high correlation between U and V . As noted by Maddala and Jeong (1992), this trifecta causes θˆ to be nonnormal in small samples so that our test is not likely to perform well for this design (recall the discussion after Assumption 4.3). Values of π1 and γ for the remaining three designs were chosen so that θˆ is approximately normal in finite samples. (This was verified by looking at the quantile plots of θˆ across the simulations. To save space, these plots are not reported in the paper.) The results in Table 1 suggest that our test is working as expected. When X is highly endogenous, W is a weak instrument, and U and V are highly correlated, the empirical size of n1/2 Rˆ max is not

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Journal of Nonparametric Statistics

293

close to its nominal value. Except for this worst case scenario, the empirical size is not statistically different from its nominal value at 5% level of significance. This makes intuitive sense when the instrument is strong because θˆ then behaves as expected irrespective of the degree of endogeneity. However, as Table 1 reveals, an interesting finding is that the empirical size can be close to nominal size even when the instrument is weak provided the regressor is not very endogenous, that is, the third design. This is because θˆ is approximately Gaussian for this design so that the critical values are accurately simulated. We also carried out a small simulation to examine the power of n1/2 Rˆ max in finite samples. Now, d ˜ V˜ ) bivariate normal θ0 = θ1 = π0 = 1 and W = (δ1 + δ−1 )/2. Independent of W , we generate (U, U˜ 0.5 with mean zero, variance one, covariance γ , and let U := W (e − e ) and V := W (eV˜ − e0.5 ), that is, U and V are symmetrised shifted lognormals. Note that U is conditionally asymmetric but marginally symmetric. Moreover, EU = 0 and E[WU] = 0 so that the structural parameters (θ0 , θ1 ) remain identified, and θˆ remains consistent for (θ0 , θ1 ), under the alternative hypothesis. The results in Table 2 again validate our intuition. In particular, as expected, the test exhibits poor power for the first design (where X is highly endogenous, W is a weak instrument, and U and V are highly correlated), and the power does not seem to improve as the sample size gets larger. The performance improves markedly for the weak instrument case when the degree of endogeneity gets smaller, namely, power increases with sample size and is several times higher when compared to the first design by the time n = 200. The test, of course, works best in terms of power when the instrument is strong; in particular, its power then appears to be robust to the degree of endogeneity. Next, we enlarge the set of IV to see how that affects the finite sample size and power of n1/2 Rˆ max . We consider a model with two instruments so that in addition to the cases considered earlier, we can now also examine the finite sample properties of n1/2 Rˆ max in a setting where one instrument is strong and the other is weak. (Intuitively, we expect the strong instrument to compensate for the weak one.)

6.2.

Linear IV model with two instruments

As in Section 6.1, Y := θ0 + θ1 X + U, where X is correlated with U. However, we now have two instruments W := (W (1) , W (2) )2×1 such that, under the null hypothesis, the conditional distribution of U given W is symmetric about the origin. The reduced form model for the endogenous regressor is given by X := π0 + π1 W (1) + π2 W (2) + V . We set θ0 = θ1 = π0 = 1 as before. The choice of π1 and π2 is described in Tables 3–5. The first IV W (1) is identical to the instrument in Section 6.1 and is generated as described earlier under the null hypothesis as well as the alternative hypothesis. The second instrument W (2) is stochastically independent of W (1) and is generated as follows. d √ Under the null hypothesis, W (2) = r 12 Uniform(0, 1) is a rescaled uniform random variable, where r is the pre-selected positive number used to generate W (1) . Under the alternative hypothesis, d W (2) = Uniform(0, 1). The error terms U and V are generated as follows. Under the null hypothesis, (U, V ) is drawn from a bivariate normal distribution with mean zero and covariance γ . The variance of U is σU2 := σ12 1(W (1) = r) + σ22 1(W (1) = −r), with σ1 or σ2 depending upon W (2) . Thus, under the null hypothesis, U is heteroscedastic with respect to both W (1) and W (2) . The variance of V is constant. Under the alternative hypothesis, U := W (1) (eU˜ − e1/2 + W (2) − 0.5) and V := ˜ V˜ ) as defined in Section 6.1. The values of σ1 , σ2 , σV , γ W (1) (eV˜ − e1/2 + W (2) − 0.5) with (U, for various designs under the null and the alternative hypotheses are given in Table 3–5. The distribution of (U, W (1) , W (2) ) is chosen such that the moment conditions EU = 0, E[W (1) U] = 0 and E[W (2) U] = 0 hold under the null as well as the alternative hypotheses. Since these moment conditions over-identify (θ0 , θ1 ) under the null and the alternative

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

294

T. Chen and G. Tripathi

hypotheses,(θ0 , θ1 ) is consistently estimatedwith the two-step optimal GMM estima n ˆ −1 (θ˜ )( nj=1 W ˆ −1 (θ)( ˜ ˜ j ) ˜ j X˜ j )]−1 ( nj=1 X˜ j W ˜ j ) ˜ ˜ tor θˆ := [( nj=1 X˜ j W j=1 Wj Yj ), where W :=     n n n n ˜  )( j=1 W ˜ j X˜ j )]−1 ( j=1 X˜ j W ˜ j )( j=1 W ˜ j Yj ) is a preliminary (1, W (1) , W (2) )3×1 , θ˜ := [( j=1 X˜ j W nj   2 ˆ θ˜ ) := j=1 W ˜ /n is an estimator of the efficient weight˜ jW ˜ j (Yj − X˜ j θ) estimator of (θ0 , θ1 ) and ( 1/2 ˆ ˜ ing matrix using θ . The size and power of n Rmax when the instruments are collectively weak or collectively strong are given in Tables 3 and 4. Following Section 6.1, we say that the instruments are collectively weak, which necessarily implies that the instruments are individually weak, if the R2 of the reduced form is less than or equal to 0.03. Similarly, the instruments are called collectively strong if the R2 of the reduced form is greater than or equal to 0.60. This does not rule out the case, handled separately, that one of the instruments is strong while the other is weak. Table 5 contains results for the case when W (1) is strong and W (2) is weak. By this we mean that W (1) explains about 99% of the R2 for the reduced form and W (2) the remaining 1%. (To be consistent with the definition of collectively strong instruments, the R2 of the reduced form in Table 5 is set roughly equal to 0.6.) Since W (1) and W (2) are independent with equal variance, their contribution R2 is determined by π1 and π2 , respectively. Consequently, setting √ to the population (1) π1 /π2 = 99 makes W strong and W (2) weak. Other parameter values are as in Tables 3 and 4. The results in Tables 3–5 are qualitatively very similar to those in Tables 1–2. Namely, under the null hypothesis, the empirical size of n1/2 Rˆ max is not close to its nominal value only when the regressor is highly endogenous, both instruments are weak, and U and V are highly correlated. In all other cases, the empirical size is not statistically different from its nominal value at 5% level of significance. Similarly, under the alternative hypothesis, n1/2 Rˆ max has poor power for the design when the regressor is highly endogenous, both instruments are weak, and U and V are highly correlated. However, the power improves rapidly as the regressor becomes less endogenous. These findings are in accordance with our intuition that, in general, regardless of the degree of endogeneity, a strong instrument can compensate for a weak one, so that the finite sample properties of n1/2 Rˆ max are not much affected by a weak instrument, provided that a strong enough instrument is also present. Overall, the simulation results in Tables 1–5 suggest that a combination of weak instruments and highly endogenous regressors can significantly affect the performance of n1/2 Rˆ max . Apart from this worst-case scenario, however, our test appears to perform well in terms of both size and power for moderately sized samples.

7.

Conclusion

We have shown how to consistently test for conditional symmetry in the presence of endogenous regressors without making any distributional assumptions and without doing nonparametric smoothing. The KS-type statistic we propose is easy to implement because it does not require optimisation over an uncountable set, and it can detect n1/2 -deviations from the null. Simulation results suggest that our test can work very well in moderately sized samples. Acknowledgements We thank an anonymous referee for helpful comments.

References Ahmad, I.A., and Li, Q. (1997), ‘Testing Symmetry of an Unknown Density Function by Kernel Method’, Journal of Nonparametric Statistics, 7, 279–293.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Journal of Nonparametric Statistics

295

Andrews, D.W.K. (1997), ‘A Conditional Kolmogorov Test’, Econometrica, 65, 1097–1128. Bai, J., and Ng, S. (2001), ‘A Consistent Test for Conditional Symmetry of Time Series’, Journal of Econometrics, 103, 225–258. Bickel, P.J. (1982), ‘On Adaptive Estimation’, Annals of Statistics, 10, 647–671. Card, D., and Hyslop, D. (1996), ‘Does Inflation “Grease the Wheels of the Labor Market"?’, NBER Working Paper 5538. Choi, I., and Phillips, P.C.B. (1992), ‘Asymptotic and Finite Sample Distribution Theory for IV Estimators and Tests in Partially Identified Structural Equations’, Journal of Econometrics, 51, 113–150. Christofides, L.N., and Stengos, T. (2001), ‘A Non-parametric Test of the Symmetry of PSID Wage-Change Distributions’, Economics Letters, 71, 363–368. Davydov, Y.A., Lifshits, M.A., and Smorodina, N.V. (1998), Local Properties of Distributions of Stochastic Functionals, Providence, Rhode Island: American Mathematical Society. Delgado, M.A., Domínguez, M.A., and Lavergne, P. (2006), ‘Consistent Tests of Conditional Moment Restrictions’, Annales d’Economie et de Statistique, 81, 33–67. Delgado, M.A., and Escanciano, J.C. (2007), ‘Nonparametric Tests for Conditional Symmetry in Dynamic Models’, Journal of Econometrics, 141, 652–682. Fan, Y., and Gencay, R. (1995), ‘A Consistent Nonparametric Test of Symmetry in Linear Regression Models’, Journal of the American Statistical Association, 90, 551–557. Hájek, J. (1972), ‘Local Asymptotic Minimax and Admissibility in Estimation’, in Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 10), Berkeley, CA: University of California Press, pp. 175–194. Hall, A.R., and Inoue, A. (2003), ‘The Large Sample Behaviour of the Generalized Method of Moments Estimator in Misspecified Models’, Journal of Econometrics, 114, 361–394. Hansen, B.E. (1996), ‘Inference When a Nuisance Parameter Is Not Identified under the Null Hypothesis’, Econometrica, 64, 413–430. Harvey, C.R., and Siddique, A. (2000), ‘Conditional Skewness in Asset Pricing Tests’, Journal of Finance, LV, 1263–1295. Hyndman, R.J., and Yao, Q. (2002), ‘Nonparametric Estimation and Symmetry Tests for Conditional Density Functions’, Journal of Nonparametric Statistics, 14, 259–278. Khmaladze, E.V. (1993), ‘Goodness of Fit Problem and Scanning Innovation Martingales’, Annals of Statistics, 21(2), 798–829. Khmaladze, E.V., and Koul, H. (2004), ‘Martingale Transforms Goodness-of-Fit Tests in Regression Models’, Annals of Statistics, 32(3), 995–1034. Maddala, G.S., and Jeong, J. (1992), ‘On the Exact Small Sample Distribution of the Instrumental Variable Estimator’, Econometrica, 60, 181–183. Manski, C. (1984), ‘Adaptive Estimation of Nonlinear Regression Models’, Econometric Reviews, 3, 145–194. Nelson, C.R., and Startz, R. (1990), ‘Some Further Results on the Exact Small Sample Properties of the Instrumental Variables Estimator’, Econometrica, 58, 967–976. Neumeyer, N., and Dette, H. (2007), ‘Testing for Symmetric Error Distribution in Nonparametric Regression Models’, Statistica Sinica, 17, 775–795. Neumeyer, N., Dette, H., and Nagel, E.-R. (2005), ‘A Note on Testing Symmetry of the Error Distribution in Linear Regression Models’, Journal of Nonparametric Statistics, 17, 697–715. Newey, W.K. (1988a), ‘Adaptive Estimation of Regression Models via Moment Restrictions’, Journal of Econometrics, 38, 301–339. Newey, W.K. (1988b), ‘Efficient Estimation of Tobit Models Under Conditional Symmetry’, in Nonparametric and Semiparametric Methods in Econometrics and Statistics. Proceedings of the Fifth International Symposium in Economic Theory and Econometrics, eds. W.A. Barnett, J. Powell, and G. Tauchen, New York: Cambridge University Press, pp. 291–336. Newey, W.K., and McFadden, D. (1994), ‘Large Sample Estimation and Hypothesis Testing’, in Handbook of Econometrics, Vol. IV, eds. R. Engle, and D. McFadden, The Netherlands: Elsevier Science B.V., pp. 2111–2245. Newey, W.K., and Powell, J.L. (1987), ‘Asymmetric Least Squares Estimation and Testing’, Econometrica, 55, 819–847. Phillips, P.C.B. (1989), ‘Partially Identified Econometric Models’, Econometric Theory, 5, 181–240. Powell, J.L. (1986), ‘Symmetrically Trimmed Least Squares Estimation for Tobit Models’, Econometrica, 54, 1435–1460. Powell, J.L (1994), ‘Estimation of Semiparametric Models’, in Handbook of Econometrics, Vol. IV, eds. R. Engle, and D. McFadden, The Netherlands: Elsevier Science B.V., pp. 2443–2521. Randles, R.H., and Wolfe, D.A. (1979), Introduction to the Theory of Nonparametric Statistics, New York: John Wiley & Sons. Sakata, S. (2007), ‘Instrumental Variable Estimation Based on Conditional Median Restriction’, Journal of Econometrics, 141, 350–382. Stengos, T., and Wu, X. (2004), ‘Information-Theoretic Distribution Tests with Application to Symmetry and Normality’, http://ssrn.com/abstract=512902. Su, J.Q., and Wei, L. (1991), ‘A Lack-of-Fit Test for the Mean Function in a Generalized Linear Model’, Journal of the American Statistical Association, 86, 420–426. van der Vaart, A.W. (1998), Asymptotic Statistics, New York: Cambridge University Press. van der Vaart, A.W., and Wellner, J.A. (1996), Weak Convergence and Empirical Processes: With Applications to Statistics, New York: Springer-Verlag. van der Vaart, A.W., and Wellner, J.A. (2007), ‘Empirical Processes Indexed by Estimated Functions’, in Asymptotics: Particles, Processes and Inverse Problems, eds., Cator, E.A., Jongbloed, G., Kraaikamp, C., Lopuhaa, H.P., and

296

T. Chen and G. Tripathi

Wellner, J.A., Vol. 55 of IMS Lecture Notes-Monograph Series, Beachwood, OH: Institute of Mathematical Statistics, pp. 234–252. Zheng, J.X. (1998), ‘Consistent Specification Testing for Conditional Symmetry’, Econometric Theory, 14, 139–149.

Appendix 1. Proofs for Section 2

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Proof of Equation (3) Let pε,W denote the density of (ε, W ) with respect to an appropriate dominating measure. Then, d

(ε, W ) = (−ε, W ) ⇐⇒ pε,W (u, w) = p−ε,W (u, w) ∀(u, w) ∈ R × supp(W ) ⇐⇒ pε|W =w (u)pW (w) = p−ε|W =w (u)pW (w) ∀(u, w) ∈ R × supp(W ) ⇐⇒ pε|W =w (u) = p−ε|W =w (u)

∀(u, w) ∈ R × supp(W )

d

⇐⇒ ε|W = −ε|W .



Appendix 2. Proofs for Section 3 Proof of Lemma 3.1 Let mA0 := (∂1 log pA0 )μ˙ 0 . We only show Equation (7), that is, sup |(Pˆ Zˆ − Pˆ Z )f − f , mA 0 PA0 (θˆ − θ0 )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θ0 ).

f ∈F

(A.1)

Since (Pˆ rˆ − Pˆ rZ )f = (Pˆ Zˆ − Pˆ Z )f r , Equation (8) follows by replacing f with f r in Equation (A.1). So, let f ∈ F and recall Z that (X, θ, θ0 ) := μ(X, θ ) − μ(X, θ0 ). Since f (Y − μ(X, θ ), W ) = f (ε − (X, θ , θ0 ), W ) =: f θ (ε, X, W ), we have (Pˆ Zˆ − Pˆ Z )f = n−1

n 

[f (εj − (Xj , θˆ , θ0 ), Wj ) − f (εj − (Xj , θ0 , θ0 ), Wj )]

j=1

= n−1

n 

ˆ

[f θ (εj , Xj , Wj ) − f0θ (εj , Xj , Wj )]

j=1 ˆ

= Pˆ A0 (f θ − f0θ ). ˆ ˆ Hence, we can write (Pˆ Zˆ − Pˆ Z )f = (Pˆ A0 − PA0 )(f θ − f0θ ) + PA0 (f θ − f0θ ). Consequently, to prove Equation (A.1) it suffices to show that ˆ

sup |n1/2 (Pˆ A0 − PA0 )(f θ − f0θ )| = oPr◦ (1),

f ∈F ˆ

sup |PA0 (f θ − f0θ ) − f , mA 0 PA0 (θˆ − θ0 )| = oPr ( θˆ − θ0 ).

f ∈F

(A.2) (A.3)

We will use equicontinuity of the empirical process n1/2 (Pˆ A0 − PA0 ) to demonstrate Equation (A.2) and mean-square 1/2 differentiability of pA0 to show that Equation (A.3) holds. (Compare the proof of Proposition 2.2 in Khmaladze and Koul (2004) for a similar approach.) We begin with Equation (A.2). Given f ∈ F , we know that f θ , f0θ ∈ F by Assumption 3.3(iv) and f θ − f0θ 2,PA0 ≤ q( θ − θ0 ) by Assumption 3.3(v), where q is continuous and passes through the origin, that is, q(0) = 0. Hence, as |(Pˆ A0 − PA0 )(f θ − f0θ )| ≤

sup f θ ,f0θ ∈F : f θ −f0θ 2,PA ≤q( θ −θ0 )

|(Pˆ A0 − PA0 )(f θ − f0θ )|

0



sup

g,h∈F : g−h 2,PA ≤q( θ −θ0 ) 0

|(Pˆ A0 − PA0 )(g − h)|

Journal of Nonparametric Statistics

297

and the right-hand side does not depend upon f , sup |(Pˆ A0 − PA0 )(f θ − f0θ )| ≤

f ∈F

sup

g,h∈F : g−h 2,PA ≤q( θ −θ0 )

|(Pˆ A0 − PA0 )(g − h)|.

(A.4)

0

Since F is PA0 -Donsker by Assumption 3.3(iii), the empirical process {n1/2 (Pˆ A0 − PA0 )f : f ∈ F } is asymptotically equicontinuous (V&W, Section 2.1.2), that is, ∀ > 0, ⎛ ⎞ lim lim sup Pr ◦ ⎝

θ →θ0 n→∞

sup

g,h∈F : g−h 2,PA ≤q( θ −θ0 )

|n1/2 (Pˆ A0 − PA0 )(g − h)| > ⎠ = 0.

(A.5)

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

0

Therefore, Equation (A.2) follows by Equations (A.4) and (A.5) because θˆ is a consistent estimator of θ0 (Assumption 3.4). Next, we show Equation (A.3). Begin by observing that, for θ ∈ , PA0 f θ = f (u − (x, θ , θ0 ), w)PA0 (du, dx, dw) =

f (u − (x, θ , θ0 ), w)pA0 (u, x, w) du κ(dx, dw) f (t, w)pA0 (t + (x, θ , θ0 ), x, w) dt κ(dx, dw)

=

by the translation invariance of Lebesgue measure. Hence,   pA0 (t + (x, θˆ , θ0 ), x, w) θˆ θ PA0 (f − f0 ) = f (t, w) − 1 PA0 (dt, dx, dw). pA0 (t, x, w)

(A.6)

Since 1/2

1/2

pA0 (u + (x, θ , θ0 ), x, w) − pA0 (u, x, w) (6) 1 ∂1 pA0 (u, x, w) = (x, θ , θ0 ) + r((x, θ , θ0 ), u, x, w), 1/2 2 pA0 (u, x, w) pA0 (u, x, w) it follows that pA0 (t + (x, θˆ , θ0 ), x, w) − 1 = (∂1 log pA0 (t, x, w))(x, θˆ , θ0 ) pA0 (t, x, w) + 0.25(∂1 log pA0 (t, x, w))2 2 (x, θˆ , θ0 ) + r 2 ((x, θˆ , θ0 ), t, x, w) + 2r((x, θˆ , θ0 ), t, x, w) + r((x, θˆ , θ0 ), t, x, w)(∂1 log pA0 (t, x, w))(x, θˆ , θ0 ).

(A.7)

We use the decomposition in Equation (A.7) to handle Equation (A.6). By Assumption 3.1, f (t, w)(∂1 log pA0 (t, x, w))(x, θˆ , θ0 )PA0 (dt, dx, dw) =

f (t, w)(∂1 log pA0 (t, x, w))μ˙ 0 (x)(θˆ − θ0 )PA0 (dt, dx, dw)

+

f (t, w)(∂1 log pA0 (t, x, w))ρ(x, θˆ )PA0 (dt, dx, dw).

By Assumptions 3.2 and 3.3(ii), vA0 ∞ ∨ MF < ∞. Hence, by Jensen and Assumption 3.1, 2     f (t, w)(∂1 log pA (t, x, w))ρ(x, θˆ )PA (dt, dx, dw) 0 0   ≤

2 MF

⎛ (∂1 log pA0 (t, x, w)) ⎝ 2

sup

ˆ 0 ) θ ∈B(θ0 , θ−θ

2 = MF

⎞2

vA0 (x, w)

2 ≤ MF vA0 ∞

= o( θˆ − θ0 2 ).

|ρ(x, θ , θ0 )|⎠ PA0 (dt, dx, dw)

ρ 2 (x, θ , θ0 )PX,W (dx, dw)

sup ˆ 0 ) θ ∈B(θ0 , θ−θ

sup

ρ 2 (x, θ , θ0 )PX (dx)

ˆ 0 ) θ ∈B(θ0 , θ−θ

(A.8)

298

T. Chen and G. Tripathi

Therefore,

    sup  f (t, w)(∂1 log pA0 (t, x, w))(x, θˆ , θ0 )PA0 (dt, dx, dw) − f , mA 0 PA0 (θˆ − θ0 ) = o( θˆ − θ0 ).

f ∈F

Next, by Assumption 3.1,      f (t, w)(∂1 log pA (t, x, w))2 2 (x, θˆ , θ0 )PA (dt, dx, dw) 0 0  

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

≤ 2MF





⎜ (∂1 log pA0 (t, x, w))2 ⎝ μ˙ 0 (x) 2 θˆ − θ0 2 + ⎝

sup

ˆ 0 ) θ ∈B(θ0 , θ−θ

= 2MF

⎛ vA0 (x, w) ⎝ μ˙ 0 (x) θˆ − θ0 +

≤ 2MF vA0 ∞

2





ρ (x, θ , θ0 )⎠ PX,W (dx, dw) 2

sup ˆ 0 ) θ ∈B(θ0 , θ−θ

⎝ μ˙ 0 (x) 2 θˆ − θ0 2 +

≤ 2MF vA0 ∞ Therefore,

⎞ 2



⎞2 ⎞ ⎟ |ρ(x, θ , θ0 )|⎠ ⎠ PA0 (dt, dx, dw)



ρ (x, θ , θ0 )⎠ PX (dx) 2

sup ˆ 0 ) θ ∈B(θ0 , θ−θ

 μ˙ 0 (x) 2 PX (dx) + o(1) θˆ − θ0 2 .

    sup  f (t, w)(∂1 log pA0 (t, x, w))2 2 (x, θˆ , θ0 )PA0 (dt, dx, dw) = O( θˆ − θ0 2 )

f ∈F

= oPr ( θˆ − θ0 )

(A.9)

because plim(θˆ ) = θ0 . Next, let > 0. Then, by Assumption 3.1 and Equation (6),      f (t, w)r 2 ((x, θˆ , θ0 ), t, x, w)PA (dt, dx, dw) 0  

r 2 ((x, θˆ , θ0 ), t, x, w)PA0 (dt, dx, dw)

≤ MF ≤ MF

≤ 2MF

2 (x, θˆ , θ0 )PX (dx) ⎛ ⎝ μ˙ 0 (x) 2 θˆ − θ0 2 +

 ≤ 2MF

⎞ sup

ρ 2 (x, θ , θ0 )⎠ PX (dx)

ˆ 0 ) θ ∈B(θ0 , θ−θ

 μ˙ 0 (x) 2 PX (dx) + o(1) θˆ − θ0 2 .

Therefore, since was arbitrary,     sup  f (t, w)r 2 ((x, θˆ , θ0 ), t, x, w)PA0 (dt, dx, dw) = o( θˆ − θ0 2 ). f ∈F

A similar argument using Jensen’s inequality reveals that     sup  f (t, w)r((x, θˆ , θ0 ), t, x, w)PA0 (dt, dx, dw) = o( θˆ − θ0 ). f ∈F

Finally, since

     f (t, w)r((x, θˆ , θ0 ), t, x, w)∂1 log pA (t, x, w)(x, θˆ , θ0 )PA (dt, dx, dw) 0 0    ≤

 ×

1/2

|f (t, w)|r 2 ((x, θˆ , θ0 ), t, x, w)PA0 (dt, dx, dw)

1/2

|f (t, w)|(∂1 log pA0 (t, x, w))2 2 (x, θˆ , θ0 )PA0 (dt, dx, dw)

by Cauchy–Schwarz, the arguments leading to Equations (A.9) and (A.10) show that

(A.10)

Journal of Nonparametric Statistics

299

    sup  f (t, w)r((x, θˆ , θ0 ), t, x, w)(∂1 log pA0 (t, x, w))(x, θˆ , θ0 )PA0 (dt, dx, dw) = o( θˆ − θ0 2 ).

f ∈F

(A.11) 

Therefore, Equation (A.3) follows by Equations (A.6)–(A.11). Proof of Lemma 3.2 Recall that

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

ˆ 0 (f ) := n1/2 (Pˆ Z − PZ )(f − f r ) + f − f r , mA P n1/2 (θˆ − θ0 ), X A0 0

f ∈ F.

ˆ 0 are bounded functions on F . Hence, by V&W (Theorem 1.5.4), Assumption 3.3(ii) implies that the sample paths of X ˆ 0 (f ) : f ∈ F } is ˆ 0 (f ) : f ∈ F } converges in distribution in ∞ (F ) to a tight limit process {X0 (f ) : f ∈ F } ⊂ ∞ (F ) if {X {X ˆ 0 (fk ) converge in distribution in Rk to the marginals X0 (f1 ), . . . , X0 (fk ) ˆ 0 (f1 ), . . . , X asymptotically tight and its marginals X for every finite subset f1 , . . . , fk of F . From Equation (10), we know that the limiting process, if it exists, is given by X0 (f ) := ˆ 0 (f ) : f ∈ F } is asymptotically tight. We G0 f + f − f r , mA 0 PA0 Nϕ0 . To verify its existence, it only remains to show that {X ˆ 0 (f ) is asymptotically tight in R by Equation (10). Next, since F is PA -Donsker proceed as follows. First, for each f ∈ F , X 0 (Assumption 3.3(iii)), (F , · 2,PA0 ) is totally bounded by Assumption 3.3(ii) and V&W (Problem 2.1.1). Hence, by ˆ 0 (f ) : f ∈ F } is asymptotically tight if it is asymptotically uniformly · 2,P -equicontinuous V&W (Theorem 1.5.7), {X A0 in probability, that is, for all > 0, ⎛ ⎞ lim lim sup Pr ◦ ⎝

δ→0 n→∞

ˆ 0 (f − g)| > ⎠ = 0, |X

sup

(A.12)

f ,g∈Fδ,PA

0

where Fδ,PA0 := {f , g ∈ F : f − g 2,PA0 ≤ δ}. Thus, we are done if we can show Equation (A.12). Since (f − g)r = ˆ 0 and the triangle inequality Pr ◦ (supf ,g∈F ˆ 0 (f − g)| > 2 ) ≤ aδ,n + bδ,n , where aδ,n := |X f r − gr , by definition of X δ,P Pr ◦ (supf ,g∈Fδ,P

A0

A0

|n1/2 (Pˆ Z − PZ )((f − f r ) − (g − gr ))| > ) and ⎛ bδ,n

:= Pr ⎝ ◦

⎞ | (f − f ) − (g − g r

sup

f ,g∈Fδ,PA

r

), mA 0 PA0 n1/2 (θˆ

− θ0 )| > ⎠ .

0

Now, F r is PA0 -Donsker because this is equivalent to F being PA0 -Donsker (Assumption 3.3(iii)). Consequently, F − F r := {f − gr : f , g ∈ F } is PA0 -Donsker by V&W (Theorem 2.10.2) because the difference map is Lipschitz. Hence, since the empirical process n1/2 (Pˆ Z − PZ )(F − F r ) is asymptotically tight under · 2,PA0 (V&W, 2.1.8), ⎛ ⎞ lim lim sup aδ,n ≤ lim lim sup Pr ◦ ⎝

δ→0 n→∞

δ→0 n→∞

sup

f ,g∈(F −F r )δ,PA

|n1/2 (Pˆ Z − PZ )(f − g)| > ⎠ = 0.

0

Next, by repeated applications of Cauchy–Schwarz, | (f − f r ) − (g − gr ), mA 0 PA0 (θˆ − θ0 )| ≤ (f − g) − (f r − gr ), mA0 PA0 θˆ − θ0 ≤ 2 f − g 2,PA0 vA0 ∞ (μ˙ 0 μ˙ 0 )1/2 2,PX θˆ − θ0 because f r − gr 2,PA0 = f − g 2,PA0 for all f , g ∈ F under the null hypothesis. Indeed, f r − gr 22,PA = (f − g)2 (−u, w)Pε,X,W (du, dx, dw) 0

=

(f − g)2 (−u, w)Pε,W (du, dw) (marginal integration)

=

(f − g)2 (u, w)Pε,W (du, dw)

(change of variables and Equation (2.1))

=

(f − g)2 (u, w)Pε,X,W (du, dx, dw)

(marginal integration)

= f − g 22,PA . 0

Hence, lim lim sup bδ,n ≤ lim lim sup Pr ◦ ( < 2δ vA0 ∞ (μ˙ 0 μ˙ 0 )1/2 2,PX n1/2 (θˆ − θ0 ) ) = 0

δ→0 n→∞

δ→0 n→∞

because θˆ is a n1/2 -consistent estimator of θ0 (Assumption 3.5). Therefore, Equation (A.12) holds.



300

T. Chen and G. Tripathi

Proof of Lemma 3.3 We follow the approach taken by Andrews (1997) in proving his Theorem A.2 while accounting ˆ We prove the stated for the fact that (unlike Andrews) our statistic is optimised over the estimated set of observations Z. result by showing that the cumulative distribution function (cdf) of the nonnegative random variable n1/2 Rˆ max converges pointwise to the cdf of R0 . In particular, we show that for all c ∈ [0, ∞), lim sup Pr(n1/2 Rˆ max > c) ≤ Pr(R0 > c) ≤ lim inf Pr(n1/2 Rˆ max > c). n→∞

n→∞

(A.13)

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

ˆ it follows by Corollary 3.1 that The first inequality is easy: since Rˆ max ≤ R, lim sup Pr(n1/2 Rˆ max > c) ≤ lim sup Pr(n1/2 Rˆ > c) = Pr(R0 > c). n→∞

(A.14)

n→∞

ˆ := n1/2 (Pˆ ˆ − Pˆ r ) and for g ∈ F1 define BP (g, ) := {h ∈ F1 : The second inequality requires a bit more effort. Let Y A0 Z Zˆ g − h 2,PA0 < }. By Lemma A.4, given f ∈ F1 and r > 0, there exists fˆ ∈ Fˆ1 ∩ BPA0 (f , r) w.p.a.1. Consequently, w.p.a.1,

inf

g∈BPA (f ,r)

ˆ ˆ fˆ )| ≤ max |Y(h)| ˆ |Y(g)| ≤ |Y( = n1/2 Rˆ max . h∈Fˆ 1

0

In fact, since the upper bound does not depend upon f , it follows that given T ∈ N and f1 , . . . , fT ∈ F1 , the event ˆ ≤ n1/2 Rˆ max } holds w.p.a.1. Now, En := {maxt≤T inf g∈BPA (ft ,r) |Y(g)| 0



Pr max

inf

t≤T g∈BPA (ft ,r)

ˆ |Y(g)| > c, En

≤ Pr(n1/2 Rˆ max > c, En )

0

or equivalently

Pr(max

inf

t≤T g∈BPA (ft ,r) 0

ˆ |Y(g)| > c) − Pr(max

inf

t≤T g∈BPA (ft ,r) 0

ˆ |Y(g)| > c, En )

≤ Pr(n1/2 Rˆ max > c) − Pr(n1/2 Rˆ max > c, En ). Hence, since limn→∞ Pr(En ) = 0,



lim inf Pr(n1/2 Rˆ max > c) ≥ lim inf Pr max n→∞

n→∞

inf

t≤T g∈BPA (ft ,r) 0

= Pr max

inf

t≤T g∈BPA (ft ,r) 0

ˆ |Y(g)| >c

|X0 (g)| > c

≥ sup sup Pr max T ∈N r>0

 (A.15) 

inf

t≤T g∈BPA (ft ,r)

|X0 (g)| > c ,

(A.16)

0

where Equation (A.15) follows by Lemma 3.2 applied to F1 plus the continuous mapping theorem and Equation (A.16) is the tightest lower bound. From the proof of Lemma 3.2, we know that (F1 , · 2,PA0 ) is totally bounded, that is, given r > 0, r there exists Tr < ∞ such that F1 ⊂ ∪Tt=1 BPA0 (ft , r) with f1 , . . . , fTr ∈ F1 . Moreover, from V&W (Addendum 1.5.8), it follows that almost all sample paths of {X0 (f ) : f ∈ F1 } are uniformly · 2,PA0 -continuous so that for all η > 0 there exists rη > 0 such that the probability of the event Uη := {supg,h∈F1 : g−h 2,P <2rη |X0 (g) − X0 (h)| < η} approaches one A0

as η → 0. Therefore, since being a convex functional of a Gaussian process, the random variable R0 := supf ∈F1 |X0 (f )|

Journal of Nonparametric Statistics

301

is continuously distributed on (0, ∞) (cf. Davydov, Lifshits, and Smorodina 1998, Theorem 11.1),

Pr(R0 > c) = lim Pr

 sup |X0 (f )| > c + η

η→0

f ∈F1





≤ lim Pr ⎝ max

sup

t≤Trη f ∈B PA (ft ,rη )

η→0

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

⎛ = lim Pr ⎝ max



sup

t≤Trη f ∈B PA (ft ,rη )

η→0

≤ lim Pr ⎝

sup

f ∈BPA (ft ,rη )

inf

|X0 (f )| > c

for some t ≤ Trη

0



≤ lim Pr max η→0

|X0 (f )| > c + η for some t ≤ Trη , Uη ⎠ 

f ∈BPA (ft ,rη )

η→0



0

≤ lim Pr

|X0 (f )| > c + η, Uη ⎠

0

⎛ η→0

|X0 (f )| > c + η⎠

0

inf

t≤Trη f ∈BPA (ft ,rη )

|X0 (f )| > c

0

≤ sup sup Pr max

 inf

t≤T f ∈BPA (ft ,r)

T ∈N r>0

|X0 (f )| > c .

(A.17)

0



Hence, Equation (A.13) follows by Equations (A.14), (A.16), and (A.17).

Appendix 3. Proofs for Section 4 Proof of Lemma 4.1 For f ∈ F , n 

(Pˆ ∗Zˆ − Pˆ R(Z(θ1 )) )f = n−1

[f (ˆεj∗ , Wj ) − f (Rj εj (θ1 ), Wj )]

j=1 n 

= n−1

˜ j , Xj , θˆ ∗ , θˆ , θ1 ), Wj ) − f (Rj εj (θ1 ) − (R ˜ j , Xj , θ1 , θ1 , θ1 ), Wj )] [f (Rj εj (θ1 ) − (R

j=1 ˆ∗ ˆ = Pˆ R(A1 ) (f θ ,θ − f θ1 ,θ1 ) ˆ ∗ ,θˆ

= (Pˆ R(A1 ) − PR(A1 ) )(f θ

ˆ ∗ ,θˆ

− f θ1 ,θ1 ).

ˆ ∗ ,θˆ

− f θ1 ,θ1 ) + PR(A1 ) (f θ

− f θ1 ,θ1 ) + PR(A1 ) (f θ

Therefore, ˆ ˆ

(Pˆ ∗Zˆ − Pˆ R(Z(θ1 )) )f − PR(A1 ) (f θ,θ − f θ1 ,θ1 ) = (Pˆ R(A1 ) − PR(A1 ) )(f θ

ˆ ∗ ,θˆ

ˆ ˆ

− f θ ,θ ).

To prove the first result (we only show the first result, as the proof of the second one is similar), it thus suffices to show that ˆ∗ ˆ sup |n1/2 (Pˆ R(A1 ) − PR(A1 ) )(f θ ,θ − f θ1 ,θ1 )| = oPr◦ (1),

f ∈F ˆ ∗ ,θˆ

sup |PR(A1 ) (f θ

f ∈F

ˆ ˆ − f θ,θ ) − 0.5 f − f r , (∂1 log pA1 )μ˙ 1 PA1 (θˆ ∗ − θˆ )| = oPr ( θˆ ∗ − θˆ ).

(A.18) (A.19)

As in the proof of Lemma 3.1, we use equicontinuity of the n1/2 (Pˆ R(A1 ) − PR(A1 ) ) process to demonstrate 1/2 Equation (A.18) and mean-square differentiability of pA1 to show Equation (A.19). We begin with the first claim. Given

302

T. Chen and G. Tripathi ˆ ∗ ,θˆ

f ∈ F , we know that f θ

, f θ1 ,θ1 ∈ F and ˆ ∗ ,θˆ

− f θ1 ,θ1 2,PR(A1 ) ≤ q( (θˆ ∗ , θˆ ) − (θ1 , θ1 ) ) =: qˆ

f θ

ˆ∗ ˆ by Assumption 4.2(iii). Hence, as |(Pˆ R(A1 ) − PR(A1 ) )(f θ ,θ − f θ1 ,θ1 )| is bounded from above by supg,h∈F : g−h 2,P

R(A1 )

|(Pˆ R(A1 ) − PR(A1 ) )(g − h)| and the bound does not depend upon f , ˆ∗ ˆ sup |n1/2 (Pˆ R(A1 ) − PR(A1 ) )(f θ ,θ − f θ1 ,θ1 )| ≤

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

f ∈F

sup g,h∈F : g−h 2,PR(A ) ≤ˆq 1

≤ˆq

|n1/2 (Pˆ R(A1 ) − PR(A1 ) )(g − h)|.

Then, Equation (A.18) follows because the empirical process {n1/2 (Pˆ R(A1 ) − PR(A1 ) )f : f ∈ F } is asymptotically equicontinuous (due to the fact that F is PR(A1 ) -Donsker by Assumption 4.2(iii)) and plim(ˆq) = 0 by Assumption 4.1 and the properties of q. Next, we show Equation (A.19). Begin by observing that ˜ x, θa , θb , θ1 ), w)PR (ds)PA1 (du, dx, dw) PR(A1 ) f θa ,θb = f (su − (s, = 0.5

˜ f (u − (1, x, θa , θb , θ1 ), w)pA1 (u, x, w) du κ(dx, dw)

+ 0.5 = 0.5

˜ f (t, w)pA1 (t + (1, x, θa , θb , θ1 ), x, w) dt κ(dx, dw)

+ 0.5 =

˜ f (−u − (−1, x, θa , θb , θ1 ), w)pA1 (u, x, w) du κ(dx, dw)

˜ x, θa , θb , θ1 ), x, w) dt κ(dx, dw) f (t, w)pA1 (−t − (−1,

˜ x, θa , θb , θ1 ), x, w)PR (ds) dt κ(dx, dw), f (t, w)pA1 (st + s(s,

(A.20)

where the third equality follows by the translation invariance of the Lebesgue measure. Hence, ˆ∗ ˆ ˆ ˆ ˜ x, θˆ ∗ , θˆ , θ1 ), x, w) PR(A1 ) (f θ ,θ − f θ,θ ) = f (t, w)(pA1 (st + s(s, − pA1 (st + (x, θˆ , θ1 ), x, w)) PR (ds) dt κ(dx, dw). 1/2

(A.21)

Under Assumption 4.2(ii), pA1 is mean-square differentiable. Hence, expanding as in Equation (A.7) while keeping st + (x, θˆ , θ1 ) fixed, ˜ x, θˆ ∗ , θˆ , θ1 ), x, w) − pA1 (st + (x, θˆ , θ1 ), x, w) pA1 (st + s(s, ˆ θ1 ) + s(x, θˆ ∗ , θˆ ), x, w) − pA1 (st + (x, θˆ , θ1 ), x, w) = pA1 (st + (x, θ, = (∂1 pA1 (st + (x, θˆ , θ1 ), x, w))s(x, θˆ ∗ , θˆ ) ˆ θ1 ), x, w))2 p−1 (st + (x, θˆ , θ1 ), x, w)(s(x, θˆ ∗ , θˆ ))2 + 0.25(∂1 pA1 (st + (x, θ, A1 + r 2 (s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w)pA1 (st + (x, θˆ , θ1 ), x, w) + 2r(s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w)pA1 (st + (x, θˆ , θ1 ), x, w) + r(s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))s(x, θˆ ∗ , θˆ ). We use this decomposition to handle Equation (A.21). First, by Assumption 4.2(i), f (t, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))s(x, θˆ ∗ , θˆ ) PR (ds) dt κ(dx, dw) =

ˆ θ1 ), x, w))sμ˙  (x, θˆ )(θˆ ∗ − θˆ )PR (ds) dt κ(dx, dw) f (t, w)(∂1 pA1 (st + (x, θ,

+

f (t, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))sρ(x, θˆ ∗ , θˆ )PR (ds) dt κ(dx, dw).

Journal of Nonparametric Statistics

303

By a change of variable, ˆ θ1 ), x, w))sρ(x, θˆ ∗ , θˆ )PR (ds) dt κ(dx, dw) f (t, w)(∂1 pA1 (st + (x, θ, = 0.5

f (t, w)(∂1 pA1 (t + (x, θˆ , θ1 ), x, w))ρ(x, θˆ ∗ , θˆ ) dt κ(dx, dw)

− 0.5

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013



f (t, w)(∂1 pA1 (−t + (x, θˆ , θ1 ), x, w))ρ(x, θˆ ∗ , θˆ ) dt κ(dx, dw)

ˆ θ1 ), w)(∂1 pA1 (u, x, w))ρ(x, θˆ ∗ , θˆ ) du κ(dx, dw) f (u − (x, θ,

= 0.5

− 0.5 = 0.5

f (−(u − (x, θˆ , θ1 )), w)(∂1 pA1 (u, x, w))ρ(x, θˆ ∗ , θˆ ) du κ(dx, dw)

(f − f r )(u − (x, θˆ , θ1 ), w)(∂1 log pA1 (u, x, w))ρ(x, θˆ ∗ , θˆ )PA1 (du, dx, dw).

(A.22)

Hence, following the argument leading to Equation (A.8), 2     f (t, w)(∂1 pA (st + (x, θ, ˆ θ1 ), x, w))sρ(x, θˆ ∗ , θˆ )PR (ds) dtκ(dx, dw) 1   2 ≤ MF vA1 ∞ ρ 2 (x, θ , θˆ )PX (dx) sup ˆ θˆ ∗ −θ ) ˆ θ ∈B(θ,

ˆ∗

= oPr ( θ − θˆ ) 2

(A.23)

by Assumption 4.2(i). Arguing similarly,      f (t, w)(∂1 pA (st + (x, θˆ , θ1 ), x, w)) s(μ(x, ˙ θˆ ) − μ˙  (x, θ1 )) (θˆ ∗ − θˆ ) PR (ds) dt κ(dx, dw) 1  ˙ θˆ ) − μ(·, ˙ θ1 ) 2,PX θˆ ∗ − θˆ ≤ MF vA1 ∞ μ(·, = oPr ( θˆ ∗ − θˆ )

(A.24)

by Assumption 4.2(i). Therefore, by Equations (A.23) and (A.24),    f (t, w)(∂1 pA (st + (x, θ, ˆ θ1 ), x, w))s(x, θˆ ∗ , θˆ )PR (ds) dt κ(dx, dw) 1  −

  ˆ θ1 ), x, w))sμ˙  (x, θ1 )(θˆ ∗ − θˆ ) PR (ds) dt κ(dx, dw) = oPr ( θˆ ∗ − θˆ ). f (t, w)(∂1 pA1 (st + (x, θ, 

(A.25)

Now, the argument leading to Equation (A.22) shows that f (t, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))sμ˙  (x, θ1 )(θˆ ∗ − θˆ )PR (ds) dt κ(dx, dw) = 0.5

(f − f r )(u − (x, θˆ , θ1 ), w)(∂1 log pA1 (u, x, w))μ˙  (x, θ1 )(θˆ ∗ − θˆ )PA1 (du, dx, dw).

Thus, by Assumption 4.2(iii) and Cauchy–Schwarz,      f (t, w)(∂1 pA (st + (x, θ, ˆ θ1 ), x, w) − ∂1 pA1 (st, x, w))sμ˙  (x, θ1 )(θˆ ∗ − θˆ )PR (ds) dt κ(dx, dw) 1   ≤ q( θˆ − θ1 ) vA1 ∞ (μ˙ 1 μ˙ 1 )1/2 2,PX θˆ ∗ − θˆ = oPr ( θˆ ∗ − θˆ ) because plim(θˆ ) = θ1 , q is continuous, and q(0) = 0. Therefore, by Equations (A.25) and (A.26),    f (t, w)(∂1 pA (st + (x, θˆ , θ1 ), x, w))s(x, θˆ ∗ , θˆ )PR (ds) dt κ(dx, dw) 1  −

  f (t, w)(∂1 pA1 (st, x, w))sμ˙  (x, θ1 )(θˆ ∗ − θˆ )PR (ds) dt κ(dx, dw) = oPr ( θˆ ∗ − θˆ ).

(A.26)

304

T. Chen and G. Tripathi

But as in Equation (A.22), f (t, w)(∂1 pA1 (st, x, w))sμ(x, ˙ θ1 )PR (ds) dt κ(dx, dw) = 0.5 f − f r , (∂1 log pA1 )μ˙ 1 PA1 . Therefore,

  ˆ θ1 ), x, w))s(x, θˆ ∗ , θˆ ) PR (ds) dt κ(dx, dw) sup  f (t, w)(∂1 pA1 (st + (x, θ,

f ∈F

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

  −0.5 f − f r , (∂1 log pA1 (u, x, w)μ˙ 1 PA1  = oPr ( θˆ ∗ − θˆ ).

(A.27)

Next, since R2 = 1, the argument leading to Equation (A.22) reveals that ˆ f (t, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))2 p−1 A1 (st + (x, θ , θ1 ), x, w) × (s(x, θˆ ∗ , θˆ ))2 PR (ds) dt κ(dx, dw) = 0.5 (f + f r )(u − (x, θˆ , θ1 ), w)(∂1 log pA1 (u, x, w))2 × 2 (x, θˆ ∗ , θˆ )PA1 (du, dx, dw). Hence, by Assumption 4.2(ii),   ˆ θ1 ), x, w))2 p−1 (st + (x, θˆ , θ1 ), x, w) sup  f (t, w)(∂1 pA1 (st + (x, θ, A1 f ∈F

  × (s(x, θˆ ∗ , θˆ ))2 PR (ds) dt κ(dx, dw) ⎛ ≤ MF vA1 ∞ ⎝ μ(x, ˙ θˆ ) 2 θˆ ∗ − θˆ 2 + ⎛

⎞ sup

ρ (x, θ , θˆ )⎠ PX (dx) 2

ˆ θ ∈B(θˆ , θˆ ∗ −θ )

˙ θˆ ) − μ(·, ˙ θ1 ) 22,PX + μ(·, ˙ θ1 ) 22,PX ) θˆ ∗ − θˆ 2 ≤ MF vA1 ∞ ⎝( μ(·, ⎞

+

sup ˆ θˆ ∗ −θ ) ˆ θ ∈B(θ,

ρ 2 (x, θ , θˆ )PA1 (dx)⎠

= OPr ( θˆ ∗ − θˆ 2 ). Next, since



f (t, w)r 2 (s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w) ˆ θ1 ), x, w)PR (ds) dt κ(dx, dw) × pA1 (st + (x, θ, = 0.5 (f (u − (x, θˆ , θ1 ), w)r 2 ((x, θˆ ∗ , θˆ ), u, x, w) + f r (u − (x, θˆ , θ1 ), w)r 2 (−(x, θˆ ∗ , θˆ ), u, x, w))PA1 (du, dx, dw),

following the argument leading to Equation (A.28), we obtain that   sup  f (t, w)r 2 (s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w) f ∈F

  × pA1 (st + (x, θˆ , θ1 ), x, w)PR (ds) dt κ(dx, dw)

≤ MF o( (·, θˆ ∗ , θˆ ) 22,PX ) = o( θˆ ∗ − θˆ 2 ). In a similar manner, it can be shown that   sup  f (t, w)r(s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w) f ∈F

  × pA1 (st + (x, θˆ , θ1 ), x, w)PR (ds)dt κ(dx, dw) = o( θˆ ∗ − θˆ 2 )

(A.28)

Journal of Nonparametric Statistics

305

and   sup  f (t, w)r(s(x, θˆ ∗ , θˆ ), st + (x, θˆ , θ1 ), x, w)(∂1 pA1 (st + (x, θˆ , θ1 ), x, w))

f ∈F

  × s(x, θˆ ∗ , θˆ )PR (ds) dt κ(dx, dw) = o( θˆ ∗ − θˆ 2 ).

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

The desired result follows from Equations (A.27)–(A.29).

(A.29) 

Proof of Equation (12) By Equation (A.20) and the fact that R2 = 1,

ˆ ˆ

PR(A1 ) f θ,θ =

f (t, w)pA1 (st + (x, θˆ , θ1 ), x, w)PR (ds) dt κ(dx, dw)

PR(A1 ) f θ1 ,θ1 =

f (t, w)pA1 (st, x, w)PR (ds) dt κ(dx, dw).

Similarly, by a change of variable and the fact that supp(R) = {−1, 1},

ˆ ˆ

PR(A1 ) f rθ,θ = = = =

f r (t, w)pA1 (st + (x, θˆ , θ1 ), x, w)PR (ds) dt κ(dx, dw) f (−t, w)pA1 (st + (x, θˆ , θ1 ), x, w)PR (ds) dt κ(dx, dw) f (u, w)pA1 (−su + (x, θˆ , θ1 ), x, w)PR (ds) du κ(dx, dw) f (u, w)pA1 (ru + (x, θˆ , θ1 ), x, w)PR (dr) du κ(dx, dw) ˆ ˆ

= PR(A1 ) f θ,θ . The desired result follows because PR(A1 ) f rθ1 ,θ1 = PR(A1 ) f θ1 ,θ1 by setting  to be identically zero in the preceding equation. 

ˆ ∗ can be established as in the Proof of Lemma 4.2 Since F is PR(A1 ) -Donsker by Assumption 4.2(iii), tightness of X 1 proof of Lemma 3.2. It only remains to prove that the empirical process {n1/2 (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ) : f ∈ F } has the same limiting marginal distribution and covariance function as G1 . So, let f ∈ F . Then, since (f − f r )(Z r (θ1 )) = −(f − f r )(Z(θ1 )) and 1{1} (R) − 1{−1} (R) = R, Pˆ R(Z(θ1 )) (f − f r ) = n−1

n 

(f − f r )(R(Zj (θ1 )))

j=1

= n−1

n 

(f − f r )(Zj (θ1 ))1{1} (Rj ) + (f − f r )(Zjr (θ1 ))1{−1} (Rj )

j=1

= n−1

n 

(1{1} (Rj ) − 1{−1} (Rj ))(f − f r )(Zj (θ1 ))

j=1

= n−1

n 

Rj (f − f r )(Zj (θ1 )).

j=1

Hence, as PR(Z(θ1 )) (f − f r ) = 0 (cf. the remark at the end of Section 4), the CLT for i.i.d. random variables shows that the empirical process {n1/2 (Pˆ R(Z(θ1 )) − PR(Z(θ1 )) )(f − f r ) : f ∈ F } has the same limiting marginal distribution and covariance function as G1 . 

306

T. Chen and G. Tripathi

Proof of Lemma 4.3 From Corollary 4.1, we know that n1/2 Rˆ ∗ converges in distribution to R1 whether the null hypothesis ∗ . In particular, we show that for all c ∈ [0, ∞), is true or not. Therefore, it suffices to prove the same for n1/2 Rˆ max ∗ ∗ > c) ≤ Pr(R1 > c) ≤ lim inf Pr(n1/2 Rˆ max > c). lim sup Pr(n1/2 Rˆ max n→∞

n→∞

This follows exactly as in the proof of Lemma 3.3 provided Fˆ1∗ := {1(−∞,z] : z ∈ Zˆ ∗ } is dense in F1 in the sense that for each f ∈ F1 and > 0, lim Pr(∃fˆ ∈ Fˆ1∗ : fˆ − f 2,PR(A1 ) ∨ fˆ r − f r 2,PR(A1 ) < ) = 1.

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

n→∞

But the denseness result above follows as in the proof of Lemma A.4 upon replacing ε by ε ∗ := Rε(θ1 ), Z by ˜ Z ∗ := (ε ∗ , W ), Zˆ by Zˆ ∗ := (ˆε ∗ , W ) with εˆ ∗ as defined in Section 4, (X, θˆ , θ0 ) by (R, X, θˆ ∗ , θˆ , θ1 ) (because εˆ ∗ − ε∗ = ˜ −(R, X, θˆ ∗ , θˆ , θ1 )), and using the fact that plim(θˆ ∗ , θˆ ) = (θ1 , θ1 ) by Assumption 4.1. 

Appendix 4. Proofs for Section 5 Proof of Theorem 5.1 Under the alternative, PZ (θ )  = PrZ (θ ) for each θ ∈ ; in particular, PZ (θ1 )  = PrZ (θ1 ). Now write ˆ 1 (F ) +  ˆ 2 (F ), sup |(Pˆ Zˆ − Pˆ rZˆ )f | = sup |(PZ (θ1 ) − PrZ (θ1 ))f | + 

f ∈F

f ∈F

ˆ 1 (F ) := supf ∈F |(Pˆ ˆ − Pˆ r )f | − supf ∈F |(Pˆ Z (θ1 ) − Pˆ r (θ1 ))f | and where  Z Z ˆ Z

ˆ 2 (F ) := sup |(Pˆ Z (θ1 ) − Pˆ rZ (θ1 ))f | − sup |(PZ (θ1 ) − PrZ (θ1 ))f |.  f ∈F

f ∈F

By the reverse triangle inequality and Lemma A.2, ˆ 1 (F )| ≤ sup ||(Pˆ ˆ − Pˆ r )f | − |(Pˆ Z (θ1 ) − Pˆ rZ (θ1 ))f || | Z Zˆ f ∈F

≤ sup |(Pˆ Zˆ − Pˆ rZˆ )f − (Pˆ Z (θ1 ) − Pˆ rZ (θ1 ))f | f ∈F

≤ sup |(Pˆ Zˆ − Pˆ Z (θ1 ))f | + sup |(Pˆ rZˆ − Pˆ rZ (θ1 ))f | f ∈F

f ∈F

= oPr◦ (1). Next, since F is PA(θ1 ) -Donsker =⇒ F and Cantelli,

Fr

(A.30) are PA(θ1 ) -Glivenko–Cantelli =⇒ F and

Fr

are PZ (θ1 )-Glivenko–

ˆ 2 (F )| ≤ sup ||(Pˆ Z (θ1 ) − Pˆ rZ (θ1 ))f | − |(PZ (θ1 ) − PrZ (θ1 ))f || | f ∈F

≤ sup |(Pˆ Z (θ1 ) − Pˆ rZ (θ1 ))f − (PZ (θ1 ) − PrZ (θ1 ))f | f ∈F

≤ sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f | + sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f r | f ∈F

f ∈F

= sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f | + sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f | f ∈F

=o

Pr ◦

f ∈F r

(A.31)

(1).

Hence,  F := sup |(Pˆ ˆ − Pˆ r )f | = sup |(PZ (θ1 ) − PrZ (θ1 ))f | + oPr◦ (1). KS Z Zˆ f ∈F

f ∈F

(A.32)

Since F is measure determining, supf ∈F |(PZ (θ1 ) − PrZ (θ1 ))f | = 0 =⇒ PZ (θ1 ) = PrZ (θ1 ). Therefore, under the alterF native hypothesis, supf ∈F |(PZ (θ1 ) − PrZ (θ1 ))f | > 0. It follows by Equation (A.32) that, under the alternative, n1/2 KS is unbounded w.p.a.1. 

Journal of Nonparametric Statistics

307

ˆ 1 and  ˆ 2 as defined Proof of Theorem 5.2 Let (f ) := |(PZ (θ1 ) − PrZ (θ1 ))f | for notational convenience. Then, with  in the proof of Theorem 5.1, ˆ 1 (Fˆ1 ) +  ˆ 2 (Fˆ1 ). Rˆ max := sup |(Pˆ Zˆ − Pˆ rZˆ )f | = sup (f ) +  f ∈Fˆ 1

f ∈Fˆ 1

The argument leading to Equation (A.30) shows that ˆ 1 (Fˆ1 )| ≤ sup |(Pˆ ˆ − Pˆ Z (θ1 ))f | + sup |(Pˆ r − Pˆ rZ (θ1 ))f | | Z Zˆ

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

f ∈Fˆ 1

f ∈Fˆ 1

≤ sup |(Pˆ Zˆ − Pˆ Z (θ1 ))f | + sup |(Pˆ rZˆ − Pˆ rZ (θ1 ))f | f ∈F1

f ∈F1

(Fˆ1 ⊂ F1 )

= oPr◦ (1). Similarly, the argument leading to Equation (A.31) shows that ˆ 2 (Fˆ1 )| ≤ sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f | + sup |(Pˆ Z (θ1 ) − PZ (θ1 ))f | = oPr◦ (1). | f ∈F1

f ∈F1r

ˆ 3 := sup ˆ (f ) − supf ∈F (f ), Therefore, letting  f ∈F1 1 ˆ 3 + oPr◦ (1). Rˆ max = sup (f ) + oPr◦ (1) = sup (f ) +  f ∈F1

f ∈Fˆ 1

Since supf ∈F (f ) > 0 under the alternative hypothesis, n1/2 Rˆ max will be unbounded under the alternative (hence conˆ 3 = oPr (1). To show the latter, let > 0 and observe that  ˆ 3 ≤ is always true because Fˆ1 ⊂ F1 . sistent) provided  Next, by the definition of the supremum, there exists g ∈ F1 such that supf ∈F1 (f ) − < (g ). For η > 0, consider the event Eˆ η := {∃hη ∈ Fˆ1 : hη − g 2,PA(θ1 ) ∨ hηr − g r 2,PA(θ1 ) < η}. If Eˆ η is true, then there exists hη ∈ Fˆ1 such that hη − g 2,PA(θ1 ) + hηr − g r 2,PA(θ1 ) < 2η. But, sup (f ) − < (g ) ≤ |(g ) − (hη )| + (hη ) ≤ |(g ) − (hη )| + sup (f )

f ∈F1

f ∈Fˆ 1

and |(g ) − (hη )| ≤ |(PZ (θ1 ) − PrZ (θ1 ))(g − hη )| (reverse triangle inequality) ≤ |PZ (θ1 )(g − hη )| + |PZ (θ1 )(g r − hηr )| ≤ g − hη 2,PA(θ1 ) + g r − hηr 2,PA(θ1 )

(Jensen).

Hence, choosing η ≤ , ˆ 3 ≥ − − 2η ≥ −3 =⇒ | ˆ 3 | ≤ 3 . Eˆ η =⇒ sup (f ) − < 2η + sup (f ) ⇐⇒  f ∈Fˆ 1

f ∈Fˆ 1

As we proved Equation (A.35), we can show that limn→∞ Pr(Eˆ η ) = 1 under Assumptions 3.1, 3.2(ii), and 3.4 with (θ0 , ε) replaced by (θ1 , ε(θ1 )). (Note that Lemma A.4 holds irrespective of whether the null hypothesis is true or not.) It follows ˆ 3 | ≤ 3 ) = 1 which is equivalent to  ˆ 3 = oPr (1) because was arbitrary. that limn→∞ Pr(|  Proof of Lemma 5.2 The approach (and notation) here is very similar to the proof of Lemma 3.2, the main difference being that we now use empirical process results that allow the underlying data generating measures to depend upon ˆ θn (f ) := n1/2 (Pˆ Z (θn ) − PZ (θn ))(f − f r ) + f − f r , m P n1/2 (θˆ − θn ), where mAn := (∂1 log pAn )μ˙ 0 n. Recall that X An An ˆ θn + h )F is asymptotically tight, it suffices to show (V&W, Section 2.8.3) that and f ∈ F . To prove that (X

308

T. Chen and G. Tripathi

limn→∞ supf ,g∈F | f − g 2,PAn − f − g 2,PA0 | = 0 and, for all > 0, ⎛



lim lim sup Pr ⎝ ◦

δ→0 n→∞

ˆ θn + h )(f − g)| > ⎠ = 0. |(X

sup

f ,g∈Fδ,PA

(A.33)

n

The first condition follows immediately because for all f , g ∈ F ,

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

( f − g 2,PAn − f − g 2,PA0 )2 ≤ | f − g 2,PAn − f − g 2,PA0 | × | f − g 2,PAn + f − g 2,PA0 | = | f − g 22,PA − f − g 22,PA | n 0 = n−1/2 (f − g)2 h dPA0 2 ≤ 2MF h ∞ n−1/2 .

To show Equation (A.33), begin by observing that ⎛



Pr ⎝ ◦

sup

f ,g∈Fδ,PA

ˆ θn + h )(f − g)| > 3 ⎠ ≤ tδ,n (1) + tδ,n (2) + tδ,n (3), |(X

n

where tδ,n (1) := Pr ◦ (supf ,g∈Fδ,P

An

|n1/2 (Pˆ Z(θn ) − PZ(θn ) )((f − f r ) − (g − gr ))| > ), ⎛

tδ,n (2) := Pr ⎝ ◦

⎞ | (f − g) − (f − g r

sup

f ,g∈Fδ,PA

r

), mA n PAn n1/2 (θˆ

− θn )| > ⎠ ,

n

and tδ,n (3) := 1(supf ,g∈Fδ,P

An

|h (f − g)| > ) because h (f − g) is nonstochastic. Now, ⎛

lim lim sup tδ,n (1) ≤ lim lim sup Pr ◦ ⎝

δ→0 n→∞

δ→0 n→∞

⎞ sup

f ,g∈(F −F r )δ,PA

|n1/2 (Pˆ Z (θn ) − PZ (θn ))(f − g)| > ⎠ = 0

n

because F − F r is asymptotically equicontinuous uniformly in (PAn ) which follows because F − F r is Donsker and pre-Gaussian uniformly in (PAn ) by Assumptions 5.3(ii) and (iii) (cf. V&W, Section 2.8.3). Next, for the remainder of the proof, let f , g ∈ Fδ,PAn . By Assumption 5.2(ii), (f − g) − (f r − gr ), mAn PAn ≤ ( f − g 2,PAn + f r − gr 2,PAn ) vAn ∞ (μ˙ 0 μ˙ 0 )1/2 2,PX . Moreover, recalling that PZ = PrZ was assumed to create the local alternatives, f r − gr 22,PA = n

= =

(f − g)2 (−u, w)(1 + n−1/2 h(u, w))PA0 (du, dx, dw) (f − g)2 (−u, w)(1 + n−1/2 h(u, w))PZ (du, dw)

(marginal integration)

(f − g)2 (u, w)(1 + n−1/2 h(−u, w))PZ (du, dw)

(change of variables)



 1 + n−1/2 h(−u, w) PAn (du, dx, dw) −1/2 1+n h(u, w)   1 + n−1/2 h ∞ ≤ f − g 22,PA (for n ≥ h 2∞ ) n 1 − n−1/2 h ∞   1 + n−1/2 h ∞ ≤ δ2 . −1/2 1−n h ∞

=

(f − g)2 (u, w)

(marginal integration)

Hence, limδ→0 lim supn→∞ tδ,n (2) = 0 because n1/2 (θˆ − θn ) is bounded in probability by Assumption 5.5. Finally, by marginal integration and Jensen,

Journal of Nonparametric Statistics

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

|h (f − g)| ≤

309



|(f − g)(h − hr )| dPAn 1 + n−1/2 h 2 h ∞ ≤ |f − g| dPAn 1 − n−1/2 h ∞

|(f − g)(h − hr )| dPA0 =



2 h ∞ f − g 2,PAn 1 − n−1/2 h ∞



2 h ∞ δ. 1 − n−1/2 h ∞

(for n ≥ h 2∞ )

Hence, limδ→0 lim supn→∞ tδ,n (3) = 0 as well. Therefore, Equation (A.33) holds.



Proof of Lemma 5.3 In the proof of Lemma 3.3, replace references to Corollary 3.1 by Corollary 5.1, Lemma 3.2 by Lemma 5.2, and X0 by X0 + h . The argument leading to Equation (A.17) goes through because (a) the denseness result in Lemma A.4 continues to hold under H1n and Assumptions 5.1, 5.2(ii), and 5.4; (b) almost all sample paths of (X0 + h )F1 are uniformly · 2,PA0 -continuous; and (c) supf ∈F1 (X0 + h )f is continuously distributed. 

Appendix 5. Proofs for Section 6 The following inequality, relating the strength of an instrument W to the degree of endogeneity of a regressor X, may be of independent interest. Lemma A.1 Let Y := θ0 + θ1 X + U, X := π0 + π1 W + V with cov(W , U) = cov(W , V ) = 0, let R02 := corr 2 (X, W ) denote the population R2 for the second regression, and assume that σX2 < ∞ and σV2 > 0. Then, R02 ∧ |corr(X, U)| ≤ √ ( 5 − 1)/2. Proof of Lemma A.1 Since the result is trivial if σU2 = 0, assume without loss of generality that σU2 > 0. We prove the desired result by contradiction. So, √ suppose to the contrary that there exists a distribution of (X, W , U) for which R02 ∧ |corr(X, U)| ≥ c, where c ∈ (( 5 − 1)/2, 1). (The case c = 1 is ruled out by the assumption that σX2 < ∞ and σV2 > 0 (cf. Equation (A.34)).) Then, since cov(W , V ) = 0, c ≤ R02 =

π 2σ 2 π 2σ 2 (π1 σW2 )2 σ2 cov2 (X, W ) = = 1 2W = 2 12 W 2 = 1 − 2 2 V . 2 2 2 2 σX σW σX σW σX π 1 σ W + σV π1 σW + σV2

Consequently, σX2 = π12 σW2 + σV2 ≥

σV2 . 1−c

(A.34)

Next, since cov(W , U) = 0 =⇒ cov(X, U) = cov(V , U), c ≤ |corr(X, U)| ⇐⇒ c2 ≤

cov2 (X, U) ⇐⇒ σX2 σU2 c2 ≤ cov2 (V , U) σX2 σU2 =⇒ σV2 σU2 ⇐⇒

c2 ≤ cov2 (V , U) 1−c

(by Equation A.34)

c2 ≤ corr 2 (V , U) 1−c

c2 ≤1 1−c

  √ √ 5−1 5+1 ⇐⇒ c − c+ ≤ 0. 2 2

=⇒

√ √ Therefore, the maximum possible value for c is ( 5 − 1)/2, which contradicts the supposition that c > ( 5 − 1)/2.



310

T. Chen and G. Tripathi

Appendix 6. Auxiliary results Lemma A.2 Under the conditions of Theorem 5.1, sup |(Pˆ Zˆ − Pˆ Z (θ1 ))f | = oPr◦ (1) and

f ∈F

sup |(Pˆ rZˆ − Pˆ rZ (θ1 ))f | = oPr◦ (1).

f ∈F

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

Proof of Lemma A.2 Since the proof of Lemma 3.1 only requires the existence of plim(θˆ ), the argument leading to Equation (A.1) can be replicated word for word with (θ0 , ε) replaced by (θ1 , ε(θ1 )) and A0 replaced by A(θ1 ) to show that sup |(Pˆ Zˆ − Pˆ Z (θ1 ))f − f , (∂1 log pA(θ1 ) )μ˙ 1 PA(θ1 ) (θˆ − θ1 )| = oPr◦ (n−1/2 ) + oPr ( θˆ − θ1 ).

f ∈F

Therefore, supf ∈F |(Pˆ Zˆ − Pˆ Z (θ1 ))f | = oPr◦ (1). The second result follows from the first upon noting that (Pˆ rˆ − Z  Pˆ rZ (θ1 ))f = (Pˆ Zˆ − Pˆ Z (θ1 ))f r . Lemma A.3 Under Assumption 3.1 and 3.2(ii), F1 and F2 satisfy Assumption 3.3(v) with q(t) ∝ t 1/2 . Proof of Lemma A.3 If f ∈ F1 , then f = 1(−∞,τ ]×(−∞,v] for some (τ , v) ∈ R × Rdim(W ) and f (u − (x, θ , θ0 ), w) = 1(−∞,τ ]×(−∞,v] (u − (x, θ , θ0 ), w) = 1(−∞,τ +(x,θ ,θ0 )] (u)1(−∞,v] (w). Since |1(−∞,a] − 1(−∞,b] | = 1(a∧b,a∨b] and a ∨ b − a ∧ b = |a − b|, it follows that (f (u − (x, θ , θ0 ), w) − f (u, w))2 Pε|X=x,W =w (du) =

((τ +(x,θ ,θ0 ))∧τ ,(τ +(x,θ ,θ0 ))∨τ ]

≤ ζ (x, w)|(x, θ, θ0 )|

Pε|X=x,W =w (du) 1(−∞,v] (w)

(by Assumption 3.2(ii))

≤ ζ (x, w)( μ˙ 0 (x) θ − θ0 + |ρ(x, θ , θ0 )|)

(by Assumption 3.1) 

≤ ζ (x, w) μ˙ 0 (x) θ − θ0 +

|ρ(x, θ˜ , θ0 )| .

sup ˜ θ∈B(θ 0 , θ −θ0 )

Hence, by Cauchy–Schwarz and Assumption 3.1, for all > 0, (f (u − (x, θ , θ0 ), w) − f (u, w))2 PA0 (du, dx, dw) ⎛ ≤ ζ 2,PX,W

  ⎝ (μ˙ 0 μ˙ 0 )1/2 2,PX θ − θ0 +  sup  θ˜ ∈B(θ , θ −θ 0

0

   ˜ |ρ(·, θ , θ0 )|  )

⎞ ⎠

2,PX

≤ ζ 2,PX,W ( (μ˙ 0 μ˙ 0 )1/2 2,PX + ) θ − θ0 . Therefore, since the right-hand side does not depend upon (τ , v), sup f (· − (X, θ , θ0 ), ·) − f (·, ·) 2,PA0 ≤ ζ 2,PX,W ( (μ˙ 0 μ˙ 0 )1/2 2,PX + )1/2 θ − θ0 1/2 ; 1/2

f ∈F1

that is, F1 satisfies Assumption 3.3(v) with q(t) ∝ t 1/2 . Since f ∈ F1 =⇒ f r = 1[−τ ,∞)×(−∞,v] and Pε|X,W ({−τ }) = 0, the above argument shows that Assumption 3.3(v) is also satisfied with f r and q(t) ∝ t 1/2 . The same reasoning works for F2 as well.  The following denseness result may be of independent interest. Lemma A.4 Let Assumptions 3.1, 3.2(ii), and 3.4 hold. Then, Fˆ1 is dense in F1 in the following sense: for each f ∈ F1 and > 0, lim Pr(∃fˆ ∈ Fˆ1 : fˆ − f 2,PA0 ∨ fˆ r − f r 2,PA0 < ) = 1.

n→∞

(A.35)

Proof of Lemma A.4 We make use of the following fact (van der Vaart 1998, Theorem 19.1): given > 0 and any cdf F, there exists a partition {−∞ =: t0 < t1 < · · · < tK := ∞} of R such that F(tl −) − F(tl−1 ) < for each l (points at

Journal of Nonparametric Statistics

311

which F jumps more than belong to the partition). Hence, letting Fε and FW (i) denote the marginal cdfs of ε and W (i) and working coordinate by coordinate, we can create a partition of R × Rdim(W ) denoted by Zπ := {τ0 < τ1 < · · · < dim(W ) (i) (i) τK } × ×i=1 {ϑ0 < ϑ1(i) < . . . < ϑK(i)i } such that Fε (τl ) − Fε (τl−1 ) < and FW (i) (ϑl(i) −) − FW (i) (ϑl−1 ) < for each l (remember that ε is assumed to be continuously distributed). For the remainder of the proof, let c := 1 + dim(W ) and d := dim(W ). Fix f ∈ F1 . Then, f = 1(−∞,u]×(−∞,v] for some (u, v) ∈ R × Rd and f r = 1[−u,∞)×(−∞,v] . We begin by showing that, given Zπ , there exist functions fπ and f˜π such that

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

d(f , fπ ; f r , f˜π ) := f − fπ 2,PA0 ∨ f r − f˜π 2,PA0 < (c )1/2 .

(A.36)

Since Zπ partitions R × Rd , we can find zπ := (τl , ϑl(1) , . . . , ϑl(d) ) ∈ Zπ such that (u, v) ∈ [τl , τl+1 ) × ×di=1 [ϑl(i) , ϑl(i) ). i i +1 1 d , . . . , ϑl(d) ) ∈ Zπ such that (−u, v) ∈ [τ˜l , τ˜l+1 ) × ×di=1 [ϑl(i) , ϑl(i) ). Now, if (Ai )ki=1 Similarly, there exists z˜π := (τ˜l , ϑl(1) i i +1 1 d k and (Bi )i=1 are subsets of R then using the fact that (∩i Ai )(∩i Bi ) ⊂ ∪i (Ai Bi ), where Ai Bi := (Ai \ Bi ) ∪ (Bi \ Ai ) is  the symmetric difference of Ai and Bi , it is straightforward to show that 1(×k Ai )(×k Bi ) ≤ ki=1 1Ai Bi . Hence, letting i=1

i=1

, . . . , ϑl(d) ), fπ := 1(−∞,τl ]×(−∞,ϑ] , and f˜π := 1[τ˜l ,∞)×(−∞,ϑ] , we have that ϑ := (ϑl(1) 1 d

|f − fπ |2 = |1(−∞,u]×(−∞,v] − 1(−∞,τl ]×(−∞,ϑ] |2 = 1(−∞,u]×(−∞,v](−∞,τl ]×(−∞,ϑ] ≤ 1(−∞,u](−∞,τl ] +

d  i=1

= 1(u∧τl ,u∨τl ] +

d  i=1

≤ 1(τl ,τl+1 ) +

d 

1(−∞,v(i) ](−∞,ϑ (i) ] li

1(v(i) ∧ϑ (i) ,v(i) ∨ϑ (i) ] li

li

1(ϑ (i) ,ϑ (i) ) . li

i=1

(A.37)

li +1

Similarly, |f r − f˜π |2 = 1[−u,∞)×(−∞,v][τ˜l ,∞)×(−∞,ϑ] ≤ 1[−u∧τ˜l ,−u∨τ˜l ) +

d  i=1

≤ 1[τ˜l ,τ˜l+1 ) +

d  i=1

1(v(i) ∧ϑ (i) ,v(i) ∨ϑ (i) ] li

li

1(ϑ (i) ,ϑ (i) ) . li

li +1

Let supp(W ) := ×di=1 [a(i) , b(i) ] with the convention that [−∞, ∞] := (−∞, ∞), [−∞, ·] := (−∞, ·], and [·, ∞] := [·, ∞) so that coordinates of W are allowed to have unbounded support under this notation. By Equation (A.37), f − fπ 22,PA ≤ 0

τl+1 τl

dPε +

d 

1(ϑ (i) ∨a(i) ,ϑ (i)

li +1 ∧b

li

i=1

≤ Fε (τl+1 ) − Fε (τl ) +

d 

(i) )

dPW (i)

FW (i) (ϑl(i) ∧ b(i) −) − FW (i) (ϑl(i) ∨ a(i) ) < c . i +1 i

(A.38)

i=1

Similarly, using the fact that Fε (τ˜l+1 ) − Fε (τ˜l ) < , f r − f˜π 22,PA < c .

(A.39)

0

and ϑl(i) enter Equation (A.38) only via ϑl(i) ∨ a(i) and Therefore, Equation (A.36) is proved. Next, note that since ϑl(i) i i +1 i ϑl(i) ∧ b(i) , we can assume without loss of generality that each v(i) ∈ [ϑl(i) ∨ a(i) , ϑl(i) ∧ b(i) ) so that (u, v) ∈ Eπ × Wπ ⊂ i +1 i i +1 R × supp(W ) and (−u, v) ∈ E˜π × Wπ with Eπ := [τl , τl+1 ), E˜π := [τ˜ , τ˜ ), Wπ := ×d [ϑ (i) ∨ a(i) , ϑ (i) ∧ b(i) ), l

l+1

i=1

li

li +1

Pε,W (Eπ × Wπ ) > 0, and Pε,W (E˜π × Wπ ) > 0. Next, replacing (u, v) by Zj := (εj , Wj ) and (−u, v) by Zjr := (−εj , Wj )

312

T. Chen and G. Tripathi

in the argument leading to Equation (A.39), we obtain that for fj := 1(−∞,Zj ] = 1(−∞,εj ]×(−∞,Wj ] , Zj ∈ Eπ × Wπ =: Sπ Recall that Zˆ j := (ˆεj , Wj ) and suffices to show that

Zˆ jr

and

Zjr ∈ E˜π × Wπ =: S˜π =⇒

d(fj , fπ ; fjr , f˜π ) < (c )1/2 .

(A.40)

:= (−ˆεj , Wj ) and let fˆj := 1(−∞,Zˆ j ] = 1(−∞,ˆεj ]×(−∞,Wj ] . To prove Equation (A.35), it

lim Pr(∪nj=1 {d(fˆj , fj ; fˆjr , fjr ) ≤ (c )1/2 and (Zj , Zjr ) ∈ Sπ × S˜π }) = 1.

(A.41)

n→∞

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

This is because Equations (A.40) and (A.41) and the triangle inequality imply that the event Kn := {d(fˆj , fπ ; fˆjr , f˜π ) < 2(c )1/2 for some j ≤ n} holds w.p.a.1. Moreover,

d(fˆj , f ; fˆjr , f r ) ≤ d(fˆj , fπ ; fˆjr , f˜π ) + (c )1/2 (by Equation A.36) < 3(c )1/2

for some j ≤ n

(if Kn is true).

Therefore, since Kn holds w.p.a.1, 1 = lim Pr(Kn ) ≤ lim Pr(d(fˆj , f ; fˆjr , f r ) < 3(c )1/2 for some j ≤ n) n→∞

n→∞

and Equation (A.35) follows because was arbitrary. So we now prove Equation (A.41). Since εˆ j = εj − (Xj , θˆ , θ0 ) and |fˆj − fj | = |1(−∞,Zˆ j ] − 1(−∞,Zj ] | ≤ |1(−∞,ˆεj ] − 1(−∞,εj ] | = 1(ˆεj ∧εj ,ˆεj ∨εj ] , we have that

|fˆj − fj |2 dPε|X=x,W =w ≤



εˆ j ∨εj εˆ j ∧εj

dPε|X=x,W =w

≤ ζ (x, w)|(Xj , θˆ , θ0 )|

(by Assumption 3.2(ii))

≤ ζ (x, w)( μ˙ 0 (Xj ) θˆ − θ0 + |ρ(Xj , θˆ , θ0 )|) ⎛ ≤ ζ (x, w) ⎝ μ˙ 0 (Xj ) θˆ − θ0 +

(by Assumption 3.1) ⎞ |ρ(Xj , θ˜ , θ0 )|⎠ .

sup ˆ 0 ) θ˜ ∈B(θ0 , θ−θ

Similarly, |fˆjr − fjr | = |1[−ˆεj ,∞)×(−∞,Wj ] − 1[−εj ,∞)×(−∞,Wj ] | ≤ |1[−ˆεj ,∞) − 1[−εj ,∞) | = 1[−ˆεj ∧−εj ,−ˆεj ∨−εj ) implies that





|fˆjr − fjr |2 dPε|X=x,W =w ≤ ζ (x, w) ⎝ μ˙ 0 (Xj ) θˆ − θ0 +

|ρ(Xj , θ˜ , θ0 )|⎠ .

sup ˆ 0 ) θ˜ ∈B(θ0 , θ−θ

Hence, ⎛



d (fˆj , fj ; fˆjr , fjr ) ≤ ζ 2,PX,W ⎝ μ˙ 0 (Xj ) θˆ − θ0 + 2

|ρ(Xj , θ˜ , θ0 )|⎠ .

sup

(A.42)

ˆ 0 ) θ˜ ∈B(θ0 , θ−θ

Consequently, for each r, η > 0, 2

{d

 r r ˆ ˆ ˆ (fj , fj ; fj , fj ) > r} ∩ { θ − θ0 < η} ⊂ ζ 2,PX,W μ˙ 0 (Xj ) η +

 sup θ˜ ∈B(θ0 ,η)

which, by Equation (A.42), implies that for each j, 2 ({d (fˆj , fj ; fˆjr , fjr ) > r} ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }) ∩ { θˆ − θ0 < η} 

 

⊂ ζ 2,PX,W

μ˙ 0 (Xj ) η +

sup ˜ θ∈B(θ 0 ,η)



|ρ(Xj , θ˜ , θ0 )| > r ,

|ρ(Xj , θ˜ , θ0 )| > r ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }.

Journal of Nonparametric Statistics

313

Hence, we have that 2 ∩nj=1 ({d (fˆj , fj ; fˆjr , fjr ) > r} ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }) ∩ { θˆ − θ0 < η} 

⊂ ∩nj=1 ζ 2,PX,W

μ˙ 0 (Xj ) η +

sup θ˜ ∈B(θ0 ,η)





|ρ(Xj , θ˜ , θ0 )| > r ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }.

Therefore, since the observations are i.i.d. and p := Pr((Zj , Zjr ) ∈ Sπ × S˜π ) > 0 for all j, Pr(∩nj=1 ({d (fˆj , fj ; fˆjr , fjr ) > r} ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }) ∩ { θˆ − θ0 < η})



  n

Downloaded by [Université du Luxembourg], [Gautam Tripathi] at 05:12 10 June 2013

2

≤ Pr ζ 2,PX,W



≤ r −1 ζ 2,PX,W ≤ (r

μ˙ 0 (X1 ) η +

−1

sup ˜ θ∈B(θ 0 ,η)

E μ˙ 0 (X1 ) η + E

ζ 2,PX,W ( (μ˙ 0 μ˙ 0 )1/2 2,PX

|ρ(X1 , θ˜ , θ0 )| > r + 1 − p

sup ˜ θ∈B(θ 0 ,η)



n

|ρ(X1 , θ˜ , θ0 )| + 1 − p

+ 1)η + 1 − p)n ,

where the second inequality is by Chebyshev and the third by Jensen and Assumption 3.1. Let r := c and choose η small enough so that r −1 ζ 2,PX,W ( (μ˙ 0 μ˙ 0 )1/2 2,PX + 1)η + 1 − p < 1. Then, ∞ 

Pr(∩nj=1 ({d (fˆj , fj ; fˆjr , fjr ) > c } ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }) ∩ { θˆ − θ0 < η}) < ∞. 2

n=1

Hence, by Borel–Cantelli, Pr(∩nj=1 ({d (fˆj , fj ; fˆjr , fjr ) > c } ∪ {(Zj , Zjr )  ∈ Sπ × S˜π }) ∩ { θˆ − θ0 < η} i.o.) = 0 2

⇐⇒ Pr(∪nj=1 ({d (fˆj , fj ; fˆjr , fjr ) ≤ c } ∩ {(Zj , Zjr ) ∈ Sπ × S˜π }) ∪ { θˆ − θ0 ≥ η} a.b.f.m n) = 1, 2

where ‘a.b.f.m n’ is short for ‘all but finitely many n’. Therefore, since limn→∞ Pr( θˆ − θ0 ≥ η) = 0 by Assumption 3.4, it follows that lim Pr(∪nj=1 {d(fˆj , fj ; fˆjr , fjr ) ≤ (c )1/2 and (Zj , Zjr ) ∈ Sπ × S˜π }) = 1.

n→∞



Testing conditional symmetry without smoothing

tribution of ε is symmetric conditional on W. Compare Powell (1994, Section 2) for an extensive. *Corresponding author. Email: [email protected] ... labour markets, much attention has been paid in empirical labour economics to determine ...

NAN Sizes 0 Downloads 221 Views

Recommend Documents

Testing Parametric Conditional Distributions of ...
Nov 2, 2010 - Estimate the following GARCH(1, 1) process from the data: Yt = µ + σtεt with σ2 ... Compute the transformation ˆWn(r) and the test statistic Tn.

Testing conditional moment restrictions
TESTING CONDITIONAL MOMENT RESTRICTIONS. 2063. 3. Basic assumptions and notation. Let Ii = I{xi ∈ S∗}, Sa = {ξ ∈ Ra : ξ = 1}, V (xi,θ) = E{g(zi,θ)g (zi ...

Testing conditional moment restrictions - Semantic Scholar
d(z) θ − θ0 and g(z,θ) − g(z,θ0) − ∂g(z,θ0)/∂θ(θ − θ0) ≤ l(z) θ − θ0. 2 hold w.p.1 for θ ∈ N0. Note that η = 6 will be used in the proof of Lemma A.1, and El2(z) ...... 3 = Op(dn)Op(. √. 1/nb3s n ). Finally, R3 = Op(.

Testing the translational-symmetry hypothesis of ...
The translational-symmetry hypothesis proposes that subjects discriminate same trials by the simultaneous ... on a Pentium personal computer. A video card (ATI ...

CONDITIONAL MEASURES AND CONDITIONAL EXPECTATION ...
Abstract. The purpose of this paper is to give a clean formulation and proof of Rohlin's Disintegration. Theorem (Rohlin '52). Another (possible) proof can be ...

Causal Conditional Reasoning and Conditional ...
judgments of predictive likelihood leading to a relatively poor fit to the Modus .... Predictive Likelihood. Diagnostic Likelihood. Cummins' Theory. No Prediction. No Prediction. Probability Model. Causal Power (Wc). Full Diagnostic Model. Qualitativ

Symmetry and Equivalence - PhilArchive
Unfortunately, these mutually reinforcing half-arguments don't add up to much: each ...... Torre, C. (1995), “Natural Symmetries and Yang–Mills Equations.” Jour-.

Testing the New Keynesian Phillips Curve without ... - CiteSeerX
Mar 12, 2007 - reported by other researchers using alternative methods (Fuhrer ... additional results on the identification of the NKPC which help understand the source of weak ... Moreover, the identification-robust tests do not waste power.

Testing the New Keynesian Phillips Curve without ... - CiteSeerX
Mar 12, 2007 - ∗I would like to thank Frank Kleibergen and the participants in seminars at Bonn and Boston Universities, the. EC2 conference, the CRETE ...

Symmetry and Equivalence - PhilArchive
My topic is the relation between two notions, that of a symmetry of a physical theory and that of the physical equivalence of two solutions or models of such a theory. In various guises, this topic has been widely addressed by philosophers in recent

Predictive State Smoothing - Research at Google
Jun 3, 2017 - rather call for optimal predictions P(y | x) or E(y | x). ... is constructed in such a way that it is minimal sufficient to predict y (see also adequate ...... Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013, Bellev

Symmetry Breaking by Nonstationary Optimisation
easiest to find under the variable/value order- ing but dynamic ... the problem at each search node A is to find a .... no constraint programmer would use such a.

040 - Symmetry + Reflection.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 040 - Symmetry ...

CONDITIONAL STATEMENTS AND DIRECTIVES
window: 'If you buy more than £200 in electronic goods here in a single purchase, .... defined by the Kolmogorov axioms, usually take as their domain a field of subsets of an ..... The best place to begin is the classic presentation Lewis (1973).

Conditional Probability.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

Conditional Probability Practice - edl.io
Use the table below to find each probability. Projected Number of Degree Recipients in 2010 (thousands). Degree. Male. Female. Associate's. 245. 433.

Acknowledgment of Conditional Employment
in this document and agree as a condition of my employment and ... NOW, THEREFORE, I have executed this document knowingly and ... Employee Signature.

Conflict-Driven Conditional Termination
Our search procedure combines decisions with reachability analysis to find potentially ... combining ranking functions with reachability analysis. ..... based tool FuncTion [32] cannot prove termination. cdct enables FuncTion to prove ... a meet ⊓,

CONDITIONAL STATEMENTS AND DIRECTIVES
always either true or false, but never both; as it is usually put, that they are two-valued. ... window: 'If you buy more than £200 in electronic goods here in a single .... defined by the Kolmogorov axioms, usually take as their domain a field of s

Conflict-Driven Conditional Termination
stract domain to construct and refine assignments to second-order variables in .... and first-order program variables are free but first-order position variables are ..... 100. FuncTion. Ul tima te. A utomizer. [18]. (d). Fig.4: Detailed comparison o

Augmented Symmetry Transforms
contains 801 3D CAD models classified into 42 categories of similar parts such as .... Computer-Aided Design, 36(11):1047–1062,. 2004. [19] J. Podolak, P.

Conditional sentences intermediate.pdf
c) If you don't get nervous before the exam, you ______ (have) any problems. d) If you ______ (be) rude to your sister, she won't help you. e) If María José ...