Minimax lower bounds via Neyman-Pearson lemma

Viewer
Transcript

2013.5.7.

Minimax lower bounds via Neyman-Pearson lemma Kengo Kato Suppose that there is a scalar dependent variable Y and a scalar covariate X which we assume has support in [0, 1]. Consider the nonparametric regression model Y = f (X) + ϵ, ϵ ⊥ ⊥ X, ϵ ∼ N (0, σ 2 ), σ 2 > 0. We fix the distribution of X and σ 2 > 0. Let {ϕj }∞ j=1 be an orthonormal 2 system in L ([0, 1]). We assume that L := sup E[ϕ2j (X)] < ∞. j≥1

For given α > 0 and C1 > 0, suppose that f belongs to the class F(α, C1 ) = {f ∈ L2 ([0, 1]) : |⟨f, ϕj ⟩| ≤ C1 j −α , ∀j ≥ 1}, where ⟨·, ·⟩ denotes the inner product in L2 ([0, 1]). Denote by ∥ · ∥ the L2 ([0, 1])-norm. Let (Y1 , X1 ), . . . , (Yn , Xn ) be i.i.d. observations of (Y, X). The purpose of this note is to prove (in a self-contained manner) the following (well-known) theorems by means of a simple application of the Neyman-Pearson lemma.1 Theorem 1. Under the above setup, we have inf

sup

fˆ f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≳ n−(2α−1)/(2α) ,

where the infimum is taken over all estimators fˆ of f . Remark 1. The idea of the proof of Theorem 1 is borrowed from [1, 2] where minimax lower bounds in the problem of estimating structural functions in nonparametric instrumental variables models and slope functions in functional linear models are derived [see also the proof of 4, Theorem 7]. However, in [1, 2], detailed proofs for the minimax lower bounds are not presented (though the proofs are correct). Hence I hope that this note would be of some help in understanding their proofs. Alternatively, we have the following theorem. Theorem 2. There exists a small constant c > 0 such that lim inf inf n→∞

sup

fˆ f ∈F (α,C1 )

Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) > 0.

1For various techniques to derive minimax lower bounds in nonparametric statistical

problems, see [3]. 1

2

Proof of Theorem 1. Let Mn be the integer part of n1/(2α) . For θMn = (θMn +1 , . . . , θ2Mn )T ∈ RMn , define 2M ∑n

fθMn (·) = C1

j −α θj ϕj (·).

j=Mn +1

Clearly, fθMn ∈ F (α, C1 ) whenever θMn ∈ [0, 1]Mn . Lemma 1. We have inf

sup

fˆ f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≥ inf

sup

θˆMn θ Mn ∈{0,1}Mn

EθMn [∥fθˆMn − fθMn ∥2 ],

where the infimum on the right side is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Proof of Lemma 1. For arbitrary fˆ, we have sup f ∈F (α,C1 )

Ef [∥fˆ − f ∥2 ] ≥

sup θ Mn ∈{0,1}Mn

EθMn [∥fˆ − fθMn ∥2 ].

Moreover, by Bessel’s inequality, ∥fˆ − f ∥2 ≥

∞ ∑

(⟨fˆ, ϕj ⟩ − ⟨f, ϕj ⟩)2 ,

j=1

so that when f = fθMn for some θMn ∈ {0, 1}Mn , it is enough to consider the estimator of the form fˆ(·) =

2M ∑n

α ˆ j ϕj (·),

j=Mn +1

where α ˆ j are data-dependent. By defining θˆj = C1−1 j α α ˆ j , fˆ is of the form 2M ∑n

fˆ(·) = fθˆMn (·) = C1

j −α θˆj ϕj (·).

j=Mn +1

We need to show that we can restrict θˆj in such a way that 0 ≤ θˆj ≤ 1. For given θˆj , define  ˆ  1, if θj > 1, θ˜j = θˆj , if 0 ≤ θˆj ≤ 1,   0, if θˆj < 0. Then whenever θMn ∈ {0, 1}Mn , ∥fθˆMn − fθMn ∥2 ≥ ∥fθ˜Mn − fθMn ∥2 . This completes the proof of the lemma.

□

3

For the notational convenience, write Mn = (θMn +1 , . . . , θj−1 , θj+1 , . . . , θ2Mn )T , Mn + 1 ≤ j ≤ 2Mn . θ−j

Observe that sup θ Mn ∈{0,1}Mn

≥

=

1 2Mn C12 2Mn

EθMn [∥fθˆMn − fθMn ∥2 ] ∑

EθMn [∥fθˆMn − fθMn ∥2 ]

θ Mn ∈{0,1}Mn 2M ∑n

j −2α

j=Mn +1

∑

{

EθMn ,θj =0 [(θˆj − θj )2 ] −j

Mn θ−j ∈{0,1}Mn −1

} + EθMn ,θj =1 [(θˆj − θj )2 ] . −j

(1)

We want to lower bound EθMn ,θj =0 [(θˆj − θj )2 ] + EθMn ,θj =1 [(θˆj − θj )2 ]. −j

−j

To this end, we make use of a variant of Neyman-Pearson lemma combined with Le Cam’s inequality [3, Lemma 2.3]. Lemma 2. Let (S, S, µ) be a measure space, and let p, q be probability density functions with respect to µ. Then (i) (A variant of Neyman-Pearson lemma): {∫ } ∫ ∫ inf φpdµ + ψqdµ : φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1 ≥ (p ∧ q)dµ. (ii) (Le Cam’s inequality): (∫ )2 ∫ 1 √ pqdµ . (p ∧ q)dµ ≥ 2 Proof of Lemma 2. Part (i): Let φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1. Then ∫ ∫ ∫ ∫ φpdµ + ψqdµ ≥ (φ ∧ 1)pdµ + (1 − φ ∧ 1)qdµ. We lower bound the right side with respect to φ. Clearly, we may assume that φ ≤ 1. The desired conclusion follows from the inequality ∫ (p − q)(φ − 1(p < q))dµ ≥ 0. ∫ ∫ Part (ii): Since (p ∨ q)dµ + (p ∧ q)dµ = 2, we have (∫ )2 (∫ )2 ∫ ∫ √ √ pqdµ = (p ∧ q)(p ∨ q)dµ ≤ (p ∧ q)dµ (p ∨ q)dµ { } ∫ ∫ ∫ = (p ∧ q)dµ 2 − (p ∨ q)dµ ≤ 2 (p ∧ q)dµ, so that the desired inequality is obtained.

□

4 Mn For a while, fix Mn+1 +1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 . Let pθj (y | x) denote the conditional density function of Y given X = x when f = fθMn ,θj :

{ )2 } 1 1 ( pθj (y | x) = √ exp − 2 y − fθMn ,θj (x) . −j 2σ 2πσ 2

−j

Then EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] −j −j [∫ n ∏ =E θˆj2 ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn ∫ { +

i=1

1 − θˆj ((y1 , X1 ), . . . , (yn , Xn ))

n }2 ∏

] pθj =1 (yi | Xi )dy1 · · · dyn . (2)

i=1

Note that θˆj2 + (1 − θˆj )2 ≥ 1/2, i.e., 2θˆj2 + 2(1 − θˆj )2 ≥ 1, and  ( )2    f (x) + f (x) Mn Mn θ−j ,θj =0 θ−j ,θj =1 1 1 pθj =0 (y | x)pθj =1 (y | x) = exp − y −  σ2  2πσ 2 2 { } 1 × exp − 2 (fθMn ,θj =1 (x) − fθMn ,θj =0 (x))2 , −j −j 4σ so that, by Lemma 2, ∫ n ∏ 2 ˆ θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn i=1

∫ { n }2 ∏ ˆ + 1 − θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =1 (yi | Xi )dy1 · · · dyn ≥

1 exp 4

{

C 2 j −2α − 1 2 8σ

n ∑

}

i=1

ϕ2j (Xi ) .

i=1

By convexity of the map x 7→ e−x , we have { } { } C12 j −2α n 1 C12 j −2α nL 1 2 E[ϕj (X)] ≥ exp − (2) ≥ exp − . 4 8σ 2 4 8σ 2 For j ≥ Mn + 1, j −2α n ≤ (Mn + 1)−2α n ≤ 1, so that whenever j ≥ Mn + 1, EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] ≥ −j

−j

{ } 1 C 2L exp − 1 2 . 4 8σ

5 Mn Since Mn + 1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 are arbitrary, combining this inequality with (1), we have

inf

sup

θˆMn θ Mn ∈{0,1}Mn

≥ ∑2Mn

Since j=Mn +1 conclusion.

j −2α

EθMn [∥fθˆMn − fθMn ∥2 ]

{ } C12 C 2L exp − 1 2 8 8σ ∼

Mn−2α+1

∼

2M ∑n

j −2α .

(3)

j=Mn +1

n−(2α−1)/(2α) ,

we obtain the desired □

Proof of Theorem 2. It is not diﬃcult to see that inf sup Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) fˆ f ∈F (α,C1 )

≥ inf

sup

θˆMn θ Mn ∈{0,1}Mn

PθMn (∥fθˆMn − fθMn ∥2 > cn−(2α−1)/(2α) ),

where inf θˆMn is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Denote by δ1n the right side on (3). Fix arbitrary estimator θˆMn ∈ [0, 1]Mn −1 of θMn . Since {0, 1}Mn is a finite set and the supremum over {0, 1}Mn is attained, there exists a sequence θMn such that EθMn [∥fθˆMn − fθMn ∥2 ] ≥ δ1n . Moreover, 2M ∑n

∥fθˆMn − fθMn ∥ ≤ C1 2

j

−2α

j=Mn +1

so that E[∥fθˆMn − fθMn

∥4 ]

≤

2 . δ2n

2M ∑n

(θˆj − θj )2 ≤ C12

j −2α =: δ2n ,

j=Mn +1

We recall the Paley-Zygmund inequality.

Lemma 3 (Paley-Zygmund inequality). Let Z be a real valued random variable with finite second moment. Then for every λ ∈ (0, 1), P(Z ≥ λE[Z]) ≥ (1 − λ)2

(E[Z])2 . E[Z 2 ]

Apply the Paley-Zygmund inequality with λ = 1/2 and Z = ∥fθˆMn − fθMn ∥2 . Then ( ) PθMn ∥fθˆMn − fθMn ∥2 ≥ δ1n /2 ) ( 1 2 2 ≥ PθMn ∥fθˆMn − fθMn ∥ ≥ EθMn [∥fθˆMn − fθMn ∥ ] 2 { } 2 1 1 (E[∥fθˆMn − fθMn ∥2 ])2 δ1n C12 L ≥ ≥ 2 ≥ exp − 2 . 4 E[∥fθˆMn − fθMn ∥4 ] 256 4σ 4δ2n Therefore, we conclude that

{ } C12 L 1 2 ˆ exp − 2 . lim inf inf sup Pf (∥f − f ∥ > δ1n /2) ≥ n→∞ fˆ f ∈F (α,C ) 256 4σ 1

6

Since δ1n ∼ n−(2α−1)/(2α) , we obtain the desired conclusion.

□

References [1] Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist. 33 2904-2929. [2] Hall, P. and Horowitz, J.L. (2007). Methodology and convergence rates for functional linear regression. Ann. Statist. 35 70-91. [3] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer. [4] Yuan, M. and Cai, T. (2010). A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 38 3412-3444.

Message Lower Bounds via Efficient Network ...