2013.5.7.
Minimax lower bounds via Neyman-Pearson lemma Kengo Kato Suppose that there is a scalar dependent variable Y and a scalar covariate X which we assume has support in [0, 1]. Consider the nonparametric regression model Y = f (X) + ϵ, ϵ ⊥ ⊥ X, ϵ ∼ N (0, σ 2 ), σ 2 > 0. We fix the distribution of X and σ 2 > 0. Let {ϕj }∞ j=1 be an orthonormal 2 system in L ([0, 1]). We assume that L := sup E[ϕ2j (X)] < ∞. j≥1
For given α > 0 and C1 > 0, suppose that f belongs to the class F(α, C1 ) = {f ∈ L2 ([0, 1]) : |⟨f, ϕj ⟩| ≤ C1 j −α , ∀j ≥ 1}, where ⟨·, ·⟩ denotes the inner product in L2 ([0, 1]). Denote by ∥ · ∥ the L2 ([0, 1])-norm. Let (Y1 , X1 ), . . . , (Yn , Xn ) be i.i.d. observations of (Y, X). The purpose of this note is to prove (in a self-contained manner) the following (well-known) theorems by means of a simple application of the Neyman-Pearson lemma.1 Theorem 1. Under the above setup, we have inf
sup
fˆ f ∈F (α,C1 )
Ef [∥fˆ − f ∥2 ] ≳ n−(2α−1)/(2α) ,
where the infimum is taken over all estimators fˆ of f . Remark 1. The idea of the proof of Theorem 1 is borrowed from [1, 2] where minimax lower bounds in the problem of estimating structural functions in nonparametric instrumental variables models and slope functions in functional linear models are derived [see also the proof of 4, Theorem 7]. However, in [1, 2], detailed proofs for the minimax lower bounds are not presented (though the proofs are correct). Hence I hope that this note would be of some help in understanding their proofs. Alternatively, we have the following theorem. Theorem 2. There exists a small constant c > 0 such that lim inf inf n→∞
sup
fˆ f ∈F (α,C1 )
Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) > 0.
1For various techniques to derive minimax lower bounds in nonparametric statistical
problems, see [3]. 1
2
Proof of Theorem 1. Let Mn be the integer part of n1/(2α) . For θMn = (θMn +1 , . . . , θ2Mn )T ∈ RMn , define 2M ∑n
fθMn (·) = C1
j −α θj ϕj (·).
j=Mn +1
Clearly, fθMn ∈ F (α, C1 ) whenever θMn ∈ [0, 1]Mn . Lemma 1. We have inf
sup
fˆ f ∈F (α,C1 )
Ef [∥fˆ − f ∥2 ] ≥ inf
sup
θˆMn θ Mn ∈{0,1}Mn
EθMn [∥fθˆMn − fθMn ∥2 ],
where the infimum on the right side is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Proof of Lemma 1. For arbitrary fˆ, we have sup f ∈F (α,C1 )
Ef [∥fˆ − f ∥2 ] ≥
sup θ Mn ∈{0,1}Mn
EθMn [∥fˆ − fθMn ∥2 ].
Moreover, by Bessel’s inequality, ∥fˆ − f ∥2 ≥
∞ ∑
(⟨fˆ, ϕj ⟩ − ⟨f, ϕj ⟩)2 ,
j=1
so that when f = fθMn for some θMn ∈ {0, 1}Mn , it is enough to consider the estimator of the form fˆ(·) =
2M ∑n
α ˆ j ϕj (·),
j=Mn +1
where α ˆ j are data-dependent. By defining θˆj = C1−1 j α α ˆ j , fˆ is of the form 2M ∑n
fˆ(·) = fθˆMn (·) = C1
j −α θˆj ϕj (·).
j=Mn +1
We need to show that we can restrict θˆj in such a way that 0 ≤ θˆj ≤ 1. For given θˆj , define ˆ 1, if θj > 1, θ˜j = θˆj , if 0 ≤ θˆj ≤ 1, 0, if θˆj < 0. Then whenever θMn ∈ {0, 1}Mn , ∥fθˆMn − fθMn ∥2 ≥ ∥fθ˜Mn − fθMn ∥2 . This completes the proof of the lemma.
□
3
For the notational convenience, write Mn = (θMn +1 , . . . , θj−1 , θj+1 , . . . , θ2Mn )T , Mn + 1 ≤ j ≤ 2Mn . θ−j
Observe that sup θ Mn ∈{0,1}Mn
≥
=
1 2Mn C12 2Mn
EθMn [∥fθˆMn − fθMn ∥2 ] ∑
EθMn [∥fθˆMn − fθMn ∥2 ]
θ Mn ∈{0,1}Mn 2M ∑n
j −2α
j=Mn +1
∑
{
EθMn ,θj =0 [(θˆj − θj )2 ] −j
Mn θ−j ∈{0,1}Mn −1
} + EθMn ,θj =1 [(θˆj − θj )2 ] . −j
(1)
We want to lower bound EθMn ,θj =0 [(θˆj − θj )2 ] + EθMn ,θj =1 [(θˆj − θj )2 ]. −j
−j
To this end, we make use of a variant of Neyman-Pearson lemma combined with Le Cam’s inequality [3, Lemma 2.3]. Lemma 2. Let (S, S, µ) be a measure space, and let p, q be probability density functions with respect to µ. Then (i) (A variant of Neyman-Pearson lemma): {∫ } ∫ ∫ inf φpdµ + ψqdµ : φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1 ≥ (p ∧ q)dµ. (ii) (Le Cam’s inequality): (∫ )2 ∫ 1 √ pqdµ . (p ∧ q)dµ ≥ 2 Proof of Lemma 2. Part (i): Let φ ≥ 0, ψ ≥ 0, φ + ψ ≥ 1. Then ∫ ∫ ∫ ∫ φpdµ + ψqdµ ≥ (φ ∧ 1)pdµ + (1 − φ ∧ 1)qdµ. We lower bound the right side with respect to φ. Clearly, we may assume that φ ≤ 1. The desired conclusion follows from the inequality ∫ (p − q)(φ − 1(p < q))dµ ≥ 0. ∫ ∫ Part (ii): Since (p ∨ q)dµ + (p ∧ q)dµ = 2, we have (∫ )2 (∫ )2 ∫ ∫ √ √ pqdµ = (p ∧ q)(p ∨ q)dµ ≤ (p ∧ q)dµ (p ∨ q)dµ { } ∫ ∫ ∫ = (p ∧ q)dµ 2 − (p ∨ q)dµ ≤ 2 (p ∧ q)dµ, so that the desired inequality is obtained.
□
4 Mn For a while, fix Mn+1 +1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 . Let pθj (y | x) denote the conditional density function of Y given X = x when f = fθMn ,θj :
{ )2 } 1 1 ( pθj (y | x) = √ exp − 2 y − fθMn ,θj (x) . −j 2σ 2πσ 2
−j
Then EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] −j −j [∫ n ∏ =E θˆj2 ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn ∫ { +
i=1
1 − θˆj ((y1 , X1 ), . . . , (yn , Xn ))
n }2 ∏
] pθj =1 (yi | Xi )dy1 · · · dyn . (2)
i=1
Note that θˆj2 + (1 − θˆj )2 ≥ 1/2, i.e., 2θˆj2 + 2(1 − θˆj )2 ≥ 1, and ( )2 f (x) + f (x) Mn Mn θ−j ,θj =0 θ−j ,θj =1 1 1 pθj =0 (y | x)pθj =1 (y | x) = exp − y − σ2 2πσ 2 2 { } 1 × exp − 2 (fθMn ,θj =1 (x) − fθMn ,θj =0 (x))2 , −j −j 4σ so that, by Lemma 2, ∫ n ∏ 2 ˆ θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =0 (yi | Xi )dy1 · · · dyn i=1
∫ { n }2 ∏ ˆ + 1 − θj ((y1 , X1 ), . . . , (yn , Xn )) pθj =1 (yi | Xi )dy1 · · · dyn ≥
1 exp 4
{
C 2 j −2α − 1 2 8σ
n ∑
}
i=1
ϕ2j (Xi ) .
i=1
By convexity of the map x 7→ e−x , we have { } { } C12 j −2α n 1 C12 j −2α nL 1 2 E[ϕj (X)] ≥ exp − (2) ≥ exp − . 4 8σ 2 4 8σ 2 For j ≥ Mn + 1, j −2α n ≤ (Mn + 1)−2α n ≤ 1, so that whenever j ≥ Mn + 1, EθMn ,θj =0 [θˆj2 ] + EθMn ,θj =1 [(θˆj − 1)2 ] ≥ −j
−j
{ } 1 C 2L exp − 1 2 . 4 8σ
5 Mn Since Mn + 1 ≤ j ≤ 2Mn and θ−j ∈ {0, 1}Mn −1 are arbitrary, combining this inequality with (1), we have
inf
sup
θˆMn θ Mn ∈{0,1}Mn
≥ ∑2Mn
Since j=Mn +1 conclusion.
j −2α
EθMn [∥fθˆMn − fθMn ∥2 ]
{ } C12 C 2L exp − 1 2 8 8σ ∼
Mn−2α+1
∼
2M ∑n
j −2α .
(3)
j=Mn +1
n−(2α−1)/(2α) ,
we obtain the desired □
Proof of Theorem 2. It is not difficult to see that inf sup Pf (∥fˆ − f ∥2 > cn−(2α−1)/(2α) ) fˆ f ∈F (α,C1 )
≥ inf
sup
θˆMn θ Mn ∈{0,1}Mn
PθMn (∥fθˆMn − fθMn ∥2 > cn−(2α−1)/(2α) ),
where inf θˆMn is taken over all estimators θˆMn ∈ [0, 1]Mn of θMn . Denote by δ1n the right side on (3). Fix arbitrary estimator θˆMn ∈ [0, 1]Mn −1 of θMn . Since {0, 1}Mn is a finite set and the supremum over {0, 1}Mn is attained, there exists a sequence θMn such that EθMn [∥fθˆMn − fθMn ∥2 ] ≥ δ1n . Moreover, 2M ∑n
∥fθˆMn − fθMn ∥ ≤ C1 2
j
−2α
j=Mn +1
so that E[∥fθˆMn − fθMn
∥4 ]
≤
2 . δ2n
2M ∑n
(θˆj − θj )2 ≤ C12
j −2α =: δ2n ,
j=Mn +1
We recall the Paley-Zygmund inequality.
Lemma 3 (Paley-Zygmund inequality). Let Z be a real valued random variable with finite second moment. Then for every λ ∈ (0, 1), P(Z ≥ λE[Z]) ≥ (1 − λ)2
(E[Z])2 . E[Z 2 ]
Apply the Paley-Zygmund inequality with λ = 1/2 and Z = ∥fθˆMn − fθMn ∥2 . Then ( ) PθMn ∥fθˆMn − fθMn ∥2 ≥ δ1n /2 ) ( 1 2 2 ≥ PθMn ∥fθˆMn − fθMn ∥ ≥ EθMn [∥fθˆMn − fθMn ∥ ] 2 { } 2 1 1 (E[∥fθˆMn − fθMn ∥2 ])2 δ1n C12 L ≥ ≥ 2 ≥ exp − 2 . 4 E[∥fθˆMn − fθMn ∥4 ] 256 4σ 4δ2n Therefore, we conclude that
{ } C12 L 1 2 ˆ exp − 2 . lim inf inf sup Pf (∥f − f ∥ > δ1n /2) ≥ n→∞ fˆ f ∈F (α,C ) 256 4σ 1
6
Since δ1n ∼ n−(2α−1)/(2α) , we obtain the desired conclusion.
□
References [1] Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Statist. 33 2904-2929. [2] Hall, P. and Horowitz, J.L. (2007). Methodology and convergence rates for functional linear regression. Ann. Statist. 35 70-91. [3] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer. [4] Yuan, M. and Cai, T. (2010). A reproducing kernel Hilbert space approach to functional linear regression. Ann. Statist. 38 3412-3444.