Universal power of Kolmogorov-Smirnov tests of under-identifying restrictions. Alfred Galichon and Marc Henry ´ Ecole polytechnique Universit´e de Montr´eal, CIREQ, CIRANO First draft: May 15, 2007 This draft1 : February 12, 2008
Abstract We provide a test for the specification of a structural econometric model without identifying assumptions. From a dual formulation of a null hypothesis of compatibility of the data generating process with the structure, we derive a Kolmogorov-Smirnov statistic for Choquet capacity functionals, which we use to construct our test. We show that our test is uniform in level over all the data-generating processes compatible with the structure, and has power against local alternatives defined in terms of total variation.
AMS (2000) Subject Classification: Primary 62G10, Secondary 49J35, 49N15 Keywords: Random correspondence, minimax theorem, specification test, total variation
Introduction A large amount of attention in the econometric literature has recently focused on the issue of conducting inference with economic models that only partially identify the distribution of observed variables because of censoring, non random sample selection, unobserved heterogeneity and multiple equilibria. The general structure of such models is a binary relation (i.e. a set of restrictions) between observable variables and latent ones, and the 1
The authors thank Victor Chernozhukov and Pierre-Andr´e Chiappori for helpful discussions. Financial
support from NSF Grant SBR 9729559 is also gratefully acknowledged by both authors.
1
inference problem is that of testing the validity of these restrictions based on a sample of the observable variables. The null hypothesis in such tests is characterized by a collection of probability distributions for the observed variables, namely the probability distributions that are set-wise dominated by the Choquet capacity functional implied by the restrictions. We show here how to apply a generalized Kolmogorov-Smirnov testing procedure to this problem, and we describe the main issues that arise in doing so. The Kolmogorov-Smirnov procedure is generalized in the sense that it is a Kolmogorov-Smirnov test for capacity functionals, hence the null hypothesis is composite, which complicates the asymptotic treatment of the test statistic and introduces an additional uniformity requirement. Additionally, a minimax result is required, which is the main innovation of this paper, to interpret our test in terms of distance of the empirical distribution of the sample to the set of distributions compatible with the null (i.e. dominated by the capacity functional). This allows us to show that our test has power against local alternatives defined in total variation, unlike the classical Kolmogorov-Smirnov specification test, which only has power against local alternatives in Kolmogorov-Smirnov distance.
Related research Chernozhukov, Hong, and Tamer (2007) is both a pioneering contribution and an upto-date reference on the topic of inference with partially identified economic models. A one-sided version of the Kolmogorov-Smirnov test was proposed by McFadden (1989) to test stochastic dominance hypotheses, and Kolmogorov-Smirnov tests of composite null hypotheses were also considered by Khamladze (1981).
Organization of the paper The paper is divided in two main sections. The first describes the test, the second describes the minimax theorem that is instrumental in showing universal power of the test. The first section is divided in three subsections following the presentation of the framework and null hypothesis. The first subsection concentrates on the procedure to obtain critical values, the second treats the issue of uniformity, and the third is a discussion of power.
2
1
Tests of non-identifying restrictions
We consider a given set of structural restrictions on the variables that describe an economic system. A variable X is observable, and a sample (X1 , . . . , Xn ) of independent and identically distributed copies of X is available. A variable ² is unobservable, and can be interpreted as a shock, an input or a latent state, depending on the context. X is a random variable on the probability space (Ω, Σ, P) (to avoid confusion we shall use the notation Pr for statements with respect to the underlying probability). Its probability distribution P ˜ to denote “X has compact domain Y ⊂ R. We shall use the short-hand X ∼ P nd X ∼ X ˜ respectively. ² is a is distributed according to P ” and “X has the same distribution as X” random variable on the same probability space (Ω, Σ, P). Its probability distribution ν has compact domain U ⊂ R. The structural restrictions come in the form of a binary relation on U × Y, or equivalently in the form of a correspondence G : U ⇒ Y. The binary relation contains the pairs (e, x) ∈ U × Y that are acceptable according to the given structure, and is equal to the graph of the correspondence G, i.e. Graph(G) = {(e, x) ∈ U × Y : x ∈ G(e)}. The correspondence G is assumed to be a closed-valued random correspondence, in that it satisfies the following basic measurability assumption: Assumption 1. G is non-empty closed-valued and measurable, i.e. G−1 (A) = {e ∈ R : G(e) ∩ A 6= ∅} is Borel measurable for any open set A. The objective of this work is to devise a test of validity of the restrictions, which we define in the following way: H0 : There exists a probability distribution π on Graph(G) with marginals P and ν. The test we propose is based on an equivalent form of the null hypothesis derived with an appeal to the duality of mass transportation in Galichon and Henry (2006), as stated in the following proposition. Proposition 1. The following statement holds. min{Pr(X ∈ / G(²)) : X ∼ P, ² ∼ ν} = sup{ P (B) − ν[G−1 (B)] : B Borel }. Hence the null hypothesis of validity of the structural restrictions H0 is equivalent to sup{ P (B) − ν[G−1 (B)] : B Borel } = 0
3
(1)
In the case where G is a function, H0 is equivalent to the equality between the probability distributions P and νG−1 , the push-forward of ν by G, defined for all A measurable by νG−1 (A) = ν{e ∈ U : G(e) ∈ A}. The Komogorov-Smirnov test for that hypothesis is based on the statistic Tn =
√
n sup{Pn (B) − νG−1 (B) : B ∈ C},
with C = {(−∞, x], (x, +∞) : x ∈ R}, and denoting by Pn the empirical distribution relative to the sample (X1 , . . . , Xn ). We consider this same test statistic to test H0 in the general case where G is a correspondence, and three main issues arise that distinguish it from the classical KolmogorovSmirnov testing procedure. First the test statistic has a different asymptotic distribution when G is multi-valued. Second, the null hypothesis is composite, and we need to address the issue of uniformity of the test. Thirdly, the multi-valued nature of G may allow the test to have power against local alternatives that are defined in total variation distance. These three differences are detailed in the following three subsections.
1.1
Critical values.
Under the null hypothesis H0 , we have P (A) − ν(G−1 (A)) ≤ 0 for all A Borel. Calling Gn √ the empirical process Gn = n(Pn − P ), we have √
n sup(Pn (A) − ν(G−1 (A))) A∈C √ = sup(Gn (A) + n(P (A) − ν(G−1 (A)))).
Tn =
A∈C
Unlike the case of the classical Kolmogorov-Smirnov test, the second term in the previous display does not vanish under the null, since the “regions of indeterminacy” allow δ(A) := P (A) − ν(G−1 (A)) to be strictly negative for some sets A ∈ C. Note that δ is independent √ of n, so that the scaling factor n will pull the second term in the previous display to −∞ for all the sets where the inequality is strict. This prompts the following definition, illustrated in figure 1: Definition 1. We denote the subclass of sets from C where P = νG−1 by Cb , i.e. © ª Cb := A ∈ C : P (A) = ν(G−1 (A)) . The empirical process converges weakly to the P -Browninan bridge G, and the convergence is uniform over the class C (i.e. the convergence is in l∞ (F), where F is the class of indicator 4
Y
U
Figure 1: Examples of sets in Cb (symbolized by the arrows) in a correctly specified case (P and ν are uniform, hence correct specification corresponds to the graph of G containing the diagonal).
functions of sets in C), so that by the continuous mapping theorem, the supremum of the empirical process converges weakly to the supremum of the Brownian bridge. Galichon and Henry (2006) show that under mild regularity conditions on δ, the term
√
nδ
dominates the oscillations of the empirical process, and the sets in C\Cb drop out from the supremum in the asymptotic expression, so that Tn à sup G(A),
(2)
A∈Cb
where à denotes weak convergence. Naturally, since Cb depends on the unknown P , we need to find a data dependent class of sets to approximate it. Definition 2. We denote by Ch the data dependent subclass of sets from C where Pn ≥ ν(G−1 ) − h, i.e. © ª Ch := A ∈ C : Pn (A) ≥ ν(G−1 (A)) − h .
Galichon and Henry (2006) show that if the data dependent class is constructed as in definition 2 with a bandwidth sequence h = hn > 0 satisfying r ln ln n −1 hn (ln ln n) + hn → 0, as n → ∞, n 5
(3)
then sup Gn (A) Ã sup G(A)
A∈Chn
(4)
A∈Cb
Based on this result, we propose the following testing procedure: Test Statistic: Tn =
√
n sup{Pn (B) − νG−1 (B) : B ∈ C},
Decision Rule: Reject the null hypothesis H0 if Tn > c˜n (α), where C = {(−∞, x], (x, +∞) : x ∈ R} , c˜n (α) is the quantile of level α of T˜n = supA∈Chn Gn (A), Chn is defined according to definition 2 with hn satisfying (3). Remark 1. Galichon and Henry (2006) show that the version T˜n∗ of T˜n , where Gn is replaced by the bootstrapped empirical process converges to the same limit, hence the critical values c˜n (α) can be replaced by the quantiles c˜∗ (α) of T˜∗ . n
1.2
n
Uniform level
The second issue that differentiates our test from a classical Kolmogorov-Smirnov test is that we are testing a composite hypothesis when G is multi-valued. To see why, return to formulation 1 of the null hypothesis and rephrase it as H0 : P ∈ Core(ν, G), with the following definitions: Definition 3. We define Core(ν, G) as the set of probability measures that are set-wise dominated by the νG−1 , i.e. Core(ν, G) = {P probability on Y : P (A) ≤ ν(G−1 (A)), all A Borel }. Remark 2. Under assumption 1, the set function νG−1 defined for all A measurable by νG−1 (A) = ν(G−1 (A)) is a Choquet capacity functional, also called an infinitely alternating capacity (see section 26.8 page 209 of Choquet (1954)), hence its Core, Core(ν, G) is nonempty. 6
Since we are testing a composite hypothesis, good performance will generally require that the level of the test be uniform over all possible observable data generating processes (i.e. probabilities P for X) compatible with the null hypothesis. We therefore show that our test has uniform level over Core(ν, G) in the following theorem 1. Assumption 2. P and ν are absolutely continuous with respect to Lebesgue measure. Theorem 1. Under assumptions 1 and 2, and with hn satisfying (3), we have lim inf n
inf
P ∈Core(ν,G)
Pr(Tn ≤ c˜n (α)|H0 ) ≥ 1 − α.
Proof of Theorem 1: It suffices to show that Tn ≤ T˜n . The latter follows from Tn − T˜n = sup (Gn (A) + A∈ C ( = max
√
nδ(A)) − sup Gn (A) A∈ Chn
sup (Gn (A) +
√
nδ(A)) − sup Gn (A), A∈ Chn
A∈ C\Chn
sup (Gn (A) +
( ≤ max ≤ max
√
)
nδ(A)) − sup Gn (A) A∈ Chn
)
nδ(A)) − sup Gn (A), 0 A∈ Chn
A∈ C\Chn
sup (Gn (A) + (
≤ max
A∈ Chn
sup (Gn (A) + (
√
√
) nδ(A)), 0
A∈ C\Chn
) √ sup ( n(Pn (A) − νG−1 (A))), 0
A∈ C\Chn
√ ≤ max(−hn n, 0) ≤ 0, where the first inequality follows from the non-positivity of δ, the second inequality follows from the fact that ∅ ∈ Chn , so that supA∈ Chn Gn (A) ≥ 0, the third inequality follows from the definition of Chn , and the last inequality follows from (3). ¤
1.3
Power against local alternatives defined in total variation
We highlight here the third discrepancy between the classical Kolmogorov-Smirnov specification testing procedure and our test when the correspondence G is multi-valued. To do this, we first give two definitions of probability metrics. First the total variation metric. 7
Definition 4. The total variation distance on the space of probability measures is defined by TV(Q, Q0 ) = supA∈ B (Q(A) − Q0 (A)) for any Q, Q0 probability measures on R, and B the class of Borel sets. The total variation distance between a probability measure Q and a set of probability measures P will be defined by TV(Q, P) = inf Q0 ∈ P TV(Q, Q0 ). Similarly, we define the Kolmogorov-Smirnov metric. Definition 5. The Kolmogorov-Smirnov metric between two probability measures Q and Q0 is defined as TKS (Q, Q0 ) = supB∈C (Q(B)−Q0 (B)), with C as in (1.1), and the KolmogorovSmirnov distance between a probability measure Q and a set of probability measures P will be defined by TKS (Q, P) = inf Q0 ∈ P TKS (Q, Q0 ). To understand how the case where G is multi-valued can differ from the classical KolmogorovSmirnov test, note first that the classical Kolmogorov-Smirnov test has power against local alternatives defined in terms of Kolmogorov-Smirnov metric (as shown in chapter 15 of Lehmann and Romano (2005)). However, it has no power against alternatives defined in terms of total variation metric, since the total variation distance between the empirical distribution Pn (which is discrete) and the single true distribution (which is absolutely continuous with respect to Lebesgue measure) is constant equal to 1. On the other hand, the total variation metric between Pn and the set Core(ν, G) of distributions compatible with the null hypothesis H0 can be made arbitrarily small when G is multi-valued an satisfies some regularity conditions listed below. Hence we show that the test of the composite hypothesis H0 has power where the traditional Kolmogorov-Smirnov specification test does not. To make the heuristic description above more precise, we define a sequence of local alternatives in total variation distance with the following assumption. Assumption 3. We denote by Qn a sequence of local alternatives in total variation distance, defined by
√
nTV(Qn , Core(ν, G)) ≥ γn → ∞.
We assume that G and G−1 have convex values. and that supB∈ B χ(G−1 (B)) < ∞, where χ(B) is the number of connected components of a measurable set B. We denote 2 supB∈ B χ(G−1 (B)) + 1 by κ. Theorem 2. Under assumptions 1-3, we have limn Pr(Tn > c˜n (α)|Qn ) = 1 for all α > 0.
8
Remark 3. The main ingredient of the proof is a minimax theorem, denoted theorem 3, which is of individual value, since it allows the interpretation of our test in terms of distance between the empirical distribution of the sample of observations and Core(ν, G) which characterizes the composite null hypothesis. We present and prove the minimax theorem in the following section. ˆ n the empirical distribution relative to an iid sample Proof of Theorem 2: Denote by Q of size n drawn from Qn . By the triangle inequality, we have ˆ n , Core(ν, G)) ≥ TKS (Qn , Core(ν, G)) − TKS (Q ˆ n , Qn ). TKS (Q We have ˆ n , Core(ν, G)) ≤ TV(Q ˆ n , Core(ν, G)) TKS (Q ˆ n (B) − ν[G−1 (B)]) = sup(Q
(5) (6)
B∈B
ˆ n (B) − ν[G−1 (B)]) ≤ κ sup(Q
(7)
B∈C
=
κTn √ , n
(8)
where (5) follows from the fact that the total variation metric is stronger than the KolmogorovSmirnov metric; (6) follows from lemma 3; (7) follows from lemma 2; and Tn in (8) is our test statistic. In addition, TKS (Qn , Core(ν, G)) ≥ sup(Qn (B) − ν[G−1 (B)])
(9)
B∈C
1 sup(Qn (B) − ν[G−1 (B)]) κ B∈B 1 = TV(Qn , Core(ν, G)) κ γn ≥ √ , κ n ≥
(10) (11) (12)
where (9) follows from lemma 1; (10) follows from lemma 2; (11) follows from lemma 3; and (12) follows from the definition of the alternative sequence Qn . It follows that n
Pr (Tn > c˜n (α) | Q ) ≥ Pr Since γn diverges, and
√
³√
´ γn n n n ˆ nTKS (Q , Q ) ≤ − κ˜ cn (α) | Q . κ
ˆ n , Qn ) is tight under Qn (see for problem 11.60 page 476 of nTKS (Q
Lehmann and Romano (2005)), the probability in (13) tends to 1, which proves the result. 9
2
Minimax Theorem
We now turn to our minimax theorem, which is the main ingredient in the proof of power against local alternatives in total variation. We present additional lemmata and the main theorem, which hold under the assumptions of Theorem 2. Lemma 1. sup(Q(B) − ν[G−1 (B)]) ≤ TKS (Q, Core(ν, G)) B∈C
Proof of Lemma 1: It follows from sup(Q(B) − ν[G−1 (B)]) = sup
inf
˜ B∈C Q∈Core(ν,G)
B∈C
≤
inf
˜ (Q(B) − Q(B))
˜ sup(Q(B) − Q(B))
˜ Q∈Core(ν,G) B∈C
= TKS (Q, Core(ν, G)) where the first inequality holds by Corollary 3.4 of Castaldo, Maccheroni, and Marinacci (2004), and the second inequality is a standard minimax inequality. Lemma 2. sup(Q(B) − ν[G−1 (B)]) ≤ sup(Q(B) − ν[G−1 (B)]) ≤ κ sup(Q(B) − ν[G−1 (B)]) B∈C
B∈B
B∈C
Proof of Lemma 2: The first inequality is immediate since C ⊂ B. For the second inequality, let B ∈ B attain supB∈B P (B) − νG−1 (B) (it exists since the set function νG−1 is infinitely (or completely) alternating, see for instance chapter 1 of Molchanov (2005)). Calling p the number of connected components of G−1 (B), we have the existence of B2k , B2k+1 , k ∈ {1, ..., p} such that 1B ≤
p X
1B2k + 1B2k+1 − p
k=1 p
1G−1 (B) =
X
1G−1 (B2k ) + 1G−1 (B2k+1 ) − p,
k=1
hence P (B) − νG−1 (B) ≤
P2p+1 j=1
P (Bj ) − νG−1 (Bj ) ≤ (2p + 1) supB∈S P (B) − νG−1 (B).
But as 2p + 1 ≤ κ, we get P (B) − νG−1 (B) ≤ κ sup P (B) − νG−1 (B) B∈S
10
Theorem 3. (Minimax) For any probability measure Q, we have TV(Q, Core(ν, G)) = sup(Q(B) − ν[G−1 (B)]) B∈B
Proof of Theorem 3: That the left-hand side is larger than the right-hand side is proved identically to lemma 1 with B replacing C, and TV replacing TKS everywhere. The converse inequality is more difficult. Recall that by Theorem 1, sup(Q(B) − ν[G−1 (B)]) = B∈B
inf
²∼ν,X∼P
Pr {X ∈ / G (²)} ,
and the infimum is attained. Let (², X) be a pair that attains the infimum, and call π ∈ M(ν, P ) its distribution. We construct a partition of the domain U of ν (marginal distribution of ²) in bins Ik = [ak , bk ), k = 1, . . . , N such that b1 −a1 ≤ b2 −a2 = . . . = bN −aN . Denote the boundaries of Graph(G) by γ = sup G and γ = inf G Note that under assumption 3, the functions γ and γ are monotonic. If there are jumps in γ or γ, or points where one of the latter functions has infinite derivative, there is a finite number of them, and the partition is adjusted so that no point of jump or infinite derivative is in the interior of a bin. Let πk the distribution π conditioned on ² ∈ Ik , and let νk and Pk be the marginal distributions of πk . We can decompose πk into πku + πkl + πko , where πku is supported on Graph(γ), πkl is supported on Graph(γ), and πko is supported on U × Y\{Graph(γ) ∪ Graph(γ)}. We partition [ak , bk ) into [ak , ck ) ∪ [ck , dk ) ∪ [dk , bk ) such that ck − ak = πku (Ik × Y) bk − ak
bk − dk = πkl (Ik × Y) . bk − ak
and
We now consider three quantile mappings. First, there exist two increasing measurable mappings α : [ak , ck ] → [ak , bk ]
and
β : [dk , bk ] → [ak , bk ]
such that the distributions of γ u (α (²)) and γ l (α (²)) conditional on ² ∈ [ak , ck ) and ² ∈ [dk , bk ) respectively, are given by γ (α (²)) | ² ∈ [ak , ck )
∼
πku ([ak , bk ) × ·) πku ([ak , bk ) × Y)
γ (β (²)) | ² ∈ [dk , bk )
∼
πkl ([ak , bk ) × ·) πkl ([ak , bk ) × Y)
11
Finally, there exists an increasing measurable map γ o such that the distribution of γ o (²) conditional on ² ∈ [ck , dk ) is distributed according to γ o (²) | ² ∈ [ck , dk )
∼
πko ([ak , bk ) × ·) πko ([ak , bk ) × Y)
Define the function γk for all e ∈ U by γk (e) = 1{e∈[ak ,ck )} γ (α (e)) + 1{e∈[ck ,dk )} γ o (e) + 1{e∈[dk ,bk )} γ (β (e)) . Conditionally on ² ∈ [ak , bk ), γk (²) is distributed according to γk (²) | ² ∈ [ak , bk ) ∼ Pk . Define the distribution πkN by πkN (B) = νk {e ∈ Ik | (e, γk (e)) ∈ B}, for any B measurable in U × Y. By construction, πkN ∈ M (νk , Pk ). Define now π N as the distribution with conditionals πkN on Ik and let us show now that π N ∈ M (ν, P ). Indeed, the marginal distribution of π N with respect to ² is ν by construction. Denoting by (²N , X N ) a pair distributed according to π N and by (², X) a pair distributed according to π, we have ¡ ¢ ¡ ¢ ¡ ¢ P for any B measurable in Y, Pr X N ∈ B = k Pr ²N ∈ Ik Pr X N ∈ B | ²N ∈ Ik = ¡ −1 ¢ P P k Pr (² ∈ Ik ) νk γk (B) = k Pr (² ∈ Ik ) Pk (B) = P (B), so that the marginal distribution of π N with respect to X is P . Define now γ such that γ (²) = γk (²) for ² ∈ Ik , k = 1, . . . , N , and introduce γ˜ such that γ˜ (²) = γ (²) for γ (²) ∈ G (²) γ˜ (²) = γ0 (²) otherwise, where γ0 is an arbitrary measurable selection of G. By construction, γ˜ ∈ Sel (G), and γ˜ (²) 6= γ (²) implies γ (²) ∈ / G (²) . Thus, for (²N , X N ) distributed according to π N , ¡ ¡ ¢¢ ¡ ¢ª © ¡ ¢ ¡ ¢¢ ¡ ¡ ¢ / G ²N ≥ Pr γ˜ ²N 6= γ ²N = Pr X N 6= γ˜ ²N , Pr γ ²N ∈ so that ¡ ¢¢ ¡ ¡ ¢ ≥ / G ²N Pr γ ²N ∈
inf
inf
γ ˜ ∈Sel(G) ²∼ν,X∼P
12
Pr (X 6= γ˜ (²))
=
=
=
≥
=
inf
sup(P (B) − ν(˜ γ −1 (B)))
inf
TV(P, ν γ˜ −1 )
γ ˜ ∈Sel(G) B∈B
γ ˜ ∈Sel(G)
inf
Q∈{ν γ ˜ −1 : γ ˜ ∈Sel(G)}
TV(P, Q)
inf
Q∈WCCH{ν γ ˜ −1 : γ ˜ ∈Sel(G)}
inf
Q∈Core(ν,G)
(13)
TV(P, Q)
T V (P, Q)
(14)
(15)
= TV(P, Core(ν, G)), where (13) follows from Theorem 1 applied to γ˜ instead of G, WCCH in (14) refers to the weak closure of the convex hull, and (15) follows from corollary 3.4 of Castaldo, Maccheroni, and Marinacci (2004). Finally, still denoting by (², X) a pair distributed according to π, it remains to show that Pr (γ (²) ∈ / G (²)) converges to Pr (X ∈ / G (²)) when N → ∞ (note that (², γ(²)) is distributed according to π N , as the dependence in N derives from the construction of γ). By construction of γ, for all k, γ(²) ∈ G(²) when ² ∈ [ak , ck ) ∪ [dk , bk ), so that Pr (γ (²) ∈ / G (²)) =
N X
Pr (² ∈ [ck , dk )) Pr (γk (²) ∈ / G (²) | ² ∈ [ck , dk )) .
k=1
In addition, for each k, if we define Rk as the rectangle with edges (ak , γ(ak )), (bk , γ(ak )), (ak , γ(ak )) and (bk , γ(ak )), we have Pr(X ∈ / Rk | ² ∈ [ak , bk )) = Pr(γ(²) ∈ / Rk | ² ∈ [ck , dk )) since by construction, (², γ(²)) is distributed according to π N , which has the same marginal distributions as π, and is hence equal on a rectangle. Define further the set rk for each k as the union of the rectangle with edges (ak , γ(ak )), (bk , γ(ak )), (ak , γ(bk )) and (bk , γ(bk )) and the rectangle with edges (ak , γ(ak )), (bk , γ(ak )), (ak , γ(bk )) and (bk , γ(bk )). We have therefore |Pr (X ∈ / G (²)) − Pr (γ (²) ∈ / G (²)) | ≤
N X k=1
13
Pr(X ∈ rk | ² ∈ [ck , dk ))
→ Pr(X ∈ γ(²) ∪ γ(²) | ² ∈
N [
[ck , dk )),
k=1
which is zero by definition of γ. Hence the result is proven. ¤
Conclusion We have shown how a Kolmogorov-Smirnov type of procedure can be applied to testing an incomplete specification, that the test is uniform in level, and, unlike classical KolmogorovSmirnov tests of specification, has power against local alternatives defined in total variation. The proof of the latter relies on a minimax theorem, which allows the interpretation of our test in terms of distance between the empirical distribution and the set of distributions compatible with the null.
References Castaldo, A., F. Maccheroni, and M. Marinacci (2004): “Random sets and their distributions,” Sankhya (Series A), 66, 409–427. Chernozhukov, V., H. Hong, and E. Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 75, 1243–1285. Choquet, G. (1954): “Theory of capacities,” Annales de l’Institut Fourier, 5, 131–295. Galichon, A., and M. Henry (2006): “Inference in incomplete models,” Columbia University Discussion Paper 0506-28. Khamladze, E. (1981): “Martingale approach to the theory of goodness-of-fit tests,” Probability Theory and its Applications, 26, 240–257. Lehmann, E., and J. Romano (2005): Testing Statistical Hypotheses. Springer: New York. McFadden, D. (1989): “Testing for stochastic dominance,” in Studies in the Economics of Uncertainty (in honor of J. Hadar), Part II, T. Fomby and T. Seo, eds., pp. 113–134. Springer-Verlag: New York. Molchanov, I. (2005): Theory of Random Sets. Springer: New York.
14