A Least-Squares Estimator for Monotone Index Models Debopam Bhattacharya
y
Department of Economics, Dartmouth College. September 29, 2004.
Abstract In this paper, I propose a new technique for estimating the parameters (up to location and scale) of a monotone-index model, based on sorting the data. The key observation guiding this procedure is that the sum of distances between pairs of adjacent observations is minimized (over all possible permutations) when the observations are sorted by their values. I demonstrate that this estimator can be interpreted as a generalization of Cavanagh and Sherman’s Monotone Rank Estimator and can also be related to Ichimura’s semiparametric least-squares (SPLS), thereby p providing a connection between the two types of estimators. The proposed estimator is nconsistent, asymptotically normal with a consistently estimable covariance matrix. Unlike the SPLS, it does not require a subjective bandwidth choice but assumes a monotonicity condition which is weaker than that required by Han’s Maximum Rank Correlation estimator. Finally, I extend the analysis to cover monotone-index panel data models where every individual is observed over at least two periods. JEL Classi…cation Code: C1, C2
1
Introduction
This paper is concerned with the estimation of …nite-dimensional parameters for the well-known monotone-index model where the dependent variable, y is an unknown function of the index, x0
0
and an unobserved error term ", satisfying E (yjx) = g x0
0
Address for correspondence Department of Economics, 301 Rockefeller Hall, Dartmouth College, Hanover, NH 03755; e-mail:
[email protected] y I am grateful to Han Hong, Shakeeb Khan and Elie Tamer for helpful discussions and to Bo Honore for continuous support, help and encouragement. I am grateful to an anonymous individual for pointing out a limitation in a previous draft which helped me generalize the paper. All errors are still mine. Financial support from the Wilson fellowship at Princeton University is gratefully acknowledged.
1
where g (:) is monotone. Two
p
n-consistent estimators for
0
which exist in the literature are
Han’s (1987) maximum rank correlation (MRC) estimator which was generalized to the monotone rank estimator (MRE) by Cavanagh and Sherman (1998). In this paper, I propose a new class of estimators for this model which I will call monotone least-squares estimator (the MLSE) which are of the least-squares type, minimizing the distance between an ordered set of observations. Next, I demonstrate that the MLSE can be interpreted as a generalization of Cavanagh and Sherman’s MRE. Finally, I show that the MLSE can be related to Ichimura’s semiparametric least-squares (SPLS) in that a conceptually di¤erent estimate of the conditional mean function is used inside a least-squares p objective function. I further demonstrate that the MLSE is n-consistent, asymptotically normal and, unlike the SPLS, does not require a subjective bandwidth choice but assumes a monotonicity condition which is weaker than what is required for consistency of Han’s MRC. The main idea behind the minimum distance criterion is as follows. When one has the true 0;
sorting (arranging in ascending order) the data by x0
0
is ‘like’(i.e. but for the random error)
sorting the data by the true y 0 s: But when one sorts by x0 for of the true ordering of the
y 0 s:
6=
0;
one produces a permutation
Therefore, the sum of distances (measured by the sum of squared
di¤erences) between the adjacent y 0 s is likely to be the smallest when one has sorted by the true x0
0:
Therefore, minimizing this (ordered) sum of squares with respect to
will yield the true .
This class of estimators, like the MRE, does not require kernel-based estimation. The key condition for consistency of the estimator is that the expectation of the dependent variable, y conditional on the independent ones, x is a monotonic function of x0
0
for some
0:
This is the same assumption
as Cavanagh and Sherman and is weaker than Han’s and Abrevaya’s, who require that the entire distribution of y given x depend on x through x0
0.
Although our assumption is neither weaker
nor stronger than theirs, it is a property of the univariate distribution of y given x; that can be ‘tested’ and holds for a larger class of standard models. Cavanagh and Sherman (1998) provide a simple multiplicative heteroskedastic model where Han’s assumption does not hold but ours does. In Section 2.4, we brie‡y compare the three methods and show a further connection of the MLSE with Ichimura’s (1993) semiparametric least squares estimator. Section 2 outlines the model, discusses the motivation behind the estimation procedure, describes the estimation technique in detail and compares the estimator with three alternative estimators proposed previously for monotone-index models.1 Section 3 contains a lemma and the main theorem for consistency. Section 4 states the assumptions and theorem for asymptotic normality. Since that proof closely resembles Sherman’s proof for the maximum rank correlation estimators, I outline the basic steps and mention the modi…cations necessary in my case. In section 5, I suggest a method for estimating the covariance matrix consistently. Section 6 discusses the single-index, panel data model. 1
Di¤erent authors have assumed di¤erent conditions when they used the term "monotone index"- this distinction
is made clear in section 2.
2
Section 7 summarizes and concludes with directions for future research. All proofs are collected in the appendix.
2
Model & Estimation Procedure
2.1
Model
The model is given by y = h x0 supp(x) are unknown.
Rd ; 0
0
2B
0; "
Rd ; x is independent of ": The parameter
(1) 0
and the function h (:; :)
is the parameter of interest. h (:; :) is non-constant and weakly increasing in the
second component. It includes, as special cases, most common parametric models including those for mean and quantile regression2 , binary choice and models with censored and truncated dependent variables. The problem is to estimate
0
(up to scale and location normalization), given a sample of n
observations (yi ; xi ) which are i.i.d. from the above model.
2.2
Estimation procedure
We …rst describe the MLSE. Then we provide an intuition for why it ‘works’ and compare it with some of the competing estimators mentioned above. Let l; u denote the minimum and maximum possible value of y: If y is not of bounded support, we can transform y to
(y) where
(:) denotes the normal or any other known cdf and the transformed
model z
h x0 ; "
(y) =
still has the same structure as (1): Then l = 0 and u = 13 : First choose a
from the parameter space. Then pick m observations without replacement from
the original n observations where m ! 2 (m will stay …xed as n tends to in…nity for the asymptotics n of them. For each such m-tuple, order the m observations of the estimator): There will be m by the values of x0 and compute the sum of squared di¤erences between the adjacent y 0 s with l; u 2
For most of this paper, we shall work with the assumption E (Y jX) is a non-decreasing function of X 0
0:
This
does not apply to quantile regression models which assume that the conditional quantiles of Y given X depend on X through the index X 0 0 : For an analysis of single-index quantile estimation, see Khan (2001). 3 If Y is bounded wp 1, then our results depend only on the property that E (Y jX) is an increasing function of X 0 It is not necessary that the entire distribution of (Y jX) should depend on X through X 0
0:
0 : If however, Y cannot be
assumed to be bounded, then we need this transformation and require that E ( (Y ) jX) depends on X through X 0
and is monotonic in X 0 : The speci…cation (1) guarantees this.
3
subtracted respectively from the …rst and last (ordered) observation. Average across the m-tuplets and choose the
n
!
m that minimizes the …nal sum. Formally, consider the objective function:
Qn ( ) =
1 n m
!
X
Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; )
1 i1
where
=
Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; ) " m 1 X X 2 yjk yjk+1 + (yj1
fj1 ;:::jm g 2}fi1 :::im g
l)2 + (yjm
k=1
u)2
!
1 x0j1 < x0j2 < ::: < x0jm
#
and } fi1 :::im g denotes the set of permutation of the integers fi1 :::im g : For example, when m = 3; we have
Q3 (yi ; yj ; yk ; xi ; xj ; xk ; ) h i = (yi l)2 + (yj yi )2 + (yk yj )2 + (yk u)2 1 x0i < x0j < x0k h i + (yi l)2 + (yk yi )2 + (yj yk )2 + (yj u)2 1 x0i < x0k < x0j h i + (yj l)2 + (yi yj )2 + (yk yi )2 + (yk u)2 1 x0j < x0i < x0k h i + (yj l)2 + (yk yj )2 + (yi yk )2 + (yi u)2 1 x0j < x0k < x0i h i + (yk l)2 + (yi yk )2 + (yj yi )2 + (yj u)2 1 x0k < x0i < x0j h i + (yk l)2 + (yj yk )2 + (yi yk )2 + (yi u)2 1 x0k < x0j < x0i and Qn ( ) =
1 n 3
!
n X n X
n X
Q3 (yi ; yj ; yk ; xi ; xj ; xk ; )
i=1 j=i+1 k=j+1
the MLSE is then de…ned as ^ = arg min Qn ( ) 2B
4
2.3
Motivation
The main observation that motivates the MLSE is that if a < y1 < y2 < y3 < ::::: < ym < b are m real numbers and yi1 ; yi2 ; yi3 ; :::yim are a permutation of the y 0 s, then (y1 (yi1
a)2 +
m X
a)2 +
(yj
j=2 m X
yj
yik
2 1)
yik
2 1
b)2
+ (ym
b)2
+ (yim
j=2
This is most easily proved for m = 3; which we do below and is enough for suggesting an estimator. The proof works in the same way for m
3 and gives us a whole ‘class’of estimators.
Let a < y1 < y2 < y3 < b: Consider e.g. S = (y1
a)2 + (y2
y1 )2 + (y3
y2 )2 + (b
y3 )2
T
a)2 + (y2
y1 )2 + (y3
y1 )2 + (b
y3 )2
= (y2
So that T
S
= (y2
a)2 + (y3
= (y2
y1 )2 + (y1
+ (y3 (y1 = 2 (y2
y1 )2
a)2 + 2 (y2
y2 )2 + (y2 a)2
(y1
(y3
a)2
y1 ) (y1
y1 )2 + 2 (y2
y2 )2
(y3
a)
y1 ) (y3
y2 )
y2 )2
y1 )2 + 2 (y2
y1 ) (y1
a) + 2 (y2
y1 ) (y3
y2 )
0 since a < y1 < y2 < y3 < b: The intuition behind the estimation procedure is the following. When one has the true (arranging in ascending order) the data by one sorts by x0
for
6=
0;
x0
0
is ‘like’sorting the data by the true
y 0 s:
0;
sorting
But when
one produces a permutation of the true ordering of the y 0 s: Therefore,
from the previous derivation, the sum of squared di¤erences between the adjacent y 0 s is likely to be the smallest when one has sorted by the true x0
0.
5
2.4
Relation with other estimators
2.4.1
Cavanagh and Sherman’s MRE
I …rst show that this estimator can be viewed as a generalization of the MRE. Letting (n)m = n (n
1) (n
2) ::: (n
m + 1) recall that the objective function for the MLSE is given by X 1 Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; ) Qn ( ) = (n)m 1 i1
Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; ) " m 1 X X 2 = yjk yjk+1 + (yj1 fj1 ;:::jm g 2}fi1 :::im g
l)2 + (yjm
u)2
k=1
!
1 x0j1 < x0j2 < ::: < x0jm
#
.
Setting u = 1 and l = 0 w.l.o.g and expand the squares in Qm (:) to " m 1 ! # X X 0 0 0 yjk yjk+1 + yjm 1 xj1 < xj2 < ::: < xjm + terms not involving . fj1 ;:::jm g 2}fi1 :::im g
k=1
Thus, for m = 2, the objective function is equivalent to X 2 (yi yj + yj ) 1 x0i < x0j + (yj yi + yi ) 1 x0j < x0i n (n 1) i6=j
=
X 1 2 yj 1 x0i < x0j n (n 1)
+ yi 1 x0j < x0i
+ terms not dependent on
i6=j
'
X 1 yi 1 x0j < x0i n (n 1)
+ terms not dependent on
i6=j
which is the negative of Cavanagh and Sherman’s objective function4 with M (y) = y where "f ( ) ' g ( )" means f ( ) = cg ( ) + d where c > 0 and known and c; d do not depend on . For m = 3; the objective function is equivalent to n (n
1 1) (n
2)
X
Q3 (xi ; yi ; xj ; yj ; xk ; yk ) , where
k6=i6=j
Q3 (xi ; yi ; xj ; yj ; xk ; yk ) = (yj yi + yk yj + yk ) 1 x0i < x0j < x0k + (yk yi + yk yj + yj ) 1 x0i < x0k < x0j + (yj yi + yk yi + yk ) 1 x0j < x0i < x0k + (yj yk + yk yi + yi ) 1 x0j < x0k < x0i + (yj yk + yi yj + yi ) 1 x0k < x0j < x0i + (yk yi + yi yj + yj ) 1 x0k < x0i < x0j 4
I am grateful to an anonymous individual for pointing out that for m = 2, my estimator is identical to the MRE.
6
and so on. Moreover, for any known monotone function, M (:) satisfying E(M (y) jx) = g (x0 g 0 (:)
0, the above estimator can be generalized to (normalizing M (l) = 0 and M (1) = 1)
Qn ( ) =
1 (n)m
X
Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; )
1 i1
Qm (yi1 ; yi2 ; ::yim ; xi1 ; xi2 :::xim ; ) " m 1 X X = M (yjk ) M yjk+1 fj1 ;:::jm g 2}fi1 :::im g
'
0 ),
X
fj1 ;:::jm g 2}fi1 :::im g
2
2
+ (M (yj1 )) + (M (yjm )
k=1
"
M (yjm ) +
m X1
M (yjk ) M yjk+1
k=1
!
2
1)
!
1
1 x0j1 < x0j2 < ::: < x0jm
x0j1 #
<
x0j2
< ::: <
.
Thus the MLSE can be viewed as a generalization of the MRE, where one uses larger subsets of observations than pairs and uses not just the largest (i.e. y corresponding to the largest x0 ) of them as in the MRE but products of the successive pairs of largest observations for each choice of a subset. 2.4.2
Ichimura’s SPLS
It is interesting to see that there is a connection between the MLSE and Ichimura’s (1993) semiparametric least squares (SPLS) estimator which solves n
X ^ = arg min 1 yi 2B n
^ yjx0i E
2
(2)
i=1
^ (yjx0 ) is a consistent estimator of E (yjx0 ) : Note that if we order all observations by x0 where E i i i 0 0 ^ and take the squared di¤erences between the adjacent y s; we are e¤ectively replacing E (yjx ) in i
(2) by the one-sided 1-nearest neighbor in terms of
(x0
) of yi ;which can be viewed as a crude (but
simpler than a kernel-based) estimator for E (yjx0i ). However, unlike the SPLS, the MLSE does not require a subjective choice of bandwidth; on the other hand, the SPLS is consistent even without monotonicity, which is required for the MLSE to be consistent. 2.4.3
Abrevaya’s Leapfrog
A third available estimator was suggested by Abrevaya (2001) who maximized 1 n 3
!
X
(1 (yi > yj )
1 (yj > yk )) 1 (xi
xj )0
> (xj
xk )0
C3
where C3 represents all possible choices of distinct triplets from 1..n:As mentioned above, the key condition for consistency for the MLSE is the same as that for Cavanagh and Sherman’s and weaker 7
x0jm
#
from that for Han’s and Abrevaya’s (Han and Abrevaya require the entire distribution of y given x to be monotone in the index where as the MLSE and the MRE require only that E (yjx) is weakly monotonic in x0
0 ).
Note that the MLSE can be easily generalized to the case where E (M (y) jx) depends on x
through an index (not necessarily linear) function,
(x;
0) ;
where M (:) is a known monotonic increasing
(:; :) is a known function; also E (M (y) jx) is monotonic in the index. One simply sorts
by
(xi ;
0)
and replace yi by M (yi ) in the objective function.
3
Consistency
From now on, I shall work only for the estimator computed with m = 3, which is the smallest value of m for which the MLSE is di¤erent from the MRE. The proof for a general m is completely analogous but notationally messier. In what follows, I also indicate what the analogous steps will be for the general m. Assumptions The following assumptions will be used in the proof of consistency: (A0) E (yjx) depends on x only through the index x0 and non-decreasing) in x0
0
and is monotonic increasing (non-constant
0:
(A1) The support of x is not contained in a proper linear subspace of Rd : (A2) The d-th component of x has an everywhere positive Lebesgue density, conditional on the other components. (A3) B is a compact subset of
2 Rd :
d
=1 :
(A4a) Ey 2 < 1: (A4b) Ey 4 < 1: In order to prove consistency of the estimator, I shall use the following lemma. Lemma 1 Consider two observations (x1 ; y1 ) ; (x2 ; y2 ) ; (x3 ; y3 ) from the above model satisfying A0A4a. Let f (x1 ; x2 ; x3 ) be any non-negative (wp1) function de…ned on the support of x = (x1 ; x2 ; x3 ) :Then for all i 6= j 6= k 2 f1; 2; 3g,
n E f (x1 ; x2 ; x3 ) 1 x01 n E f (x1 ; x2 ; x3 ) 1 x01
0
< x02
0
< x02
0
< x03
0
< x03
0
(y1
l)2 + (y2
y1 )2 + (y3
y2 )2 + (y3
u)2
0
(yi
l)2 + (yj
yi )2 + (yk
yj )2 + (yk
u)2
8
o
o
Lemma 1 essentially translates the motivating observation of section 2.1 for real numbers into the analogous inequality for (conditional expectations of ) random variables. This lemma will be used repeatedly in proving the main consistency theorem5 . Theorem 1 Under assumptions A0-A3 and A4a, ^
0
= op (1)
Proof of consistency will be done through the following steps: 1. E (Qn ( )) = E (Q3 ( )) is uniquely minimized at 2. sup
2B
kQn ( )
=
0:
E (Qn ( ))k ! 0 in probability.
3. E (Q3 ( )) is continuous in :
Proofs of steps 2 and 3 and the proof of uniqueness in step 1 are analogous to those in Cavanagh and Sherman (1998). It is the proof of the minimization step that is substantially di¤erent and is worked out in detail in the appendix.
4
Asymptotic Normality
The proof of asymptotic normality is analogous to the proof of asymptotic normality of the maximum rank correlation estimator, proposed by Sherman (1993). Instead of repeating the entire proof therefore, we shall only point out the modi…cations to that proof that the MLSE warrants. We list the assumptions, describe the basic idea of the proof, and then refer to the relevant results in Sherman (1993, 1994) mentioning the modi…cations along the way. First note that the dth component of the parameter and its estimate are 1: Let De…ne Z = (Z1 ; Z2 ; Z3 )
=
( ) = ( ; 1).
f(x1 ; y1 ) ; (x2 ; y2 ) ; (x3 ; y3 )g and
(z; ) = E12 Q3 (:; :; z; ( )) + E13 Q3 (:; z; :; ( )) + E23 Q3 (z; :; :; ( )) where E13 Q3 (:; z; :; ) denotes the expectation of Q3 (Z1 ; Z2 ; Z3 ; ) taken with respect to the …rst and third arguments, keeping the second argument at z etc. The proof works by showing that the objective function is asymptotically equivalent to an empirical process, indexed by 5
For a general m; the statement of the lemma will be 8 9 = < f (x1 ; ::xm ) 1 (x01 0 < ::: < x0m 0 ) E Pm 2 2 2 : (y1 l) + j=2 (yj yj 1 ) + (ym u) ; 8 9 < = f (x1 ; ::xm ) 1 (x01 0 < ::: < x0m 0 ) E Pm 2 2 2 : (yi1 l) + j=2 yij yij 1 + (yim u) ;
for any permutation (yi1 ; ::yim ) of (y1 ; :::ym ) :
9
with kernel
(3)
(:; ). Minimization of this empirical process with respect to
yields the result. The outline of the
proof is sketched in the appendix. Assumptions (B1)
0
belongs to the interior of a compact subset of Rd
Letting N denote a neighborhood of
0
=(
1:
0 ; 1) ;
(B2a) For each z 2supp(x; y) ; all mixed partial derivatives of (B2b) There is an integrable function
(z; :) exist on N:
(z) such that for all z 2supp(x; y) ;
kr2 (z; )
r2 (z;
0 )k
(z) k
2 N;
0k
where all relevant norms are Euclidean and the partial derivatives, denoted by r are taken w.r.t. the …rst d
(B2c) E (B2d) E
Pd
1 j=1
@
@2 i@
(B2e) Er2 (:;
@ @
j
0)
j
(z;
1 components of :
(z; 0)
0)
2
<1
< 1 for all i; j
is positive de…nite
Theorem 2 Under assumptions A0-A3, A4b, B1, B2a-e, p n ^ 0 =)d (W; 0) where W follows N (0; ) with
V
= V 1 V; 1 = Er2 (:; ( 0 )) ; 3 = E r1 (:; ( 0 )) r1 (:; (
0 0 ))
The proof works by decomposing the objective function, a third-order U-statistic, according to the Hoe¤ding method and showing that the asymptotic properties of its minimizer are determined by an empirical process with kernel
V
(:). For a general m, the asymptotic variance will be
= V 1 V; 1 = Er2 (:; ( 0 )) ; m = E r1 (:; ( 0 )) r1 (:; (
where (z; ) =
m X
E12:::r
0 0 ))
1;r+1;:::m Qm (:::; z; :::;
( ))
r=1
denotes the expectation of Qm (:::; :; :::; ( )) w.r.t. all arguments except the rth one and the rth argument is z. 10
5
Asymptotic Covariance Matrix and its Estimation
5.1
Numeric derivatives
In this section we describe how to use numeric derivatives to compute a consistent estimator of V
1
V
1;
along the lines of section 7 of Sherman (1993). I will use the notation
changeably with
(z; ) inter-
(z; ( )). Now, since 1 V = Er2 (:; 3
0) ;
= E r1 (:;
0 ) :r1
(:;
0 0)
;
a natural estimate of these quantities amounts to replacing population expectations by their sample analogs and analytic derivatives by numeric ones. We refer to the proof of theorem 4 of Sherman to deduce the consistency of these estimates for their population counterparts. The method of construction now follows. Consider
n
n
XX 3 Q3 (y; x; yi ; xi ; yj ; xj ; ) n (n 1)
n (z; ) =
i=1 j=1; j6=i
Then E
n (z;
) = (z; )
and the sequence f
n (z;
)
(z; ) :
2 B; z 2 Sg
is a mean-zero U -process with the kernel function belonging to an Euclidean class, given that l < y < u << 1: Hence by Corollary 8 of Sherman (1994), sup 2B;z2S
k
n (z;
)
(z; )k = Op
1
1 <<
(4)
n3=2
Now, de…ne p^j (z; ) = p^jk (z; ) =
1 [ n (z; + "n ej ) "n " 1 n (z; + n ej + 2 n
n (z;
n (z;
+
)]
n ek ) n ek )
n (z;
+
n (z;
+
n ej )
)
where ej is the jth unit vector. Then by (4), p^j (z; ) = p^jk (z; ) =
1 1 1 [ (z; + "n ej ) (z; )] + Op "n "n n3=2 " # (z; + n ej + n ek ) (z; + n ej ) 1 2 n
+
(z;
1
2 Op n
1 n3=2 11
+
n ek )
+ (z; )
#
(5)
1 P Then if n 2 "n ! 1; given that ^ !
0
and that
(z; :) is di¤erentiable at P
p^j z; ^ ! rj (z; and therefore
0;
we have
0)
n
1X P p^j zi ; ^ ! E rj (z; n
0)
i=1
1
Similarly, if n 4
n
P ! 1; given that ^ !
0
and that
(z; :) is di¤erentiable at
P
p^jk z; ^ ! rjk (z; and therefore
0;
we have
0)
n
1X P p^jk zi ; ^ ! E rjk (z; n
0)
:
i=1
Therefore a suitable choice for "n would be n
1 4
and that for
would be n
1 6
.6 And our …nal
expression for the estimate of the variance are n
11X p^jk zi ; ^ 3n
V^jk =
i=1
n 1X p^j zi ; ^ n
^ jk = ^
= V^
i=1 1^
V^
1
n
1X p^k zi ; ^ n i=1
where the p^ (:)0 s are de…ned in (5).
6
Panel data
Consider a panel data model where the data are available for at least two periods for each individual. The model is given by yit = h
0 i ; xit 0 ; "it
(6)
where h (:; :; :) is increasing and non-constant in the second argument, "it is independent of xit for all i; t; given i; t ranges from 1 to Ti
2; i ranges from 1 to n: We think of
i
as a …xed e¤ect,
or equivalently, we condition our analysis on the realized values (in the sample) of a random e¤ect. 6
This comes from the consideration that p^j (z; ) =
which is minimized when "n
n
"n rjj
rj (z; ) 1 z; ~ + Op "n
1 4
12
1 p n
For the consistency of the estimator we propose, it is enough that E (yit jxit ; is monotone in x0it : We propose estimating
0
i)
= E (yit jx0it
by the following mechanism. First choose a
0;
i)
in
the parameter space (suitably normalized for identi…cation). Then for each individual i; sort the Ti observations (as t varies from 1 to Ti ), according to x0it : Compute the sum of squared di¤erences between the Ti observations for each i: Average this sum over i = 1; ::n and choose
to minimize
this average. Formally, letting l; u denote the minimum and maximum possible values of y, n
1X Qi ( ) where arg min 2B n
^ = arg min Gn ( ) 2B
i=1
Qi (yi1 ; ::yiTi ; xi1 ; ::xiTi ; ; =
X
fj1 ;:::jTi g 2}f1:::Ti g
TX i 1
yjk
i)
yjk+1
2
+ (yj1
l)2 + yjTi
u
2
k=1
!
1 x0j1 < x0j2 < ::: < x0jT
i
!
where } f1:::Ti g denotes the set of permutations of the Ti integers f1:::Ti g : Now, G ( ) EGn ( ) = 1 Pn i=1 EQi ( ) : Exactly by the same logic as in the consistency proof, EQi ( ) is maximized uniquely n at
0,
given
i:
The continuity and uniform convergence results are completely analogous to the
single-cross-section case described above. But now, our objective function is no longer a U-process and resembles the objective function of Manski’s maximum score estimator. Therefore it is no longer p n-consistent and one may replace the indicator functions in the objective function by smooth alternatives, along the lines of Horowitz’s smoothed maximum score estimator, to get arbitrarily p close to n-consistency. Abrevaya (1999) considers the model h (yit ) = x0it + where h (:) is strictly increasing,
i
is a …xed e¤ect,
i
+ t
t
+ "it
(7)
is a time-e¤ect. He suggests estimating
by
the so-called ‘leap-frog’estimator, which maximizes 1 n(n
1)
X i6=j
sign (4xi
4xj )0
(1 (yi2 > yj2 )
1 (yi1 > yj1 ))
The attractive features of this estimator are that it eliminates both the individual-speci…c and the p period-speci…c e¤ects and is n-consistent. However, the strict monotonicity of h (:) rules out many interesting and common parametric models with limited dependent variables, as in binary choice. Moreover, linear structure of the right hand side is crucial for this estimator to work. While the MLSE will not be consistent in presence of a time-e¤ect, it can handle models with limited dependent variables and does not require the linearity between x0it ; time e¤ect, (7) implies (6).
13
i ; "it
in (7) Note also that without the
7
Conclusion
A possible extension of this is to investigate if non-monotonic models can be estimated by ordering either m-tuples of observations, or, what is more likely, ordering ALL observations. This is motivated by the apparent similarity between our objective function and Ichimura’s SPLS. I am currently investigating this direction.
References [1] Abrevaya, J. (2001): “Pairwise-di¤erence rank estimation of the transformation model,”mimeo. [2] Abrevaya, J. (1999): “Leapfrog estimation of a …xed e¤ect model with an unknown transformation of the dependent variable,”Journal of Econometrics, 93, 203-228. monotonic index models,” Journal of Econometrics, 84, 351-381. [3] Cavanagh,C. and Sherman,R.P. (1998): “Rank estimators for monotonic index models,”Journal of Econometrics, 84, 351-381. [4] Han, A.K. (1987): “Non-parametric analysis of a generalized regression model,” Journal of Econometrics, 35, 303-316. [5] Horowitz, J.L. (1992): “A Smoothed Maximum Score Estimator for the Binary Response Model,” Econometrica, 1992, vol. 60, issue 3, 505-31. [6] Ichimura, H. (1993): “Semiparametric least squares (SLS) and weighted least squares estimation of single index models,” Journal of econometrics, 58, 71-120. [7] Khan, Shakeeb. (2001): “Two-stage rank estimation of quantile index models,” Journal of Econometrics, 100, 319-356. [8] Ser‡ing, R.J. (1980): Approximation theorems of mathematical statistics. New York: Wiley. [9] Sherman, R.P. (1994): “Maximal Inequalities for Degenerate U-processes with applications to optimization estimators,”Annals of Statistics, 22, 439-459. [10] Sherman, R.P. (1993): “The limiting distribution of the maximum rank correlation estimator,” Econometrica, 61, 123-137.
8
APPENDIX
Proof of Lemma 1: 14
Consider three observations (x1 ; y1 ) ; (x2 ; y2 ) ; (x3 ; y3 ) from the above model satisfying A0-A4a. Let f (x1 ; x2 ; x3 ) be any non-negative w.p.1 function de…ned on the support of x = (x1 ; x2 ; x3 ) :Then for all i 6= j 6= k 2 f1; 2; 3g,
n E f (x1 ; x2 ; x3 ) 1 x01 n E f (x1 ; x2 ; x3 ) 1 x01
Proof. Let Z = 1 (x01
0
< x02
0
< x02
0
< x02
0
< x03
0
< x03
0
< x03
0)
0
(y1
l)2 + (y2
y1 )2 + (y3
y2 )2 + (y3
u)2
0
(yi
l)2 + (yj
yi )2 + (yk
yj )2 + (yk
u)2
o
and consider w.l.o.g the permutation of f2; 1; 3g : Then we
have
l)2 + (y2
E f Z (y2
o
y1 )2 + (y3
y1 )2 + (y3
u)2
E f Z (y1 l)2 + (y2 y1 )2 + (y3 y2 )2 + (y3 u)2 h n oi = E f Z (y2 l)2 + (y3 y1 )2 (y1 l)2 (y3 y2 )2 The term inside f:g equals
L=
2y2 l
2y1 y3 + 2y1 l + 2y2 y3 :
Taking expectations, conditional on x1 ; x2 ; x3 E (Ljx1 ; x2 ; x3 ) =
2g x02
0
=
2g x02
0
= 2 g x03
0
l
2g x01
0
g x03
0
l l
Note that when Z = 1; we have x01 that l < g (x0i
0)
g x02 0
g x03
0
+ 2g x01
+ 2g x01 g x01
0
< x02
0
< x03
l + 2g x02
g x03
l
0
0
0
0
l
g x02
0 0:
Given that g (:) is monotone increasing and
g x01
0
0
0 w.p. 1
Therefore, E f Z (y2
l)2 + (y2
y1 )2 + (y3
y1 )2 + (y3
u)2
E f Z (y1
l)2 + (y2
y1 )2 + (y3
y2 )2 + (y3
u)2
Proof of theorem 1: Proof. Step 1 Minimization
15
0
0
for i = 1; 2; 3 w.p. 1 in x; we have g x03
g x03
Now, EQ3 ( ) h i = E (yi l)2 + (yj yi )2 + (yk yj )2 + (yk u)2 1 x0i < x0j < x0k h i +E (yi l)2 + (yk yi )2 + (yj yk )2 + (yj u)2 1 x0i < x0k < x0j h i +E (yj l)2 + (yi yj )2 + (yk yi )2 + (yk u)2 1 x0j < x0i < x0k h i +E (yj l)2 + (yk yj )2 + (yi yk )2 + (yi u)2 1 x0j < x0k < x0i h i +E (yk l)2 + (yi yk )2 + (yj yi )2 + (yj u)2 1 x0k < x0i < x0j h i +E (yk l)2 + (yj yk )2 + (yi yk )2 + (yi u)2 1 x0k < x0j < x0i
Now write the …rst term as the sum of 6 terms, each term corresponding one ordering of x01
0 0 0 ; x2 0 ; x3 0
as T1: = E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x01
0
< x02
0
< x03
0
+E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x01
0
< x03
0
< x02
0
+E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x02
0
< x01
0
< x03
0
+E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x02
0
< x03
0
< x01
0
+E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x03
0
< x01
0
< x02
0
+E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x03
0
< x02
0
< x01
0
= T11 + T12 + T13 + T14 + T15 + T16 ; say. Let S1: = E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x01
0
< x02
0
< x03
0
+E
(y1
l)2 + (y3
y1 )2 + (y2
y3 )2 + (y2
u)2 1 x01 < x02 < x03
1 x01
0
< x03
0
< x02
0
+E
(y2
l)2 + (y1
y2 )2 + (y1
y3 )2 + (y3
u)2 1 x01 < x02 < x03
1 x02
0
< x01
0
< x03
0
+E
(y2
l)2 + (y3
y2 )2 + (y1
y3 )2 + (y1
u)2 1 x01 < x02 < x03
1 x02
0
< x03
0
< x01
0
+E
(y3
l)2 + (y1
y3 )2 + (y2
y1 )2 + (y2
u)2 1 x01 < x02 < x03
1 x03
0
< x01
0
< x02
0
+E
(y3
l)2 + (y3
y2 )2 + (y2
y1 )2 + (y1
u)2 1 x01 < x02 < x03
1 x03
0
< x02
0
< x01
0
= S11 + S12 + S13 + S14 + S15 + S16 ; say. 16
Lemma 1 with f (x1 ; x2 ; x3 ) = 1 x01 < x02 < x03 T1j so that T1:
implies
S1j for j = 1; :::6
(8)
S1: .
Similarly, h T2: = E (y1
l)2 + (y3
y1 )2 + (y2
y3 )2 + (y2
i u)2 1 x01 < x03 < x02
= E
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01 < x03 < x02
S2:
(y1
+E
l)2 + (y3
(y1
2
y1 )2 + (y2 2
y3 )2 + (y2 2
1 x01
u)2 1 x01 < x03 < x02 2
0
< x02 0
< x03
0
< x02
0
(y2
l) + (y1
y2 ) + (y1
y3 ) + (y3
u)
1 x01 < x03 < x02
1 x02
0
< x01
0
< x03
0
+E
(y2
l)2 + (y3
y2 )2 + (y1
y3 )2 + (y1
u)2 1 x01 < x03 < x02
1 x02
0
< x03
0
< x01
0
+E
(y3
l)2 + (y1
y3 )2 + (y2
y1 )2 + (y2
u)2 1 x01 < x03 < x02
1 x03
0
< x01
0
< x02
0
+E
(y3
l)2 + (y3
y2 )2 + (y2
y1 )2 + (y1
u)2 1 x01 < x03 < x02
1 x03
0
< x02
0
< x01
0
S6 :.Therefore, 6 X
E (Q3 ( ))
S :.
=1
Note that S
1
= E
(y1
l)2 + (y1
y2 )2 + (y2
y3 )2 + (y3
u)2 1 x01
0
< x02
0
< x03
0
S
2
= E
(y1
l)2 + (y3
y1 )2 + (y2
y3 )2 + (y2
u)2 1 x01
0
< x03
0
< x02
0
(y3
l)2 + (y3
y2 )2 + (y2
y1 )2 + (y1
u)2 1 x03
0
< x02
0
< x01
0
=1
6 X
0
+E
and so on up to T6:
=1 6 X
< x03
1 x01
= S21 + S22 + S23 + S24 + S25 + S26 ; say.
6 X
0
::: S
6
= E
=1
so that 6 X =1
This proves that
0
S
:
=
6 X
S
1+
=1
= E (Q2 (
6 X =1
0 )) .
minimizes E(Q2 ( )).
Uniqueness 17
S
2 + ::: +
6 X =1
S
6
Analogous to Cavanagh and Sherman (1998) pages 352-4. Step 2: Uniform convergence Step 2, viz. the uniform convergence of the objective function to its expectation follows from the procedure in Sherman (1994) for general U-processes, given the Euclidean properties of the underlying class of functions. Step 3: Continuity We want to show for any sequence Consider a sequence
k
!
k
! ; E (Q3 (
k ))
as k ! 1:Consider
! E (Q3 ( )) :
t ( k) t ( ) h i = (y1 l)2 + (y1 y2 )2 + (y2 y3 )2 + (y3 u)2 1 x01 k < x02 k < x03 h i (y1 l)2 + (y1 y2 )2 + (y2 u)2 + (y3 u)2 1 x01 < x02 < x03 h i = (y1 l)2 + (y1 y2 )2 + (y2 y3 )2 + (y3 u)2 1 x01
For
k
k
< x02
k
< x03
k
1 x01 < x02 < x03
k
lying in a small enough neighborhood of ; 1 x01
k
< x02
k
< x03
k
and 1 x01 < x02 < x03
must be either both 0 or both 1 wp1. Suppose not. Then, for instance, we shall have with positive probability x01
k
< x02
k
and x01
x02
i.e.
Taking limits as k ! 1; which can hold i¤ (x1 wp1, t ( imply
k)
x2 )0
(x1
x2 )0
0 > (x1
x2 )0
(x1
x2 )0
0
x2 )0
(x1
k
= 0, a zero probability event under assumption A2. It follows that
t ( ) ! 0. Now assumption A4, together with the dominated convergence theorem E (t (
k)
t ( )) ! 0.
A similar argument hold for the other terms in Q3 ( ). Outline of proof of theorem 2: Proof. Analogous to the proof of theorem 4 of Sherman (1993), one can write the criterion function as n(
) =
1 ( 2 +op k
0 0) V
(
0k
0) 2
18
1 +p ( n
+ op n
3=2
0 0 ) Wn
where Wn =
n p 1X n r1 (zi ; n
0)
i=1
V
= E fr2 (z;
0 )g .
The following steps outline the main argument. n(
1
)
X
!
n 3
Q3 (zi ; zj ; zk ; )
i6=j6=k
n
=
1X (zi ; ( )) n
2E (Q3 ( ( ))) + Un(3) h (zi ; zj ; zk ; )
i=1
(3)
where Un h (zi ; zj ; zk ; ) is a degenerate U-process indexed by
with kernel
h (zi ; zj ; zk ; ) X 1 = Q3 (zi ; zj ; zk ; ) ! n i6=j6=k 3
(E12 Q3 (:; :; zi ; ( )) + E13 Q3 (:; zj ; :; ( )) + E23 Q3 (zk ; :; :; ( ))) +2E (Q3 ( ( ))) . Using properties of degenerate U-processes as in Sherman (1994, corollary 8), one can show that this degenerate U-process is small (op n
3=2
) uniformly in . Thus we have that n
n( ) '
1X (zi ; ( )) n
2E (Q3 ( ( )))
(9)
i=1
Using a second-order Taylor expansion of (:; ) around (:;
0)
and using the assumptions to bound
the remainder terms we have that n
1X (zi ; ( )) ' ( n i=1
1 + ( 2
n
1X ) r (zi ; ( 0 n 0
0 ))
i=1
n
0
0)
1X r2 (zi ; ( n i=1
!
0 ))
(
0)
+ terms that do not depend on
0 E (Q3 ( ( ))) ' E (Q3 ( ( 0 ))) + ( 0 ) rE (Q3 ( ( 0 ))) 1 0 + ( 0 ) (r2 E (Q3 ( ( 0 )))) ( 0) 2 1 0 = ( 0 ) (r2 E (Q3 ( ( 0 )))) ( 0 ) + terms that do not depend on 2
19
(10)
where the last line uses the fact that
0
minimizes E (Q3 ( ( ))). From (9) and (10), we have that
n
n(
1X r (zi ; ( 0) n 0
) = (
0 ))
+
i=1
2
1 ( 2
0 0 ) (r2 E
(Q3 ( (
1 ( 2
0 0 ) (E
0 )))) (
0)
fr2 (zi ; (
0 ))g) (
0)
+ terms that do not depend on
Using the fact that 1 E (Q3 ( ( ))) = E f (z; ( ))g 3 we have that n
n(
) = (
1X ) r (zi ; ( 0 n 0
0 ))
+
i=1
11 ( 32
0 0 ) (E
+terms that do not depend on n X 1 0 1 = ( ) r (zi ; ( 0 )) + ( 0 n 2
0 0) V
fr2 (zi ; (
(
0 ))g) (
0)
0)
i=1
where
n
1 X Wn = p r (zi ; ( n
0 )) ; V
i=1
The result follows by noting that the maximizer of p1 n
(
is simply
0 0 ) Wn 0
n(
1 = E fr2 (zi ; ( 3
0 ))g .
) and the maximizer of 12 (
0 0) V
(
0 )+
di¤er by at most an op (1) term (theorem 2, Sherman, 1993). The latter maximizer
p1 V n
1W
n.
d
Since Wn ! N (0; ) by an ordinary CLT, this latter maximizer and
therefore the former maximizer, viz. ^ satis…es p
n (^
0)
d
! N 0; V
20
1
V
1
.