Inference for ordered parameters in multinomial ...

Viewer
Transcript

Inference for ordered parameters in multinomial distributions Shifeng Xiong,

Guoying Li∗

Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100080, China

Abstract This paper discusses inference for ordered parameters of multinomial distributions. We first show that the asymptotic distributions of their maximum likelihood estimators (MLEs) are not always normal and the bootstrap distribution estimators of the MLEs can be inconsistent. Then a class of weighted sum estimators (WSEs) of the ordered parameters is proposed. Properties of the WSEs are studied, including their asymptotic normality. Based on the results, large sample inferences for smooth functions of the ordered parameters can be made. Especially, the confidence intervals of the maximum cell probabilities are constructed. Simulation results indicate that this interval estimation performs much better than the bootstrap approaches in the literature. Finally, the above results for ordered parameters of multinomial distributions are extended to more general distribution models. Keywords: multinomial distribution, ordered parameters, weighted sum estimator, asymptotic normality

1

Introduction

Let X n = (Xn1 , . . . , Xnk ) be the cell frequencies from a multinomial distribution, M N Dk (n; p), Pk Pk where i=1 Xni = n, with the parameter p = (p1 , . . . , pk ), pi > 0, i = 1, . . . , k and i=1 pi = 1.

Denote the ith largest component of p by p[i] and let p[·] = (p[1] , · · · , p[k] ), which are called ordered

parameters of M N Dk (n; p). Inference for p[·] often arises in practice. For example, Ethier (1982) applied inference for p[1] to the problem of testing for favorable numbers on a roulette wheel; and Gelfand et al. (1992) analyzed a crime data set using interval estimation of p[1] . Recently, Li (2006) ∗ Corresponding

author. Tel: +86-10-62651425; fax: +86-10-82568364.

E-mail address: [email protected]

1

discussed estimation for gene expression profile based on EST (Expressed Sequence Tag) samples. The corresponding statistical issue is to estimate the ordered parameters of a multinomial distribution with an unknown k. Furthermore, there are some other statistical problems which are related to inference for p[·] , for example, selecting the most or least likely event (Alam and Thompson, 1972) and the study of qualitative dispersion (Dykstra et al. 2002), etc. In more general models, inference for ordered parameters is also useful in both theoretical and practical problems. Let θ1 , . . . , θk be (part of) the parameters of a statistical model and θ[1] > · · · >

θ[k] be the ordered parameters. Silvapulle and Sen (2005) presented many examples, in which one

needs to answer whether H : θ1 < 0, . . . , θk < 0 (i.e. θ[1] < 0) holds. Also, multiple comparisons with the best (Hsu, 1996) and intrinsic diversity profiles (Marcheselli, 2000; Marcheselli, 2003) are essentially the inference problems for θ[1] , . . . , θ[k] . Little literature discusses inference for ordered parameters in general models. However, several studies have been done regarding the inference for p[1] or p[k] in multinomial distribution, including testing hypotheses for p[1] (Ethier 1982, Xiong and Li 2005); point estimation and interval estimation for p[1] (Gelfand et al. 1992); interval estimation for p[1] and p[k] (Glaz and Sison 1999). The inferences in the above papers are mostly based on the MLE of p[1] or p[k] . However, as shown in Section 2, the MLE of p[1] (p[k] ) will overestimate (underestimate) it on average. Moreover, if p[1] = p[2] (p[k−1] = p[k] ), the MLE of p[1] (p[k] ) is not asymptotic normal, and the bootstrap distribution estimator for the MLE of p[1] (p[k] ) is not consistent. Thus, the performance of the bootstrap approaches for the interval estimation of p[1] (p[k] ) is unstable (see Gelfand et al. 1992, Glaz and Sison 1999 and Section 4 below). In this paper, we continue to discuss the inference for functions of p[·] . In Section 3, a class of weighted sum estimators (WSE) for p[·] is proposed. The properties of the WSE including strong consistency and asymptotic normality are proved. Based on these results, large sample inferences for smooth functions of ordered parameters can be made easily in one-sample cases and two-sample cases. Especially, the confidence intervals of p[1] can be constructed. Section 4 is devoted to simulation studies. The proposed method for interval estimation of p[1] is compared with two bootstrap approaches (Glaz and Sison 1999). Section 5 extends the results in Section 2 and Section 3 to more general cases and provides their proofs. Therefore our method for ordered parameters can be used in many distribution models.

2

Discussion of the MLE All theorems in this section and the next section are special cases of corresponding theorems

in Section 5. Hence there is no need to present the proofs of these results here. Define Xn[1] > · · · > Xn[k] to be the order statistics of X n . It is obvious that the MLE of p[·]

ˆ n[·] = (ˆ is p pn[1] , . . . , pˆn[k] ) = (Xn[1] /n, . . . , Xn[k] /n). Now we investigate properties of the MLE.

Firstly, we study the biases of pˆn[1] and pˆn[k] . It is easy to know that the functions f1 (x1 , . . . , xk ) = max16i6k xi and f2 (x1 , . . . , xk ) = min16i6k xi are convex and concave, respectively. By Jensen’s 2

inequality, we have E

X[1] > p[1] , n

E

X[k] 6 p[k] . n

Gelfand et al. (1992), Glaz and Sison (1999) have pointed out the first inequality. Actually, the two inequalities hold strictly for all p. Theorem 2.1. Eˆ pn[1] > p[1] ; Eˆ pn[k] < p[k] .

Usually, p[1] and p[k] are of most interest among all ordered parameters p[i] , i = 1, . . . , k. From this theorem we see that the MLE of p[1] (p[k] ) overestimates (underestimates) the true parameter on average. ˆ n[·] . Obviously it is strong consistent. Secondly, we discuss asymptotic properties of the MLE p However, asymptotic normality does not always hold for it. Given a vector y = (y1 , . . . , yk ), denote the distinct elements of y by {¯ y1 , . . . , y¯l }, where y¯1 > · · · > y¯l . Let {ri1 , . . . , riki } = {j : yj = y¯i },

where ri1 < · · · < riki , i = 1, . . . , l. For the given y, define the function hy : Rk 7−→ Rk to

˜ l ), where the components of the vector x ˜ i are the nonincreasing be hy (x1 , . . . , xk ) = (˜ x1 , . . . , x

rearrangement of xri1 , . . . , xriki , i = 1, . . . , l. Theorem 2.2. √ L n(ˆ pn[·] − p[·] ) −→ hp (ξ), L where “−→” denotes “converges in distribution”, ξ ∼ N 0, pi (δij − pj ) k×k and δij = 1 for i = j; δij = 0 for i 6= j.

The above theorem indicates that the limit distribution of pˆn[i] is not always normal. For √ example, if k1 = #{i : pi = p[1] } > 1, then n(ˆ pn[1] −p[1] ) converges in distribution to max16i6k1 ζi , which does not have a normal distribution, where (ζ1 , . . . , ζk1 ) ∼ N 0, p[1] (δij − p[1] ) k1 ×k1 .

Note that the distribution depends on k1 , which is usually unknown and difficult to estimate as a quantity of dimension (Xiong and Li 2005). Therefore, it is difficult to construct tests and interval estimation for p[1] directly using the asymptotic distribution of pˆn[1] except testing H0 : p = (1/k, · · · , 1/k) ↔ H1 : p 6= (1/k, · · · , 1/k). It is worth noting that the distribution of pˆn[1] in the case of p = (1/k, . . . , 1/k) was studied by some authors. Yusas (1972) discussed the bounds

of the distribution function of pˆn[1] with any fixed n and k. Kolchin et al. (1978, Section II.6) derived the asymptotic distribution of pˆn[1] as n, k → ∞. Their results cannot be applied to make inference for all possible p[1] , either.

Finally, we show that the bootstrap distribution estimators of the MLEs for p[i] , i = 1, . . . , k, are not always consistent, i.e. they do not always tend to the limit distributions of the MLEs. The following example shows that when #{i : pi = p[1] } > 1, the bootstrap distribution estimator of 3

pˆn[1] is inconsistent. General results can be obtained similarly. ∗ ∗ Suppose that (Xn1 , Xn2 ) ∼ M N D2 n; (1/2, 1/2) . If (Xn1 , Xn2 ) is a bootstrap Xn1 Xn2 sample from M N D2 n; ( n , n ) , then for each x ∈ R,

Example 2.1.

P

∗ √ Xn[1] Xn[1] n( − ) 6 x | Xn1 , Xn2 n n

L −→ I(Z > −2x or Z < 2x) Φ(2|Z| + 2x) + Φ(2x) − 1 , where Z ∼ N (0, 1), and Φ is the distribution function of Z. Proof. For each x ∈ R,

∗ √ Xn[1] Xn[1] n( − ) 6 x | Xn1 , Xn2 n n ∗ √ X∗ √ Xn1 Xn[1] Xn[1] − ) 6 x, n( n2 − ) 6 x | Xn1 , Xn2 P n( n n n n √ X∗ √ X∗ √ Xn1 Xn1 Xn2 Xn2 P n( n1 − ) 6 x, n( n2 − ) 6 x + n( − ) | Xn1 , Xn2 n n n n n n ·I(Xn1 > Xn2 ) ∗ √ X∗ √ Xn2 Xn1 Xn1 √ Xn2 Xn2 + P n( n1 − ) 6 x + n( − ), n( − ) 6 x | Xn1 , Xn2 n n n n n n ·I(Xn1 < Xn2 ).

P

= =

√ ∗ From Shao and Tu (1995, P.80), given (Xn1 , Xn2 ), the conditional distribution of n(Xn1 /n − √ √ ∗ n(Xn1 /n − Xn1 /n), n(Xn2 /n − Xn2 /n) converges almost surely to the limit distribution of √ 1/2), n(Xn2 /n − 1/2) , i.e. the distribution of (Z/2, −Z/2). Denote the distribution function of (Z/2, −Z/2) by F (·, ·). For all u, v ∈ R,

F (u, v) = P(Z/2 6 u, −Z/2 6 v) = Φ(2u) + Φ(2v) − 1 I(u + v > 0), then F is a continuous function. Therefore, ∗ √ Xn1 √ X∗ √ Xn1 Xn1 Xn2 Xn2 P n( − ) 6 x, n( n2 − ) 6 x + n( − ) | Xn1 , Xn2 n n n n n n √ Xn1 Xn2 − F x, x + n( − ) → 0 almost surely; n n ∗ ∗ √ Xn1 √ Xn2 Xn1 Xn1 √ Xn2 Xn2 P n( − ) 6 x + n( − ), n( − ) 6 x | Xn1 , Xn2 n n n n n n √ Xn2 Xn1 − F x + n( − ), x → 0 almost surely. n n

Thus, the Rn below tends to 0 almost surely, and we have

∗ √ Xn[1] Xn[1] n( − ) 6 x | Xn1 , Xn2 n n √ Xn1 √ Xn2 Xn2 Xn1 F x, x + n( − ) I(Xn1 > Xn2 ) + F x + n( − ), x I(Xn1 < Xn2 ) n n n n + Rn

P

=

L

−→ F (x, x + Z)I(Z > 0) + F (x − Z, x)I(Z < 0) =

I(Z > −2x or Z < 2x) Φ(2|Z| + 2x) + Φ(2x) − 1 . 4

Gelfand et al. (1992), Glaz and Sison (1999) derived bootstrap confidence intervals for p[1] using bootstrap distribution estimators of pˆn[1] or the statistics being asymptotically equivalent to pˆn[1] . From the above discussion we see that such bootstrap intervals are inconsistent when #{i : pi = p[1] } > 1 . Remark 2.1. It is difficult to make large sample inference for p[1] (p[k] ) directly using the asymptotic properties of its MLE. However, in small sample cases, we can use its MLE to construct tests for p[1] (p[k] ) by application of some inequalities (see Ethier 1982, Xiong and Li 2005).

3

Weighted sum estimator

From the above section, the MLEs are not good estimators for the ordered parameters. Xiong Pˆn and Li (2005) considered the estimator pˆ[1]n = ki=1 Xn[i] /kˆn for p[1] , where kˆn is a consistent

estimator for #{i : pi = p[1] }, and proved its asymptotic normality. But this estimator has some shortcomings. First, it is not easy to obtain a good estimator for #{i : pi = p[1] }. Second, Not

using all Xn[i] , i = 1, . . . , k, may cause a loss of efficiency because P(Xni0 = Xn[i] ) > 0 for each i = 1, . . . , k, where i0 ∈ {1, . . . , k} and pi0 = p[1] . Third, this estimation method is difficult to be extended to estimation for p[i] , i = 2, . . . , k − 1.

In this section, we propose a class of new estimators for p[i] , i = 1, . . . , k, which are weighted sums of Xn[1] , . . . , Xn[k] and call them weighted sum estimators, WSEs for short. The WSE, p˜n[i] of p[i] , is as follows. If Xn[i] = 0, let p˜n[i] = 0; otherwise, let p˜n[i] = where [i] wnj

Pk

[i]

wnj Xn[j] , i = 1, . . . , k, Pk [i] n j=1 wnj j=1

  ( Xn[i] )λn , j < i; Xn[j] =  ( Xn[j] )λn , j > i, Xn[i]

and {λn } is a sequence of positive constants. In the rest of this section, we discuss properties of the WSEs.

Note that p˜n[1] 6 pˆn[1] and p˜n[k] > pˆn[k] . The bias of p˜n[1] and p˜n[k] is usually smaller than that of the MLEs. Now we show that (˜ pn[1] , . . . , p˜n[k] ) and (p[1] , . . . , p[k] ) are of the same order. Theorem 3.1. p˜n[1] > · · · > p˜n[k] , and p˜n[i] = p˜n[i+1] if and only if Xn[i] = Xn[i+1] , i = 1, . . . , k − 1.

˜ n[·] . Next we discuss its consistency and asymptotic normality. Denote (˜ pn[1] , . . . , p˜n[k] ) by p ˜ n[·] → p[·] almost surely. Theorem 3.2. If λn → ∞, (n → ∞), then p 5

Let {¯ p1 , . . . , p¯l } = {p1 , . . . , pk } and {ri1 , . . . , riki } = {j : pj = p¯i }, where p¯1 > · · · > p¯l and

ri1 < · · · < riki , i = 1, . . . , l.

Theorem 3.3. Let ξ = (ξ1 , . . . , ξk ) be the random vector given in Theorem 2.2. Suppose that √ as n → ∞, λn / n → 0 and for each c > 0, n/ecλn → 0. Then √ L n(˜ pn[·] − p[·] ) −→ (ξ˜1 , . . . , ξ˜l ),

where

ξri1 + . . . + ξriki ξri1 + . . . + ξriki ,..., ki ki distributed according to a ki -dimensional normal distribution, i = 1, . . . , l. ξ˜i =

Remark 3.1. The conditions in Theorem 3.3 can be easily satisfied. For example. we can set λn = cnd , where c > 0 and d ∈ (0, 1/2) are constants. In the rest of this section, we assume that

{λn } satisfies these conditions.

From Theorem 3.3, for any smooth function of p[·] , we can obtain its estimator with asymptotic normality. Corolarry 3.1. Denote ui = #{j : pj = p[i] }, i = 1, . . . , k. Then   0, p1 = · · · = pk = 1/k; √ L n(˜ pn[i] − p[i] ) −→ i = 1, . . . , k. 2  N (0, p[i] −ui p[i] ), otherwise, ui Remark 3.2. From the proof of Theorem 5.5, which is an extension of Theorem 3.3, is a consistent estimator of ui . Thus, we can take 2 σ ˆni =

Pk

j=1

[i]

wnj

p˜n[i] (1 − uˆni p˜n[i] ) u ˆni

as an estimator for the asymptotic variance, where  Pk [i]  Pk w[i] , 1 − p˜n[i] j=1 wnj > 0; j=1 nj u ˆni =  max{j : j is a positive integer with 1 − j p˜n[i] > 0}, otherwise.

Therefore if p 6= (1/k, . . . , 1/k),

√ n(˜ pn[i] − p[i] ) L −→ N (0, 1). σ ˆni

Following Glaz and Sison (1999), define r(p) = p[1] − p[k] and ζ(p) = p[1] /p[k] , which can be

used to measure the dispersion of p. For them, we have the two results below. Corollary 3.2.

 L  0, p1 = · · · = pk = 1/k; √ n (˜ pn[1] − p˜n[k] ) − r(p) −→ 2 2  N (0, p[1] −u1 p[1] + p[k] −uk p[k] + 2p p ), otherwise, [1] [k] u1 uk 6

Corollary 3.3.  L  0, p1 = · · · = pk = 1/k; √ p˜n[1] 2 2 2 2 − ζ(p) −→ n  N (0, 21 p[1] −u1 p[1] + p4[1] p[k] −uk p[k] + 2 p2[1] ), otherwise, p˜n[k] u1 uk p p p [k]

[k]

Remark 3.3. Computation shows that usually ˜ n[·] by p

Pk

i=1

[k]

p˜n[i] 6= 1. To improve it we can replace

˜ n[·] k : p ∈ D}, pn[·] = arg min{kp − p Pk where D = {(p1 , . . . , pk ) : pi > 0, i = 1, . . . , k, i=1 pi = 1} and k · k is the Euclidean norm. ˜ n[·] onto π = {(x1 , . . . , xk ) : Now we give the explicit expression of pn[·] . Denote the projection of p Pk Pk π π π π ˜ = (˜ p1 , . . . , p˜k ). It is easy to know that p˜i = p˜n[i] + (1 − j=1 p˜n[j] )/k, i = i=1 xi = 1} by p

˜ π ; otherwise, assume that p˜π1 > · · · > p˜πs > 1, . . . , k. If for each i = 1, . . . , k, p˜πi > 0, then pn[·] = p Pk 0 > p˜πs+1 > · · · > p˜πk , 1 6 s 6 k−1, then pn[·] equals to the projection p˜π1 + i=s+1 p˜πi /s, . . . , p˜πs + Pk P ˜ π onto {(x1 , . . . , xk ) : ki=1 xi = 1, xs+1 = · · · = xk = 0}. When n ˜πi /s, 0, . . . , 0 of p i=s+1 p is sufficiently large, the second case can not occur, and thus the second projection is not needed. Also, for pn[·] , Theorem 3.1, Theorem 3.2 and Theorem 3.3 hold. The WSEs are also applicable in two-sample cases. For example, we have Corollary 3.4. Let X m ∼ M N Dk (m; p), Y n ∼ M N Dk (n; q), where p = (p1 , . . . , pk ) and q = (q1 , . . . , qk ). Suppose that X m is independent of Y n . Then as m → ∞, n → ∞ and m/n → µ ∈ (0, ∞), r

 L  0, p1 = · · · = pk = q1 = · · · = qk = 1/k; mn (˜ pm[i] − q˜n[i] ) − (p[i] − q[i] ) −→ 2 2  N (0, p[i] −ui p[i] + µ(q[i] −vi q[i] ) ), otherwise, m+n (1+µ)ui (1+µ)vi

for i = 1, . . . , k, where q[i] , q˜n[i] and vi for M N Dk (n; q) are defined in the same way as p[i] , p˜m[i] and ui for M N Dk (m; p), respectively. By use of the above results, large sample inferences for smooth functions of the ordered parameters in one-sample cases and two-sample cases can be made. Generally speaking, p[1] is the most concerned among p[i] , i = 1, . . . , k. Xiong and Li (2005) constructed tests for one-sided hypotheses concerning p[1] based on its another estimator with asymptotic normality. Next we construct interval estimation for p[1] based on its WSE. Let α ∈ (0, 1) and uα/2 be the upper α/2 quantile of standard normal distribution. Note

that p[1] ∈ [ k1 , 1). Then the asymptotic 1 − α confidence interval for p[1] is [an , bn ], where an =

max{1/k, p˜n[1] − uα/2 σ ˆn1 }, bn = min{1, p˜n[1] + uα/2 σ ˆn1 }. From Remark 3.2, it follows that if

p 6= (1/k, . . . , 1/k), P(p[1] ∈ [an , bn ]) → 1 − α.

Applying Remark 3.2, one-sided and two-sided tests for p[1] can also be constructed.

7

Table 1: confidence intervals for p[1] , 1 − α = 0.95 Coverage

Average

Coverage

Average

Rate

Length

Rate

Length

n = 20,

λ=9

n = 50,

λ = 12

(1) p = (0.5, 0.15, 0.15, 0.1, 0.1) Bootstrap I

0.768

0.359

0.903

0.272

Bootstrap II

0.965

0.360

0.955

0.272

WSE

0.931

0.406

0.936

0.274

(2) p = (0.7, 0.075, 0.075, 0.075, 0.075) Bootstrap I

0.939

0.387

0.959

0.252

Bootstrap II

0.941

0.388

0.959

0.252

WSE

0.950

0.388

0.933

0.251

(3) p = (0.3, 0.175, 0.175, 0.175, 0.175) Bootstrap I

0.842

0.293

0.734

0.199

Bootstrap II

0.954

0.293

0.979

0.199

WSE

0.980

0.301

0.931

0.207

(4) p = (0.24, 0.24, 0.24, 0.24, 0.04) Bootstrap I

0.963

0.290

0.963

0.186

Bootstrap II

0.449

0.290

0.132

0.186

WSE

0.903

0.310

0.916

0.197

(5) p = (0.25, 0.24, 0.24, 0.15, 0.12) Bootstrap I

0.962

0.289

0.957

0.190

Bootstrap II

0.638

0.290

0.474

0.190

WSE

0.958

0.299

0.957

0.196

(6) p = (0.25, 0.25, 0.25, 0.15, 0.1) Bootstrap I

0.955

0.291

0.949

0.192

Bootstrap II

0.586

0.292

0.361

0.192

WSE

0.946

0.305

0.948

0.201

8

Table 1 ( continue) Coverage

Average

Coverage

Average

Rate

Length

Rate

Length

n = 20,

λ=9

n = 50,

λ = 12

(7) p = (0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.05, 0.05) Bootstrap I

0.939

0.257

0.681

0.170

Bootstrap II

0.941

0.271

0.989

0.170

WSE

0.976

0.286

0.956

0.192

(8) p = (0.25, 0.25, 0.1, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05) Bootstrap I

0.901

0.293

0.893

0.205

Bootstrap II

0.970

0.298

0.933

0.205

WSE

0.964

0.345

0.959

0.225

(9) p = (0.28, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08) Bootstrap I

0.532

0.288

0.771

0.214

Bootstrap II

0.997

0.296

0.974

0.214

WSE

0.927

0.337

0.919

0.240

(10) p = (0.145, 0.095, 0.095, 0.095, 0.095, 0.095, 0.095, 0.095, 0.095, 0.095) Bootstrap I

0.998

0.243

0.987

0.153

Bootstrap II

0.573

0.262

0.488

0.153

WSE

0.982

0.258

0.984

0.158

4

Simulation This section presents numerical results for comparing the interval estimation for p[1] based on

the WSE with the two bootstrap methods in Glaz and Sison (1999). The comparison criterions here are coverage rate and average length of the intervals. In our simulation study, the confidence level is 1 − α = 0.95, and the repetition is 10,000 times. Also, ten different p’s which are given in Table 1 and two sample sizes, n = 20 and n = 50, are considered. For the choice of λn , we tried

λn = 1, 2, . . . , n. It is shown that larger λn ’s result in almost same performances. Here we choose λn = 9 for n = 20 and λn = 12 for n = 50 in the simulation. The simulation results are given in Table 1, from which we find that: (i). The performances of the two bootstrap confidence intervals are not stable. Their coverage rates vary from 0.132 to 0.997, and many of them are lower than 0.8. (ii). The coverage rates of WSE confidence intervals are much closer to 0.95, and all fall in [0.903, 0.984]. 9

We conclude that the interval estimation based on the WSE performs much better than the two bootstrap approaches. Also, the calculation of our method is much more simple.

5

Generalization Gelfand et al. (1992) mentioned how to make inference for the maximum coordinate of

the location vector for a multivariate location and scale model. Here, by use of our method given in Section 3, we can discuss estimation for ordered parameters of more general distribution models. Generally speaking, for a certain distribution model, if there exists an estimator of its parameter vector with asymptotic normality, we can create the weighted sum estimators (WSEs) with asymptotic normality for the ordered parameters of this model. Let θ = (θ1 , . . . , θk ) ∈ Θk ⊂ Rk be the parameter vector of a certain distribution model, and

θ[1] > · · · > θ[k] be the nonincreasing permutation of θ1 , . . . , θk . Write θ[·] for (θ[1] , . . . , θ[k] ). Let ˆ n = (θˆn1 , . . . , θˆnk ) be a reasonable estimator of θ (MLE usually) with θˆni ∈ Θ ⊂ R, i = 1, . . . , k. θ ˆn. Consider the following three assumptions for θ Assumption 1. Assumption 2.

ˆ n = θ, and P(θˆni > θˆnj ) > 0, ∀ i 6= j. Eθ ˆ n is a (strongly) consistent estimator of θ. θ

Assumption 3. There exist a sequence {an } with an → ∞ (n → ∞) and a random vector η, L ˆ n − θ) −→ such that an (θ η. Remark 5.1. In the multinomial distribution M N Dk (n; p) discussed in the previous sections, √ the MLE of p, (Xn1 /n, . . . , Xnk /n), satisfies the three assumptions obviously. In this case, an = n and η = ξ (given in Theorem 2.2). Remark 5.2. Let X 1 , . . . , X n i.i.d ∼ k-dimensional normal distribution N (θ, Σ) with Σ > 0. ˆ n = (Pn Xi1 /n, . . . , Pn Xik /n), satisfies the three assumptions. The MLE of θ, θ i=1

i=1

Remark 5.3. Because the asymptotic properties, especially asymptotic normality, are of more interest here. For common distribution models, it is not difficult to find an estimator satisfying Assumption 3 and η follows normal distribution. ˆ n[·] = Let θˆn[1] > · · · > θˆn[k] be the nonincreasing permutation of θˆn1 , . . . , θˆnk . Then θ (θˆn[1] , . . . , θˆn[k] ) is a natural estimator of θ[·] . The following two theorems are extensions of Theorem 2.1 and Theorem 2.2 , respectively. Theorem 5.1. Under Assumption 1, we have E θˆn[1] > θ[1] ,

E θˆn[k] < θ[k] .

Proof. Let i0 ∈ {1, . . . , k} and θi0 = θ[1] . Denote the set {θˆni0 = θˆn[1] } by A. Note that 10

P(Ac ) > 0. Then E θˆn[1]

= E [θˆn[1] I(A)] + E [θˆn[1] I(Ac )] > E [θˆni0 I(A)] + E [θˆni0 I(Ac )] = E θˆni0 = θ[1] .

The second inequality can be obtained similarly.

Theorem 5.2. Under Assumption 3, we have L ˆ n[·] − θ[·] ) −→ an (θ hθ (η).

Proof. Let {θ¯1 , . . . , θ¯l } = {θ1 , . . . , θk } and {ri1 , . . . , riki } = {j : θj = θ¯i }, where θ¯1 > · · · > θ¯l and ri1 < · · · < riki , i = 1, . . . , l. Define the random k-vector I n = (In1 , . . . , Ink ) to be

a permutation of (1, . . . , k), and to satisfy for each i = 1, . . . , l, Inri1 < · · · < Inriki and ˆ n → θ in probability. {θˆnInri1 , . . . , θˆnInrik } = {θˆn[k1 +···+ki−1 +1] , . . . , θˆn[k1 +···+ki ] }. Note that θ i

Then for i = 1, . . . , l,

P {θˆn[k1 +...+ki−1 ] , . . . , θˆn[k1 +...+ki ] } = {θˆnri1 , . . . , θˆnriki } → 1. Therefore P (Inri1 , . . . , Inriki ) = (ri1 , . . . , riki ) → 1.

It is followed that

P I n = (1, . . . , k) → 1.

Then for x1 , . . . , xk ∈ R,

k \ P {an (θˆnIni − θi ) 6 xi } i=1

=

k \ \ P {an (θˆnIni − θi ) 6 xi } {I n = (1, . . . , k)} i=1

+P

k \

{an (θˆnIni − θi ) 6 xi }

i=1

\ {I n 6= (1, . . . , k)}

=

k \ \ P {an (θˆni − θi ) 6 xi } {I n = (1, . . . , k)} + o(1)

=

k \ P {an (θˆni − θi ) 6 xi } + o(1),

i=1

i=1

It is followed that L

ˆ nI − θ) −→ η, an (θ n ˆ nI = (θˆnI , . . . , θˆnI ). Since an (θ ˆ n[·] − θ[·] ) = hθ an (θ ˆ nI − θ) and hθ is continuous, where θ n n1 n nk

the proof is completed.

11

Remark 5.4. Marcheselli (2000) essentially derived the result of this theorem using the generalized delta method. The proof presented here is more direct and more elementary. ˆ n[·] of θ[·] directly obtained by the estimator The above two theorems show that the estimator θ ˆ n of θ is not satisfactory. Thus we construct and investigate the WSE for θ[i] . Define θ θ˜n[i] =

[i] ˆ j=1 wnj θn[j] , Pk [i] j=1 wnj

Pk

where [i]

wnj

i = 1, . . . , k,

  f (θˆ )/f (θˆ ), j < i; n n[i] n n[j] =  fn (θˆn[j] )/fn (θˆn[i] ), j > i,

and {fn } is a sequence of functions. Consider the following conditions for {fn }. Condition 1.

For each n = 1, 2, . . ., fn is an increasing function from Θ, the parameter space of

θi , i = 1, . . . , k, to positive real numbers. Condition 2. For each x, y ∈ Θ with x < y, fn (x + o(1))/fn (y + o(1)) = o(1), (n → ∞). Condition 3. For each x, y ∈ Θ and α, β ∈ R, α + o(1) an β + o(1) y+ an

fn x + fn

Remark 5.5. Suppose that an =

=

  1 + o(1), x = y; (n → ∞).  o(1/an ), x < y,

√

n and {λn } satisfies the conditions in Theorem 3.3. Let g be a strictly increasing and differentiable function on Θ. It is easy to show that fn (x) = exp g(x)λn

satisfies the above three conditions. For the multinomial distribution, take g(x) = log x in above fn (x), then the corresponding WSE are the ones discussed in Section 3 and Section 4. Theorem 5.3. Under Condition 1, we have θ˜n[1] > · · · > θ˜n[k] . Proof. We need to show that for u1 > · · · > uk , an increasing function f valued on positive reals, and i = 1, . . . , k − 1,

i k X X f (ui ) f (uj ) uj + uj f (uj ) f (ui ) j=1 j=i+1 i X j=1

k X

>

i+1 X f (ui+1 ) j=1

f (uj )

i+1 X

f (ui ) f (uj ) + f (uj ) j=i+1 f (ui )

j=1

k X

j=i+2

f (uj ) uj f (ui+1 )

k X f (ui+1 ) f (uj ) + f (uj ) f (ui+1 ) j=i+2

which is equivalent to α1 + A α2 + A > , β1 + B β2 + B

12

uj +

,

where i X

α1 =

j=1

β2 =

i

i X f (ui+1 ) j=1

i

X X f (ui+1 ) f (ui )2 f (ui )2 uj , β1 = , α2 = uj , f (uj )f (ui+1 ) f (uj )f (ui+1 ) f (uj ) j=1 j=1 f (uj )

, A=

k X

j=i+1

k X f (uj ) f (uj ) uj , B = . f (ui+1 ) f (ui+1 ) j=i+1

Note that α1 β2 = α2 β1 . Then we only need to prove (α1 − α2 )B > (β1 − β2 )A. Since

(α1 − α2 )B − (β1 − β2 )A = the theorem follows.

i k X X s=1 t=i+1

f (ui )2 f (ui+1 ) f (ut ) − (us − ut ) > 0, f (ui+1 )f (us ) f (us ) f (ui+1 )

˜ n[·] is a (strongly) consistent estimator Theorem 5.4. Under Assumption 2 and Condition 2, θ of θ[·] . Proof. By Condition 2, if i > j, and θ[i] < θ[j] , then fn (θˆn[i] ) → 0 in probability (almost surely). fn (θˆn[j] ) [i]

Therefore if θ[j] 6= θ[i] , then wnj → 0 in probability (almost surely). For each i = 1, . . . , k, |θ˜n[i] − θ[i] | =

6 6

[i] k X wnj (θˆn[j] − θ[i] ) Pk [i] j=1 s=1 wns

[i] [i] X wnj wnj ˆ + ˆ θ − θ θn[j] − θ[i] Pk n[j] [j] P [i] [i] k j:θ[j] =θ[i] j:θ[j] 6=θ[i] s=1 wns s=1 wns X X [i] ˆ θn[j] − θ[j] + wnj ˆ θn[j] − θ[i] → 0 in probability (almost surely),

X

j:θ[j] =θ[i]

j:θ[j] 6=θ[i]

˜ n[·] → θ [·] in probability (almost surely). which implies θ

Theorem 5.5. Under Assumption 3 and Condition 3, we have L ˜ n[·] − θ[·] ) −→ ˜ l ), an (θ (˜ η1 , . . . , η

where

ηri1 + . . . + ηriki ηri1 + . . . + ηriki ,..., ki ki is a random ki -vector, i = 1, . . . , l. ˜i = η

Proof. Denote Zni = an (θˆn[i] − θ[i] ) for i = 1, . . . , k. if i > j, then   1 + o (1), θ = θ ; fn (θˆn[i] ) fn (θ[i] + Zni /an ) p [i] [j] = = fn (θ[j] + Znj /an )  op (1/an ), θ[i] < θ[j] . fn (θˆn[j] ) 13

Therefore, [i] wnj

For each i = 1, . . . , k, an (θ˜n[i] − θ[i] ) = Pk

  1 + o (1), θ = θ ; p [i] [j] =  op (1/an ), θ[i] = 6 θ[j] ,

1 wnj

[i]

X

[i]

j=1

i, j = 1, . . . , k.

j:θ[j] =θ[i]

an wnj (θˆn[j] − θ[i] ) +

X

j:θ[j] 6=θ[i]

[i] an wnj (θˆn[j] − θ[i] ) .

We have [i]

X

j:θ[j] =θ[i]

an wnj (θˆn[j] − θ[i] ) =

X

an (θˆn[j] − θ[i] ) + op (1)

X

an (θˆn[j] − θ[j] ) + op (1),

X

[i] [i] an wnj θˆn[j] − θ[j] + an wnj (θ[j] − θ[i] )

j:θ[j] =θ[i]

=

j:θ[j] =θ[i]

X

[i] an wnj (θˆn[j]

j:θ[j] 6=θ[i]

− θ[i] ) =

j:θ[j] 6=θ[i]

=

op (1).

Let ui = #{j : θj = θ[i] }. We obtain k X j=1

[i]

wnj

=

X

j:θ[j] =θ[i]

1 + op (1) +

X

op (1/an ) = ui + op (1),

j:θ[j] 6=θ[i]

Consequently,

an (θ˜n[i] − θ[i] ) =

X

j:θ[j] =θ[i]

an (θˆn[j] − θ[j] ) + op (1) ui + op (1)

By Slutsky’s theorem and Theorem 5.2, the proof is completed.

, i = 1, . . . , k.

From this theorem, if η is distributed according to normal distribution, then θ˜n[i] has asympˆ n = (dˆij,n )k×k is a totic normality for each i = 1, . . . , k. In addition, if η ∼ N (0, Σ) and Σ consistent estimator of Σ, then for i ∈ {rm1 , . . . , rmkm },

1′ Var(η , . . . , η rm1 rmkm )1 L ηrm1 + · · · + ηrmkm an (θ˜n[i] − θ[i] ) −→ ∼ N 0, , ui u2i where 1 = (1, . . . , 1). Note that 2 σ ˜ni =

Pk

s=1

[i] [i] ˆ t=1 wns wnt dst,n Pk [i] ( s=1 wns )2

Pk

is a consistent estimator of 1′ Var(ηrm1 , . . . , ηrmkm )1/u2i . Consequently we have Corolarry 5.1. If 1′ Var(ηrm1 , . . . , ηrmkm )1 > 0, then an (θ˜n[i] − θ[i] ) L −→ N (0, 1). σ ˜ni 14

Acknowledgements The authors thank the Associate Editor and two referees for their constructive comments. This work was supported partly by the National Natural Science Foundation of China (Grant No.10371126).

References Alam, K., Thompson, J.R., 1972. On selecting the least probable multinomial event. Ann. Math. Statist. 43, 1981-1990. Dykstra, R.L., Lang, J.B., Oh, M., Robertson, T., 2002. Order restricted inference for hypotheses concerning qualitative dispersion. J. Statist. Plann. Inference. 107, 249-265. Ethier, S.N., 1982. Testing for favorable numbers on a roulette wheel. J. Amer. Statist. Assoc. 77, 660-665. Gelfand, A.E., Glaz,J., Kuo, L., Lee, T.M., 1992. Inference for the maximum cell probability under multinomial sampling. Naval Res. Logist. 39, 97-114. Glaz, J., Sison, C.P., 1999. Simultaneous confidence intervals for multinomial proportions. J. Statist. Plann. Inference, 82, 251-262. Kolchin, V.F., Sevast’yanov, B.A., Chistyakov, V.P., 1978. Random Allocations. V.H.Winston & Sons, Washington, D.C. Li, Q., 2006. EST-based analysis for gene expression profile. Technical Report, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences. Marcheselli, M., 2000. A generalized delta method with applications to intrinsic diversity profiles. J. Appl. Prob. 37, 504-510. Marcheselli, M., 2003. Asymptotic results in jackknifing nonsmooth functions of the sample mean vector. Ann. Statist. 31, 1885-1904. Shao, J., Tu, D., 1995. The Jackknife and Bootstrap. Springer, New York. Yusas, I.S., 1972. On the distribution of the maximum frequency of a multinomial distribution. Theory Prob. Applications. 17, 712-717 . Xiong, S., Li, G., 2005. Testing for the maximum cell probabilities in multinomial distributions. Science in China, Ser. A: Mathematics. 48, 972-985.

15

Efficiency Gains in Rank-ordered Multinomial Logit ...