Acta Mathematica Sinica, English Series - Springer Link

Viewer
Transcript

Acta Mathematica Sinica, English Series Oct., 2008, Vol. 24, No. 10, pp. 1637–1650 Published online: October 5, 2008 DOI: 10.1007/s10114-008-6220-6 Http://www.ActaMath.com

Acta Mathematica Sinica, English Series The Editorial Office of AMS & Springer-Verlag 2008

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models Mi Xia WU College of Applied Sciences, Beijing University of Technology, Beijing 100022, P. R. China and Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, NIH/DHHS, 6100 Executive Boulevard, Rockville, MD 20852, USA E-mail: [email protected]

Song Gui WANG College of Applied Sciences, Beijing University of Technology, Beijing 100022, P. R. China E-mail: [email protected]

Kai Fun YU Division of Epidemiology, Statistics and Prevention Research, National Institute of Child Health and Human Development, NIH/DHHS, 6100 Executive Boulevard, Rockville, MD 20852, USA E-mail: [email protected] Abstract Problems of the simultaneous optimal estimates and the optimal tests in general mixed models are considered. A necessary and suﬃcient condition is presented for the least squares estimate of the ﬁxed eﬀects and the analysis of variance (Hendreson III’s) estimate of variance components being uniformly minimum variance unbiased estimates simultaneously. This result can be applied to the problems of ﬁnding uniformly optimal unbiased tests and uniformly most accurate unbiased conﬁdential interval on parameters of interest, and for ﬁnding equivalences of several common estimates of variance components. Keywords Linear mixed model, Least squares estimate, Analysis of variance estimate, Minimum variance unbiased estimate, Uniformly most powerful unbiased test MR(2000) Subject Classification 62F10, 62J10, 62F03

1

Introduction

Linear mixed eﬀects model is an important class of statistical models that are used in many ﬁelds of applications, such as biological health sciences, economics, computer graphics and mechanic engineering etc., see Baltagi [1], Davidian and Giltinan [2], Diggle, et al. [3]. The parameters in these models, consisting of ﬁxed eﬀects and variance components, are mainly estimated by Received April 27, 2006, Revised May 27, 2007, Accepted May 30, 2008 The work is supported by Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality PHR (IHLB), National Natural Science Foundation of China (NSFC) (10801005) and (NSFC) (10771010) and the Intramural Research Program of the National Institute of Child Health and Human Development, National Institute of Health

Wu M. X., et al.

1638

the following popular methods: the least squares (LS), analysis of variance (ANOVA), maximum likelihood (ML), restricted maximum likelihood (REML) and minimum norm quadratic unbiased (MINQU). In general, ML estimate and REML estimate need to solve a system of non-linear equations, which usually do not give explicit solution, they must be determined by iterative algorithms, though many iterative algorithms have been developed, such as EM iterations or penalized least squares, see, Searle et al [4] and Bates and DebRoy [5], and there mainly exist simulation results in literature; MINQU estimate strongly depends on the prior values of variance components. In practice, the least squares estimate (LSE) and ANOVA estimate are more commonly employed for their simple forms and some good properties under some models, especially in hypothesis testing. Baltagi [1], by large simulations, also showed that ANOVA type estimate has high eﬃciency in two-stage estimate of ﬁxed eﬀects. On ANOVA estimate taking negative values, some improved estimates of ANOVA estimate have been presented, see Kelly and Mathew [6], and in some ﬁelds, such as biology, genetics, the negative values of variance components are permitted only if the covariance matrix can be ensured nonnegative deﬁnite, see Nelder [7], Smith and Murray [8], and Hazelton and Gurrin [9]. In this paper, we mainly consider simultaneous optimality of the LSE and ANOVA estimate in general linear mixed models. Graybill and Hultquist [10] discussed this problem in general random eﬀect models and present a suﬃcient condition for the LSE of the ﬁxed eﬀect and the ANOVA estimate of variance components to be uniformly minimum variance unbiased (UMVU) estimates simultaneously. It is the case for all random eﬀect models with balanced data, see Searle, et al. [4]. For general linear mixed models, Rao [11] obtained the necessary and suﬃcient conditions for the LSE of ﬁxed eﬀects to be the UMVU estimate; Seely [12] presented the necessary and suﬃcient condition for the existence of the uniformly minimum variance invariant unbiased (UMVIU) estimator of variance components. Brown [13] proved that an ANOVA estimate, which is obtained by some independent quadratic forms in the total sum of squares, is UMVIU under some conditions. Khuri, et al [14] studied the optimality of tests constructed by these above quadratic forms. However, the optimality of Henderson III’s estimate, as the most common form of ANOVA estimate, in general, is yet to be solved, besides some results under special models, see, Searle, et al [4] and Wu and Wang [15]. This paper gives the fairly general theories of the simultaneous optimal estimates and the optimal tests in general mixed models. The outline of this paper is as follows. In Section 2, we describe the LSE of any estimable function and the ANOVA estimate (Henderson III’s estimate) of variance components under general linear mixed models. In Section 3, the necessary and suﬃcient conditions are obtained for the LSE and the ANOVA estimate to be UMVU estimates simultaneously, optimal tests and conﬁdence interval are presented. In the last section, the equivalences of several common estimates of variance components, ANOVA estimate, REML estimate and MINQU estimate, are established. 2

LSE and ANOVA Estimate

Consider a general linear mixed model

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

y = Xβ +

k

Ui ξi + e,

1639

(2.1)

i=1

where y is an n × 1 vector of observations, X is an n × p known matrix, β is an unknown p × 1 vector of ﬁxed eﬀects, each Ui is an n × qi known matrix, each ξi is an qi × 1 vector of random eﬀects, having normal distribution with mean vector 0 and covariance matrix σi2 Iqi , e is an n × 1 error vector, having normal distribution with mean vector 0 and covariance matrix σe2 In , and ξ1 , . . . , ξk and e are mutually independently distributed. Denote Uk+1 = In , ξk+1 = e and 2 = σe2 . Then the covariance matrix of y is σk+1

V (σ 2 ) = Cov(y) =

k+1

σi2 Ui Ui ,

(2.2)

i=1 2 ) . The parameter space of σ 2 is deﬁned to be where σ 2 = (σ12 , . . . , σk2 , σk+1 2 > 0}. Ω = {σ 2 | σi2 ≥ 0, i = 1, . . . , k, σk+1

The LSE of any estimable function c β under model (2.1) is c βˆ = c (X X)− X y = α PX y,

(2.3)

where α is deﬁned by c = X α, PX = X(X X)− X , A− denotes a g-inverse of A. The Henderson III estimates, the most common form of ANOVA estimate, is based on sums of squares obtained by the method of ﬁtting constants. Without loss of generality, we assume that the order of U1 , U2 , . . . , Uk+1 satisﬁes the condition: M (Ui ) ⊂ M (Uj ) implies i < j,

(2.4)

which is the partial ordering() on Sn , the set of all matrices with n rows, denoted by Ui Uj . Here M (A) stands for the column space of a matrix A. Then an ANOVA estimate of σ 2 is the solution to the following equations ai (σ 2 ) = y Mi y,

i = 1, . . . , k + 1,

(2.5)

where each ai (σ 2 ) = E(y Mi y) and M1 = P(X:U1 ) − PX , Mi = P(X:U1 :···:Ui ) − P(X:U1 :···:Ui−1 ) ,

i = 2, . . . , k + 1,

(2.6)

E(η) denotes the expectation of a random variable. Clearly, P(X:U1 :···:Uk :Uk+1 ) = In from the fact Uk+1 = In , and ai (σ 2 ) =

k+1

tr(Mi Uj Uj ) σj2 ,

i = 1, . . . , k + 1,

j=i

where tr(A) denotes the trace of matrix A. Let C = (cij ) be a (k + 1) × (k + 1) upper triangular matrix such that tr(Mi Uj Uj ), if i ≤ j, cij = 0, otherwise.

(2.7)

Wu M. X., et al.

1640

Denote a = (a1 , . . . , ak+1 ) . Then (2.7) can be rewritten as a = Cσ 2 .

(2.8)

If C is invertible, then the ANOVA estimate of σ 2 exists, and can be described as σ ˆ 2 = C −1 s,

(2.9)

where s = (s1 , . . . , sk+1 ) with element si = y Mi y, i = 1, . . . , k + 1. Clearly, the ANOVA estimate σ ˆ 2 is unbiased. Remark 2.1 The partial ordering required on U1 , U2 , . . . Uk+1 above (2.4) ensures that each Mi = 0 if ANOVA estimate of σ 2 exists. In the following, we consider the matrix C = (cij ) being invertible. 3

Necessary and Suﬃcient Conditions for the Simultaneous Optimality

In order to prove our main results, we need the following lemmas. Lemma 3.1 For any matrices A and B with the same number of rows, if PA PB = PB PA , then (i) PA PB is the orthogonal projector onto M (A) ∩ M (B); (ii) P(A: B) = PA + PB − PB PA PB = PA + QA PB , where QA = I − PA . For proof see Theorem 1 of Baksalary [16]. Lemma 3.2

For any n × n positive deﬁnite matrix V and n × p matrix X, the following two

statements are equivalent. (i) PX and V commute; (ii) V can be decomposed as V =

m

λi P i +

i=1

m0

λi P i ,

(3.1)

i=m+1

where λ1 , . . . , λm and λm+1 , . . . , λm0 are distinct non-zero eigenvalues of QX V QX and PX V PX , m respectively. The orthogonal projectors P1 , . . . , Pm0 satisfy that Pi Pj = 0 (i = j) and i=1 Pi = m0 Pi = PX . QX , i=m+1 Proof Obviously, (i) is straight result of (ii), we prove (i) ⇒ (ii). Assume that (i) is true. Then V = V (QX + PX ) = QX V QX + PX V PX . Let the spectral decompositions of QX V QX and PX V PX be QX V QX =

m

λi P i ,

PX V PX =

i=1

m0

λi P i ,

i=m+1

where λ1 , . . . , λm and λm+1 , . . . , λm0 are the distinct non-zero eigenvalues of QX V QX and PX V PX , respectively. Since V is a positive-deﬁnite matrix, it is easy to show that M (QX V QX ) = M (QX ) and M (PX V PX ) = M (PX ). We have m i=1

Pi = QX ,

m0 i=m+1

Pi = PX ,

Pi Pj = 0 (i = j)

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

1641

so (ii) holds. The proof of Lemma 3.2 is completed. Remark 3.1 There may exist some equal eigenvalues between QX V QX and PX V PX , that is, λm+j = λi may hold for some i, j (≤ m) in (3.1). By (ii) of Lemma 3.2, it is easy to verify that if PX and V commute, then QX V −1 QX = and |V

−1

|=

m0 i=1

m 1 Pi λ i=1 i

(3.2)

m

1 1 −1 H|, ri = ri · |H V λi λ i i=1

(3.3)

where ri = tr(Pi ), i = 1, 2, . . . , m0 , r = rk(X), and H is an n × r column orthogonal matrix such that H H = Ir and PH = HH = PX . ˆ2 In the following theorem, we present the necessary and suﬃcient conditions for c βˆ and σ to be the UMVU estimates. ˆ 2 are the UMVU estimates of estimable function Theorem 3.1 Under model (2.1), c βˆ and σ 2 c β and σ , respectively, if and only if QX V (σ 2 ) is symmetric and exactly has k + 1 non-zero distinct eigenvalues, which are linearly independent combinations of the σ 2 ’s. Proof (Suﬃciency) Let λ1 , λ2 , . . . , λk+1 be exactly k + 1 distinct eigenvalues of QX V (σ 2 ). Since they are linearly independent combinations of the σ 2 ’s, there exists a (k + 1) × (k + 1) reversible constant matrix A = (aij ) such that λ = Aσ 2 ,

(3.4)

where λ = (λ1 , λ2 , . . . , λk+1 ) . Using the symmetry of QX V (σ 2 ), we have QX V (σ 2 )QX = QX V (σ 2 ) =

k+1

λi P i ,

(3.5)

i=1

k+1 where {Pi } are orthogonal projectors and satisfy Pi Pj = 0 (i = j), i=1 Pi = QX . By combining (3.2) and (3.3) with QX V (σ 2 ) = V (σ 2 )QX , the density function of y can be rewritten as 1 −n 2 − 12 −1 2 2 f (y) = (2π) |V (σ )| exp − (y − Xβ) V (σ )(y − Xβ) 2 k+1 m 1 r 1 1 − n−r 2 exp − y Pi y (2π)− 2 |H V −1 (σ 2 )H| 2 = (2π) ri /2 2λi i=1 λi i=1

−1 2 · exp −(H y − H Xβ) H V (σ )H(H y − H Xβ)/2 , (3.6) where H is deﬁned in (3.3) and σ 2 = A−1 λ. Noting that Xβ = H(H Xβ), hence any estimable function c β (= α Xβ) is determined by the estimable vector H Xβ. Reparametrizing the regressive parameter as γ = H Xβ, we have H y, y Pi y, i = 1, 2, . . . , k + 1 are a set of complete and suﬃcient statistics according to Theorem 15.16 of Arnold [17]. Furthermore, combining (2.2) and (3.4) with (3.5) yields k+1 k+1 k+1 k+1 k+1 2 2 2 2 σj QX Uj Uj QX = σj aij Pi = σj aij Pi QX V (σ )QX = j=1

i=1

j=1

j=1

i=1

Wu M. X., et al.

1642

for any σ 2 ∈ Ω, which implies that any QX Uj Uj QX can be decomposed as QX Uj Uj QX =

k+1

aij Pi .

(3.7)

i=1

Denote bij =

0, aij = 0, 1, aij = 0.

Thus the orthogonal projector onto subspace M (QX Uj ) is given by P QX Uj =

k+1

bij Pi ,

j = 1, 2, . . . , k + 1,

(3.8)

i=1

which shows that P QX Ui and P QX Uj commute (1 ≤ i, j ≤ k + 1), that is, PQX Ui PQX Uj = PQX Uj PQX Ui ,

1 ≤ i, j ≤ k + 1.

According to (ii) of Lemma 3.1, we have P(QX U1 :QX U2 ) =

k+1

k+1

i=1

i=1

(bi1 + bi2 − bi1 bi2 )Pi =

max{bi1 , bi2 } Pi .

(3.9)

Clearly, P(QX U1 :QX U2 ) PQX U3 = PQX U3 P(QX U1 :QX U2 ) . Note that each max{bi1 , bi2 } still takes only the value 1 or 0. Similarly, we can deduce P(QX U1 :QX U2 :···:QX Ul ) =

k+1 i=1

max{bij } Pi , j≤ l

1 ≤ l ≤ k + 1.

(3.10)

By the matrix identity P(A: B) = PA + PQA B , it is easy to verify M1 = P(QX U1 ) =

k+1

bi1 Pi ,

i=1

Ml = P(QX U1 :QX U2 :···:QX Ul ) − P(QX U1 :QX U2 :···:QX Ul−1 ) =

k+1 i=1

max{bij } − max {bij } Pi , j≤ l

2 ≤ l ≤ k + 1.

j≤ l−1

(3.11)

Noting that c βˆ = c (X X)− X y = α HH y, k+1 s1 = y M1 y = bi1 y Pi y, sl = y Ml y =

i=1 k+1

i=1

(3.12)

max{bij } − max {bij } y Pi y, j≤ l

j≤ l−1

2 ≤ l ≤ k + 1,

then c βˆ and σ ˆ 2 are the UMVU estimates of c β and σ 2 , respectively. (Necessity) For any known (k + 1) × 1 vector f = (f1 , f2 , . . . , fk+1 ) , it is obvious by (2.5) ˆ 2 belongs to the class of the unbiased quadratic invariant estimate of f σ 2 that f σ D = {y Gy : G = G , GX = 0, E(y Gy) = f σ 2 }. Under normal assumption of y, the variance of each y Gy ∈ D is Var(y Gy) = tr(GV (σ 2 ))2 .

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

1643

If ANOVA estimate σ ˆ 2 is the UMVU estimate of σ 2 , then f σ ˆ 2 is the UMVU estimate of f σ 2 . ˆ 2 is deﬁnitely the minimum variance estimate under class D, a subclass of the unbiased Thus f σ ˆ 2 = y G0 y. Then matrix G0 is the solution to the following estimate class of f σ 2 . Denote f σ minimum trace problem min{tr(GV (σ 2 ))2 : GX = 0, each tr(GUi Ui ) = fi }, G

which, being similar to the minimum trace problem on MINQU estimate, can be proved to be equivalent to the following fact that σ ˆ 2 is the explicit solution to the equations k+1

tr(P (σ 2 )Ui Ui P (σ 2 )Uj Uj )σj2 = y P (σ 2 )Ui Ui P (σ 2 ) y,

i = 1, 2, . . . , k + 1,

(3.13)

j=1

−1 where P (σ 2 ) = V −1 (σ 2 )−V −1 (σ 2 )X X V −1 (σ 2 )X X V −1 (σ 2 ) = B(B V (σ 2 )B)−1 B . Here B is an n × (n − r) column orthogonal matrix such that B B = In−r and BB = QX . Applying Theorem 5 of Szatrowski [18], we get (3.13) has the explicit solution if and only if B V (σ 2 )B has exact k + 1 non-zero distinct eigenvalues which are linearly independent combinations of the σ 2 . Noting the facts that QX V (σ 2 )QX and B V (σ 2 )B have the same non-zero eigenvalues, the symmetry of QX V (σ 2 ) is the straightforward result of c βˆ to be the UMVU estimate (see Searle et. al [4]), so the necessary is testiﬁed. The proof of Theorem 3.1 is completed. By Lemma 3.2, two conditions PX V (σ 2 ) = V (σ 2 )PX , and QX V (σ 2 )QX have k + 1 nonzero distinct eigenvalues which are linearly independent combinations of the σ 2 ’s, are equivalent to that V (σ 2 ) can be decomposed as V (σ 2 ) =

k+1

λi P i +

i=1

m0

λi P i ,

(3.14)

i=k+2

where λ1 , . . . , λk+1 are the same as that deﬁned in Theorem 3.1, and orthogonal projectors k+1 P1 , · · · , Pm0 are independent on σ 2 and satisfy that Pi Pj = 0 (i = j) and i=1 Pi = QX , m0 i=k+2 Pi = PX . Thus we obtain the following simple and useful corollaries. ˆ 2 are the UMVU estimates of estimable functions c β and σ 2 , Corollary 3.1 c βˆ and σ respectively, if V (σ 2 ) can be decomposed as V (σ 2 ) =

k+1

λi P i + λ0 P X ,

(3.15)

i=1

where eigenvalues λ1 , λ2 , . . . , λk+1 are linearly independent combinations of the σ 2 ’s, orthogonal k+1 projectors P1 , . . . , Pk+1 satisfy that PX Pi = 0, Pi Pj = 0 (i = j) and i=1 Pi = QX . ˆ 2 are the UMVU estimates of estimable functions c β and σ 2 , Corollary 3.2 c βˆ and σ respectively, if V (σ 2 ) can be decomposed as V (σ 2 ) =

k+1

λi G i ,

(3.16)

i=1

where eigenvalues λ1 , λ2 , . . . , λk+1 are linearly independent combinations of the σ 2 ’s, orthogonal k+1 projectors G1 , . . . , Gk+1 satisfy that Gi PX = PX Gi = 0, Gi Gj = 0 (i = j) and i=1 Gi = In .

Wu M. X., et al.

1644

As a special case of Corollaries 3.1 and 3.2, under random eﬀect models ( wherein Xβ = 1μ), n μ ˆ = y¯ = i=1 yi /n, we can obtain Theorem 5 in Graybill and Hultquist [10], that is, Corollary 3.3 The necessary and suﬃcient condition for the simultaneous optimality of μ ˆ 2 2 2 ¯ and σ ˆ is that W = V (σ ) + nμ Jn exactly has k + 2 distinct eigenvalues, which are linearly independent combinations of the σ 2 ’s and μ2 , where J¯n = 1n 1n /n, 1n = (1, . . . , 1) . Remark 3.2

Under the condition of Corollary 3.2, the spectral decomposition of QX V (σ 2 )

QX is QX V (σ 2 )QX =

k+1

λi (QX Gi QX ) =

i=1

k+1

λi P i .

i=1

Remark 3.3 From the proof of Theorem 3.1, we have that QX V (σ 2 ) being symmetric is a ˆ2 necessary condition for σ ˆ 2 being UMVU estimate σ 2 . That is, if c βˆ is not UMVU, then σ must be not UMVU. Remark 3.4 Combining Corollary 3.2 with Theorem 5.1 of Szatroski (1980), we have if the ˆ2 σ ˆ 2 must be UMVU. However, the converse is MLE of (c β, σ 2 ) has the explicit form, then σ not true. ˆ 2 may be the uniformly minimum variance Furthermore, if c βˆ is not UMVU estimate, σ estimate under the invariant unbiased estimate class {y −→ y + Xa} if QX V (σ 2 )QX exactly has k + 1 non-zero distinct eigenvalues which are linearly independent combinations of the σ 2 ’s, see Brown [13]. The following theorem will show the uniqueness of ANOVA estimate σ ˆ 2 under the latter condition. Theorem 3.2 If QX V (σ 2 )QX exactly has k + 1 non-zero distinct eigenvalues λ1 , λ2 , . . . , k+1 λk+1 , which are linearly independent combinations of σ 2 ’s, denoted by each λi = i=1 aij σj2 , then ANOVA estimate σ ˆ 2 is unique, and equal to the solution to equations λi =

k+1

aij σj2 = y Pi y/tr(Pi ),

j = 1, 2, . . . , k + 1,

(3.17)

j=1

where Pi is deﬁned by (3.5). Proof Denote A = (aij ). Then λ = Aσ 2 . By the proof of Theorem 3.1, if QX V (σ 2 )QX exactly has k + 1 non-zero distinct eigenvalues which are linearly independent combinations of σ 2 ’s, then (3.11) holds. Thus equations (2.5) can be rewritten as a1 (σ 2 ) = E(s1 ) =

k+1

bi1 λi · tr(Pi ) =

i=1

al (σ 2 ) = E(sl ) =

k+1

=

i=1

bi1 · y Pi y,

i=1

max bij − max bij λi · tr(Pi ) j≤l

i=1 k+1

k+1

j≤l−1

max bij − max bij · y Pi y, j≤l

j≤l−1

l = 1, 2, . . . , k + 1.

(3.18)

Denote D = (dij ) being (k + 1) × (k + 1) matrix, where each di1 = bi1 , and dij = (maxj≤l bij −

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

1645

maxj≤l−1 bij ) for any j = 1. Then the matrix form of (3.18) is a = D diag(tr(P1 ), . . . , tr(Pk+1 ))λ = D (y P1 y, . . . , y Pk+1 y) . Together with (2.7) and (3.4), we have D = CA−1 diag(tr(P1 ), . . . , tr(Pk+1 ))−1 is invertible, and equations (3.18) are equivalent to equations (3.17). Thus ANOVA estimate σ ˆ 2 is unique and equal to the solution of (3.17). The proof of Theorem 3.2 is completed. Remark 3.5 Theorem 3.2 implies that if there doesn’t exist the partial ordering ‘ between Ui and Uj , then Ui and Uj can permute in (2.4). To illustrate the results above, we consider the following three simple examples. Example 3.1

Two-way classiﬁcation random model yij = μ + αi + βj + eij , i = 1, 2, . . . , a,

(3.19)

j = 1, 2, . . . , b,

where μ is overall mean, both αi and βj are random eﬀects. Assume that αi ∼ N (0, σα2 ), βj ∼ N (0, σβ2 ) and eij ∼ N (0, σe2 ), all αi s, βj s and eij s are mutually independent. Denote y = (y11 , . . . , y1b , . . . , y21 , . . . , yab ). Then Cov(y) is given by V (σ 2 ) = σα2 Ia ⊗ Jb + σβ2 Ja ⊗ Ib + σe2 Iab and its spectral decomposition is V (σ 2 ) = (bσα2 + aσβ2 + σe2 ) J¯ab + (bσα2 + σe2 )Ia ⊗ (Ib − J¯b ) + (aσβ2 + σe2 ) (Ia − J¯a ) ⊗ J¯b + σe2 (Ia − J¯a ) ⊗ (Ib − J¯b ), where Jm = 1m 1m , J¯m = Jm /m. By Corollary 3.1, μ ˆ = y¯ and ANOVA estimate of σ 2 = ˆ 2 which is (σα2 , σβ2 , σe2 ) are UMVU. Applying (3.17), we can easily obtain ANOVA estimate σ the solution of the following equations ⎧ 1 ¯ 2 2 ⎪ ¯ ⎪ ⎪ bσα + σe = a(b − 1) y [Ja ⊗ (Ib − Jb )]y, ⎪ ⎪ ⎨ 1 aσβ2 + σe2 = y [(Ia − J¯a ) ⊗ J¯b ]y, (3.20) b(a − 1) ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎩ σe2 = y [(Ia − J¯a ) ⊗ (Ib − J¯b )]y. (a − 1)(b − 1) Example 3.2

Two-way classiﬁcation mixed model yij = μ + αi + βj + eij , i = 1, 2, . . . , a, j = 1, 2, . . . , b,

where βj is the ﬁxed eﬀect, and μ, αi , eij have the same assumption as (3.19). Thus Cov(y) = V (σ 2 ) = (bσα2 + σe2 ) Ia ⊗ J¯b + σe2 Ia ⊗ (Ib − J¯b ), and X = (1a ⊗ 1b : 1a ⊗ Ib ). Denote γ = (μ, α1 , . . . , αa ), which is the vector of ﬁxed eﬀects. By ˆ 2 = (ˆ σβ2 , σ ˆe2 ) Corollary 3.2, the LSE c γˆ (for any estimable function c γ) and ANOVA estimate σ

Wu M. X., et al.

1646

are UMVU, where γˆ = (¯ y , y¯1 − y¯, . . . , y¯a − y¯). Using Remark 3.2 and (3.17), we can obtain ⎧ 1 ⎪ ⎪ ˆe2 = y [(Ia − J¯a ) ⊗ (Ib − J¯b )]y, ⎨ σ (a − 1)(b − 1) ⎪ ⎪ ⎩ σ ˆα2 = Example 3.3

1 [y (Ia − J¯a ) ⊗ J¯b y − (a − 1)ˆ σe2 ]. b(a − 1)

Random regression coeﬃcient model yi = 1ui + Xbi + ei ,

i = 1, 2, . . . , k,

(3.21)

where yi = (yi1 , . . . , yin ) is the observation vector of ith subclass, X is an n × p matrix, rk(X) = p, ui is random intercept with distribution N (μ, σu2 ), bi is p × 1 random regression coeﬃcient with distribution N (β, σβ2 Ip ), ei ∼ N (0, σe2 In ), and all ui s, bi s and ei s are mutually independent. Denote ui = μ + ηi , bi = β + ξi , where ηi ∼ N (0, σu2 ), ξi ∼ N (0, σb2 Ip ). Let y = (y1 , . . . , yk ) , η = (η1 , . . . , ηk ) , ξ = (ξ1 , . . . , ξk ) and e = (e1 , . . . , ek ) . Then (3.21) can be rewritten as y = 1kn μ + (1k ⊗ X)β + (Ik ⊗ 1n )η + (Ik ⊗ X)ξ + e. The covariance matrix of y is given by

V (σ 2 ) = σu2 Ik ⊗ Jn + σb2 Ik ⊗ (XX ) + σe2 Ikn . Let X0 = (1kn : 1k ⊗ X). Clearly, we have PX0 V (σ 2 ) = V (σ 2 )PX0 , so the LSE of c β is the UMVU estimate. Moreover, if X satisﬁes the orthogonal condition X X = aI, (a > 0) and 1p X = 0, then the spectral decomposition of QX0 V (σ 2 )QX0 is given by QX0 V (σ 2 )QX0 = (nσu2 + σe2 )((Ik − J¯k ) ⊗ J¯n ) + (aσb2 + σe2 )((Ik − J¯k ) ⊗ PX ) + σe2 (Ik × (In − J¯n − PX )). By Theorems 3.1 and 3.2, the ANOVA estimate of (σe2 , σb2 , σu2 ) is UMVU and unique, which is the solution to the following equations ⎧ y ((Ik − J¯k ) ⊗ J¯n )y ⎪ ⎪ , ⎪ nσu2 + σe2 = λ1 = ⎪ ⎪ (k − 1) ⎪ ⎨ y ((Ik − J¯k ) ⊗ PX )y (3.22) aσb2 + σe2 = λ2 = , ⎪ (k − 1)p ⎪ ⎪ ⎪ y (Ik × (In − J¯n − PX ))y ⎪ ⎪ ⎩ σe2 = λ3 = . k(n − 1 − p) Theorems 3.1 and 3.2 not only extend the results on optimality in Graybill and Hultquist [10] to general mixed models, but also give the actual formula for ANOVA estimate of σ 2 , which is the solution to equations (3.17). ˆ 2 are UMVU estimates, then Corollary 3.4 If both the LSE c βˆ and ANOVA estimate σ ˆ ˆ ˆ k+1 are mutually independent, and each λ ˆ i is distributed as (λ1 /ri ) · χ2 , where c β, λ1 , . . . , λ ri ˆ i = y Pi y/ri , ri = tr(Pi ), χ2 denotes the Chi-squares distribution with degree of freedom m. λ m

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

1647

Corollary 3.4 is a direct result of (3.6). It is very important for constructing exact tests and exact conﬁdence intervals of parameters based on the LSE and ANOVA estimate. Firstly, we consider the hypothesis H0 : σi2 = 0 vs Ha : σi2 > 0

(3.23)

under model (2.1). Among λ1 , . . . , λk+1 , if there exist two eigenvalues λi1 , λi2 such that λi1 − λi2 = mi σi2 for some positive real number mi , then Fi (y) =

ˆ i1 λ ˆ i2 λ

(3.24)

is an exact F -statistic for testing the hypothesis H0 . A convenient example is to test σα2 = 0 by ˆ 3 in Example 3.3. ˆ 1 /λ F - statistic λ Since the statistics H y, yP1 y, y P2 y, . . . , y Pk+1 y are suﬃcient and complete, by standard multiparameter exponential family results (Lehmann [19]), the test 1, if Fi (y) ≥ Fri1 ,ri2 (1 − α), φ(Fi (y)) = 0, otherwise is the uniformly most powerful unbiased (U M P U ) test with the power function g(λi2 /λi1 ) = Pσi2 {Fi (y) ≥ Fri1 ,ri2 (1 − α)} λi2 Fri1 ,ri2 (1 − α) , = P Fri1 ,ri2 ≥ λi1

(3.25)

where Fm,n denotes the distribution with degree of freedom m and n. Furthermore, it is easy to verify that the probability for σ12 taking negative value is equal to ˆ i1 ≤ λ ˆ i2 } = P Fr ,r ≤ λi2 . P {λ (3.26) i1 i2 λi1 Secondly, we consider tests on ﬁxed parameters of interest. If X can be partitioned as X = (X1 , X2 , . . . , Xq ), where Xi Xj = 0, (i = j), then each Xi βi is an estimable vector, and its LSE is (3.27) Xi βˆi = Xi (Xi Xi )− Xi y = PXi y, i = 1, 2, . . . , q, where βi is the subvector of parameter β corresponding to Xi , that is β = (β1 , β2 , . . . , βq ) . ˆ1, . . . , λ ˆ k+1 are mutually independent. By Corollary 3.4, we can conclude that X1 βˆ1 , . . . , Xq βˆq , λ Thus there exists the UMPU test for the hypothesis H0 : ci βi = di vs Ha : ci βi = di

(3.28)

as long as PXi V (σ 2 ) = λj PXi , and λj coincides with one of λ1 , . . . , λk+1 , where ci ∈ M (Xi ). This test statistic is given by ci βˆi − di T (y) = ˆj ci (Xi Xi )− ci · λ

,

which has t-distribution with degree of freedom rj under the null hypothesis ci βi = di .

(3.29)

Wu M. X., et al.

1648

From (3.29), we can directly derive that the exact (1 − α) conﬁdence interval of ci βi α

ˆ j , c βˆi + tr α ˆj ci βˆi − trj (3.30) ci (Xi Xi )− ci · λ ci (Xi Xi )− ci · λ i j 2 2 is the uniformly most accurate unbiased (U M AU ) conﬁdence interval. In Example 3.3, if X X = aI, (a > 0) and 1p X = 0, then Cov(ˆ μ) = λ1 /kn,

ˆ = (λ2 /ak)Ip , Cov(β)

ˆ = 0, Cov(ˆ μ, β)

the 1 − α exact conﬁdence intervals for the ﬁxed eﬀects μ and c β are α α ˆ 1 /kn, μ ˆ 1 /kn μ ˆ − tk−1 ˆ + tk−1 λ λ 2 2 α α ˆ ˆ ˆ ˆ c β − tp(k−1) c cλ2 /ak, c β + tp(k−1) c cλ2 /ak , 2 2

and

respectively. 4

Equivalent Estimates of ANOVA Estimate

In this section, we prove that the condition of Theorem 3.2 is the necessary and suﬃcient condition for equivalence of the ANOVA estimate, REML estimate and MINQU estimate. Theorem 4.1 The condition of Theorem 3.2 is necessary and suﬃcient for equivalence of the ANOVA estimate σ ˆ 2 and the explicit solution of the REML equations of model (2.1). Proof (Suﬃciency)

The log-likelihood function based on B y is given as

l(σ 2 ) = log L(σ 2 |B y) 1 1 1 = − (n − rk(X)) log 2π − log |B V (σ 2 )B| − y B(B V (σ 2 )B)−1 B y, (4.1) 2 2 2 where B is an n × (n − r) column orthogonal matrix such that B B = In−r and BB = QX . 2 ) , we obtain the REML equations By maximizing l(σ 2 ) with respect to σ 2 = (σ12 , σ22 · · · , σk+1

tr(P (σ 2 )Ui Ui ) = y P (σ 2 )Ui Ui P (σ 2 ) y,

i = 1, 2, . . . , k + 1,

(4.2)

which is an equivalent form of equations (3.13). Noting that

−1 P (σ 2 ) = V −1 (σ 2 ) − V −1 (σ 2 )X X V −1 (σ 2 )X X V −1 (σ 2 )

+ = QX QX V (σ 2 )QX QX ,

(4.3)

under the conditions of Theorem 3.2, we have P (σ 2 ) =

k+1 i=1

k+1

1 1 QX Pi QX = Pi . λi λ i=1 i

Combining with (3.7), the REML equations (4.2) can be reduced to k+1 i=1

k+1

aij aij tr(Pi ) = 2 y Pi y, λi λ i i=1

j = 1, 2, . . . , k + 1,

(4.4)

where aij and λi is the same as that deﬁned in the proof of Theorem 3.1. By the invertibility of A = (aij ) (whose collum vectors ai = (ai1 , ai2 , . . . , ai(k+1) ) i = 1, 2, . . . , k + 1 are linearly

Simultaneous Optimality of LSE and ANOVA Estimate in General Mixed Models

1649

independent), thus equations (4.4) is equivalent to equations (3.19), that is, the REML equations (4.4) have the explicit solution, which is equal to the ANOVA estimate σ ˆ 2 by Theorem 3.2. (Necessity) If the ANOVA estimate σ ˆ 2 and the explicit solution to the REML equations are equal, then the explicit solution to the REML equations (3.13) must exist. Thus QX V (σ 2 )QX exactly has k + 1 non-zero distinct eigenvalues which are linearly independent combinations of σ 2 , which has been proved in Theorem 3.1. The proof of Theorem 4.1 is completed. If there exists a prior value σ˙ 2 , then MINQU estimate is another important explicit estimate of σ 2 , which is the solution to the following equations k+1

tr(P (σ˙ 2 )Ui Ui P (σ˙ 2 )Uj Uj )σj2 = y P (σ˙ 2 )Ui Ui P (σ˙ 2 ) y,

i = 1, 2, . . . , k + 1,

(4.5)

j=1

where P (σ˙ 2 ) has the same form as (4.2), obtained by replacing σ 2 in P (σ 2 ) with its prior value σ˙ 2 , see [20]. Thus we have Theorem 4.2 The condition of Theorem 3.2 is also necessary and suﬃcient for equivalence of the ANOVA estimate σ ˆ 2 and the MINQU estimate. Proof Noting that the REML iterative equations (3.13) and MINQU estimate equations (4.5) have the same form as long as we replace the prior value λ˙ by the ith iterative value of λ, see [20], thus MINQU estimate being independent of the prior value σ˙ 2 is equivalent to REML equations (3.13) having the explicit solution. Therefore we need only to prove the suﬃciency of Theorem 4.2. Using the fact k+1

P (σ˙ 2 )Ui Ui P (σ˙ 2 )Uj Uj σj2 = P (σ˙ 2 )Ui Ui P (σ˙ 2 )V (σ 2 )

j=1

and (3.7) and (4.3), we can conclude that if the condition of Theorem 3.2 holds, Equation (4.5) is equivalent to k+1 aij λi tr(Pi ) k+1 aij y Pi y = , j = 1, 2, . . . , k + 1, (4.6) λ˙ 2i λ˙ 2i i=1 l=1 which can be simpliﬁed as (3.19) by making use of invertibility of A = (aij ) and each λ˙ i > 0. Thus the ANOVA estimate σ ˆ 2 and the MINQU estimate are equal. The proof of Theorem 4.2 is completed. For example, the ANOVA estimate, the solution to REML equations and MINQU estimate under model (3.21) are equal if and only if QX0 V (σ 2 )QX0 exactly has three non-zero distinct eigenvalues which are linearly independent combinations of (σu2 , σb2 , σe2 ). It can be proved the latter is equivalent to X X = aI, (a > 0) and 1p X = 0 in model (3.21). Acknowledgments We are grateful to the anonymous referees and the editor for their detailed suggestions which have considerably improved the quality of the paper. References [1] Baltagi, B. H.: Econometric Analysis of Panel Data, John Wiley, New York, 1995

1650

Wu M. X., et al.

[2] Davidian, M., Giltinan, D. M.: Nonlinear Models for Repeated Measurement Data, Chapman and Hall, London, 1996 [3] Diggle, P. J., Liang, K. E., Zeger, S. L.: Analysis of Longitudinal Data, Oxford Science, New York, 2000 [4] Searle, S. R., Casella, G., McCulloch, C. E.: Variance Components, Wiley, New York, 1992 [5] Bates, M. D., DebRoy, S.: Linear mixed models and penalized least squares. J. Multiv. Anal., 91, 1–17 (2004) [6] Kelly, R. J., Mathew, T.: Improved estimators of variance components with smaller probability of negativity. J. R. Statist. Soc., B, 55, 897–911 (1994) [7] Nelder, J. A.: The interpretation of negative variance components. Biometrika, 41, 544–548 (1954) [8] Smith, D. W., Murray, L. W.: An alternative to Eisenhart’s model II and mixed model in the case of negative variance estimates. J. Amer. Statist. Assoc., 79, 145–151 (1984) [9] Hazelton, M. L., Gurrin, L. C.: A note on genetic variance components in mixed models. Genetic Epidemiology, 24, 297–301 (2003) [10] Graybill, F. A., Hultquist, R. A.: Theorems concerning Eisenhart’s model II. Ann. Math. Statist., 32, 261–169 (1961) [11] Rao, C. R.: Least squares theory using an estimated dispersion matrix and its application to measurement of signals, Proc. Fifth Berkeley Symposium on Math Statist. & Pro., LeCam, L M., Neyman, J. ( eds.), Univ. California Press, Berkeley, 1, 355–372 (1967) [12] Seely, J.: Quadratic subspaces and completeness. Ann. Math. Statist, 42, 710–721 (1971) [13] Brown, K. G.: On analysis of variance in the mixed model. Ann. Statist, 12, 1488–1499 (1984) [14] Khuri, A. I., Mathew, T., Sinha, B. K.: Statistical Test for Mixed Linear Models, Wiley, New York, 1998 [15] Wu, M. X., Wang, S. G.: Simultaneously optimal estimation for ﬁxed eﬀects and variance components in the mixed model. Sci. China Ser. A, 47(5) 787–799 (2004) [16] Baksalary, J. K., Algebraic characterizations and statistical implications of the commutativity of orthogonal projectors. Proc. Second International Tampere Conference in Statistics, Tarmo Pukkila & Puntanen. S. (eds.), 113–142, 1987 [17] Arnold, S. F.: The Theory of Linear Models and Multivariate Analysis, John Wiley, New York, 1981 [18] Szatroski, T. H.: Necessary and suﬃcient conditions for explicit solutions in the multivariate normal estimation problem for patterned means and covariances. Ann. Statist, 8, 802–810 (1980) [19] Lehmann, E. L.: Testing Statistical Hypotheses, 2nd ed, John Wiley & Sons, New York, 1986 [20] Rao, C. R., Kleﬀe, J., Estimation of Variance Components and Applications, Elsevier Science Publishers B. V., New York, 1988

Acta Mathematica Academiae Paedagogicae NyÄ±regyhÃ¡ziensis 20 ...