A Stable Estimator for the Information Matrix under EM

Viewer
Transcript

A Stable Estimator for the Information Matrix under EM Jin-Chuan Duan and Andras Fulop∗ (First Draft: October 10, 2006) (This Draft: March 1, 2007)

Abstract This article develops a new and stable estimator for information matrix when the EM algorithm is used in maximum likelihood estimation. This estimator is constructed using the smoothed individual completedata scores that are readily available from running the EM algorithm. In comparison to the approach of Louis (1982), this new estimator is more stable and easier to implement. Both real and simulated data are used to demonstrate the use of this new estimator Keywords: Particle filter, EM, Information matrix, Kalman filter, AR, GARCH, Maximum likelihood.

∗

Duan is with Joseph L. Rotman School of Management, University of Toronto and Fulop is with ESSEC Paris and CREST. Duan acknowledges support received as the Manulife Chair in Financial Services and research funding from the Social Sciences and Humanities Research Council of Canada. Duan also thanks the Guanghua School of Management, Peking University and the Risk Management Institute, National Singapore University where this article was completed during his visits. E-mail: [email protected]. Fulop would like to acknowledge the hospitality of the National Bank of Hungary where parts of this research were completed. E-mail: [email protected].

1

1

Introduction

The EM algorithm of Dempster, et al (1977) is a well-established way of obtaining maximum likelihood (ML) estimates for models with missing data. However, the method has a drawback because it does not provide a natural estimator for the information matrix and thus there is no readily available covariance matrix for the ML estimates. In this article, we propose a simple and stable estimator for the information matrix under EM. This estimator works for either independent or dependent data sample. Two dynamic latent variable models are used to demonstrate the use of this estimator. Several approaches to estimating the information matrix under EM have been proposed in the literature. One way is through the empirical (incompletedata) Hessian. Louis (1982) shows how to express the incomplete-data Hessian in terms of the conditional expectations of the complete-data Hessian and scores, which are by design much easier to work with. Meng and Rubin (1991) use the trajectory of the EM-iterates to arrive at the incomplete-data Hessian. Jamshidian and Jennrich (2000) propose numerically differentiating the incomplete-data (sample) score vector. However, all of these approaches are susceptible to the fact that the empirical Hessian of a finite sample may be unstable and cannot even be guaranteed to stay as a negative definite matrix. Moreover in many applications of the EM-algorithm, simulation is used to perform the expectation step and simulation output may unavoidably become an irregular function of the parameters. In those cases, Louis’ (1982) method appears to be the only solution albeit its deficiency. In principle, Louis’ (1982) method always works because the conditional expectations of the complete-data score and Hessian are computed with the 2

conditioning parameters being fixed. Irregularities caused by the simulator thus presents no difficulty in computing numerical derivatives. But the same does not apply to the other techniques, where the parameters used in conditioning do change at some stages. Simulation thus creates discontinuity that cannot be avoided even with common random numbers. The resampling step required by a particle filter is such an example. Apart from being potentially unstable, Louis’ (1982) method can be cumbersome to apply sometimes. A well-known alternative proposed by Meilijson (1989) in the context of EM is to use the outer product of the individual incomplete-data scores instead of the incomplete-data Hessian. This alternative is known to be more stable and is guaranteed to be positive definite. The theoretical basis is that the individual incomplete-data scores equal the smoothed individual scores (expectations of the individual complete-data scores conditional on the observed data set). However, Meilijson’s (1989) method is limited to settings where data points are independent. This limitation arises from the fact that with data dependency, the smoothed individual scores are no longer martingale differences. In his discussion of limitations, Meilijson (1989) suggests to consider expressing the sample score as a martingale, but he also acknowledges its practical difficulty. It appears that Louis’ (1982) method until now remains to be most widely applicable albeit its shortcomings. Our proposal is to compute the smoothed individual scores, which are available directly from running the EM-algorithm. We then estimate the variance of the incomplete-data (sample) score via the approach of Newey and West (1987) for estimating the variance for a sum of dependent random vari-

3

ables. The variance of the sample score is then theoretically linked back to the expected value of the negative Hessian of the incomplete-data set via a martingale representation. This martingale representation is only used as a conceptual device without the actual need to compute its elements. In short, our method uses the the Newey-West estimator on the smoothed individual scores as the information matrix for the incomplete-data sample. Two examples are used to demonstrate the numerical performance of this new estimator via simulation. For a latent AR(1) model with i.i.d. Gaussian noises, both this new estimator and Louis’ (1982) estimator are found to performs equally well as compared to the two commonly used estimators directly computable with the Kalman filter and a standard prediction error decomposition of the incomplete-data log-likelihood. Although for this model, there is no need to use the EM algorithm, it allows us to benchmark the performance of this new estimator along with Louis’ (1982) estimator. The second case considered is a latent AR(1) model with GARCH noises for which the EM algorithm is clearly needed for maximum likelihood estimation. We show that Louis’ (1982) estimator is not computable for about 6% of the simulated samples whereas the new estimator is by design always computable. When Louis’ (1982) estimator is computable, the new estimator is found to yield coverage rates that are comparable to Louis’ (1982) estimator. We then apply the latent AR(1) model with GARCH noises on real stock price data to examine the performance of Louis’ method and this new estimator. We randomly sample 100 firms from the CRSP database. For each firm, the first 1000 daily sampled stock prices form the time series for the estimation. Our empirical results suggest that for 17% of firms, Louis’ method

4

cannot produce a valid estimate for the covariance matrix, whereas the new estimator always generate valid estimates. Our findings also point out that there are significant measurement errors present in stock prices (90 out of 100 firms) to warrant the state-space formulation.

2

A new estimator for information matrix

Denote the observed data set up to time i by Di = {sj ; j = 1, · · · , i}. The complete-data vector at i is denoted by zi , which is connected to the observed data by a many-to-one mapping, i.e. si = F (zi ). The complete-data up to time i is Fi = {zj ; j = 1, · · · , i}, and its log-likelihood based on the k-dimensional parameter θ can be written as ln g(zi ; i = 1, · · · , n | θ) =

n X

ln gi (zi | Fi−1 , θ)

(1)

i=1

In a similar fashion, the observed-data (or incomplete-data) log-likelihood can be written as ln f (si ; i = 1, · · · , n | θ) =

n X

ln f (si | Di−1 , θ)

(2)

i=1

Note that gi (zi | Fi−1 , θ) in the complete-data log-likelihood is typically simple by the model assumption. But f (si | Di−1 , θ) in the observed-data loglikelihood is likely to be very complex, which motivates the use of the EM algorithm.

5

Let the observed-data score, Sn (θ) and Hessian, Hn (θ) be n

∂ ln f (si ; i = 1, · · · , n | θ) X ∂ ln f (si | Di−1 , θ) Sn (θ) ≡ = ∂θ ∂θ i=1

(3)

n

Hn (θ) ≡

∂ 2 ln f (si ; i = 1, · · · , n | θ) X ∂ 2 ln f (si | Di−1 , θ) = . 2 ∂θ2 ∂θ i=1

(4)

Then, the standard argument implies that the maximum likelihood estimator, ˆ under some regularity conditions, converges to the true parameter value θ0 θ, and is asymptotically normally distributed with a covariance matrix −H(θ0 )−1 ; that is,

√

where H(θ0 ) = limn→∞

D

n(θˆ − θ0 ) ⇒ N (0, −H(θ0 )−1 )

Hn (θ0 ) n

= limn→∞

imated by its sample analogue,

Hn (θ0 ) , n

E(Hn (θ0 )) . n

(5)

Thus, H(θ0 ) can be approx-

but this estimator is often unstable in

a finite sample and sometimes is not even negatively definite due to numerical inaccuracy. A more stable alternative can be derived using the following result: V ar(Sn (θ0 )) = E [Sn (θ0 )Sn (θ0 )0 ] ¯ · µ ¶¸ n X n X ∂ ln f (si | Di−1 , θ0 ) ∂ ln f (sj | Dj−1 , θ0 ) 0 ¯¯ = E E ¯ Di−1 ∂θ ∂θ i=1 j=1 ¯ · µ ¶¸ n X ∂ ln f (si | Di−1 , θ0 ) ∂ ln f (si | Di−1 , θ0 ) 0 ¯¯ = E E ¯ Di−1 ∂θ ∂θ i=1 ¯ ¶¸ · µ 2 n X ∂ ln f (si | Di−1 , θ0 ) ¯¯ = E −E ¯ Di−1 ∂θ2 i=1 = −E(Hn (θ0 ))).

(6) 6

The first and second equalities follow from the definition of Sn (θ) and the law of iterated expectations. The third equality is due to the fact that Sn (θ) is a martingale with increments

∂ ln f (si |Di−1 ,θ0 ) . ∂θ

The fourth equality follows from

the conditional version of Fisher’s equality, and the last one is due to the definition of Hn (θ) and the law of iterated expectations. A popular and stable way to approximate H(θ0 ) is to use the third equality in (6); that is, E(Hn (θ0 )) n µ ¶ n X ∂ ln f (si | Di−1 , θ0 ) ∂ ln f (si | Di−1 , θ0 ) 0 1 = − E n i=1 ∂θ ∂θ

H(θ0 ) ≈

n

≈ −

1 X ∂ ln f (si | Di−1 , θ0 ) ∂ ln f (si | Di−1 , θ0 ) 0 . n i=1 ∂θ ∂θ

(7)

Such an approximation is typically more stable than a direct use of the sample Hessian, and is guaranteed to be negatively semidefinite. However, this approach requires evaluating the scores,

∂ ln f (si |Di−1 ,θ) . ∂θ

In the current context,

it is a challenging task because the conditional density function f (si | Di−1 , θ) may not have an explicit expression and needs to be approximated numerically. More seriously, the numerical approximation may not be differentiable with respect to θ. Our solution is to approximate V ar(Sn (θ0 )) using a different decomposition of Sn (θ0 )) that is naturally embedded in the EM algorithm. Specifically, we will use the smoothed individual scores, knowing fully well that they are not martingale differences. By the arguments of Dempster, et al (1977) and Louis (1982), we can decompose the observed-data score into the sum of smoothed

7

individual scores:

¯ ¶ X n ∂ ln g(zi ; i = 1, · · · , n | θ) ¯¯ Sn (θ) = E ai (θ), (8) ¯ Dn , θ = ∂θ i=1 ¯ ´ ³ ∂ ln gi (zi |Fi−1 ,θ) ¯ where ai (θ) = E ¯ Dn , θ . The smoothed individual scores ai (θ)’s ∂θ µ

can be computed as a by-product of the EM algorithm. Since the smoothed individual scores are not martingale differences, approximating V ar(Sn (θ)) must take into account the dependence among the lagged terms. Assume that beyond some lags, say l, dependence among a0i s becomes negligible. Following Newey and West (1987), we have V ar(Sn (θ0 )) ≈ Ω0 +

l X

w(l)(Ωj + Ω0j )

(9)

j=1

where Ωj =

n−j X

ˆ i+j (θ) ˆ 0 and ai (θ)a

i=1

j w(j) = 1 − . l

3 3.1

Examples A latent AR(1) model with i.i.d. Gaussian noises

Assume that the latent state variable, xi , follows an AR(1) process with i.i.d. Gaussian innovations, and we observe the latent process with Gaussian noises, denoted by si . So the state-space model becomes xi = A + Bxi−1 + σ²i

(10)

si = xi + δηi

(11)

8

where the standard normal disturbances ²i and ηi are independent of each other and across time. This state-space model can be estimated using the log-likelihood obtained with the standard Kalman filter and the prediction error decomposition. However, we opt for an EM approach not because of necessity; rather we intend to use this formulation to demonstrate the use of our proposed estimator for information matrix. We set up the complete-data space as zi = (si , ηi ). This choice reflects the fact that having the measurement error, ηi , in the complete data space instead of the typical xi , makes the EM algorithm converge faster for lower values of δ. The unknown parameter vector is θ = (A, B, σ, δ). The individual complete-data log-likelihood for this model is ln gi (zi | Fi−1 , θ) = −

[si − δηi − A − B(si−1 − δηi−1 )]2 ln σ 2 − . 2σ 2 2

(12)

One iteration of the EM algorithm, moving the parameter vector from θ(k−1) to θ(k) , takes the standard form as follows. • E-step: Compute the conditional expectation of the complete-data loglikelihood given the observed data and being evaluated at θ(k−1) ; that is,

¡ ¢ Q(θ | θ(k−1) ) = E ln g(zi , i = 1, · · · , n| θ) | Dn , θ(k−1)

This Q-function can be computed with a forward and then a backward pass of the Kalman filter (see Shumway and Stoffer (1982)). • Generalized M-step: Find an updated parameter value, θ(k) , that ensures an increase in the objective function; that is, Q(θ(k) | θ(k−1) ) > Q(θ(k−1) | θ(k−1) ). 9

We avoid a full maximization to speed up iterations. Specifically, we first perform optimization over δ and then over the remaining parameters. Both E- and the generalized M-steps can be executed using closed-form solutions. It is important to note that the present model’s observed-data log-likelihood can also be computed with one forward pass of the Kalman filter using the prediction error decomposition. Thus, one can compute the asymptotic covariance matrix using either the standard martingale difference representation in (7) or the observed-data Hessian. The former provides a useful benchmark to check our proposed estimator whereas the latter can be used to benchmark Louis’ (1982) estimator. Louis’ (1982) estimator relies on the decomposition: ¯ µ 2 ¶ ∂ ln g(zi ; i = 1, · · · , n | θ) ¯¯ Hn (θ) = E ¯ Dn , θ ∂2θ ¯ µ ¶ ∂ ln g(zi ; i = 1, · · · , n | θ) ¯¯ +V ar ¯ Dn , θ ∂θ

(13)

The first element of the right-hand side can be computed with finite differencing. For the second term, however, we need to assume that dependence among the individual scores dies away after some lags, say K. That is, ¯ µ ¶ ∂ ln gi (zi | Fi−1 , θ) ∂ ln gi+j (αi+j | Fi+j−1 , θ) 0 ¯¯ , Cov ¯ Dn , θ ≈ 0, for j > K. ∂θ ∂θ Then, we can compute the variance of the sum by summing the auto-covariances up to K. Table 1 presents the results based on simulated data. Basically, all methods show a very good performance as indicated in the empirical coverage rates. For this example, one should note that there is dependency among smoothed 10

individual scores. Our proposed estimator thus requires a Newey-West correction up to some lags l. Table 1 presents the results for l = 0, 5 and 10. Not surprisingly, the coverage rates using l = 0 are indeed off. Once we set l = 5, the performance is basically the same as the estimator based on the martingale difference representation.

3.2

A latent AR(1) model with GARCH noises

The example in this section extends the i.i.d. Gaussian noises of the previous example to the GARCH(1,1) noises. This choice is motivated by the fact that financial data are typically characterized by volatility clustering and the GARCH model is found to account for this feature well. The state-space model thus becomes xi = A + Bxi−1 + σi ²i

(14)

2 2 σi2 = a0 + a1 σi−1 + b1 σi−1 ²2i−1

(15)

si = xi + δηi

(16)

where the standard normal disturbances ²i and ηi are independent of each other and across time. The complete-data space is the same as in the previous example; that is, zi = (si , ηi ). The unknown parameter vector is θ = (A, B, a0 , a1 , b1 , δ). The individual complete-data log-likelihood thus becomes ln gi (zi | Fi−1 , θ) = −

(si − δηi − A − B(si−1 − δηi−1 ))2 ln σi2 − 2σi2 2

(17)

2 + b1 (si − δηi − A − B(si−1 − δηi−1 ))2 . The Q-function where σi2 = a0 + a1 σi−1

11

is defined as before: ¡ ¢ Q(θ | θ(k−1) ) = E ln g(zi , i = 1, · · · , n | θ) | Dn , θ(k−1) n X = qi (θ | θ(k−1) ) i

¡ ¢ where qi (θ | θ(k−1) ) = E ln gi (zi | Fi−1 , θ) | Dn , θ(k−1) . Note that the GARCH error structure prevents us from having a closedform solution for qi (θ | θ(k−1) ). Nevertheless, we can use a particle filter to approximate this quantity. Particle filtering sequentially generates sets of M (m)

particles, {zi

, m = 1, · · · , M } for i = 1, 2, · · · , to represent the filtering

distribution of the latent state variable(s) at different i’s, i.e., f (xi | Di ). Then, the Bayes rule is used to determine proper weights for different particles and resampling is employed to ensure that the system has a good stochastic behavior by ridding of particles with low weights. For a general review of particle filtering, we refer readers to Doucet, et al (2001). In the appendix, we have designed an efficient particle filter specifically for the latent AR(1)GARCH(1,1) model. Also by saving the particle paths, one has an empirical representation of the smoothing distributions f (zi | Di+L ) for any L > 0. Denote the parti(m)

cles representing the smoothing distribution as zi|i+L . Then, the conditional expectation of any function h(·) can be approximated as E (h(zi ) | Di+L ) ≈

M 1 X (m) z M m=1 i|i+L

Following Cappe and Moulines (2005), it is actually not advisable to condition on the entire data set, i.e., Dn , in computing the smoothed estimate. A 12

fixed-lag smoothing, say L = 10, will drastically decrease the Monte-Carlo error and yet introduce only minor bias. Thus we use the following particle approximation of the i-th element of the Q-function ¡ ¢ qi (θ | θ(k−1) ) ≈ E ln gi (zi | Fi−1 , θ) | Di+L∧n , θ(k−1) M 1 X (m) (m) ≈ ln g(zi|i+L | Fi−1|(i+L∧n) , θ) = qˆi (θ | θ(k−1) ) M m=1 (m)

where Fi−1|i+L∧n denotes the particle paths for m = 1, · · · , M , based on information up to i+L∧n. Note that the particles are obtained by running the filter at θ(k−1) . Also note that qˆi (θ | θ(k−1) ) is a smooth function of θ, but it is not continuous in θ(k−1) due to the resampling step in the particle filter. The benefit of using the EM algorithm is that the M-step is performed with respect to θ while fixing θ(k−1) so that the irregularity introduced by resampling becomes inconsequential. In fact, it is this irregularity that makes direct maximization of the observed-data log-likelihood problematic. In short, one cannot obtain the information matrix for this latent AR(1)-GARCH model using either the martingale difference representation or the observed-data Hessian. One iteration of the generalized EM algorithm, moving the parameter vector from θ(k−1) to θ(k) , again takes the standard form. • E-step: Compute the Q-function by running the particle filter at θ(k−1) ; that is, ˆ | θ(k−1) ) = Q(θ

n X

qˆi (θ | θ(k−1) )

i=1

• Generalized M-step: Find an updated parameter value, θ(k) , that

13

ensures an increase in the objective function; that is, ˆ (k) | θ(k−1) ) > Q(θ ˆ (k−1) | θ(k−1) ) Q(θ In updating, we again separate δ from the remaining parameters. Table 2 presents the results from a simulation study. We use 200 particles in running the filter to obtain the parameter estimates. To compute the information matrix, 5000 particles are used in the filter. Our fixed-lag smoothing is conducted using L = 10. Altogether, we have simulated 500 data samples. It turns out that Louis’ (1982) estimator cannot be computed in 28 cases because the information matrix evaluated at the maximum likelihood estimate is not positively definite. This failure rate of about 6% indicates that Louis’ (1982) method may be unstable when numerical approximation is used. In contrast, there is no failure case with our proposed estimator. This is hardly surprising because our estimator is by design positively semi-definite, and practically speaking it will always be positively definite. The results reported in Table 2 are based on the 472 samples for which Louis’ estimate can be computed. It is evident from these results that our proposed estimator produces results that are comparable to Louis’ (1982) estimator when the failure cases are discarded.

4

Empirical analysis of the stock prices of 100 randomly chosen firms

We apply the proposed estimator to real data using the latent AR(1)-GARCH(1,1) model as in equations (14)-(16). The AR(1)-GARCH(1,1) model without measurement errors is a commonly adopted specification to model logarithmic 14

stock prices, where the value of B is preset 1. Note that with B = 1, the first-order difference in the logarithmic stock prices becomes the continuously compounded return (excluding dividends). In our estimation, we choose to let B be a free parameter so that the empirical results will indicate to us whether the typical specification in this regard is a sensible one. A more fundamental departure from the standard specification is our allowance for measurement errors in stock prices. The stock market is undoubtedly subject to microstructure noises arising from liquidity/information motivated trades. Microstructure noises can also come in via the bid-ask spread and the minimum tick size. It is inconceivable that one can observe the intrinsic stock value without a measurement error. Thus, the latent AR(1)-GARCH(1,1) model is a more appealing specification for the observed stock prices. We sample randomly 100 firms from the US stock population as represented by the CRSP database. For each sampled firm, we record 1000 daily stock prices starting from the first trading day of 2000. To conserve space, our findings are reported in a summary form in Table 3. There are several results worth noting. First, pre-setting B = 1 in the standard specification is warranted because the estimates are all extremely close to 1. This finding is not at all surprising because stock prices are expected to behave like a random walk with a drift. Second, measurement errors are indeed present in stock prices because 90% of the firms have a significant measurement error based on the 5% Wald test corrected for the fact that the null hypothesis of no measurement error corresponds to a boundary situation. Third, firms do exhibit the GARCH effect as reflected in the estimates for a1 and b1 .

15

Finally, Louis’s method when applied to real data seems to be even less stable. Earlier in the simulation analysis, we found a failure rate of about 6%. Out of the 100 randomly sampled firms, there are 17 firms whose estimate for the covariance matrix is not positively definite. In contrast and as expected, our proposed estimator yields no failure case for all 100 firms.

5

Conclusion

In this article, we have developed a new and stable estimator for the information matrix under the EM algorithm, which is often needed in maximum likelihood analysis of models that involve latent variables or censoring. This estimator is constructed by recognizing that the variance of the observed-data sample score can be approximated with the smoothed individual completedata scores, which are readily available from running the EM algorithm. In addition, using a martingale-difference representation for the observed-data sample score allows us to link its variance to the expected value of the negative of the observed-data Hessian, and thus justifies this new estimator. Two dynamic latent variable models are used to demonstrate the numerical properties of this new estimator. Our findings indicate that this new estimator is equally competitive with other estimators when the model does not need the EM algorithm to run maximum likelihood estimation. When the EM algorithm is truly needed, this new estimator is shown to be more stable than Louis’ (1982) estimator using both simulated and real data.

16

Appendix:

An efficient particle filter for the latent AR(1)-

GARCH(1,1) model The transition system is clearly Markovian in (σi , xi−1 ). Thus, we only need to keep track of these quantities by particles. An efficient importance sampler can be constructed by recognizing that (xi , si ) is, conditional on (σi , xi−1 ), bivariate normal. Let φ(·; a, b) denote a Gaussian density function with mean a and standard deviation b. The conditional density function for xi thus equals ³ ´ p f (xi | σi , xi−1 , si ) = φ xi ; µi + Ci (si − µ), σi 1 − Ci (18) where µi = A + Bxi−1 and Ci = σi2 / (σi2 + δ 2 ). The specific particle filter based on the above conditional density as the importance sampler is summarized below. Step 1: Initialize M particles (m = 1, · · · , M ) by setting the initial variance to the stationary standard deviation and ignoring the measurement q (m) (m) error; that is, x0 = s0 and σ1 = 1−aa10−b1 . (m)

Step 2: To advance the system from i − 1 to i, generate particles xˆi

(m = 1, · · · , M ) using the conditional density in equation (18) with ³ ´ (m) (m) (m) (m) (m) µi = A + Bxi−1 and Ci = (σi )2 / (σi )2 + δ 2 . Then use equa(m)

(m) 2

tion (15) to compute (ˆ σi+1 )2 = a0 +a1 (σi (m)

(m)

) +b1 (ˆ xi

(m)

−µi )2 . Together,

(m)

we have generated (ˆ xi , σ ˆi+1 ) for m = 1, · · · , M . (m)

(m)

(m)

Step 3: Attach importance weight, wi , to (ˆ xi , σ ˆi+1 ) where ´ ³ ´ ³ (m) (m) (m) (m) φ si ; xˆi , δ φ xˆi ; µi , σi (m) ¶. wi = µ q (m) (m) (m) (m) (m) (m) φ xˆi ; µi + Ci (si − µi ), σi 1 − Ci 17

(m)

(m)

Step 4: Resample using the particle set (ˆ xi , σ ˆi+1 ) (m = 1, · · · , M ) (m)

with the normalized weight πi (m)

=

(m)

wi PM

m=1

(m)

sample (xi , σi+1 ) (m = 1, · · · , M ). Repeat Steps 2-4 until reaching i = n.

18

(m)

wi

to obtain an equal-weight

References [1] Cappe, O., and E. Moulines, (2005), On the use of particle filtering for maximum likelihood parameter estimation, in European Signal Processing Conference (EUSIPCO), Antalya, Turkey. [2] Dempster, A. P., N. M. Laird, and D. B. Rubin, (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the American Statistical Association, Series B 39, 1–38. [3] Doucet, A., de Freitas, N. and Gordon, N., eds ( 2001), Sequential Monte Carlo Methods in practice, Springer-Verlag. [4] Jamshidian, M. , and Jennrich, R. I., (2000), Standard errors for EM estimation, Journal of the Royal Statistical Society, Series B 62, 257-270. [5] Louis, T. A., (1982), Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society, Series B 44, 226–233. [6] Meilijson, I., 1989, A fast improvement to the EM algorithm on its own terms, Journal of the Royal Statistical Society, Series B 51, 127-138. [7] Meng, X. L., and Rubin, D. B., 1991, Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm, Journal of the American Statistical Association 86, 899-909. [8] Newey, W. K., and K. D. West, 1987, A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix, Econometrica 55, 703–708. 19

[9] Shumway, R. H., and Stoffer, D. S., 1982, An approach to time series smoothing and forecasting using the EM algorithm, Journal of Time Series Analysis 3, 253–264.

20

Table 1: Simulation results for the latent AR(1) model with i.i.d. Gaussian noises Nominal Rates 50 % 75 % 90 % 50 % 75 % 90 % 50 % 75 % 90 % 50 % 75 % 90 % 50 % 75 % 90 % 50 % 75 % 90 %

A B σ Martingale Differences 0.498 0.496 0.5 0.746 0.734 0.738 0.928 0.926 0.902 Smoothed scores, 0 lags 0.486 0.492 0.592 0.74 0.746 0.846 0.91 0.912 0.962 Smoothed scores, 5 lags 0.49 0.498 0.518 0.758 0.758 0.748 0.924 0.926 0.902 Smoothed scores, 10 lags 0.506 0.51 0.518 0.776 0.782 0.75 0.926 0.926 0.912 Hessian 0.488 0.49 0.496 0.744 0.742 0.736 0.928 0.926 0.898 Louis’ method 0.488 0.49 0.498 0.744 0.744 0.736 0.93 0.928 0.9

δ 0.49 0.75 0.924 0.568 0.862 0.976 0.504 0.776 0.936 0.5 0.78 0.934 0.482 0.754 0.926 0.482 0.754 0.926

This table shows the empirical coverage rates of symmetric confidence intervals using different estimates for the asymptotic covariance matrix. For Louis’ method we have assumed that the conditional covariances of scores die out after K = 5. The parameter values are A = 0.1, B = 0.8, σ = 0.02, and δ = 0.01. The results are based on 500 simulated samples with each using T = 1000. 21

Table 2: Simulation results for the latent AR(1) model with GARCH noises Nominal Rates 50 % 75 % 90 % Failure rate 50 % 75 % 90 % Failure rate 50 % 75 % 90 % Failure rate 50 % 75 % 90 % Failure rate

A

B a0 a1 Louis’ method 0.475 0.566 0.549 0.693 0.833 0.797 0.881 0.917 0.915

0.472 0.695 0.879 28/500 Smoothed scores, 0 lags 0.487 0.481 0.65 0.564 0.703 0.695 0.854 0.805 0.89 0.89 0.932 0.932 0/500 Smoothed scores, 5 lags 0.475 0.487 0.619 0.551 0.706 0.699 0.828 0.807 0.89 0.886 0.915 0.93 0/500 Smoothed scores, 10 lags 0.496 0.494 0.608 0.555 0.714 0.712 0.826 0.811 0.898 0.898 0.915 0.93 0/500

b1

δ

0.536 0.78 0.903

0.443 0.693 0.871

0.566 0.521 0.807 0.79 0.932 0.953

0.557 0.805 0.93

0.458 0.714 0.898

0.549 0.466 0.801 0.725 0.936 0.909

This table shows the empirical coverage rates of symmetric confidence intervals using different estimates of the asymptotic covariance matrix. The statistics for Louis’ method are computed with the 28 failed cases out of 500 samples being excluded. For the new estimator, we have also excluded those 28 cases. For Louis’ method, we have assumed that the conditional covariances of scores die out after K = 5. The parameter values are A = 0.1, B = 0.8, a0 = 4 × 10−5 , a1 = 0.8, b1 = 0.1, and δ = 0.01. The results are based on 500 simulated samples with each using T = 1000.

22

Table 3: Summary results for the latent AR(1)-GARCH(1,1) model based on 100 randomly chosen firms on the CRSP database (1000 daily stock prices starting from the first trading day of year 2000) Summary statistics Median 10 Percentile 90 Percentile Min Max Significant δ (5%) Failure rates Louis’ method New method

A B a0 × 10−5 0.0048 0.9980 7.03 -0.0044 0.9928 0.3 0.0244 1.0014 36 -0.0133 0.9838 0.0436 0.0662 1.0044 102 90/100

a1 0.5491 0.0121 0.8888 0 0.9626

b1 0.4255 0.1037 0.9851 0.0338 1.0000

δ 0.0102 0.0025 0.0235 0.0000 0.0385

17/100 0/100

A case is considered failure if its covariance matrix estimate is not positively definite. The test for significance of δ is a Wald test corrected for the null being at the boundary value. For Louis’ method, we have assumed that the conditional covariances of scores die out after K = 5.

23

Stable Coalition$Governments under Weighted ...