Infinite-State Markov-switching for Dynamic Volatility

Viewer
Transcript

OXFORD UNIVERSITY PRESS LTD JOURNAL 00 (0000), 1–?? doi:10.1093/OUP Journal/XXX000

Infinite-State Markov-switching for Dynamic Volatility Arnaud Dufaysab a Universit´ e b Ecole

catholique de Louvain, CORE, B-1348 Louvain-la-Neuve, Belgium Nationale de la Statistique et de l’Administration Economique, CREST, France

Abstract: GARCH model with changing parameters is specified using the sticky infinite hidden Markov-chain framework. Estimation by Bayesian inference determines the adequate number of regimes as well as the optimal specification (Markov-switching or change-point). The new provided algorithms are studied in terms of mixing properties and computational time. Applications highlight the flexibility of the model and compare it to existing parametric alternatives. KEY WORDS JEL Classification

Bayesian inference, Markov-switching, GARCH, Infinite hidden Markov model, Beam sampler, Dirichlet process C11, C15, C22, C58.

Received

∗ Correspondence to: Email: [email protected] or [email protected]. Phone : +32 (0) 10 47 43 36, Website: https://sites.google.com/site/websiteofarnauddufays/.

c 0000 Oxford University Press Copyright Prepared using oupau.cls [Version: 2007/02/05 v1.00]

2

A. DUFAYS

Introduction Univariate GARCH (generalized auto-regressive conditional heteroskedastic) models are widely used to forecast volatilities. However GARCH models with fixed parameters are restrictive since financial time series are prone to exhibit breaks, especially over long periods. From time to time, the level of volatilities increases sharply, due to financial tensions, and decreases when they end. Ignoring these breaks by assuming constant parameters in econometric models typically leads to forecasts that are far from realizations (see for example Stock and Watson (1996)) and often gives the spurious impression of a nearly integrated property of the time series (see, e.g., Diebold (1986), Lamoureux and Lastrapes (1990), and Hilebrand (2005)). An important econometric challenge is therefore to detect a structural break as soon as possible. An interesting way to introduce breaks in GARCH models is enriching them with a dynamic discrete latent state Markov process, in such a way that the parameters can abruptly switch from one value to another. These models are called Markov-switching (MS) GARCH models when the Markov chain is recurrent (see e.g. Francq and Zakoian (2008) and Bauwens, Preminger, and Rombouts (2010)) and change-point (CP) GARCH models ( see e.g. He and Maheu (2010) and Bauwens, Dufays, and De Backer (2011)) when the states are not recurrent. Estimation of these models by the method of maximum likelihood is numerically infeasible, due to the path dependence problem. This occurs because the conditional variance at time t depends on the entire sequence of regimes visited up to time t. Bayesian estimation by an MCMC algorithm is practicable, by embedding the vector of states in the parameter space, and therefore simulating them, as proposed by Bauwens, Preminger, and Rombouts (2010) and Bauwens, Dufays, and Rombouts (2013). Choosing the number of regimes in an MS or CP model can be done, in Bayesian inference, by maximizing the marginal likelihood with respect to the number of regimes. Doing this for MS- and CP-GARCH models requires the estimation of the model for a given number of regimes, followed by the marginal likelihood computation itself (see Bauwens, Dufays, and Rombouts (2013)). This procedure is repeated several times, up to a maximum number of regimes, which makes the search for the best model very time-consuming. The sticky infinite hidden Markov chain model (IHMM), proposed by Fox, Sudderth, Jordan, and Willsky (2011)), allows us to bypass these repeated computations by treating the number of regimes as an unknown parameter. It relies on a Markovchain with a potentially infinite number of states (regimes) and is suited for series exhibiting persistence. The structure encompasses the MS and the CP specifications as special cases. Its building blocks are the Dirichlet process (Ferguson (1973) and Sethuraman (1994)) and the hierarchical Dirichlet process (Teh, Jordan, Beal, and Blei (2006)). The sticky IHMM has already been applied in fields such as genetics (Beal and Krishnamurthy (2006)), visual recognition (Kivinen, Sudderth, and Jordan (2007)), and economics (Jochmann (2013) and Song (2013)) for models without path dependence. Our main contribution is to develop Bayesian inference for sticky infinite hidden MS-GARCH, thus for models with path dependence. More precisely, our contribution is twofold. Although one can apply the forward-backward algorithm (Rabiner (1989), Hamilton (1989) and Chib (1996)) within a Gibbs sampler for inferring on a model with structural breaks,1 this does not apply to models subject to path dependence. The issue is usually circumvented either by applying the model of Haas, Mittnik, and Paolella (2004), who suggest switching from one stationary GARCH process to another stationary one instead of moving from one state (explosive or not) to another or by estimating the model of Klaassen (2002), who gets rid of the path dependence problem by approximating the conditional variance with a mixture. We generalize these two models by embedding a Metropolis-Hastings (MH) algorithm that allows us to estimate the MS-GARCH model. We additionally illustrate that these new algorithms outperform the existing MCMC alternatives in term of computational time, while the mixing properties remain competitive with respect to the Particle MCMC algorithm (PMCMC) of Bauwens, Dufays, and Rombouts (2013). Our method renders inference on MS- and CP-GARCH models almost as fast as inference on MS- and CP-ARCH models without adding much complexity to the coding structure. Moreover, even if our algorithms can be used to compute the marginal likelihood using Chib’s formula (see Chib and Jeliazkov (2001)) or the bridge sampling (see Meng and Wong (1996)), so that we can maximize this criterion in order to select a specific number of regimes, we avoid such a time-consuming approach. This leads to our second contribution: we use an MCMC algorithm to determine directly the number of regimes as well as the specification (MS or CP), by incorporating the sticky IHMM into our MS-GARCH model. Currently this approach is limited to models without path dependence (Jochmann (2013) and Song (2013)). Thus we extend the scope of the IHMM modeling framework to richer models, such as GARCH (instead of ARCH). While our proposed model is very interesting for revealing different volatility patterns, its predictive ability is, in addition, investigated on four different financial time series. The proposed process turns out to drastically outclass the GARCH as well as the Spline-GARCH models (see Engle and Rangel (2008)) in terms of marginal likelihoods. Furthermore, predictive likelihoods also provide evidence (although less categorical) in favor of our specification.

IHMM FOR DYNAMIC VOLATILITY

3

In section 1 and 2, we present the infinite hidden Markov-switching GARCH model (IHMM-GARCH), its Bayesian estimation and the computation of the marginal likelihood for model comparison purposes. In Section 3, we discuss the two new algorithms in the light of existing MCMC alternatives and illustrate them on simulated data. We highlight that the IHMM-GARCH model accurately estimates CP- and MS-GARCH models. Section 4 provides detailed results on the S&P500 index daily series and we compare our results with those of Bauwens, Dufays, and Rombouts (2013) for the MS-GARCH model. A comparison exercise with other volatility models on several financial time series is also reported in this Section. Conclusions ends the paper.

1

Infinite-States Markov Switching Models

In this Section, we develop a Markov-switching framework with an infinite number of regimes for the univariate GARCH model. To motivate our setup, we first present former regime-switching models in order of increasing complexity. Our model, presented in Subsection 1.1, follows the discussion by just falling under the continuity of these processes. Let y1:T = {y1 , ..., yT }′ be a time series where T denotes the sample size and s1:T = {s1 , ..., sT } be a random variable taking positive integer values. A regime-switching GARCH(1,1) model is defined as, for t > 1, yt

=

σt (s1:t )ǫt

σt2 (s1:t )

=

2 2 (s1:t−1 ), + βst σt−1 ωst + αst yt−1

with

ǫt ∼ i.i.d. N (0, 1),

(1) (2)

where σt2 (s1:t ) emphasizes that the variance at time t depends on s1:t . The process exhibits a GARCH dynamic but with time-varying parameters controlled by the random variable st which indicates the parameters that are active at time t. When the latent variable st , that refers to a state or a regime, changes from one value to another, the GARCH parameters abruptly switch and the variance dynamic is affected. The model complexity mostly depends on the underlying process of the state variable. One may adopt an independent state transition (e.g. Wong and Li (2000)) or an exogenous process related to some observed time series (e.g. Koop, Pesaran, and Potter (1996)). Richer dynamics assume an endogenous dependent process based on Markov chains (e.g. Chib (1996)) or semi-Markov chains (e.g. Bulla and Bulla (2006)). In this paper, the state vector is Markov and the regimes can be recurrent. For notational sake, we define the vectors θi = {ωi , αi , βi } ∀i ∈ N. A Markov-switching (MS) GARCH model with a finite number K of regimes is specified by equations (1),(2) along with a Markov-chain driving s1:T . The latter process is characterized by an initial distribution over the K states and a full transition probability matrix given by

PM S

=

p1,1  p2,1 



pK,1

p1,2 p2,2 ... pK,2

... ... ...

 p1,K p2,K  ,

(3)

pK,K

PK where pi,j denotes the probability of moving from state i to state j and j=1 pi,j = 1 ∀i ∈ [1, K]. The MSGARCH model with K states reduces to a mixture model with finite components exhibiting time-varying probabilities since the predictive density yields f (yt |y1:t−1 , s1:t−1 , PM S , θ1 , ..., θK )

=

K X

pst−1 ,j fN (yt |0, σt2 (s1:t−1 , st = j)),

(4)

j=1

where fN (.|a, b) denotes a Normal density with mean a and variance b. This specification is quite challenging to estimate due to the lagged variance appearing in the GARCH dynamic. The main issue, known as the path dependence problem, arises because the likelihood function at time t depends on the entire path s1:t through the (unobserved) lagged volatility rendering unfeasible the filtering of the state vector by the forward-backward algorithm (see Rabiner (1989)). Focusing on MCMC methods, alternative inference have been proposed by Bauwens, Preminger, and Rombouts (2010) and by Bauwens, Dufays, and Rombouts (2013). In Section 2, we provide a third solution to infer such a model and we empirically show that it outperforms these two ones. The MS-GARCH model with finite states has pitfalls, among them selecting K. The standard approach (e.g. Chib (1996), Pesaran, Pettenuzzo, and Timmermann (2006)) consists in estimating the model for several numbers of regimes and then determining the optimal number according to a chosen criterion. As the estimations last several hours for daily financial time series, the computational resources is demanding. In addition, since the number of regimes is fixed, the predictions do not take into account the uncertainty on K. Bayesian Model

4

A. DUFAYS

Averaging (e.g. Raftery, Madigan, and Hoeting (1998)) could solve the problem but it requires the computation of the marginal likelihood which is difficult to estimate in MS-GARCH frameworks (see Bauwens, Dufays, and Rombouts (2013)). Finally, it may be difficult to set the upper bound of the number of regimes when no prior information exists on the studied time series. For these non-exhaustive reasons, we want to elaborate an MS-GARCH process allowing for an (potentially) infinite number of regimes. One interesting alternative to fixing `a priori the number of regimes is based on Dirichlet processes. Instead of equation (2), the GARCH model with infinite states, that is our object of interest, is defined as σt2

=

2 2 ωt + αt yt−1 + βt σt−1 .

(5)

The model tractability depends on the prior distribution of the time-varying parameters φt = {ωt , αt , βt } that should somewhat limit the possible dynamics. To mimic the abrupt switches of the MS-GARCH parameters, the time-varying parameters are driven by a Dirichlet process as follows φt G0 |η, H

∼ ∼

G0 , DP (η, H),

(6) (7)

where DP (η, H) denotes the Dirichlet process fully characterized by the (scalar) concentration parameter η and the base distribution H assumed to be absolutely continuous in our specification. The DP is a discrete distribution whose expectation and variance are given by H and H(1 − H)/(η + 1) respectively and exhibits the interesting clustering property that makes inference tractable. Considering that K ≤ t − 1 stands for the number of existing regimes at time t − 1, we denote by θi with i ∈ [1, K] the parameter of regime i and by ni the number of observations belonging to regime i until time t − 1. Then, using the p´olya urn scheme to integrate the DP G0 (see Blackwell and MacQueen (1973)), the GARCH parameters at time t conditionally to the previous parameters are distributed as φt |θ1 , ..., θK

∼

K X i=1

η ni δθi + H, η+t−1 η+t−1

(8)

in which δθi is the probability measure concentrated at θi . The p´olya urn reveals the discreteness as well as the η while it is clustering property of the DP. The probability that φt takes a new value (from H) is given by η+t−1 assigned to a previous observed one otherwise. Note that the likelihood of having new values, which stand for new regimes in the finite state model above, decreases over time and highly depends on η. As the number of observations grow, the expected number of regimes approximately raises according to η log(t). The DP-GARCH model (equations (5)-(7)) boils down to a mixture with an infinite number of components. To highlight this assertion, Sethuraman (1994) shows that any DP has a stick-breaking representation, which relies ∞ on two independent sequences of i.i.d random variables {πi }∞ i=1 and {θi }i=1 , given by βi ∼ Beta(1, η), πi = β i

i−1 Y

θi ∼ H,

(1 − βl ),

G0 =

∞ X

πi δ θ i ,

i=1

l=1

∞ then P∞ it ensures that G0 ∼ DP (η, H). The sequence {π}i=1 , conveniently written π ∼ Stick(η), satisfies i=1 πi = 1 with probability one and therefore defines a distribution over the positive integer. To link the DP-GARCH model with the MS-GARCH with finite states, instead of sampling the variable s1:T from the transition matrix (3), it is now drawn from π. Under this form, looking at the predictive density exposes the mixture nature of the model since ∞ f (yt |y1:t−1 , s1:t−1 , {θi }∞ i=1 , {π}i=1 )

=

∞ X

πj fN (yt |0, σt2 (s1:t−1 , st = j)).

(9)

j=1

Recently, this framework has been used in multivariate GARCH and stochastic volatility models (e.g. Jensen and Maheu (2010), Jensen and Maheu (2013) and Jensen and Maheu (2014)). While the DP prior allows for an infinite number of clusters, the predictive density (9) makes explicit that the probabilities do not depend on time anymore, implying that the underlying process of s1:T is i.i.d. Although convenient, this assumption is not suitable for time series.

IHMM FOR DYNAMIC VOLATILITY

5

The hierarchical Dirichlet process (HDP) of Teh, Jordan, Beal, and Blei (2006) extends the infinite mixture model to the infinite hidden Markov model (IHMM) by rendering the probabilities depending on a Markovchain. The HDP-GARCH model is an MS-GARCH with undetermined number of regimes and, coupled with equations (1) and (5), is defined as φt |φt−1

∼

Gφt−1 ,

(10)

Gφt−1 |λ, G0

∼

DP (λ, G0 ),

(11)

G0 |η, H

∼

DP (η, H).

(12)

Rather than depending on one DP as in the infinite mixture model, the GARCH parameters are driven by multiple DPs connected to each others by the same base distribution G0 . As this distribution is discrete (since it is a DP) and takes values from H, it ensures that each Gφi ∀i ∈ [1, T − 1] shares the same discrete support; a feature that is essential to build the transition dynamic between the different DPs. Using the Sethuraman’s form of the HDP (demonstrated in Teh, Jordan, Beal, and Blei (2006)), the predictive density of the HDP-GARCH model yields ∞ f (yt |y1:t−1 , s1:t−1 , {θi }∞ i=1 , {pst−1 ,j }j=1 )

∞ X

=

pst−1 ,j fN (yt |0, σt2 (s1:t−1 , st = j)),

(13)

j=1

P∞ where θj ∼ G0 ,G0 = j=1 πj δθj , π ∼ Stick(η) and {pst−1 ,j }∞ j=1 |λ, π ∼ DP (λ, π). The predictive density (13) is a mixture with infinite components that exhibits time-varying probabilities. The HDP-GARCH model, although more flexible than the DP-GARCH one, is only adapted for series without persistent regimes. Indeed, the transition matrix {pi,j }∞ j=1 ∀i ∈ N does not distinguish a self-transition to a transition to another state. Regardless the previous state i, the expected probability of moving in state j is given by E(pi,j |π) = πj and therefore, the HDP dynamic encourages rapid switches between (sometimes redundant) regimes which is not realistic for modeling time series. Fox, Sudderth, Jordan, and Willsky (2011) solve the problem by introducing a parameter κ that controls the persistence in the HDP. Their structure, called sticky HDP, constitutes the keystone of our model.

1.1

The Model

Relying on the sticky HDP framework, we propose the infinite hidden Markov switching GARCH model (IHMMGARCH) that is specified as yt

=

σ t ǫt

with

σt2

=

ωt +

φt ≡ {ωt , αt , βt }|φt−1

∼

Gφt−1 ,

Gφt−1 |λ, κ, G0

∼

G0 |η, H

∼

2 αt yt−1

ǫt ∼ i.i.d. N (0, 1), 2 + βt σt−1 ,

(14)

λG0 + κδφt−1 DP (λ + κ, ), λ+κ DP (η, H).

(15) (16)

This hierarchical structure exhibits similar properties as the HDP setup. To begin with, using the Sethuraman’s form of the sticky HDP (see Fox, Sudderth, Jordan, and Willsky (2011)), the predictive density of the IHMMGARCH model is given by ∞ f (yt |y1:t−1 , s1:t−1 , {θi }∞ i=1 , {pst−1 ,j }j=1 )

=

∞ X

pst−1 ,j fN (yt |0, σt2 (s1:t−1 , st = j)),

(17)

j=1

P∞ λπ+κδi where θj ∼ G0 ,G0 = j=1 πj δθj , π ∼ Stick(η) and {pst−1 ,j }∞ j=1 |λ, κ, π ∼ DP (λ + κ, λ+κ ). The predictive density (17) is again an infinite mixture with time-varying probabilities but the transition matrix encourages persistent regimes through the parameter κ as the expected probability of staying in a regime is given by i +κ E(pi,i |π, λ, κ) = λπ λ+κ ∀i ∈ N. Furthermore, the IHMM-GARCH model is tractable thanks to the parsimony of regimes implied by the DPs. As for the mixture model with infinite states, the p´olya urn scheme (see Fox, Sudderth, Jordan, and Willsky (2011)) is used to integrate the DPs of the GARCH parameters at time t conditionally to the previous parameters.

6

A. DUFAYS

Considering that K ≤ t − 1 denotes the number of existing regimes at time t − 1 and that φt−1 belongs to the regime i, we gather all the previous GARCH parameters that have been sampled from regime i in the set Ai = {φj |φj−1 = θi , j = 2, ..., t − 1} and denote by ni,. the number of elements in Ai and by ni,j the number of parameters in Ai that are equal to θj with j ∈ [1, K]. Then, the GARCH parameters at time t > 1 conditionally to the previous parameters are distributed as φt |A1 , ..., AK , G0

∼

K X j=1

φt |A1 , ..., AK

∼

K X j=1

ni,j 1 δθ + [λG0 + κδθi ], λ + κ + ni,. j λ + κ + ni,.

(18)

K

X nk,. ni,j η 1 δθj + [λ[ δθ + H] + κδθi ]. (19) λ + κ + ni,. λ + κ + ni,. η+t−1 k η+t−1 k=1

λ < 1, as exemplified by Since the probability of having a new regime depends on G0 multiplied by λ+κ+n i,. equation (18), the expected number of states cannot grow faster than for the DP-GARCH model. From equation λη (19), we observe that the probability of having a new regime at time t amounts to (λ+κ+ni,. )(η+t−1) which depends on the concentration parameters η, α, κ but also on the number of times of being in a particular state. ηλ .2 For example, when only one regime exists at time t − 1, this probability reaches (λ+κ+t−1)(η+t−1) To end the discussion on DPs, note that the sticky-HDP framework reduces to a HDP one when κ = 0. For κ convenience, the set HDir = {η, λ + κ, ρ}, where ρ = λ+κ , stands for the random concentration parameters. Their prior distributions are detailed in Section 3. In addition, the infinite transition matrix composed of Dirichlet processes is denoted by P = {pi,j }∞ j=1 ∀i ∈ N.

When a new regime appears, equation (19) makes explicit that the realization is drawn from the base distribution H. In order to take full advantage of volatility parameters from existing regimes, we add a hierarchical layer on this distribution as follows H|µ, Σ µ

∼ ∼

N (µ, Σ), N (µ, Σ),

(20) (21)

Σ−1

∼

Wishart(V , v).

(22)

This structure enhances the proposed parameters of new regimes since if a new state emerges, we draw related GARCH parameters from N (µ, Σ) whose expectation and variance-covariance matrix are updated by taking into account parameters of existing regimes. As the GARCH parameter supports are constrained but the parameters are sampled from a Normal distribution, we map them on the real line. The one-to-one transformation (i.e. αi ωi ), α ˜ i = log( 1−α ), β˜i = log( 1−αβii−βi ) for i = 1, 2, ...) is denoted by {˜ ωi , α ˜ i , β˜i } and ensures the ω ˜ i = log( 5−ω i i positivity of the variance as well as the stationarity of the model. More specifically, the constraint support for each parameter is such that ωi ∈ [0, 5], αi ∈ [0, 1] and βi ∈ [0, 1 − αi ]. For easing the inference presentation, we also define the set Θ = {˜ ω1 , . . . , ω ˜∞, α ˜1 , . . . , α ˜ ∞ , β˜1 , . . . , β˜∞ } which includes all the GARCH parameters of the ˜ ˜ model and the set θi = {˜ ωi , α ˜ i , βi } which contains all the relevant GARCH parameters of the regime i. In the paper, we restrict the specification to the GARCH(1,1) model but the inference could be adapted to other types of GARCH equations or to other innovation distributions.

2

Bayesian inference

Inference of the IHMM-GARCH model requires to cope with the path dependence issue as well as the infinite number of regimes. These two complications are apparent in the predictive density (17) where the conditional variance depends on all the previous states and the summation over the regimes is infinite. Nevertheless, Bayesian inference is feasible by treating explicitly s1:T as parameters. We also need to augment our targeted posterior distribution with auxiliary variables u1:T = {u1 , ..., uT } to deal with the infinite structure. Based on the slice sampler (Neal (2003)) this technique, the beam sampler from Van Gael, Saatci, Teh, and Ghahramani (2008), is exposed in the next subsection. Our sampling scheme iteratively draws from each full conditional distribution of Table 1. Insert Table 1 Updating the state vector s1:T from its full conditional distribution is the most challenging part of the MCMC sampler. The other full conditional distributions have already been detailed in the literature. The last two

IHMM FOR DYNAMIC VOLATILITY

7

distributions and f (P |s1:T , Θ, HDir , π, y1:T ) constitute the sticky IHMM blocks of which the sampling has been described in Fox, Sudderth, Jordan, and Willsky (2011). The hierarchical structure µ, Σ has the usual NormalWishart priors in order to get the conjugate posterior distributions. Drawing from the full conditional of Θ is standard. In this paper, we use an adaptive Metropolis method (see Atchad and Rosenthal (2005)). The entire MCMC sampler is sketched in Appendix A. We now concentrate on the sampling of a complete state vector.

2.1

Sampling the state vector s1:T

Updating each state st separately given the others (Bauwens, Preminger, and Rombouts (2010)) produces poor mixing properties due to the dependence of the states. We therefore propose another strategy for sampling the state vector. The method relies on the forward-backward algorithm of Chib (1996). However a straightforward application is infeasible due to the infinite number of regimes and to the path dependence problem. We use the beam sampler to circumvent the first issue. The second one is tackled by combining an approximation of the MSGARCH model with a Metropolis-Hastings step. It is worth noticing that we could also use the Particle MCMC algorithm of Bauwens, Dufays, and Rombouts (2013) conjugated with the beam sampler but our approach is less time consuming and easier to implement as empirically shown in Section 3.3.

Beam sampler : Dealing with the infinite number of components The random variables u1:T have been embedded to the targeted posterior distribution in order to make possible the update of s1:T . They act as a slice sampling to truncate the infinite summation implied by the DPs that appears, for instance, in the predictive density (17) into a finite one. In the initial paper (Van Gael, Saatci, Teh, and Ghahramani (2008)) the distribution of ut |st , st−1 , P is uniform, i.e. U [0, pst−1 ,st ]. Embedding the variable ut in the predictive density of the IHMM-GARCH yields f (yt , ut |s1:t−1 , Θ, P )

=

=

∞ X

j=1 ∞ X

f (yt |s1:t−1 , st = j, Θ, P )f (ut |st−1 , st = j, P )f (st = j|st−1 ),

(23)

δ0
(24)

j=1

Consequently, the beam sampler variable ut ensures that the summation becomes computable since only a finite number of probabilities will verify the inequality 0 < ut ≤ pst−1 ,j . Moreover, as we recover the predictive density (17) when we integrate over ut , the posterior distribution is not altered. To sample an entire vector u1:T , we use the decomposition : f (u1:T |s1:T , P ) δ{0≤u

= ≤p

f (uT |sT , sT −1 , P )f (uT −1 |sT −1 , sT −2 , P )...f (u2 |s2 , s1 , P )f (u1 |s1 , P ) }

s1 ,s1 1 . where f (u1 |s1 , P ) = ps1 ,s1 The size of the finite possible paths (which verify the inequality) directly affects the mixing properties of the sampler. The probability of staying in a same state for the next period is generally high for time series due to their persistence. This stylized fact can drastically decrease the size of possible paths allowed by the beam sampler at each MCMC iteration. As a result, the draws of the forward-backward algorithm can become highly dependent. To quote Fox, Sudderth, Jordan, and Willsky (2011) p. 1038 :

[...] the beam sampler can be slow to mix. The issue is that in order to consider a transition which has low prior probability, one needs a correspondingly rare slice variable sample at that time. Thus, even if the likelihood cues are strong, to be able to consider state sequences with several low-prior-probability transitions, one needs to wait for several rare events to occur when drawing slice variables. To avoid this problem we introduce a modified uniform distribution that concentrates more probabilities on small values of ut for self-transitions of the states instead of using a uniform distribution. The modified density is as follows f (ut |st , st−1 , P )

= =

δ0
if st−1 = st , otherwise,

(25) (26)

8

A. DUFAYS

where k ∈ [0, 1] is a user-defined parameter. Given this adjustment, the marginal and the joint distributions of the state st are f (st , ut |st−1 , P )

=

f (st |st−1 , P )

=

δ0
+ δ0
pst−1 ,st

The modified distribution is designed to improve the MCMC mixing property compared to the initial proposal one and when the constant k is set to one, it collapses to the standard uniform distribution. More precisely, although the MCMC algorithm is supposed to converge to the invariant distribution for any values of k, a small k can drastically enhance its convergence time. Indeed, small values of ut imply that more switching possibilities of the states are allowed in the forward-backward algorithm rendering the draws from the latter less persistent. The modified uniform distribution acts accordingly by increasing the probability of drawing values of ut ≤ kpst−1 ,st by k compared to a standard uniform distribution when it is a self-transition. Naturally, increasing the number of possibilities has a computational cost since more regimes are visited. For the forward-backward algorithm, the cost is particularly acute when many new regimes appear in the sampler and this number of new states is directly dependent on the smallest value of u1:T . From this point of view, the probability that the smallest u1:T comes from distribution (25) (from a self-transition) is approximately multiplied by 1/k compared to the initial Uniform distribution. Multiple values of the constant k have been empirically tested and from our experience, the range [0.1,1] does not augment to much the computational burden as it does not dramatically increase the average number of regimes allowed in the predictive density. The impact of k on the computational resource is discussed in section 4.4 but in the remainder of the paper, we set its value to 0.1. The alternative to the beam sampler consists in truncating the infinite summation up to a large number of regimes K (e.g. Teh, Jordan, Beal, and Blei (2006), Song (2013)). On one hand, the computational burden of this strategy can be consequent if the number of regimes K is large. On the other hand, if the chosen upper bound of regimes is too small, the posterior samples are misleading. On that aspect, the beam sampler has the advantage of not fixing any upper bound of states and of ensuring no truncation of the posterior distribution but at the cost of depreciating the MCMC mixing.

MS-GARCH approximation : Handling the path dependence issue The forward-backward algorithm fails when it faces a GARCH model due to the path dependence induced by the lag of the conditional variance. We propose a Metropolis-Hastings approach that avoids the path dependence problem. We first sample an entire state vector from an approximate GARCH model that allows us to apply the forward-backward algorithm and the proposal is accepted or rejected according to the Metropolis-Hastings ratio that preserves the required balance. Although an approximate model is used to sample the state vector, the posterior distribution is not altered thanks to the Metropolis-Hastings step. Two approximate models are discussed in the paper. • MS-GARCH models are well approximated by the model of Klaassen (2002) defined as yt σt2 2 σ ˜t−1,s t

= =

σ t ǫt , 2 2 ˜t−1,s + β st σ ωst + αst yt−1 , t

2 E[σt−1 |y1:t−1 , st , u1:T , Θ, P ].

(27) 2 σ ˜t−1,s t

in Appendix B. Under = We derive the computation of where this specification, the likelihood function at time t (i.e. q(yt |y1:t−1 , Θ, st ) = fN (yt |0, σt2 (st ))) is computable as the conditional variance only depends on the current state. • Relying on the research of Haas, Mittnik, and Paolella (2004), we also consider the following approximation of the MS-GARCH model : yt 2 σt,1 2 σt,2 2 σt,K

= =

σt,st ǫt 2 2 ω1 + α1 yt−1 + β1 σt−1,1

= ... =

2 2 ω2 + α2 yt−1 + β2 σt−1,2 2 2 ωK + αK yt−1 + βK σt−1,K

where K stands for the maximum number of regimes. The approximation gets rid of the path dependence problem since each state at any time is related to a known variance. Given the state st , the likelihood 2 ). In the following, this function at time t is estimable and is given by q(yt |y1:t−1 , Θ, st ) = fN (yt |0, σt,s t approximate model is called HMP for convenience.

IHMM FOR DYNAMIC VOLATILITY

9

Procedure for updating the state vector A draw of s1:T is recursively obtained from the proposal distribution as follows (letting si:T = {si , ..., sT } and omitting the condition to the sets of parameters Θ,P,HDir and π) : q(s1:T |y1:T , u1:T )

=

q(sT |y1:T , u1:T )q(sT −1 |y1:T , sT , u1:T )q(sT −2 |y1:T , sT −1:T , u1:T )...q(s1 |y1:T , s2:T , u1:T(28) )

As the MS-GARCH approximation rules out the path dependence problem, each conditional density q(st |y1:T , u1:T , st+1:T ) is proportional to q(st |y1:T , u1:T , st+1:T )

∝ ∝ ∝ ∝

q(st:T , yt+1:T , ut+1:T |y1:t , u1:t ) q(st |y1:t , u1:t )f (st+1 , ut+1 |st )q(st+2:T , yt+1:T , ut+2:T |y1:t , u1:t+1 , st , st+1 ) q(st |y1:t , u1:t )f (st+1 , ut+1 |st ) because no path dependence δ{0≤ut+1 ≤kpst ,st+1 } δst =st+1 + δ{0
and q(st |y1:t , u1:t ) is computed by forward looking : q(st |y1:t , u1:t )

∝

q(yt |y1:t−1 , st , ut )

∞ X

f (st , ut |st−1 = i)q(st−1 = i|y1:t−1 , u1:t−1 )

i=1

∞ δ{0≤u ≤kp } δsit−1 =st X t si ,s t−1 t (29) ) + δ{0
where sit−1 stands for st−1 = i and the infinite sum of the last equation is handled thanks to the beam sampler. 2 2 ) (where σ ˆt,s As we use an approximate MS-GARCH model, the likelihood q(yt |y1:t−1 , st ) = fN (yt |0, σ ˆt,s t t denotes either the conditional variance of the Klaassen’s model or the HMP’s one) is computable because the conditional variance does not depend on the previous state any more. The state vector s′1:T , sampled from (28), is accepted according to the Metropolis-Hastings ratio : α(s1:T , s′1:T |y1:T , Θ, P, u1:T , HDir , π)

=

min{1,

f (s′1:T |y1:T , u1:T , Θ, P, HDir , π)q(s1:T |y1:T , u1:T , Θ, P ) } f (s1:T |y1:T , u1:T , Θ, P, HDir , π)q(s′1:T |y1:T , u1:T , Θ, P )

The proposal distribution is not always a good approximation of the full conditional one. It occasionally leads to stick the algorithm at a fixed state vector. In order to avoid this situation we sample the state vector by randomized blocks instead of drawing an entire one in a single piece. At each MCMC iteration we randomly set the size of the block. The method has been proposed in a stochastic volatility context by Jensen and Maheu (2010). In our empirical exercise the block size randomly varies from fifty observations to the whole sample size. Note that the resulting randomized block sampler can be applied to any block size which makes a relevant difference with the Metropolis-Hastings method of Billio, Monfort, and Robert (1999) or the single-move of Bauwens, Preminger, and Rombouts (2010), which can hardly be extended to block sizes greater than 50 observations. It is worth emphasizing that the proposed sampler does not need the sticky HDP to operate. Applications where the number of regimes K is ` a priori fixed (for instance Bauwens, Preminger, and Rombouts (2010), Henneke, Rachev, Fabozzi, and Nikolov (2011)) could also benefit from the current method. The number of regimes could then be determined using the marginal likelihood computed by bridge sampling (e.g. Meng and Wong (1996) and Fruhwirth-Schnatter (2004)).

2.2

Model comparison : Computation of the marginal likelihood

Being able to compute the marginal likelihood (ML) of a model is of principal interest in Bayesian inference as it remains a standard tool to compare models. The ML of the IHMM-GARCH model can be sequentially estimated by decomposing the quantity into a product of predictive densities as follows f (y1:T )

=

f (y1 )

TY −1 t=1

f (yt+1 |y1:t ).

(30)

10

A. DUFAYS

Gathering all the sampled random variables of the MCMC algorithm in {Θ, P, s1:t , u1:t , HDir , π, µ, Σ}, the predictive likelihood f (yt+1 |y1:t ) is approximated by Z f (yt+1 |y1:t ) = f (yt+1 |y1:t , Ωt , st+1 )f (st+1 |Ωt )f (Ωt |y1:t )dΩt , ≈

N 1 X f (yt+1 |y1:t , Ωit , sit+1 ), N

the

set

Ωt =

(31)

i=1

where the MCMC draws {Ωit , sit+1 }N i=1 are simulated from the posterior distribution f (st+1 |Ωt ) f (Ωt |y1:t ). The ML is therefore tractable by computing each term of (30) using equation (31). Proceeding from the very first time to the end of the sample by only adding one observation at a time is computationally demanding, especially for long time series. A faster strategy would consist in computing a h-variate density f (yt+1 , ..., yt+h |y1:t ) given one MCMC output conditional to y1:t and iterate by adding h observations. Unfortunately, the predictive density estimation becomes less and less accurate as long as the h increases. Given a fixed number of MCMC draws, one can monitor the accuracy of the predictive ratio N density f (yt+1 , ..., yt+h |y1:t ) by choosing an appropriate number of forecasts h. To do so, we borrow the Effective Sample Size (ESS) from the Sequential Monte Carlo literature used to evaluate the particle discrepancies (e.g. Doucet, de Freitas, and Gordon (2001)). In our case, the ESS denotes the number of MCMC draws that PN 2 −1 contain information to compute the predictive density. The measure is defined as ESSh = ( i=1 wi,h ) where

wi,h =

f (yt+h ,...,yt+1 |y1:t ,Ωit ,sit+1 ,...,sit+h ) PN . j j j j=1 f (yt+h ,...,yt+1 |y1:t ,,Ωt ,st+1 ,...,st+h )

One can easily verify that the ESSh lies between 1 and N. If only

one MCMC sample represents all the weights (i.e. w1,h = 1 and wi,h = 0 ∀ i ∈ [2, N ]), the ESS reaches its lowest bound. On the contrary, if all the draws are equally informative, the ESS amounts to N . The criterion allows for selecting h on the fly after each MCMC estimation. In practice, we sequentially proceed from t = 1 until T by adding h observations to the current ones and by launching an MCMC simulation to compute the multivariate predictive density. We fix the number of posterior draws to N = 4500 after convergence (see section 3.1) and the horizon h is dynamically chosen by precluding any ESS smaller than N2 .3

3

A Simulation Study

The algorithm is illustrated on simulated data in this Section. We first document our prior choices and some practical issues for the MCMC implementation. We devote the next three subsections to results on simulated series from six different data generating processes (DGPs). We start by describing our simulation strategy that includes the chosen DGPs. The MCMC mixing properties are then compared with those of existing MCMC alternatives. The last subsection eventually reports summary statistics of the posterior distributions of several simulated data. A discussion on the MCMC mixing properties of the two approximate models lies in the web Appendix.4 Based on a Monte Carlo study, the Klaassen’s model turns out to better approximate the MSGARCH model than the HMP’s one.

3.1

Starting point, priors, burn-in and label switching

As it is shown in Table 2, we use standard prior distributions for the model. The transformed GARCH parameters of each regime are independently driven by Normal distributions. Figure 1 displays the distributions of the corresponding GARCH parameters. Although slightly informative, the prior distributions do not exhibit strong a` priori beliefs. Regarding the IHMM settings, we fix the same prior distributions on the Dirichlet process parameters as Fox, Sudderth, Jordan, and Willsky (2011) and Jochmann (2013) but two sets of hyper-parameters are investigated to empirically evaluate their impact on the number of regimes. The first one, referred as the strong prior throughout the paper, reflects the high persistence of daily financial time series. The expectation of λ + κ and ρ are respectively set to 1000 and 0.9994. These choices of the persistence are closely related to the one set by Bauwens, Dufays, and Rombouts (2013) who require a huge persistence for the PMCMC to operate. The second hyper-parameter values, denoted as the loose prior for convenience, ensures relatively diffuse distributions. The values on ρ imply an expected regime duration of 100 observations which represents a reasonable persistence for long time series (3000 observations). The two concentration parameters η and α + λ exhibit completely diffuse priors with expectations equal to 10 and variances amounting to 100. Insert Figure 1 Insert Table 2

IHMM FOR DYNAMIC VOLATILITY

11

The starting point of an MCMC algorithm is a relevant practical issue. Typically, MCMC algorithms start at the maximum likelihood estimates (MLE) but these cannot be computed in the presence of structural breaks except by using a demanding simulated method (see Augustyniak, Boudreault, and Morales (2013)). To simplify matters, our MCMC starting point for each simulation is the MLE of a GARCH(1,1) model without breaks. The starting values of λ, κ and η are set to 1.2,1000 and 3, respectively. The posterior distribution of the model parameters is invariant to the labels of regimes. As a consequence a label of one regime can switch to another one during the MCMC simulation. If this label switching happens, summary statistics that are label dependent, such as the posterior means of the GARCH parameters, are misleading. Some methods have been proposed to alleviate the permutation in the MCMC by imposing some constraints on the parameters or to build a sample of coherent labels at the end of the MCMC simulation by minimizing a loss function. In this paper we circumvent the problem by using statistics that are invariant to label switching. For instance, instead of showing posterior means of each regime, we display posterior means over time (≈ E(φt |y1:T ) where φt = {ωt , αt , βt }), which do not depend on the state label. We thus permit label switches during the simulation as advocated by Geweke (2007). For assessing the MCMC convergence we use Geweke’s diagnostic (Geweke (1992)) on some of the GARCH parameters over time : φt |y1:T . We select ten parameters φt |y1:T equally spaced in time in order to cover the whole sample and we apply Geweke’s diagnostic to them. Once the MCMC has converged for these ten variables we save the next 60.000 samples as draws of the posterior distribution. The IHMM-GARCH program coded in c++ is available for Windows platform on Arnaud Dufays’ website.

3.2

Data Generating Processes

By exhibiting an infinite number of regimes, the IHMM-GARCH model encompasses the MS- and the CPGARCH ones. We thus revisit some simulations of the CP and the MS literature. We consider the same simulations as He and Maheu (2010) (HM), Bauwens, Preminger, and Rombouts (2010)(BPR), Bauwens, Dufays, and Rombouts (2013) (BDR), one DGP whose true parameters are based on the S&P 500 result of the empirical section (Emp-SP) and a last one without break. The DGP of He and Maheu consists in a CP model with three regimes. The BPR simulation is an MS model with two regimes. BDR consider a Change-point and a Markov-switching DGPs. The S&P 500 ones also present an MS behavior. The last series does not exhibit any break for testing the IHMM-GARCH ability to estimate a standard specification. All the simulated DGPs are summarized in Table 3. Each series has 3000 observations but the BPR simulation (1500 observations). Insert Table 3 The CP specification of HM assumes structural breaks in the unconditional variance while the persistence ω parameters (α and β) remain constant over regimes. The unconditional variance (i. e. 1−αs st−βs ) before the t t first break is equal to 2 and then increases to 6. It decreases to 1 at the end of the sample. It tries to mimic a financial market that switches from quiet to more volatile periods. On the contrary each parameter of the CP DGP in BDR varies across regimes. The unconditional variance evolves from 2 to 0.67 with an intermediate state where it is equal to 7. The three Markov-switching specifications differ among others things by their transition matrix. The BPR DGP assumes an expected duration of remaining in the first and second states of 50 and 25 observations respectively whereas the transition probabilities of the BDR MS model are very low (with duration of 10.000 and 2000 observations). For the Emp-SP DGP, the volatility dynamic oscillates from one regime displaying a high persistence (0.975) and a low unconditional variance (0.4) to another one exhibiting the opposite characteristics, i.e. a smaller persistence (0.95) with a high unconditional variance (2.8). The expected duration of the two regimes are respectively equal to 278 and 556 observations which more or less represents one and two financial years.

3.3

Comparison of the sampler with other MCMC algorithms

We compare our sampling method of the state vector with two other MCMC samplers namely, BPR for Bauwens, Preminger, and Rombouts (2010) and PMCMC for Bauwens, Dufays, and Rombouts (2013). BPR draw each state one by one which leads to a highly autocorrelated samples. It requires many MCMC iterations to explore the entire support of the posterior distribution. The PMCMC method samples the entire state vector in one block using a Sequential Monte Carlo (SMC) algorithm within the MCMC. The mixing properties are by far improved but at a computational cost. The complexity order of the SMC is O(N 2 T ) where N stands for the number of particles. They choose N = 250 for an MS model and N = 150 for a CP one. The two competing algorithms are sketched in Appendix C. We expect that the mixing properties of our MH approach lies in between these

12

A. DUFAYS

two methods since we randomize the size of the block that we sample and we use a Metropolis-Hastings step. However the computation time is drastically reduced compared to the SMC algorithm. Indeed the computational burden is equivalent to the forward-backward algorithm (O(K 2 T ) where K denotes the number of regimes). To compare the different methods we launch the samplers on simulated series from the five DGPs that exhibit structural breaks. For a fair comparison, we have implemented a finite hidden Markov structure with a fixed number of regimes for all the samplers (such as in Bauwens, Dufays, and Rombouts (2013), see Appendix C) and we estimate the correct specification for each simulated time series (i.e. the number of regimes is equal to the true one exhibited by the DGPs and we use constraint transition matrices for CP DGPs and MS transition ones for the DGPs with recurrent regimes). All the simulations have been executed on the same computer and the programs only differ in the way of sampling s1:T . Finally we compute 30.000 MCMC iterations and store one out of every three samples for each simulation providing 10.000 posterior draws. Figure 2 documents the performance of each algorithm in terms of elapsed time in minutes, the normalized Hamming distance between the posterior draws and the true state vector (calculated using the Munkres’ algorithm, see Munkres (1957)) as well as the autocorrelation time of this distance.5 For each statistic, the lower is the better. Comparing with BPR, the new samplers (HMP-MH for Haas, Mittnik and Paolella MetropolisHastings and Kl-MH for Klaassen Metropolis-Hastings) are almost as fast as the single-move algorithm for CP setups and become by far the swiftest ones for MS configurations. In addition, they display much smaller autocorrelation times than the BPR approach for all the simulated data and, contrary to the BPR algorithm that may exhibit strong correlated samples (see Normalized Hamming distance of the two CP setups), the new approaches have always converged within the first 1000 stored MCMC draws as emphasized by the Hamming distance graphics. With respect to the other competitor, the proposed algorithms are by far faster than the SMC-based approach while they exhibit similar (but slightly higher) autocorrelation times. For each simulated data, the Hamming distance of the new algorithms are always as good as the PMCMC one. Considering that 30.000 posterior draws are sufficient for summarizing the posterior distributions, one requires at least 853 minutes for obtaining such a sample (for a time series of 3000 observations) with the PMCMC algorithm when it only amounts to 27 minutes using the KL-MH method; making the new approach more than 31 times faster than the SMC-based one. Table 4 details the acceptance rate of the state vector using the HMP-MH or the KL-MH algorithms. The latter always exhibits higher values. Moreover, as exemplified by extra Monte Carlo results documented in the web Appendix, the KL-MH approach seems to provide slightly better MCMC mixing properties. Therefore, throughout the paper we only report results derived from this algorithm. Nevertheless all the simulations have also been run with the HMP-MH one. Results, which are comparable, are available on request. Insert Figure 2 Insert Table 4

3.4

IHMM-GARCH results on simulated data

We present results of the IHMM-GARCH model on the different simulated data generated from the DGPs of Table 3. To begin with, Table 5 displays the posterior probability of having a specific number of regimes and for each simulation/prior, probabilities are maximized at the true one. Also, it is worth noticing that we never underestimate the number of regimes. However more regimes are sometimes counted and, depending on the priors, the algorithm quickly or gradually comes back to the true setting. As the likelihood does not decrease by adding more parameters, it is not surprising to observe such a pattern, especially for the loose prior. Insert Table 5 Figures 3 and 4 show the posterior means of the parameters and the maximum likelihood estimates given the true states over time for each simulation. The IHMM-GARCH model closely tracks the MLE and sharply identifies the break points. It seems capable to both reproduce Change-point and Markov-switching behaviors. Furthermore, the acceptance rate for updating the state vector does not decrease below 70 percent for all the simulated data.6 Insert Figure 3 Insert Figure 4 Focusing now on the BPR series which exhibits the less persistent regimes, Figure 5 displays the posterior duration distribution of the current regime over time (i.e. 1−p1s ,s |y1:T ). This indicator of the regime persistence t

t

IHMM FOR DYNAMIC VOLATILITY

13

highlights that both models, whatever the prior, have been capable of capturing small persistences. The loose prior exhibits a posterior distribution that comprises the two average duration of the BPR DGP (i.e. 50 and 25 observations). On the contrary, the true average durations have not been accurately estimated with the other prior although the posterior distribution highly departs from the strong persistence imposed by its initial setup. Insert Figure 5 To end this study, Table 6 documents the minimum, the mean and the maximum normalized Hamming distances between the true state vector and the posterior ones. The two types of priors provide similar results. Not surprisingly, the mean and the maximum distances are almost always higher for the loose prior since the number of regimes are less stable than with the strong prior. Insert Table 6 Summarizing this simulation study, the IHMM-GARCH model is able to identify the turning points exhibited by the series dynamics as well as the correct number of regimes, independently of the two IHMM priors. Nevertheless, these comments are strongly related to the simulation design and this exercise can be made as difficult (or easy) as needed. Robust results based on Monte Carlo studies are left for further research.

4

An Empirical Study

We now investigate the IHMM-GARCH fit compared to standard volatility models on several financial time series. The next subsection presents the considered alternatives to the IHMM-GARCH process. We then move to a detailed comparison of the various models on the S&P 500 daily returns in subsection 4.2 before studying their performances through marginal likelihoods and predictive likelihoods on three other time series in the last subsection.

4.1

Some other volatility models

The IHMM-GARCH process has many competitors. This section documents the different volatility models that are next used as alternatives to the proposed one. • Making use of the IHMM structure, we extend the MS-GARCH model of Haas, Mittnik, and Paolella (2004) to potentially take into account an infinite number of regimes. The process is denoted IHMM-HMP model in the next sections. Its estimation only requires small changes of the MCMC approach exposed in the paper so we do not detail the scheme. For interested users, the program is available on the author’s website. • The spline-GARCH model of Engle and Rangel (2008) also generates a time-varying long-term variance relying on a spline function. The specification is as follows yt gt2

= =

τt gt ǫt , ǫt ∼ N (0, 1), 2 (1 − α − β) + α(yt−1 /τt−1 )2 + βgt−1 k+1

τt2

=

X t (t − Ki−1 ) 2 ω0 exp ω1 + , 0] ωi max[ T T i=2

!

,

where {K1 = 0, K2 = T /k, ..., Kk = (k − 1)T /k} are turning points equally spaced on the sample size and (α, β, ω0 , ..., ωk+1 ) are parameters. The number of knots (i.e. k) is selected by maximizing the marginal likelihood. In this setup, the function τt2 coincides with the unconditional variance whereas the parameter α + β still denotes the persistence of the conditional variance. Note that the time in the spline function is normalized for avoiding numerical instabilities. The parameter prior distributions, documented in Table 7, are as uninformative as possible. Insert Table 7 To our knowledge, no paper has developed Bayesian inference of the spline-GARCH model so far. As in the GARCH one, none of the full conditional posterior distributions of the spline-GARCH parameters are known in closed form rendering the estimation by MCMC difficult. Moreover, the marginal likelihood cannot be directly derived from an MCMC output and therefore requires an extra-method such as the bridge sampling to estimate it. We propose to bypass these difficulties by employing the Sequential Monte

14

A. DUFAYS

Carlo (SMC) sampler (see Neal (1998), Del Moral, Doucet, and Jasra (2006)). The algorithm applies the sequential importance sampling technique on an artificial sequence of distributions that converges to the posterior distribution of interest. The sequence of importance distributions bears upon MCMC kernels that are adapted on the fly rendering the method fully generic (for theoretical convergence, see Beskos, Jasra, and Thiery (2013)). As final advantage, the SMC sampler provides an estimate of the marginal likelihood. The algorithm is detailed in the web Appendix. • As standard model, we add the GARCH one to the list of alternatives.

4.2

Model comparisons on the S&P 500 daily returns

We begin by revisiting the empirical exercise of Bauwens, Dufays, and Rombouts (2013). They use an MS and a CP model on the S&P 500 daily percentage returns from May 20, 1999 to April 25, 2011 (3000 observations). They further choose the optimal model and the number of regimes with the marginal likelihood. They find evidence in favor of a Markov-switching model with two regimes. Table 8 documents the posterior distribution of the number of regimes given the two distinct priors. The IHMM-GARCH posterior distribution of the strong prior covers five different numbers of regimes (2 to 7) and the most observed one is 2. As also shown in Table 8, the possible number of regimes is more spread for the loose prior and the most likely number amounts to 6. Insert Table 8 Insert Figura 6 Focusing on the strong prior which exhibits an `a priori regime persistence similar to the PMCMC model of BDR, Figure 6 displays the time series with the estimated mode of the state vector at the most likely number of regimes and highlights that the IHMM-GARCH process captures regime switches comparable to the best MS-GARCH model of Bauwens, Dufays, and Rombouts (2013). Interestingly, an extreme value occurring before the financial crisis, at February 27, 2007, has been detected (by a rapid switch in the break dynamic) although the prior distributions are in favor of long regime durations. The crisis begun to affect the financial sector five days before it when HSBC, the world’s largest bank at that time, laid off its US mortgage head for the loss of 10.5 billion dollar. This rapid change stresses the flexibility of the IHMM-GARCH model to accommodate extreme values. To extend the parallel with the PMCMC results, we show the posterior medians of the parameters over time given the strong prior in Figure 7 below. We easily identify the regime switches on the graphic. The (local) unconditional variance jumps from volatile to quiet periods. The turbulent episodes correspond to the dot-com bubble and the financial crisis eras and are clearly observable on Figure 6. Moreover, the two other GARCH parameters (α and β) evolve over time emphasizing the relevance of not fixing them. We depict a spike in the density interval of the β parameter which is related to the rapid switch of regimes occurring at the beginning of the financial crisis. PMCMC posterior medians are also reported and they only slightly deviate (in particular βt ) from the estimated posterior medians of the IHMM-GARCH model. Insert Figure 7 Moving to the comparison with other models, Figure 8 documents the time-varying long-term component of the IHMM-GARCH, the IHMM-HMP and the spline-GARCH models.7 These specifications lead to similar dynamics and track the variations observed in the time series. However, in relation to the IHMM results, a smooth function for driving the long term variance does not look sufficient to capture all the changes. The two IHMM long term components evolve similarly and it highlights an ability of detecting abrupt changes in the time series dynamic. In addition, note that the two loose configurations display more flexibilities by rapidly switching from high (local) unconditional variances to smaller ones. Focusing now on the IHMM-HMP specification, the model exhibits 2 to 7 regimes with the most likely number of regimes equal to three for the strong prior while more regimes are counted with the loose prior. Table 9 documents the different posterior probabilities of observing a specific number of regimes. Insert Figure 8 Insert Table 9

IHMM FOR DYNAMIC VOLATILITY

4.3

15

Predictability of the IHMM-GARCH model

Looking at the marginal log-likelihood (MLL), Table 10 provides strong evidence in favor of the three flexible models compared to the GARCH one. Relying on the informal rule of Kass and Raftery (1995) which states that if the difference of MLL is smaller than 1, the evidence in favor of the model that has the highest value is ”not worth than a bare mention”, whereas if it is larger than 1, the evidence is positive, and strong if it exceeds 3, the IHMM models dominate the spline-GARCH model whatever the chosen prior. This performance can be explained by the flexibility of the IHMM models since not only the long term component varies over time but also the short term dynamic abruptly changes. Concentrating on the IHMM results, when the prior is in favor of persistent regime, the two IHMM models lead to very close marginal log-likelihoods. Consequently, when only few regime switches occur, the two underlined volatility dynamics are similar, making the HMP model a good approximation of the MS-GARCH model in such a case. This result is in line with Bauwens, Dufays, and De Backer (2011) who estimate CP-GARCH models (with a small number of structural breaks) and obtain very close in-sample fits with the CP version of the HMP model. Finally, observe that the IHMM models with loose prior outclasses all the other setups. Not surprisingly, the stock market is therefore better described by rapid abrupt changes in the volatility dynamics than long-lasting regimes or by smooth long run volatility movements. Insert Table 10 To deepen the comparison, we also explore the predictive performances of the models by means of the predictive likelihood. Given an user-defined value h ∈ [2, t], the predictive likelihood (PL) is defined as f (yh , ...yt |y1:h−1 , M )

=

f (y1:t |M ) f (y1:h−1 |M )

(32)

where M denotes the model used to fit the data y1:t = {y1 , ..., yt } and t ∈ [h, T ]. The PL is therefore the ratio of two marginal likelihoods and it can easily be checked that the PL of a model (say M1 ) can be higher than another model (say M2 ) even if the two marginal likelihoods of M1 are smaller than those of M2 . According to the formula (32), the quantity only focuses on the ability of predicting the realizations from h to t given the previous observations. We will be interested in an one-step ahead prediction (h=t).8 To compare the predictability of our different competitors, we focus on a linear model combination that is meant to improve the density forecasts by maximizing the log score function. In contrast with Bayes factors or model averaging methods, this approach, put forward by Geweke and Amisano (2011), does not assume that the true model lies in the set of studied ones. More specifically, we study predictive densities of the form n X

wi f (yt |y1:t−1 , Mi );

wi ≥ 0 ∀i ∈ [1, n];

i=1

n X

wi = 1

(33)

i=1

where Mi denotes one of the n considered models. The object of interest are the weights that maximize the log-score function defined as follows LS(yτ :T , w)

=

T X t=τ

log[

n X

wi f (yt |y1:t−1 , Mi )].

(34)

i=1

The log-score function (34) is interpreted through the weights attributed to the different models. The highest weight implies that the related model contributes the more to the prediction. Table 11 documents the maximum log-score functions and the corresponding weights taking into account several model combinations with τ , the first prediction, being equal to May 09, 2005 (τ = 1500 observations). Whatever the combination, the IHMM-based models always exhibit the highest weights. In addition, the IHMM-GARCH one slightly outperforms the IHMM-HMP one regarding the weights of the predictive density (33). Insert Table 11 The same analysis is carried out on three other financial time series, namely the Bank of America corporation (BAC), Boeing company (BA) and GOLD on the same time-span as the S&P 500. Table 12 shows the MLLs of the various specifications for each return. The IHMM setups again highly dominate the GARCH and the splineGARCH models for each time series. The Table also highlights that a high number of regimes is sometimes required to fit the volatility dynamics contained in the individual stocks. Insert Table 12

16

A. DUFAYS

Eventually, as with the S&P 500 series, a predictive study based on model combination has been undertaken. Table 13 displays the maximum log-score functions and their corresponding weights given several model combinations for τ , the first prediction, being equal to May 09, 2005 (1500th observation). Remarkably, the IHMM setups always exhibit highly superior weights confirming that these ones improve over the GARCH and the spline-GARCH models. Insert Table 13

4.4

Computational cost of the modified uniform distribution

So far, the constant k arising in the modified distribution of the beam sampler has been fixed. We now discuss the computational impact of choosing its value in relation to the mixing improvements. As long as k is taken small, the number of regimes and the possible trajectories in the forward-backward algorithm increase. Therefore, there is a trade-off between improving the mixing of the sampler and the computational time lost in spurious regimes during the forward-backward algorithm. Van Gael, Saatci, Teh, and Ghahramani (2008) suggest counting the average number of regimes allowed by the beam sampler in the forward equation (29) as it directly controls the computational cost as well as the mixing properties. When using a Uniform distribution (i.e. k = 1), it turns out that this average number rapidly declines and reaches a steady state around one as empirically shown in their paper. We perform the same analysis but with a panel of k. We carry out five estimations of the S&P 500 for each considered value. Figure 9 display the mean (over the five simulations) of the average number of possible trajectories allowed by the beam sampler during the first 10.000 MCMC iterations. To begin with, when the standard uniform distribution is used, i.e. k = 1, the average number of possible regimes remains close to one during the entire simulation. The interesting cases start when k ≤ 0.5 since the average number of trajectories departs from one. For those values, the modified uniform improves the mixing of the MCMC compared to the standard beam sampler. Moreover, as the modified distribution only acts on the self-transition probabilities, even for extremely low values of k, the number of possible paths does not explode and remains quite small. This feature is remarkable since the complexity of the forward-backward directly depends on the number of regimes and this number does not sharply grow with the modified distribution. To be more explicit, the complexity of the forward-backward algorithm amounts to O(K 2 T ) where K denotes the number of regimes. In this example, for k > 0.5, the complexity of the algorithm is more or less linear since only one regime is allowed on average. Considering the extreme case when k = 0.01, the complexity is approximately equivalent to estimate an MS-GARCH model with three regimes which computationally remains unburdensome. Insert Figure 9

5

Conclusion

The GARCH model with fixed parameters has drained all its potential and more flexible models are now required. We first propose an IHMM-GARCH model that allows for a potentially infinite number of regimes. Our MCMC sampler relies on the sticky infinite hidden Markov model that has already been applied to autoregressive models but never to volatility models. Furthermore it accommodates the path dependence problem by using a novel Metropolis-Hastings method. As a result an entire state vector is sampled in a few blocks and with small autocorrelations. A comparison of the algorithm with existing MCMC alternatives confirms that the sampler outperforms the others in term of computational time although the mixing properties remain as almost good as the best known MCMC method. The IHMM encompasses the CP and the MS models. Some simulations in the paper show that the IHMM-GARCH model accurately estimates the two different specifications. We also detail results for the S&P500 from May 20, 1999 to April 25, 2011 (3000 observations) and highlight strong similarities with the MS-GARCH results of Bauwens, Dufays, and Rombouts (2013). The relevance of using an IHMM structure into a volatility models is empirically assessed compared to other popular models on four financial time series through marginal likelihoods and predictive likelihoods. As Bayesian results are dependent on the prior distributions, two prior configurations have been used over the whole paper. The posterior distributions of the two setups are robust on simulated data and differ on empirical time series mostly due to the fact that the strong prior implies very persistent regimes which are not in agreement with the financial time series dynamics. As noted in the model specification section, the algorithm could also handle other distributional assumptions than the Normal one. Further research will investigate this feature.

IHMM FOR DYNAMIC VOLATILITY

17

Appendix A

IHMM-GARCH MCMC sampler : Implementation

Before developing the P implementation of the we summarize some useful notations. The sum are denoted P sampler, P by dots. For instance a xa,b = x.,b and a b xa,b = x.,. . The vector {x1 , x2 , ..., xr } is briefly denoted by x1:r . Vectors are in row and the transpose operator is designated by ′ . The number K stands for the number of regimes. Some confusion can rise about the density function of the Gamma distribution. In the paper we always use the following one : X ∼ G(k, s)

if

f (x|k, s) =

1 sk Γ(k)

xk−1 e

−x s

.

After finding a starting point (see subsection 3.1), we iterate until convergence between the following steps. ˜ [0, ps ,s ] where ˜ [0, ps ,s ] and u1 ∼ U 1. Sampling u1:T from f (u1:T |s1:T , P ) : for t=2,...,T, sample ut ∼ U 1 1 t−1 t ˜ U [a, b] denotes the modified Uniform distribution described in 2.1 on the interval (a,b). T 2. Generate new required states : while(max{pi,K+1 }K i=1 ) > min({ut }t=1 ) : • Sample pK+1,1:K+1 ∼ Dir(λπ1:K+1 + κδK+1 ) • Break the last stick of π : (a) Draw ξ ∼ Beta(1, η) (b) Set πK+2 = (1 − ξ)πK+1 . (c) Afterward, set πK+1 = ξπK+1 . • Increase the dimension of each vector pi : for i=1,...,K+1 (a) Draw ξi ∼ Beta(λπK+1 + κ1{i=K+1} , λπK+2 ) (b) Set pi,K+2 = (1 − ξi )pi,K+1 and pi,K+1 = ξi pi,K+1 • Draw ΘK+1 ∼ N (µ, Σ) • Set K = K+1. 3. Sampling s1:T from f (s1:T |Θ, P, HDir , π, u1:T , y1:T ) = f (s1:T |Θ, P, u1:T , y1:T ) : see Section 2.1 4. According to the new vector s1:T , remove the unvisited states, re-label s1:T appropriately and adapt K, π, Θ, P . 5. Sampling P from f (P |s1:T , Θ, HDir , π, y1:T ) : for i=1,...,K, sample pi,1:K+1 ∼ Dir(λπ1 + ni,1 , ..., λπi + κ + ni,i , ..., λπK+1 ) where ni,j denotes the number of transition from state i to j observed in the state vector s1:T . 6. Sampling HDir from f (HDir |π, Θ, P, s1:T , y1:T ) : (a) Introduce auxiliary variables : • Sampling m : For j=1,...,K, and k=1,...,K. Set mj,k = 0. For i=1,...,nj,k sample xi ∼ λπk +κ1{j=k} Bernoulli( i−1+λπ ) and increment mj,k = 0 if xi = 1. k +κ1{j=k} ρ κ ) where ρ = λ+κ • Sampling r : For j=1,...,K. rj ∼ Binomial(mj,j , (1−ρ)π j +ρ • set m ¯ j,k = mj,k if j 6= k and m ¯ j,k = mj,k − rj if j = k ¯ = 0, for k=1,...,K, if m ¯ • set K ¯ .,k > 0 then increment K (b) Sampling λ and κ ni,. ) • Sample auxiliary variables : for i=1,...,K, qi ∼ Beta(λ + κ + 1, ni,. ) and si ∼ Bernoulli( ni,. +λ+κ κ • Sample ρ = λ+κ ∼ Beta(ρhyp1 + r. , ρhyp2 + m.,. − r. ) where ρhyp1 and ρhyp2 denotes the hyperparameters of ρ (see Table 2) PK 1 − j=1 log qj )−1 ) where ahyp and bhyp denotes the • Sample λ + κ ∼ G(ahyp + m.,. − s. , ( bhyp hyperparameters of λ + κ (see Table 2) • set λ = (1 − ρ)(λ + κ) and κ = ρ(λ + κ) (c) Sampling η m ¯ .,. • Sample auxiliary variables : q˜ ∼ Beta(η + 1, m ¯ .,. ) and s˜ ∼ Bernoulli( m ¯ .,. +η ) ¯ − s˜, { 1 − log q˜}−1 ) where ηhyp1 and ηhyp2 denotes the hyper• Sample η ∼ G(ηhyp1 + K ηhyp2 parameters of η (see Table 2) 7. Sampling π from f (π|HDir , Θ, P, s1:T , y1:T ) ∼ Dir(m ¯ .,1 , m ¯ .,2 , ..., m ¯ .,K , η). 8. Sampling Θ from f (Θ|s1:T , µ, Σ, y1:T ) by adaptive Metropolis algorithm (see Atchad and Rosenthal (2005)). More precisely, for each parameter of each regime θ˜i , accept the proposal θˆi ∼ N (θ˜i , kθi ) according to a

18

A. DUFAYS

Metropolis ratio. Then update the acceptance rate ACθi of the sampler for the parameter and adapt the variance kθ˜i as follows kθ˜i = max{kθ˜i + (ACθ˜i − 0.333)/0.6M , 0.001} where M denotes the current MCMC iteration. PK −1 ˜ ¯ where µ ¯ ¯ = (KΣ−1 + Σ−1 )−1 9. Sampling µ from f (µ|Σ, Θ) ∼ N (¯ µ, Σ) ¯ = Σ( θi + Σ−1 µ) and Σ i=1 Σ P K −1 10. Sampling Σ−1 from f (Σ−1 |µ, Θ) ∼ Wishart({ i=1 (θ˜i − µ)(θ˜i − µ)′ + V }−1 , v + K)

B

Klaassen’s approximations

We first derive the useful expression f (st−1 |st , y1:t−1 , u1:T , Θ, P ). This distribution will be used in the approximations derived next. f (st−1 |st , y1:t−1 , u1:T , Θ, P )

f (st , st−1 , ut:T |y1:t−1 , u1:t−1 , Θ, P ) f (st , ut:T |y1:t−1 , u1:t−1 , Θ, P ) f (st−1 |y1:t−1 , u1:t−1 , Θ, P )f (st |st−1 , P )f (ut |st , st−1 , P )f (ut+1:T |st , P ) f (st , ut:T |y1:t−1 , u1:t−1 , Θ, P ) f (s |y ,u , Θ, P )f (st |st−1 , P )f (ut |st , st−1 , P ) P∞ t−1i 1:t−1 1:t−1 f (s |y , u , Θ, P )f (st |sit−1 , P )f (ut |st , sit−1 , P ) 1:t−1 1:t−1 t−1 i

= = =

where sit−1 stands for st−1 = i and f (ut |st , st−1 , P ) denotes the modified uniform distribution defined in Section 2.1. Note that the product f (st |st−1 , P )f (ut |st , st−1 , P ) leads to δ{ut ∈[0,kpst−1 ,st ]} δst−1 =st /k + δ{ut ∈[0,pst−1 ,st ,pst−1 ,st ]} δst−1 6=st .

B.1

Approximate GARCH : computation of the modified variance

2 2 = E[σt−1 |y1:t−1 , st , u1:T , Θ, P ]. We remind that σ ˜t−1,s t

2 E[σt−1 |y1:t−1 , st , u1:T , Θ, P ]

= =

∞ X

i=1 ∞ X

2 σt−1,s f (sit−1 |st , y1:t−1 , u1:T , Θ, P ) i t−1

2 2 (ωsit−1 + αsit−1 yt−2 + βsit−1 σ ˜t−2,s )f (sit−1 |st , y1:t−1 , u1:T , Θ, P ) i t−1

i=1

where sit−1 stands for st−1 = i.

C

Other MCMC algorithms for inferring the MS-GARCH model with finite regimes

Two papers (BPR,Bauwens, Preminger, and Rombouts (2010) and PMCMC,Bauwens, Dufays, and Rombouts (2013)) propose an MCMC method to infer the MS-GARCH model with finite states. While the technique for drawing a state vector is very different, the remainder of the MCMC procedure is shared. Let K define the fixed number of regimes and we specify the set Θ = {θ1 , ..., θK } denoting the GARCH parameters within each regime PK and the transition matrix P ∈ ℜK,K with k=1 pi,k = 1 for i = 1, ..., K. The MCMC method iterates over three full conditional distributions as follows 1. Sampling Θ ∼ f (Θ|y1:T , P, s1:T ), 2. Sampling P ∼ f (P |y1:T , Θ, s1:T ), 3. Sampling s1:T ∼ f (s1:T |y1:T , Θ, P ). The two subsections sketch the procedures of the two papers to sample the state vector from its full conditional distribution.

IHMM FOR DYNAMIC VOLATILITY

C.1

19

Sampling a state vector using BPR’s algorithm

Bauwens, Preminger, and Rombouts (2010) update one state variable at a time given the other states. While this method renders the posterior sample highly correlated, it exhibits the advantage of being straightforward to understand and to code. The full conditional distribution of one state variable is given by, t ∈ [2, T − 1],

f (st |y1:T , Θ, P, s1:t−1 , st+1:T )

∝

f (y1:T |Θ, P, s1:T )f (st |st−1 , P )f (st+1 |st , P ),

=

f (y1:T |Θ, P, s1:T )pst−1 ,st pst ,st+1 ,

∝

pst−1 ,st pst ,st+1

T Y

fN (yi |0, σi2 (s1:i )).

i=t

Algorithm 1 details how to draw an entire state vector using the previous derivation. Algorithm 1 BPR’s algorithm for k = 1 to K do QT Compute w1k = p1,k pk,s2 i=1 fN (yi |0, σi2 (s1 = k, s2:T )). end for for k = 1 to K do PK Compute the normalized weight : W1k = w1k / j=1 w1j end for Sample k ∼ W1 and set s1 = k for t = 2 to T − 1 do for k = 1 to K do QT Compute wtk = pst−1 ,k pk,st+1 i=t fN (yi |0, σi2 (s1:t−1 , st = k, st+1:T )). end for for k = 1 to K do PK Compute the normalized weight : Wtk = wtk / j=1 wtj end for Sample k ∼ Wt and set st = k end for for k = 1 to K do Compute wTk = psT −1 ,k fN (yT |0, σT2 (s1:T −1 , sT = k)). end for for k = 1 to K do PK Compute the normalized weight : WTk = wTk / j=1 wTj end for Sample k ∼ WT and set sT = k

C.2

Sampling a state vector using the PMCMC

At each MCMC iteration, the PMCMC algorithm runs a Sequential Monte Carlo in order to approximate the full conditional distribution s1:T |y1:T , Θ, P . From this approximation, it samples a full state vector. The posterior draws are therefore less correlated than the BPR’s algorithm but at the computational cost of using a SMC at each MCMC iteration. By denoting sˆ1:T the realization of the state vector of the previous MCMC step and by N the number of particles employed in the SMC, a new state vector is sampled by using the backward algorithm 3 which requires a run of the forward algorithm 2.

Funding This work was supported by the contract ”Projet d’Actions de Recherches Concert´ees” of the ”Communaut´e fran¸caise de Belgique”, granted by the ”Acad´emie universitaire Louvain” [12/17-045], by the Belgian F.S.R. Research program and by the contract ”Investissements d’Avenir” ANR-11-IDEX-0003/Labex Ecodec/ANR11-LABX-0047 granted by the Centre de Recherche en Economie et Statistique.

20

A. DUFAYS

Algorithm 2 Forward algorithm Compute the un-normalized weight : w11 = f (y1 |Θ, P, sˆ1 ) for i = 2 to N do Sample si1 ∼ f (s1 ) Compute un-normalized weight : w1i = f (y1 |Θ, P, si1 ) end for PN Compute the normalized weight : W1i = w1i / j=1 w1j for t = 2 to T do for i = 1 to N do PK i i Compute the auxiliary weights : gti = Wt−1 k=1 f (st = k|st−1 )f (yt |y1:t−1 , Θ, P, st = k) end for PN Compute the normalized weight : Πit|t−1 = gti / j=1 gtj Compute the un-normalized weight : wt1 = f (yt |y1:t−1 , Θ, P, sˆt ) for i = 2 to N do Sample Ait ∼ Πt|t−1 where Ait ∈ [1, N ] Ai

t Sample sit ∼ f (st |st−1 )

Ait

Compute un-normalized weight : wti = f (yt |y1:t−1 , Θ, P, sit )/gt end for for i = 1 to N do PN Compute the normalized weight : Wti = wti / j=1 wtj end for end for Algorithm 3 Backward algorithm - sampling a state vector Sample k ∼ WT and set sT = skT for t = T − 1 to 1 do for i = 1 to N do si

i Compute wt|T = wti λt t p(st+1 |sit , P ) where λqt =

PN

i f (yt+1 ,...,yt+τ |sit ,sit+1 ,...,sit+τ ,σt−1 ,Θ,P )1{si =q} t PN i i i ,si ,...,s ,σ f (y ,...,y |s t+1 t+τ t t+1 t+τ t−1 ,Θ,P ) i=1

i=1

with τ solves the equation βiτi = 0.001 for i = 1, ..., K + 1. end for PN j i i Compute the normalized weight : Wt|T = wt|T / j=1 wt|T Sample k ∼ Wt|T and set st = skt . end for

Acknowledgements The author warmly thanks Luc Bauwens for his precious advices and for his support. He is also grateful to Jeroen Rombouts as well as to conference participants of the 5th International Conference of the ERCIM WG on Computing and Statistics and of the IAAE 2014 for their useful comments that increased the quality of the manuscript. The author assumes the scientific responsibility and all errors are my own.

References Atchad, Y., and J. Rosenthal (2005): “On adaptive Markov chain Monte Carlo algorithms,” Bernoulli, 11(5), 815–828. Augustyniak, M., M. Boudreault, and M. Morales (2013): “Estimating the Markov-Switching GARCH Model with a Deterministic Particle Filter,” Working paper, Available at SSRN: http://ssrn.com/abstract=2365763 or http://dx.doi.org/10.2139/ssrn.2365763. Bauwens, L., A. Dufays, and B. De Backer (2011): “Estimating and forecasting structural breaks in financial time series,” Journal of Empirical Finance, Forthcoming, DOI: 10.1016/j.jempfin.2014.06.008. Bauwens, L., A. Dufays, and J. Rombouts (2013): “Marginal Likelihood for Markov Switching and Change-point GARCH Models,” Journal of Econometrics, 178 (3), 508–522. Bauwens, L., A. Preminger, and J. Rombouts (2010): “Theory and Inference for a Markov-switching GARCH Model,” Econometrics Journal, 13, 218–244. Beal, M. J., and P. Krishnamurthy (2006): “Gene expression time course clustering with countably infinite hidden Markov models,” Proceedings of the Conference on Uncertainty in Artificial Intelligence. Beskos, A., A. Jasra, and A. Thiery (2013): “On the Convergence of Adaptive Sequential Monte Carlo Methods,” Available at http://arxiv.org/pdf/1306.6462v2.pdf.

IHMM FOR DYNAMIC VOLATILITY

21

Billio, M., A. Monfort, and C. Robert (1999): “Bayesian estimation of switching ARMA models,” Journal of Econometrics, 93(2), 229 – 255. Blackwell, D., and J. MacQueen (1973): “Ferguson distributions via P´ olya urn schemes,” Annals of Statistics, 1, 353–355. Bulla, J., and I. Bulla (2006): “Stylized facts of financial time series and hidden semi-Markov models,” Computational Statistics & Data Analysis, 51(4), 2192–2209. Chib, S. (1996): “Calculating Posterior Distributions and Modal Estimates in Markov Mixture Models,” Journal of Econometrics, 75, 79–97. Chib, S., and I. Jeliazkov (2001): “Marginal Likelihood from the Metropolis-Hastings Output,” Journal of the American Statistical Association, 96, 270–281. Del Moral, P., A. Doucet, and A. Jasra (2006): “Sequential Monte Carlo samplers,” The Royal Statistical Society: Series B(Statistical Methodology), 68, 411–436. Diebold, F. (1986): “Comment on Modeling the Persistence of Conditional Variances,” Econometric Reviews, 5, 51–56. Doucet, A., N. de Freitas, and N. Gordon (2001): Sequential Monte Carlo Methods in Practice. Engle, R., and J. Rangel (2008): “The Spline-GARCH Model for Low-Frequency Volatility and its Global Macroeconomic Causes,” Review of Financial Studies, 21, 1187–1222. Ferguson, T. S. (1973): “A bayesian analysis of some nonparametric problem,” The annals of Statistics, 1, No 2, 209–230. Fox, E., E. Sudderth, M. Jordan, and A. Willsky (2011): “A sticky HDP-HMM with Application to Speaker Diarization,” Annals of Applied Statistics, 5(2A), 1020–1056. Francq, C., and J. Zakoian (2008): “Deriving the autocovariances of powers of Markov-switching GARCH models, with applications to statistical inference,” Computational Statistics and Data Analysis, 52, 3027–3046. Fruhwirth-Schnatter, S. (2004): “Estimating Marginal Likelihoods for Mixture and Markov-switching Models Using Bridge Sampling Techniques,” Econometrics Journal, 7, 143–167. Geweke, J. (1992): “Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments,” Bayesian Statistics, Oxford University Press., 4, 169–193. (2007): “Interpretation and Inference in Mixture Models: Simple MCMC works,” Computational Statistics and Data Analysis, 51, 3259–3550. Geweke, J., and G. Amisano (2011): “Optimal prediction pools,” Journal of Econometrics, 164(1), 130–141. Geyer, C. J. (1992): “Practical Markov Chain Monte Carlo,” Statistical Science, 7(4), 473–511. Haas, M., S. Mittnik, and M. Paolella (2004): “A New Approach to Markov-Switching GARCH Models,” Journal of Financial Econometrics, 2, 493–530. Hamilton, J. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econometrica, 57, 357–384. He, Z., and J. Maheu (2010): “Real Time Detection of Structural Breaks in GARCH Models,” Computational Statistics and Data Analysis, 54, 2628–2640. Henneke, J., S. Rachev, F. J. Fabozzi, and M. Nikolov (2011): “MCMC-based estimation of Markov Switching ARMA-GARCH models,” Applied Economics, 43(3), 259–271. Hilebrand, E. (2005): “Neglecting parameter changes in GARCH models,” Journal of Econometrics, 129, 121–138. Jensen, M. J., and J. M. Maheu (2010): “Bayesian Semiparametric Stochastic Volatility Modeling,” Journal of Econometrics, 157(2)(2309), 306–316. (2013): “Bayesian semiparametric multivariate GARCH modeling,” Journal of Econometrics, 176(1), 3–17. (2014): “Estimating a semiparametric asymmetric stochastic volatility model with a Dirichlet process mixture,” Journal of Econometrics, 178(P3), 523–538. Jochmann, M. (2013): “Modeling U.S. Inflation Dynamics : A Bayesian Nonparametric Approach,” Econometric Reviews, Forthcoming. Kass, R., and A. Raftery (1995): “Bayes Factors,” Journal of the American Statistical Association, 90, 773–795. Kivinen, J., E. Sudderth, and M. Jordan (2007): “Learning multiscale representations of natural scenes using Dirichlet processes,” Proceedings of the IEEE International Conference on Computer Vision. Klaassen, F. (2002): “Improving GARCH volatility forecasts with regime-switching GARCH,” Empirical Economics, 27(2), 363– 394. Koop, G., M. H. Pesaran, and S. M. Potter (1996): “Impulse response analysis in nonlinear multivariate models,” Journal of Econometrics, 74(1), 119–147. Lamoureux, C., and W. Lastrapes (1990): “Persistence in Variance, Structural Change, and the GARCH Model,” Journal of Business and Economic Statistics, 8, 225–234. Meng, X.-L., and W. Wong (1996): “Simulating Ratios of Normalizing Constants via a Simple Identity : A theoretical Exploration,” Statistica Sinica, 6, 831–860. Munkres, J. (1957): “Algorithms for the Assignment and Transportation Problems,” Journal of the Society o f Industrial and Applied Mathematics, 5(1), 32 – 38. Neal, R. (2003): “Slice sampling,” The Annals of Statistics, 31, 705–741. Neal, R. M. (1998): “Annealed Importance Sampling,” Statistics and Computing, 11, 125–139. Pesaran, M. H., D. Pettenuzzo, and A. Timmermann (2006): “Forecasting Time Series Subject to Multiple Structural Breaks,” Review of Economic Studies, 73, 1057–1084. Rabiner, L. R. (1989): “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, pp. 257–286. Raftery, A. E., D. Madigan, and J. A. Hoeting (1998): “Bayesian Model Averaging for Linear Regression Models,” Journal of the American Statistical Association, 92, 179–191. Sethuraman, J. (1994): “A constructive definition of dirichlet priors,” Statistica Sinica, 4, 639–650. Song, Y. (2013): “Modelling Regime Switching and Structural Breaks with an Infinite Hidden Markov Model,” Journal of Applied Econometrics, forthcoming, doi: 10.1002/jae.2337. Stock, J. H., and M. W. Watson (1996): “Evidence on Structural Instability in Macroeconomic Time Series Relations,” Journal of Business & Economic Statistics, 14, 11–30. Teh, Y., M. Jordan, M. Beal, and D. M. Blei (2006): “Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581.

22

A. DUFAYS

Van Gael, J., Y. Saatci, Y. Teh, and Z. Ghahramani (2008): “Beam sampling for the infinite hidden Markov model,” Proceedings of the 25th International Conference on Machine learning. Wong, C., and W. Li (2000): “On a Mixture Autoregressive Model,” Journal of the Royal Statistical Society, Series B, 62, 95–115.

IHMM FOR DYNAMIC VOLATILITY

D

23

Notes

Section 0 1. Hamilton (1989) used a forward algorithm to integrate the latent variables and by doing so, was able to compute the likelihood at any parameter value. Chib (1996) embedded a backward step in the algorithm in order to ease the sampling of the latent variables. These methods rely on the assumption that the likelihood at time t only depends on the current state (i.e. no path dependence).

Section 1 2. A new regime becomes more and more unlikely as long as the number of observations increases.

Section 2 3. Note that if the model badly fits the future observations, the ESS can exhibit values smaller than N2 for any horizons, which could create an infinite loop. To avoid such a case, we add at least one new observation at each iteration.

Section 3 4. available on the author’s website. P∞ 5. The autocorrelation time is computed by batch means (see Geyer (1992)) and is defined as 1 + 2 i=1 ρi where ρi is the autocorrelation coefficient of order i between the posterior draws of a state variable. 6. This high acceptance rate is partly due to the beam sampler which drastically reduces the possible paths in the forward-backward algorithm.

Section 4 7. The marginal likelihood of the spline-GARCH is maximized with k = 3. 8. For the spline-GARCH model, the one step ahead predictive marginal log-likelihood is computed by assuming 2 that the spline function is constant after its last observed value (i.e. τh2 = τh−1 ).

24

A. DUFAYS

1. 2. 3. 4.

f (s1:T |Θ, P, HDir , π, u1:T , y1:T ) f (u1:T |Θ, P, HDir , π, s1:T , y1:T ) f (P |Θ, HDir , π, u1:T , s1:T , y1:T ) f (Θ|µ, Σ, P, HDir , π, u1:T , s1:T , y1:T )

5. f (µ, Σ|Θ, P, HDir , π, u1:T , s1:T , y1:T ) 6. f (HDir |Θ, P, π, u1:T , s1:T , y1:T ) 7. f (π|Θ, P, HDir , u1:T , s1:T , y1:T )

Table 1: IHMM-GARCH MCMC sampler

Prior Distributions of the GARCH parameters For each regime i : θ˜i ≡ {˜ ωi , α ˜ i , β˜i } ∼ N(µ,Σ) µ µ Σ

Hierarchical parameter : µ ∼ N(µ,Σ) 0.5 0.15 = {log( 4.5 ), log( 0.85 ), log( 0.7 0.8 )} = I3

Hierarchical parameter : Σ Σ−1 ∼ W(V ,v) 1 = 5v I3 V v = 50

Prior Distributions of the Dirichlet processes η ∼ G(10, 21 ) η ∼ G(1, 10)

Strong prior Loose prior

λ + κ ∼ G(1000, 1) λ + κ ∼ G(1, 10)

ρ= ρ=

κ λ+κ κ λ+κ

∼ Beta(10000, 6) ∼ Beta(100, 1)

Table 2: Prior distributions. The d-dimensional identity matrix is denoted by Id .

Name DGPHM DGPBDR1

Type CP CP

Regimes 3 3

Break point {1000, 2000} obs. {1000, 2000} obs.

ω {0.2; 0.6; 0.1} {0.2; 0.7; 0.4}

α {0.1} {0.1; 0.2; 0.2}

β {0.8} {0.8; 0.7; 0.4}

Name

Type

Regimes

ω

α

β

DGPBP R

MS

2

{0.3; 2}

{0.35; 0.1}

{0.2; 0.6}

DGPBDR2

MS

2

{0.6; 0.4}

{0.1; 0.2}

{0.8; 0.4}

DGPEmp−SP

MS

2

{0.14; 0.01}

{0.12; 0.035}

{0.83; 0.95}

DGPnoBreak

—

1

Tr. Matrix (P) 0.98 0.02 0.04 0.96 0.9999 0.0001 0.0005 0.9995 0.9964 0.0035 0.0018 0.9982 —

{0.5}

{0.2}

{0.7}

Table 3: Data Generating Processes of the six simulated series.

HMP-MH Kl-MH

DGPHM 77% 78%

DGPBDR1 74% 79%

DGPBP R 51% 57%

DGPBDR2 86% 90%

DGPEmp−SP 46% 59%

Table 4: Acceptance rate of the state vector using the two different approximated models.

IHMM FOR DYNAMIC VOLATILITY # of regimes

1

2

DGPHM DGPBDR1 DGPBP R DGPBDR2 DGPEmp−SP DGPnoBreak

0 0 0 0 0 0.99

0 0 0.96 0.89 0.86 0.01

DGPHM DGPBDR1 DGPBP R DGPBDR2 DGPEmp−SP DGPnoBreak

0 0 0 0 0 0.97

0 0 0.71 0.70 0.61 0.03

3 4 Strong prior 0.54 0.35 0.67 0.30 0.04 0.00 0.10 0.01 0.10 0.03 0.00 0 Loose prior 0.37 0.21 0.54 0.26 0.17 0.07 0.22 0.07 0.32 0.05 0 0

25

5

6

7

up to 7

0.09 0.03 0 0 0.00 0

0.01 0.00 0 0 0.00 0

0.00 0 0 0 0 0

0.00 0 0 0 0 0

0.20 0.13 0.03 0.01 0.01 0

0.11 0.05 0.01 0.00 0.00 0

0.07 0.02 0.01 0.00 0.00 0

0.03 0.00 0.00 0.00 0

Table 5: Posterior probabilities of the number of regimes for six simulated series generated from DGPs displayed in Table 3 given two different sets of IHMM hyper-parameters. The correct number of regimes is bolded.

Strong

Min 0.00

Loose

0.00

Strong Loose

DGPHM Mean Max 0.07 0.38

Min 0

0.09 0.40 DGPBDR2 Min Mean Max 0 0.01 0.29 0

0.02

DGPBDR1 Mean Max 0.03 0.24

0

0.04 0.37 DGPSP Min Mean Max 0.05 0.13 0.62

0.30

0.04

0.17

Min 0.03

DGPBP R Mean Max 0.06 0.31

0.04 0.10 0.65 DGPnoBreak Min Mean Max 0 0.00 0.15

0.55

0

0.00

0.09

Table 6: Minimum, mean and maximum Hamming distances of the simulated state vectors compared to the true one. GARCH parameters α ∼ U [0, 1] β|α ∼ U [0, 1 − α] Weight parameters {ω0 , ..., ωk+1 } ω0 ∼ U [0, 15] ωi ∼ N (0, 20) ∀i ∈ [1, k + 1] Table 7: Prior distributions of the spline-GARCH parameters. # of regimes IHMM-GARCH - Strong prior IHMM-GARCH - Loose prior

1 0 0

2 0.47 0

3 0.33 0

4 0.12 0

5 0.06 0.28

6 0.02 0.34

7 0.00 0.21

8 0.00 0.09

9 0 0.02

10 0 0.03

up to 10 0 0.01

Table 8: Posterior probabilities of the number of regimes for the S&P 500 daily index. # of regimes IHMM-HMP - Strong prior IHMM-HMP - Loose prior

1 0 0

2 0.10 0.23

3 0.41 0.29

4 0.34 0.17

5 0.11 0.13

6 0.03 0.07

7 0.00 0.05

up to 7 0 0.03

Table 9: Posterior probabilities of the number of regimes for the S&P 500 daily index when estimating the IHMM-HMP model.

MLL

GARCH

spline-GARCH

-4515.8

-4506

IHMM-HMP Strong Loose -4495.41 -4490.17

IHMM-GARCH Strong Loose -4495.23 -4483.52

Table 10: Marginal log-likelihoods of various volatility models.

26

A. DUFAYS

Log-score -2178.61 -2182.02 -2178.86 -2180.21 -2178.61

IHMM-GARCH (loose) 0.53 x 0.65 0.53 0.53

IHMM-HMP (loose) 0.18 0.64 x 0.40 0.18

spline-GARCH 0.29 0.33 0.35 x 0.29

GARCH 0 0.02 0 0.07 x

Table 11: Maximum Log-score function and their corresponding weights considering several model combinations for the S&P 500 daily returns. The letter x means that the model is not taking part in the combination.

Series

BAC BA GOLD

GARCH MLL -6128.4 -6174.9 -4350.4

spline-GARCH MLL -6125.4 -6177.37 -4342.7

nb. knots 5 3 2

IHMM-HMP Strong MLL nb regimes -6031.10 7 -6163.20 3 -4251.94 4

IHMM-GARCH

Loose MLL nb regimes -6027.04 9 -6147.46 3 -4176.13 5

Strong MLL nb regimes -6058.21 5 -6155.29 4 -4256.47 5

MLL -6037.2 -6140.1 -4196.18

Loose nb. regimes 9 4 8

Table 12: MLLs of various volatility models and financial time series. The number of regimes corresponds to the most observed number of regimes during the MCMC sampler. The number of knots is selected by maximizing the MLL. Description of the time series : BAC: Bank of America Corporation; BA: Boeing Company; GOLD (Gold Bullion LBM USD/Troy ounce).

BAC Log-score -3138.62 -3158.53 -3138.62 -3138.62 -3140.64

IHMM-GARCH (loose) 0.79 x 0.79 0.79 0.83

IHMM-HMP (loose) 0 0.74 x 0 0.07 BA

spline-GARCH 0 0 0 x 0.10

GARCH 0.21 0.26 0.21 0.21 x

Log-score -2974.27 -2977.38 -2974.39 -2974.48 -2975.17

IHMM-GARCH (loose) 0.57 x 0.66 0.62 0.63

IHMM-HMP (loose) 0.12 0.48 x 0.13 0.20 GOLD

spline-GARCH 0.09 0.20 0.10 x 0.16

GARCH 0.22 0.32 0.24 0.25 x

Log-score -2367.96 -2370.56 -2394.47 -2369.31 -2367.96

IHMM-GARCH (loose) 0.24 x 0.67 0.26 0.24

IHMM-HMP (loose) 0.54 0.67 x 0.53 0.54

spline-GARCH 0.22 0.28 0.19 x 0.22

GARCH 0.00 0.05 0.14 0.21 x

Table 13: Maximum Log-score function and their corresponding weights considering several model combinations and various time series. The letter x means that the model is not taking part in the combination.

IHMM FOR DYNAMIC VOLATILITY

6

25

5

27

3

2.5

20

4

2 15

3

1.5 10

2

1

5

1

0 0

2

4

6

0 0

0.5

0.5

1

0 0

0.5

1

Figure 1: Prior densities of the GARCH parameters. The density of ω is depicted on the left Figure. In the middle, the graphic displays the prior density of α and the last plot is devoted to the density of β.

28

A. DUFAYS

900

1 KL−MH HMP−MH BPR PMCMC

800 700

900 KL−MH HMP−MH BPR PMCMC

0.9 0.8

800 700

0.7

600

1 KL−MH HMP−MH BPR PMCMC

0.8 0.7

600

0.6

0.6

500

500 0.5

0.5

400

400 0.4

300

0.4 300

0.3

200

Time in min

Act Hamming

0.2

100

0.1 0 0

0.3

200

0.2

100 0

KL−MH HMP−MH BPR PMCMC

0.9

200

400

600

800

1000

0

0.1

Time in min

DGP - HM

900 800 700 600

0 0

200

400

600

800

1000

DGP - BDR1

1 KL−MH HMP−MH BPR PMCMC

Act Hamming

2000 KL−MH HMP−MH BPR PMCMC

0.9 0.8

1 KL−MH HMP−MH BPR PMCMC

1800 1600

KL−MH HMP−MH BPR PMCMC

0.9 0.8

0.7

1400

0.7

0.6

1200

0.6

0.5

1000

0.5

0.4

800

0.4

0.3

600

0.3

0.2

400

0.2

0.1

200

0.1

500 400 300 200 100 0

Time in min

Act Hamming

0 0

200

400

600

800

1000

DGP - BPR

2000

Time in min

Act Hamming

0 0

200

400

600

800

1000

DGP - BDR2

1 KL−MH HMP−MH BPR PMCMC

1800 1600

0.8 0.7

1200

0.6

1000

0.5

800

0.4

600

0.3

400

0.2

200

0.1

Time in min

Act Hamming

KL−MH HMP−MH BPR PMCMC

0.9

1400

0

0

0 0

200

400

600

800

1000

DGP - Emp-SP

Figure 2: For each DGP, the histograms display the elapsed time in minutes for sampling 30.000 MCMC draws as well as the autocorrelation time of the Hamming distance statistic over the 10.000 stored draws obtained from each algorithm. In addition, a second graphic shows the normalized Hamming distance of the first 1000 saved draws of the state vector with the true state of the simulated series.

IHMM FOR DYNAMIC VOLATILITY

7

29

1 0.9

6

0.6 0.8

5

0.5

4

0.4

3

0.3

2

0.2

1

0.1

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

500

1000

HM -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

HM - αt |y1:T

1000

1500

2000

2500

3000

2500

3000

HM - βt |y1:T

9

1 0.9

8

0.6 0.8

7 0.5

0.7

6 5

0.4

4

0.3

0.6 0.5 0.4

3

0.3

0.2 2

0.2 0.1

1

0.1

0 0

500

1000

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

BDR1 -

1000

1500

2000

2500

3000

0 0

500

BDR1 - αt |y1:T

1000

1500

2000

BDR1 - βt |y1:T

8

1 0.9

7

0.6 0.8

6

0.5

0.7

5

0.6

0.4 4

0.5 0.3

0.4

3

0.3

0.2

2

0.2 0.1

1

0.1

0 0

500

BPR -

1000

1500

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

0 0

500

BPR - αt |y1:T

1000

1500

BPR - βt |y1:T

7

1 0.9

6

0.6 0.8

5

0.5

4

0.4

3

0.3

2

0.2

1

0.1

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

500

1000

BDR2 -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

BDR2 - αt |y1:T

1000

1500

2000

2500

3000

2500

3000

2500

3000

BDR2 - βt |y1:T

12

1 0.9 0.6

10

0.8 0.5

0.7

8

0.6

0.4 6

0.5 0.3

0.4

4

0.3

0.2

0.2

2

0.1 0.1

0 0

500

1000

Emp-SP -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

Emp-SP - αt |y1:T

1000

1500

2000

Emp-SP - βt |y1:T

5.5

1 0.9 0.6 0.8 0.5

0.7

5

0.6

0.4

0.5 0.3

0.4

4.5

0.3

0.2

0.2 0.1 0.1 4 0

500

1000

no break -

1500

2000

2500

ωt |y 1−αt −βt 1:T

3000

0 0

500

1000

1500

2000

no break - αt |y1:T

2500

3000

0 0

500

1000

1500

2000

no break - βt |y1:T

Figure 3: Simulation study with the strong prior : ML estimates given the true states (solid grey lines) compared to the posterior medians (dashed black lines) and the corresponding 90% density interval. Results for each simulated data are presented in row in the same order as in Table 3 (i.e. HM, BDR1, BPR, BDR2, Emp-SP and no break). The first column of graphics displays the unconditional variance ( 1−αωtt−βt |y1:T ) whereas the two others respectively show the parameters αt |y1:T and βt |y1:T .

30

A. DUFAYS

9

1 0.9

8

0.6 0.8

7 0.5

0.7

6 5

0.4

4

0.3

0.6 0.5 0.4

3

0.3

0.2 2

0.2 0.1

1

0.1

0 0

500

1000

HM -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

HM - αt |y1:T

1000

1500

2000

2500

3000

2500

3000

HM - βt |y1:T

12

1 0.9 0.6

10

0.8 0.5

0.7

8

0.6

0.4 6

0.5 0.3

0.4

4

0.3

0.2

0.2

2

0.1 0.1

0 0

500

1000

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

BDR1 -

1000

1500

2000

2500

3000

0 0

500

BDR1 - αt |y1:T

1000

1500

2000

BDR1 - βt |y1:T

9

1 0.9

8

0.6 0.8

7 0.5

0.7

6 5

0.4

4

0.3

0.6 0.5 0.4

3

0.3

0.2 2

0.2 0.1

1

0.1

0 0

500

BPR -

1000

1500

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

0 0

500

BPR - αt |y1:T

1000

1500

BPR - βt |y1:T

7

1 0.9

6

0.6 0.8

5

0.5

4

0.4

3

0.3

2

0.2

1

0.1

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

500

1000

BDR2 -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

BDR2 - αt |y1:T

1000

1500

2000

2500

3000

2500

3000

2500

3000

BDR2 - βt |y1:T

140

1 0.9

120

0.6 0.8

100

0.5

80

0.4

60

0.3

40

0.2

20

0.1

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

500

1000

Emp-SP -

1500

2000

2500

3000

0 0

500

ωt |y 1−αt −βt 1:T

1000

1500

2000

2500

3000

0 0

500

Emp-SP - αt |y1:T

1000

1500

2000

Emp-SP - βt |y1:T

5.5

1 0.9 0.6 0.8 0.5

0.7

5

0.6

0.4

0.5 0.3

0.4

4.5

0.3

0.2

0.2 0.1 0.1 4 0

500

1000

no break -

1500

2000

2500

ωt |y 1−αt −βt 1:T

3000

0 0

500

1000

1500

2000

no break - αt |y1:T

2500

3000

0 0

500

1000

1500

2000

no break - βt |y1:T

Figure 4: Simulation study with the loose prior : ML estimates given the true states (solid grey lines) compared to the posterior medians (dashed black lines) and the corresponding 90% density interval. Results for each simulated data are presented in row in the same order as in Table 3 (i.e. HM, BDR1, BPR, BDR2, Emp-SP and no break). The first column of graphics displays the unconditional variance ( 1−αωtt−βt |y1:T ) whereas the two others respectively show the parameters αt |y1:T and βt |y1:T .

IHMM FOR DYNAMIC VOLATILITY

220

220

200

200

180

180

160

160

140

140

120

120

100

100

80

80

60

60

40

40

20 0 0

31

20 500

1000

BPR - Strong prior

1500

0 0

500

1000

1500

BPR - Loose prior

Figure 5: Marginal posterior distributions of the duration of the current regime over time for the BPR simulated series given two different priors.

32

A. DUFAYS

15

15

10

10

5

5

0

0

−5

−5

−10 05/1999

05/2001

05/2003

05/2005

05/2007

BDR - PMCMC

05/2009

04/2011

−10 05/1999

05/2001

05/2003

05/2005

05/2007

05/2009

04/2011

IHMM-GARCH - Strong prior

Figure 6: The left graphic shows the S&P 500 daily index with structural breaks estimated by the PMCMC algorithm of BDR in vertical lines. The right one displays the same time series with structural breaks estimated by the IHMM-GARCH model with the strong prior.

IHMM FOR DYNAMIC VOLATILITY

33

14 12 10 8 6 4 2 0 05/1999

05/2001

05/2003

05/2005

S&P 500 -

05/2007

05/2009

04/2011

05/2007

05/2009

04/2011

05/2007

05/2009

04/2011

ωt |y 1−αt −βt 1:T

0.6 0.5 0.4 0.3 0.2 0.1 0 05/1999

05/2001

05/2003

05/2005

S&P 500 - αt |y1:T 1

0.8

0.6

0.4

0.2

0 05/1999

05/2001

05/2003

05/2005

S&P 500 - βt |y1:T

Figure 7: Posterior medians of IHMM-GARCH parameters (dashed lines) with 90% percent density interval (dashed grey lines) for the S&P 500 daily index with the strong prior compared to the posterior medians of PMCMC parameters (grey lines) over time. The graphic at the top displays the unconditional variance ( 1−αωtt−βt |y1:T ) while the two others respectively show the parameters αt |y1:T and βt |y1:T .

34

A. DUFAYS

15

10

5

0

−5

−10 05/1999

05/2001

05/2003

05/2005

05/2007

05/2009

04/2011

05/2007

05/2009

04/2011

05/2009

04/2011

Spline-GARCH - τ |y1:T 15

10

5

0

−5

−10 05/1999

05/2001

05/2003

05/2005

IHMM-GARCH -

p

ωt /(1 − αt − βt )|y1:T

15

10

5

0

−5

−10 05/1999

05/2001

05/2003

IHMM-HMP -

05/2005

p

05/2007

ωt /(1 − αt − βt )|y1:T

Figure 8: Posterior modes p of the long term standard deviation of the spline-GARCH (i.e. τ , top Figure) and of the IHMM models (i.e. ωt /(1 − αt − βt ), middle and bottom Graphics). On the middle graphic, the IHMMGARCH long term component with the strong prior appears in solid red line while the loose prior is drawn with black dashed dot one. The last Figure shows the same dynamics but for the IHMM-HMP (strong prior in solid red and loose prior in black dashed).

IHMM FOR DYNAMIC VOLATILITY

5

5

4.5

4.5

4

4

3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

1 0

2000

4000

6000

8000

10000

1 0

2000

k = 0.01 5

4.5

4.5

4

4

3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

2000

4000

6000

8000

10000

1 0

2000

k = 0.1 5

4.5

4.5

4

4

3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

2000

4000

k = 0.9

6000

8000

10000

4000

6000

8000

10000

6000

8000

10000

k = 0.5

5

1 0

4000

k = 0.05

5

1 0

35

6000

8000

10000

1 0

2000

4000

k=1

Figure 9: Average number of allowed trajectories in (29) for several values of k computed from 10.000 MCMC draws on the S&P 500 daily returns.

Infinite-State Markov-switching for Dynamic Volatility Models : Web ...

The Market for Volatility Trading

forecasting volatility - CiteSeerX

Understanding Volatility

Hedging volatility risk

Empirical Evaluation of Volatility Estimation

Comparison results for stochastic volatility models via ...

Validity of Edgeworth expansions for realized volatility ...

The new market for volatility trading

Preliminary Resource Management for Dynamic Parallel Applications ...

Comparing Alternatives for Capturing Dynamic ...

DYNAMIC GAUSSIAN SELECTION TECHNIQUE FOR ...

DREAM: Dynamic Resource Allocation for Software-defined ...

EQUILIBRIUM RESULTS FOR DYNAMIC ...

Dynamic workflow model fragmentation for distributed execution

Learning Methods for Dynamic Neural Networks - IEICE

Encouraging Essentials for a Dynamic Ministry

HOW DYNAMIC ARE DYNAMIC CAPABILITIES? 1 Abstract ...

Corporate Taxes and Stock Volatility