Estimating parameters in stochastic compartmental ...

Viewer
Transcript

IMA Journal of Mathematics Applied in Medicine & Biology (1998) 15, 19-40

Estimating parameters in stochastic compartmental models using Markov chain methods GAVIN J GIBSON

Biomathematics and Statistics Scotland, King's Buildings, MayfieldRoad, Edinburgh EH9 3JZ, UK ERIC RENSHAW Department of Statistics and Modelling Science, Livingstone Tower, University of Strathdyde, 26 Richmond Street, Glasgow Gl 1XH, UK [Received 7 March 1997 and in revised form 18 July 1997] Markov chain Monte Carlo methodology is presented for estimating parameters in stochastic compartmental models from incomplete observations of the corresponding Markov process. The methods, which are based on the Metropolis-Hastings algorithm, are developed in the context of epidemic models. Their use is illustrated for the particular case where only susceptible, infective, and removed states are represented using simulated realizations of the process. By comparing estimated likelihoods with theoretical forms, in cases where these can be derived, or with the known model parameters, we show that the methods can be used to provide meaningful estimates of parameters and parameter uncertainty. Potential applications of the techniques are also discussed. Keywords: stochastic compartment models; parameter estimation; Markov chain Monte Carlo methods; hidden Markov models.

1. Introduction The field of stochastic modelling of biological and ecological systems (Durrett & Levin, 1994) is currently undergoing considerable development as advances in computing power make investigation of complex stochastic models by simulation methods more feasible. Mathematicians have contributed to the understanding of stochastic models, and a range of papers can be found in the literature of probability theory and statistical physics characterizing the theoretical properties of a large variety of stochastic models. For example, Cox & Durrett (1988) have provided a rigorous mathematical treatment of the large-scale 'shapes' and velocities of spatio-temporal models for epidemics and forest fires. Other questions of interest include critical phenomena (for example, phase transitions between persistence and extinction), and several researchers have investigated this topic for the contact process (Liggett, 1985; Brower et al., 1978; Buttell et al., 1993). Because a full analytic treatment of a dynamical stochastic system is rarely feasible, attention has also focused on how models might be approximated by systems which are amenable to analysis. Pairwise approximation (see, for example, Sato et al., 1994; Levin & Durrett, 1996), a technique borrowed from statistical physics which simplifies spatio-temporal stochastic models by neglecting correlations beyond nearest neighbours, is one example. Another approach is the normal approximation (Whittle, 1957; Isham, 1991) whereby assumptions of normal© Oxford University Press 1998

20

GAVIN J GIBSON AND ERIC RENSHAW

ity are made in order to render tractable analysis of nonlinear stochastic processes via the moment-generating function. All these approaches share the feature of being aimed primarily at characterizing the dependence of model behaviour, or its statistical properties, on the model parameters. In this paper we are concerned with the inverse problem. That is, given an observation of a process, which is believed to be governed by a known stochastic model, how can we obtain an indication of the range of model parameters which could plausibly explain the observation? This problem is, of course, one of statistical inference and, depending on our philosophy, the 'range' may correspond to a posterior density or a confidence interval. Although a vast literature on statistical inference exists, there are relatively few instances where continuous-time nonlinear stochastic models of the kind considered in this paper have been fitted to observations. This is partly due to the fact that, although likelihoods can be written down when observations are in some sense complete, observations of biological processes from practical experiments typically record only a subset of the information that defines the evolution of the system. Computing the likelihood of the observations necessitates 'integrating out' such uncertainty and this may present great difficulty. Because of the difficulties inherent in working directly with stochastic dynamical systems, researchers may choose to fit descriptive models, such as time-series models (Box & Jenkins, 1970), that do not necessarily reflect their belief about underlying processes. Alternatively, they may opt to fit deterministic models to observations, for example, by least-squares minimization. Whilst this latter approach can be extremely useful there may be problems in assigning measures of uncertainty to estimated parameters. Deterministic models accurately reflect the behaviour of stochastic versions in many situations, but in other cases the agreement, both qualitative and quantitative, can be poor. Here we describe a methodology for fitting stochastic dynamical models directly to observations which allows parameter uncertainty to be treated within a statistical framework. Advances in stochastic integration methods—in particular, Markov chain Monte Carlo (MCMC) methods (see, for example, Metropolis et al., 1953; Hastings, 1970; Smith & Roberts, 1993; Besag & Green, 1993)—are certainly relevant to our needs. For example, Gibson (1997) has applied MCMC methods to fit spatio-temporal 5/ (susceptibleinfective) models to observations of the spread of virus diseases in orchards where temporal sampling is infrequent. A feature of this situation is that the events that occur between observation times, these being infections of previously healthy individuals, are specified uniquely by the observations. An MCMC algorithm is then applied to integrate out the uncertainty regarding the order in which these events have occurred to enable parameter likelihoods to be computed. In this paper we consider a more complex situation where the observations do not uniquely specify the events that occur. Specifically, we focus attention on systems whose behaviour is governed by nonlinear stochastic epidemic compartment models such as the SIR (susceptible-infected-removed) or the more general SEIR model, where E denotes a latent state. These models and their many variants (see, for example, Murray, 1989; Renshaw, 1991) have seen widespread application in studies of epidemics in humans, plants, and animals. Furthermore, related compartment models, such as the Lotka-Volterra and other predator-prey formulations, have been applied in ecology. Thus the simple models considered here are often the building blocks for modelling studies of complex epidemiological or ecological systems (for example, Yakowitz et al., 1996) and are therefore an appropriate starting point for our

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

21

own investigations. In many systems to which compartment models are applied, not all the transitions between compartments, such as that from the susceptible to the latent state in an SEIR model, may be observable. However, the corresponding transition rate may be an important determinant of the system dynamics. We believe the methodology described in this paper represents progress on the problem of estimating transition rates between unobservable states in stochastic compartment models. 2. Parameter estimation in stochastic epidemic models Consider a stochastic SEIR compartment model for an epidemic in a population whose state at time / is defined by the vector of non-negative integers s(t) = (S(t), E(t), I(t)) denoting the numbers of individuals in the susceptible, latent, and infective states. Without ambiguity these variables are also used to denote the sets of individuals in these classes at time t. The model is specified by a parameter vector a = (a\, a2, aj,, a4). Assume that each susceptible gives birth at rate a\ and enters the latent state at rate a2l(t). An individual in E becomes infective at rate a3, while any infected individual dies, corresponding to removal, at rate a 4 . Events of these four types are respectively denoted by B, L, I, and D. We adopt the convention that s(t) is left-continuous and therefore represents the state of the system prior to the occurrence of any event at t. The notation s(t+), and obvious extensions, will be used to denote the state of the system immediately after the occurrence of any events at time t. Formally, the model is specified by the equations Pr(S(f + dr) = S(t) + 1) = a,S(O dt

(= r)B(a, s(t)) dt),

Pr(£(r + dt) = E(t) + 1) = a2S(t)I(t) dt

(= ^ L (a, s(t)) dt),

Pr(/(f + df) = 7(0 + 1) = a3£(f) dt

(= m(a, s(t)) dt),

Pr(/(f + dr) = /(0 - 1) = 04/(0 dr

(= /?D(a, s(t)) dt).

(2.1)

Assume that a is unknown and that the population is observed over a period [0, T] during which only the occurrence and times of events of type B and D are recorded. Events of type L and I are referred to as hidden events. Thus the observations record the evolution of the quantity N(t) = S(t) + E(t) + I(t) over the period [0, T]. We consider the problems of estimating the unknown parameter a from knowledge of the initial state s<> and N(t) over t e [0, T], and of quantifying the uncertainty in the estimate. These problems are tackled using a likelihood approach. However, rather than dealing explicitly with the likelihood of the observations given the parameters, we consider only its relative values over the parameter space. This information is, nevertheless, sufficient for likelihood-based parameter estimation and, furthermore, it enables the problem to be recast in a form which can exploit the powerful computational methodology more commonly applied in Bayesian inference. This is done as follows. For simplicity we restrict attention to a bounded parameter space A, in which all rates are strictly positive and, initially, we assume that so = s(0) is known. A realization s of the process over [0, T] is specified by the initial state 5(0), the sequence of events of type B, L, I, and D occurring in [0, T], and their times of occurrence. Let k(s) denote the

22

GAVIN J GIBSON AND ERIC RENSHAW

number of events occurring, £, e {B, L, I, D} = 11 (1 ^ j ^ k(s)) denote the type of the jth event, and /, e (0, T) denote its time of occurrence. Let Q be the set of all such s for which 5(0) = So- For a selected uniformly from A, and s € Q simulated from the model with parameter o, the joint density of (o, s) e A x £2 is defined, up to a constant of proportionality, by /5(a,s)dads
(2.2)

where

L(a, s) = exp [ -(J - tkU)) ^ r]n(a, s(tm)) j / e.(a,s(ti))exp

j= ]

^

\

I —(tj — f,-_i) > Tjn(a,s(ti)) I . \ neil I

(2.3)

Now let i? c Q denote the set of realizations consistent with the observation N(t) (t e [0, T]). Any realizations sl, s2 e R are identical in terms of the times of events of type B and D, but they may differ in terms of the number and timings of events of type L or I. Our parameter-estimation approach is to consider a density n(a, s) on A x R which is proportional to fi(a, s). Thus n(a, s) represents the posterior joint density of a and s conditional on the observations N(t) and SQ, assuming a uniform prior for a. For simplicity, we adopt the common practice of using the symbol n to denote the joint density 7r(a, s) and its associated marginal and conditional densities. Now, the marginal density

;r(a) = f f n(a\ s')n(a', s') ds' da'

(2.4)

can be interpreted as giving the relative values of a parameter likelihood G(a) over A, thereby enabling us to make inference on a. As we will show, the space A x R has a complicated high-dimensional structure so that estimation of the integral (2.4) is problematic. Nevertheless, progress can be made using MCMC methods to exploit our knowledge of the density n(a, s) up to a constant of proportionality. Specifically, we can construct a Markov chain on A x R whose equilibrium distribution is ;r(a, s), and which is subsequently used to generate a sequence of samples (aj,Sj) from n(a, s). This sequence can then be used to estimate the marginal density n(a), for example by averaging the conditionals n(aj \ Sj) over j or by forming a histogram from the ay-. 3. Construction of reversible-jump MCMC sampler In this section we describe the construction of the Markov chain. An upper bound for the number of hidden events in any j e R for which L(o, s) ^ 0 can be deduced from knowledge of N(f). (Since any individual can undergo at most two hidden transitions the total number of hidden events cannot exceed 2 x (N(0)+ number of births in [0, 7"]}.) Thus A x R can be decomposed as a finite union of subspaces A x RmM, where each Rmn consists of those realizations on [0, T] incorporating m events of type L and n events of type I. We can therefore express Rm,n as

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

23

Rm.n = {('{,.... t'j | t'j e [0, T], t'j < t'j+x) x {(r{

t'n) I *j. e [0, T], t) < rj +1 }, (3.1) where the two components specify, respectively, the times of occurrence of events of type L and I. The Markov chain is therefore required to move between subspaces of differing dimension, a task at which reversible-jump methodology (Green, 1995) is specifically aimed. This approach can be applied, for example, to problems of Bayesian model determination where the subspaces are the parameter spaces of competing models. Recently Richardson & Green (1997) have proposed a reversible-jump framework for fitting mixture distributions with unknown numbers of components. The following presents a formulation of a reversible-jump sampler to the specific scenario of our paper. Our approach has some similarity with that used by Newton etal., (1992) tofitlinear stochastic models in haematology, in that the Markov chain constructed involves moving, inserting, or deleting hidden events in the time-window of the observations. The construction of a suitable Markov chain is not unique; here we describe a Metropolis-Hastings sampler (Hastings, 1970) which generates a new state coj+\ = (a.j+\,Sj+\) from the current state toj = (aj,sj) according to the following two-step process. First, we propose a candidate for the next state co' = (a', s') and select a', the candidate parameter, uniformly from the parameter space A. Second, a new realization s' is proposed by modifying sj according to one of the following possibilities: (a) with probability p\ > 0 delete a randomly chosen hidden event from the realization sy, (b) with probability p2 > 0 insert a new hidden event at time t, where t is selected uniformly from [0, T] and the new event is type I or L with equal probability; (c) with probability pi = 1 — p\ — p2 > 0 move a randomly chosen hidden event to a new time t chosen uniformly from [0, T]. These options define a density q(co', coj) for the candidate state co' conditional on the current state COJ . The state co' is then accepted with probability equal to the ratio of measures . f n(dco')q(co',dcoj)} [ n(dcoj)q(COJ , da/) J in which case we set coj+\ = co'; otherwise the move is rejected and coj+\ = coj. From (2.2) the acceptance probability can be calculated explicitly, namely for

(a) p = mm 1, —

(b)

p — min j 1,

(c)

p=

^— ,

L(a,'s')p2(n + 1 2L(aj,Sj)piT

(

•

>

L(a',s')

Note that in none of these above cases can the denominator vanish since L(a,j, Sj) must be nonzero in order for (o;-, sj) to be the current state of the Markov chain.

24

GAVIN J GIBSON AND ERIC RENSHAW

If a(co', co) denotes the density for the next state co' conditional on the current state co, then, by construction, the above chain has the property of detailed balance (see, for example, Smith & Roberts, 1993) with respect to JI, namely a(co', to)n(co) = a(co, co')n(a>'). The transition probabilities therefore satisfy = /

X\o)j=

co)7i(w)dco

(3.4)

AxR

for all measurable X. From standard theory (see, for example, Theorem 1 of Smith & Roberts, 1993) it follows that if, in addition to satisfying (3.4), the chain is aperiodic and n -irreducible (see below) then lim /

V

\pM(couco)-7t(co)\dco = 0,

(3.5)

-*°°JA

where p(v)(co\, co) is the probability density of state co, reached after v iterations of the chain from initial state co\. Moreover, for any realization of the chain {o>,-} — V ] f{wi) —*• I K i^{

f(co)n(co) dco almost surely as K -*• oo,

(3.6)

JAXR

for all real-valued 7r-integrable functions / . A consequence of (3.5) is that, irrespective of the starting state a>\, the state co, reached after a sufficiently large number of iterations of the chain, can be considered to be drawn from n. Setting /(&>,•) = /(o,-, s,-) = n(a | 57) in (3.6), it follows that, for all a & A, 1 K f — Y^ 7r(o I Si) —> I K

^

n(a \ s')n(co') da/ = 7r(a) almost surely as K ~* 00, (3.7)

JAXR

where co' = (a', s'). Hence, conditional on aperiodicity and irreducibility, given a sequence of samples {(a,-, s,-)} generated from the chain, we may estimate the marginal density n(a) by averaging the conditional densities n(a | 5,) in the manner of Gelfand & Smith (1990). For the chain to be aperiodic, it must not cycle between the subsets in some nontrivial partition of A x R. For our chain, aperiodicity is an immediate consequence of the fact that q{co, co) is nonzero for all co e A x R. The chain is JT-irreducible if, for all n(coi), n(tO2) > 0, there exists a positive integer v such that p(v)(wi, 0. Intuitively, an irreducible chain connects all states which have nonzero weight under the density n, so that regardless of the initial state the entire space will be explored by the chain. In practical applications of MCMC, demonstrating the irreducibility of a chain can be a nontrivial problem, and this property should certainly not be taken for granted. On the grounds that moving a hidden event from t to t' can be achieved by a deletion at t followed by an insertion of an identical event at t', one might reasonably propose a simpler chain involving operations of types (a) and (b) only. However, the simpler chain fails to be irreducible in general. To see this, suppose that s(0) = (1,0, 1) and that our observations consist of deaths at t\ and t2 (t\ < t2). Clearly, this implies that an event of type L has occurred at some time rL (0 < fi, < '1). and an event of type I has occurred at time fi (tL < h < t2)- Although the nature and order of the hidden events is fixed, any combination ('L. ' I ) satisfying the above inequalities is possible. However, the Markov chain with only

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

25

deletions and insertions is incapable of exploring the space of possibilities since either operation results in an impossible realization. In this case the ability to alter the time of hidden events through operations of type (c) is necessary to explore the space of feasible realizations. The irreducibility of the chain described above is a consequence of the following result which is proved in the Appendix. First let R+ denote the set of all s e R for which L(a, s) 7^ 0 for some (and therefore all) a e A. PROPOSITION 3.1 Let realizations s\s2 e R+. Then for some q ^ 2 there exist realizations y 1 , y2,..., yq G R+, where sx = yx and s2 = yq, and, for all 2 < 7 < 1, yj is obtained from yi~l by one of the following procedures: (i) the insertion of a hidden event at some time t e [0, T\, (ii) the deletion of a hidden event; (iii) an alteration to the time of occurrence of a hidden event in [0, T]. Showing that the Markov chain has the required equilibrium distribution is not the only issue to be considered. Although Proposition 3.1 shows theoretically that any feasible state can be reached with a nonzero probability from any initial state after sufficiently many iterations of the Markov chain, it does not preclude this probability being extremely small for a given pair of states—even after many iterations. For a chain to be of practical use in exploring the equilibrium distribution we also require that it 'mixes' efficiently, in the sense that the distribution pM(a>i, w) converges rapidly to the stationary distribution as v -*• 00. A chain which mixes slowly can pose problems in several ways. For example, the resultant correlation between successive states means that standard errors in expectations estimated from a sequence of samples may be underestimated. This problem can be partly overcome by batching (Ripley, 1987), which involves partitioning the sequence into blocks of a fixed size and estimating expectations on each block. As the block size increases, the correlation in the sequence of within-block estimates decreases, so that realistic standard errors can be estimated from it. In some situations a slowly mixing chain can fail to explore the target distribution adequately, although a sequence of outputs may give the misleading impression that convergence has occurred. It is straightforward to construct examples involving mixtures of bivariate normal distributions which illustrate this point. Theoretically, the convergence properties of finite-state Markov chains can be understood in terms of the eigenvalues of the corresponding state-transition matrix, and considerable progress has been made on bounding the mixing properties of some chains arising in combinatorial problems (Sinclair, 1993). In most cases the convergence of chains is most often investigated using the statistical properties of sequences of samples generated from the chain. A wide range of techniques now exists. However, recent studies (Cowles & Carlin, 1996) have demonstrated that tests of convergence based on a simulated sequence of states may give misleading results, and this should be borne in mind whenever the MCMC methodology is used. In this paper we investigate the validity of the MCMC approach in two ways. The first is to apply the algorithms in situations where the target density can be derived analytically, and to observe whether density estimates are in accord with the theoretical predictions (to within estimated uncertainty). The second method is to use the techniques to estimate model parameters from data sets simulated from the model with known parameter values.

26

GAVIN J GIBSON AND ERIC RENSHAW

A topic which has not yet been discussed is how a feasible realization can be obtained to initiate the MCMC algorithm for the SEIR model from knowledge of SQ and the times of all events of type B and D in [0, T]. Suppose that m deaths occur in [0, T] at times 11 ^ h ^ • • • ^ tm. Here we produce a feasible realization by inserting hidden events so that for each j (1 ^ i ^ m - 1) /(*,-) ^ 2 (bearing in mind that the variables are left-continuous). If 7(0) > m, then no hidden events are required for a feasible realization. If 1(0) < m, then no hidden events are required in the interval [0, f/(0)-i], but for 1(0) ^ j ^ m — 1 we must insert hidden events immediately prior to tj, and after any event of type B and D occurring prior to tj, to ensure that /(/,) = 2. If E(tj) > 1 then only an event of type I is required. On the other hand, if E(tj) = 0 then an event of type L followed immediately by one of type I must be inserted prior to tj. It can be shown that this procedure will yield a feasible realization with which to initiate the MCMC algorithm, if one exists. Furthermore, the realization produced has the minimum number of hidden events for any feasible realization. 4. Illustration of reversible-jump sampler for SIR model The methods described in Section 3 have been implemented, with some modifications, in the C programming language. Here we illustrate their use by applying them to simulated data sets. Attention is restricted to an SIR model, since for this case analytic solutions to the likelihood calculations exist for some scenarios and they can be used to validate the results obtained using the MCMC methodology. We further simplify matters by assuming that the parameters specifying the birth rate, a\, and the death rate, a4, are both equal to unity. The problem therefore reduces to estimation of the parameter aj, (henceforth denoted as a), from knowledge of so = (5(0), 1(0)), and observation of N(t) over the interval [0, T]. A Metropolis-Hastings reversible-jump sampler for this problem could be constructed in the way described in Section 3 with the important distinction that only one type of hidden event, signifying the transition from S to / and denoted by I, is involved, and the state space A x R is accordingly simpler. The irreducibility of the chain can be verified by repeating the logic of the proof of Proposition 3.1 with appropriate simplifications. A feasible realization with which to initiate the MCMC sampler can be constructed by a straightforward adaptation of the procedure described at the end of Section 3. The chain implemented here is a slight variant of the Metropolis-Hastings sampler described in Section 3. Specifically, the parameter a and the realization s are updated separately, and not within a single step of the Metropolis-Hastings algorithm. The modified algorithm is now described. For simplicity, we constrain a to take only a finite range of regularly spaced values in the parameter space A. To construct our Markov chain, the new state
y i)

(4.1)

(ii) The realization s,-+i is then obtained by applying the realization updating procedure M times to 5,- to generate a sequence of states cr,o, a-,\,..., a-,M, where s, = a,o and

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

27

R

Step ii): Sj+j is obtained _ from s j by applying updating procedure M times.

Step i): a i + ] is selected from

A x R

FIG. 1. Schematic representation of a single iteration of a Markov chain on A x R described in Section 4. The model parameter and the realization are updated separately in steps (i) and (ii), respectively.

si+l = oiM. Each a,j is obtained from o^y-i) by inserting, deleting, or translating an event of type I in CT/Q-I), to produce a candidate state a'. Mutations of each type are proposed with equal probability (p\ = pi = pi = 1/3). Depending on the kind of mutation proposed, the state a' is accepted (
*(«) = 4

K+H

(4.2) i=H+\

and the algorithm is therefore specified by the values of H, K, and M. The results presented

28

GAVIN J GIBSON AND ERIC RENSHAW

in Example 4.1 below provide evidence that this approach leads to meaningful estimates of n(a) and other aspects of the joint posterior distribution n(a, s). 4.1 Suppose that the observations consist of 5(0), 7(0), and N(t), where N(t) = K for t e [0, T]\ that is, no births or deaths occur during the period [0, T]. Since a, = a4 = 1, Pr{B or D occurs in [t, t + dt)} = N(t) dt. So the probability EXAMPLE

Pr{N(0 = K, t e [0, T]} = Pr{no event of type B or D occurs in [0, T]} = cxp(-KT) (4.3) is independent of the parameter a. It follows that the marginal posterior density n(a) = l/nA, where nA is the number of points in the parameter space. Hence, for this case, n(a) is known and can be used to test the MCMC algorithm. Other facets of n(a, s) can be derived analytically for this case. The Markov chain can be used to explore the distribution of any real-valued function of (a, s), and in particular we consider Ni(s), the number of events of type I which occur for the realization s. We show that the distribution of n(Ni(s)) can be derived analytically, and used to assess further the validity of the algorithm. First, fix a and consider n(s \ a). Then if t\,..., tk e (0, T) denote the times of the events of type I specified by s, we have k

7t(ds) =

exp[-(K + aj)(T - tk)] Y[ocjdtexp[-(.K /=• Pr(W(0 = K 11 e [0, T)) k

= cxp[—ctj(T — tk)] rjoydf exp[—ctj(tj — f/-i)],

(4.4)

7=1

where / = 0 and aj = a(/(0) + ; - l)(5(0) - j + 1). It follows that TZ(NI(S) | a) is the distribution of the number of events occurring in the interval [0, T], where the events occur at times T\, T2,... and 7} — 7y_i is exponentially distributed with mean kj = 1/ay. Now pn(t), the density of Tn, is a generalized Erlang distribution (Johnson & Kotz, 1970). When the X, are all distinct this is given by

£ f[

;

^ / A

7

] .

(4.5)

It follows that o

pH{t)e*p[-{T

x [exp(-rA ; ) - exp(-rA n+l )].

(4.6)

For the case of the SIR model the condition kj ^ kk will clearly not hold if 7o < [N/2] since for some j there will exist k •£ j satisfying kj = kk, for which the above expression is not defined. However, these singularities can be removed using limiting arguments. In

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

29

the case of Xj — Xk ^ Xn+\, we can replace the terms in (4.6) which contain a factor (Xj — Xk)~l with the single term g'(Xj), where

g(X) = Y\ (A. - ^r'V-'CA. - X.+O-'texpC-r/X) - exp(-r/X B+ ,)].

(4.7)

In the case of A.;- = Xn+\, the term involving (Xj — A. n+1 )~' is replaced by

Y\(Xj - Xk)-]Xn-2exp(-T/Xj).

(4.8)

To obtain an analytic expression for Px{Ni(s) = n), we integrate Pr{A^i(s) = n \ a] with respect to the discrete density it (a) to obtain

Y

t q

= n | aq\.

(4.9)

9=1

The computation of Pr{Afj(.y) = n) involves sums of products and, for large values of n, each of these may be extremely large in modulus. The arithmetic necessary to yield a probability less than unity may involve prohibitively many significant figures. Although analytic computations are used where possible, we evaluate Pr{Ni(s) = n | a} in other cases by directly simulating the variates Tn as defined above, from which we can estimate Pr[Ni(s) = n | a] as Pr{Tn < T, Tn+\ > T}. This simulation is straightforward to perform and allows Pr{Ni(s) = n) to be estimated for larger values of n. We now present results of estimates of n(s) and n(Ni(s)) obtained using the MCMC algorithm for the following three cases: 1(0) = 15, 1(0) = 1, 1(0) = 5,

N(t) = 30 (f €[0,3]); N(t) = 30 ( r e [0,3]); N(t) = 30 ( r e [0,3]).

In all cases the discrete parameter space is A — (0.001,0.002 0.05}, and 5020 samples (aj, sj) are generated from the chain, applying the realization updating procedure M = 500 times to obtain each Sj from Sj_\. The marginal density n(a) is estimated by averaging K = 5000 densities n(a \ Sj) after discarding the first H = 20 densities to allow burnin of the Markov chain. Marginal confidence intervals for the estimates of the n(aj) are obtained by batching (Ripley, 1987). For these examples, a sequence of 500 blocks of 10 densities produced a sequence of within-block estimates which were not significantly correlated. The estimates of n(Ni(s)) are obtained by forming a histogram of the values of Ni(s), where the value of Ni(s) is recorded at each iteration of the realization-update steps (a)-(c), giving a total of 2.5 x 106 values with which to form the histogram. Standard errors in the estimates of n(Ni(s)) are obtained by partitioning the 2.5 x 106 values into blocks of 5000 and forming a separate estimate of n(Ni(s)) for each block. The standard error of n(Ni(s)) is then estimated using the sequence. The values of H, K, and M used in this experiment, and those which follow, were chosen after some experimentation, and we do not claim that they optimize the algorithm's performance. The value of M was selected to be large in relation to the maximal number of hidden events to ensure substantial mutation of the realization s, before calculation of the

30

GAVIN J GIBSON AND ERIC RENSHAW I(0)=15

I(0)=15

0.03

0.02-

0.01 0.01

0.02

0.03

0.04

0.05

3 6 9 12 Number of hidden events

(b)

(a)

0.03-

Number of hidden events K0)=5

I(0)=5

0.03

O.l-i 0.075-

0.02-

0.05-

o o

MCMC analytic simulated

,11 X

0.025-

0.01

ft-

0.05 (e)

(0

5 10 15 20 Number of hidden events

25

FIG. 2. Estimated and known values of (a, c,e)n(a) (a =0.001,0.002 0.05) and (b,i,f)n(Ni(s)) (N\(s) = 1,2 29) when NO) = 30 (/ e [0, 3]), with initial conditions given by: (a, b) 1(0) = 15, (c, d) 1(0) = l.and (e,f)/(0) = 5. (---) The exact value of n(a). The error bars on the MCMC estimates indicateil standard error.

next density n(a \ s 1+ i). The burn-in period H was selected by monitoring the sequence {(a,-, Ni(sj))), for a range of initial conditions, to determine how quickly the distribution of these variables converged to their asymptotic form. Our investigations suggested that this convergence occurred within a very few iterations. Figure 2 shows the estimated densities n(a) and n(Ni(s)) for the three cases described above. Figure 2 also shows the known values of these densities obtained from analytic calculations or, in the case of n(Ni(s)) for larger values of N\(s), obtained by simulation.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

31

80-

FIG. 3. The evolution of N(t) (/ € [0, 3]), for simulated realizations of the SIR model with /(0) = 5(0) = 15: (a) a = 0.05, and (b) a = 0.1.

We can see that the results obtained by MCMC are in accord with the known densities to within the estimated standard errors for the estimates of both n(a) and x(N\(s)). Although the accurate estimation of marginal properties of n(a, s) does not in itself guarantee that the joint density n(a, s) is fully explored by the Markov chain, the results do provide strong evidence that the algorithm is capable of providing meaningful estimates of parameter likelihoods. 4.2 Figure 3(a) shows the function N(t) (t e [0, 3]), for three simulated realizations of the model with the birth and death rates fixed at unity and a = 0.05. In all cases 5(0) = 1(0) = 15. Note that the trajectories are quite distinct from each other. Figure 4(a) shows the posterior density 7T,(a) (i = 1, 2, 3) conditional on each simulated set of observations as estimated using the MCMC algorithm for the simulated trajectories. The parameter range is {0.001,0.002,..., 0.1}, and the other algorithm parameters are given by M = 1000, K = 5000, and H = 100. The standard errors in the estimates, obtained by batching using blocks of 25 densities, were of the order of 1 % of the estimated value of EXAMPLE

32

GAVIN J GIBSON AND ERIC RENSHAW 0.1 Simulation 1 Simulation 2

0.075-

Simulation 3 Joint likelihood

,0.05-

0.025-

0.02

0.04

0.03

(a)

a

0.05

0.07

0.06

0.08

U.U8-

A

Simulation 1

0.06-

D

Simulation 2

A

Simulation 3

X

Joint likelihood

0.04-

0.02-

0< 0.04 (b)

fly ^ 0.06

0.08

0.1

0.12 a

0.14

0.16

0.18

0.2

FIG. 4. Estimates of 7z(a) obtained by MCMC for simulated realizations: (a) simulations of Fig. 3(a), a 0.020,0.021,..., 0.08; and (b) simulations of Fig. 3(b), a = 0.040, 0.042 0.02.

7ii (a), and they are not displayed in Fig. 4. Figure 4 also shows an estimate of the posterior density conditional on all the observations jointly, which was obtained by normalizing the product n\(a)jT2(a)x3(a). The estimated densities are clearly consistent with the parameters used in the simulation. Moreover, the densities are consistent with each other, in the sense that there is no significant evidence that the simulations were not obtained using the same value of a for each. Therefore, the differences between the simulations for a parameter value are correctly interpreted as being consistent with the inherent stochasticity in the model. This experiment is repeated for a = 0.1 and the parameter range {0.002, 0.004,..., 0.2}, keeping the same values of M, K, and H. The simulated trajectories and the corresponding densities are plotted in Fig. 3(b) and 4(b), which show similar findings to the case where a = 0.05.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

33

5. The case of unknown initial conditions Thus far we have assumed that the initial state so = (5(0), 1(0)) is known. Suppose now that this information is lacking but that we can assume 1(0) > 0. Such an assumption could be made, for example, whenever the observations include at least one death. We will now show that the MCMC routines described in the previous sections can be modified for application to this scenario. The approach exploits the idea that uncertainty in the initial conditions can be interpreted as uncertainty in the hidden events that have occurred prior to the start of the observation interval. This avoids the need to consider the initial conditions explicitly as additional parameters in the Markov chain state space. We 'expand' the observations by supposing that the observations commence at some time Tc < 0 and that S(TC) = #(0) - 1 and I(TC) = 1. Suppose further that N(t) = N(0) for all t e [Tc, 0]. It is clear that any initial condition So = (s(0), 1(0)) consistent with the observations N(t) (t e [0, t]) can be reached from the state sc = (S(TC), I(TC)) = (N(0) - 1, 1) through the occurrence of the appropriate hidden events during the interval (Tc, 0). Let Rc denote the set of realizations of the process on (Tc, T) which are consistent with sc and N(t), for t e (Tc, T). Then there is a surjective mapping / : Rc -*• W, where W denotes the set of all realizations s defined on [0, T] which are consistent with the observations N(t), (/ e [0, T]). Applying Proposition 3.1 to the interval [Tc, T], it follows that any element in Rc can be 'reached' from any other by a finite number of transitions of type (i)-(iii) above, and these operations induce transitions between elements of W. Since / is surjective, it follows that any element of W can be reached from any other by a sequence of induced transitions. Thus, when the initial conditions are unknown, the above 'expansion' process allows the space of all possible combinations of initial conditions and realizations to be explored using transitions of the form (a) to (c). We note that this procedure can equally well be applied to the SEIR model discussed in Section 2. Here the universal antecedent for arbitrary initial conditions is given by S(TC) = N(0) — 1 and E(TC) = 1. This argument allows us to construct a Markov chain on A x Rc, whose stationary distribution is the posterior density nc(a, sc | N(t), t e [0, T]), this time assuming a uniform joint prior for (a, sc(0)). This is achieved almost exactly as above, except that an amended likelihood Lc must be used in the acceptance steps, defined by , a)

•

(5-D

where s denotes the restriction of sc to [0, 7"]. The likelihood has this form because the <7-dimensional volume of the set {(t\,..., tq) e [Tc, 0]q | t\ ^ t-i... < tq\ is equal to \Tc\q/(q + 1)!, making Lc(sc,a) consistent with a uniform prior for 1(0) over the range To illustrate the above extension, consider simulation 1 with a = 0.05 assuming no knowledge of the initial condition 7(0) = 15. The MCMC algorithm is used to generate estimates of the marginal parameter density nc(a) over a = 0.001,0.002,..., 0.1 and the marginal initial-condition density 7r c (/(0)). The first of these is estimated by averaging 5000 conditional densities Ttc(a \ sc), with M = 1000 mutations being proposed between the calculation of each density. The density 7re(/(0)) is estimated from a histogram formed from the set of all 5 x 106 realizations, 07, (1 ^ / ^ 5000, 1 4 7 ^ 1000), generated in the simulation. Note that the joint density nc(a, 1(0)) can be estimated as a histogram over a

34

GAVIN J GIBSON AND ERIC RENSHAW

0.06

0.03

0.04

0.05

0.06

0.07

0.08

0.00 1 (b)

3

5

7

11 13 15 17 19 21 23 25 27 29 1(0)

FIG. 5. (a) Estimate of nc(a) (a = 0.020, 0.021 0.08) for simulation 1 in Fig. 3(a) obtained by MCMC. The estimate of the corresponding n(a) (see Fig. 4(a), for which knowledge of 1(0) is required, is shown for comparison, (b) Histogram estimate of nc(t (0))(/ (0) = 1,2,.... 29) for simulation 1 in Fig. 3(a).

two-dimensional grid of cells indexed by / (0) and the current value of a. Figure 5(a) shows the marginal density nc(a) which may be compared with the density n(a) estimated in Fig. 4(a). The histogram estimate of ;r c (/(0)) is shown in Fig. 5(b). It is clear for this case that estimates of a or / (0) based on the mode or mean of these marginals are consistent with the values used for the simulation (a = 0.05,1(0) = 15). Note also that the density nc(a) lies close to the density n(a), indicating that, in this case, knowledge of the initial condition has little effect on our estimate of a. However, this is not true in general. Figure 6(a) shows the evolution of N(t) over t 6 [0, 3] for a simulated realization of the SIR model with 1(0) = 8 and a = 0.05. In Figure 6(b) we plot n(a), estimated by MCMC assuming knowledge of 1(0). Once again K = 5000, M = 1000, and H = 100. The marginal density nc(a) is also shown, as estimated by MCMC for the case where 1(0) is unknown, using the same algorithm parameters.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

0.03 (b)

0.04

0.05

0.06

0.07

0.08

0.09

35

0.10

a

FIG. 6. (a) Evolution of N(t) (l e [0, 3]) for simulated realizations of the SIR model with / (0) = 8, S(0) = 22, and a = 0.05. (b) MCMC estimates of n(a), nc(a), and the histogram estimate of nc(a | 1(0) = 8) (a = 0.030, 0.031,...,0.1).

A clear difference between the modal value of a for the two distributions is observed. As a check on the validity of the extended MCMC methods, we note that the density n{a) = nc(a \ 1(0) — 8) and estimates of this obtained by the two methods should therefore be consistent. By first forming a histogram estimate of nc(a, 1(0)) directly from the set of all 5 x 106 parameter/realization pairs {(a,-, a,-;) | 1 < i ^ 5000, 1 ^ j < 1000} we can obtain a histogram estimate of n(a) by considering the relative values of nc(a, 8), (a = 0.001,0.002,..., 0.1). This estimate is also plotted in Fig. 6(b), and good agreement with n(a) can be seen. Figure 7(a) shows the histogram estimate of nc(a, 1(0)) produced using Mathematica (Wolfram, 1991), and we note that it has features which are intuitively plausible. For example, there is a negative correlation between a and 1(0), indicating that the observations may be explained by a low initial number of infectives coupled with a higher transmission rate or the converse situation of higher / (0) and lower a values. Figure

36

GAVIN J GIBSON AND ERIC RENSHAW

n

0.15-

0.1-

n ' ; •:

0.05-

0-

j., 1

(b)

3

5

11 13 15 17 19 21 23 25 27 29 /(0)

FIG. 7. (a) Histogram estimate of the joint posterior density nc(a, 1(0)) for the simulated realization of Fig. 6(a). The lighter regions correspond to cells with higher frequencies, (b) The marginal density JTC(I(O)) (1(0) = 1, 2,..., 29) from the histogram estimate in (a).

7(b) shows a histogram estimate of the marginal density of the initial conditions 7rc(/(0)) estimated from the samples, showing consistency with the true value / (0) = 8. 6. Discussion We have demonstrated how recent developments in stochastic integration, specifically reversible-jump MCMC methods, are highly relevant to the problem of estimating parameter values in stochastic epidemic compartment models from observations of a process which are incomplete in the sense that transitions between certain states or initial conditions may be unobserved. In particular we have developed techniques for estimating

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

37

relative-likelihood values as a first step in the process of inference. Although we have considered discrete parameter spaces, in which all parameter values are a priori equally likely, we remark that the techniques generalize to the Bayesian situation in which some other prior parameter density may be imposed. The results demonstrate that the algorithms developed are capable of providing meaningful estimates of parameter likelihoods for the contact rate in a stochastic SIR model. This is borne out by the agreement of estimated densities with their known form, for cases where this can be derived analytically, or by comparison with parameter values used to simulate data. We have demonstrated that the approach can be extended to the case where the initial conditions are unknown. Moreover, we have shown theoretically how MCMC algorithms can be formulated for application to the more complex SEIR model. As with any stochastic integration method there is some uncertainty in the estimated quantities. An important question for MCMC methods in particular is whether the estimates of this uncertainty obtained from a sequence of samples from the chain is realistic, and whether or not the distributional properties of the generated samples are a true reflection of the target distribution. The results of this paper provide significant evidence that the chains applied do provide a meaningful representation of their equilibrium distributions and that error bounds are realistic. However, for unknown densities, it is not possible to verify this with certainty, and results based on MCMC methods should be qualified with this caveat. Nevertheless, we stress that the problems tackled here essentially involve integration of functions over spaces of very high dimension, and that in the absence of an analytic solution the use of stochastic methods may be the only feasible approach. Although our results concern only simulated scenarios, we believe there is great scope for applying them to the study of real situations. One example is in the study of Clostridium difficile, a diarrhoea-causing pathogen which commonly affects patients in geriatric wards. Recently Starr etal. (1997) have proposed a compartmental model for the dynamics of this pathogen which shares some features of the models considered here. Any experimental study of this system is likely to encounter the problem of unobservable transitions between states, and the techniques developed here may be valuable in estimating transition rates in the model. It is our intention to pursue this line of research. We have considered exclusively the problem of estimating parameters in models. However, in many cases data may be collected with the purpose of selecting between models. For example, in epidemiology we may wish to investigate whether a contact rate is densitydependent, rather than merely estimate the model. One motivation for the development of the reversible-jump methodology is its potential for the investigation of state spaces composed of subspaces, each of which represents the parameter space of a different model. It therefore offers a computational approach to the Bayesian methods of model choice advocated by several researchers (Draper, 1995). The state spaces considered in this paper could be expanded in this way to allow the methods to be applied to model selection, and this would enable, for example, the comparison of underlying generating mechanisms for ecological time-series data. Acknowledgements This work was supported in part by the Scottish Office Agriculture Environment and Fisheries Department. This article is based upon a paper read at the Seventh IMA Conference

38

GAVIN J GIBSON AND ERIC RENSHAW

on the Application of Mathematics in Medicine and Biology, held in Oxford, 10-12 July 1996. REFERENCES

BESAG, J., & GREEN, P. J. 1993 Spatial statistics and Bayesian computation. J. R. Statist. Soc. B, 55,25-37. BOX, G .E. P., & JENKINS, G. M. 1970 Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day. BROWER, R. C , FURMAN, M. A., & MOSHE, M. 1978 Critical exponents for the Reggeon quantum spin model. Phys. Lett. B 76, 213-219. BUTTELL, L., Cox, J. T., & DURRETT, R. 1993 Estimating the critical values of stochastic growth models. J. Appl. Probab. 30,455-461. COWLES, M. K., & CARLIN, B. P. 1996 Markov chain Monte Carlo convergence diagnostics: A comparative review. / Am. Stat. Soc. 91, 883-904. Cox, J. T., & DURRETT, R. 1988 Limit theorems for the spread of epidemics and forest fires. Stock. Proc. Applic. 30, 171-191. DRAPER, D. 1995 Assessment and propagation of model uncertainty, (with discussion). J. R. Statist. Soc. B 57,45-97. DURRETT, R., & LEVIN, S. 1994 Stochastic spatial models—A users guide to ecological applications, Phil. Trans. R. Soc. Lond. B 343, 363-394. GELFAND, A. E., & SMITH, A. F. M. 1990 Sampling-based approaches to calculating marginal densities. / Am. Statist. Assoc. 85, 398-409. GIBSON, G. J. 1997 Markov chain Monte Carlo methods for fitting spatio-temporal stochastic models in plant epidemiology, Appl. Stat. 46, 215-233. GREEN, P. J. 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711-732. HASTINGS, W. K. 1970 Monte Carlo sampling methods using Markov chains, and their applications. Biometrika 57, 97-109. ISHAM, V. 1991. Assessing the variability of stochastic epidemics. Math. Biosc. 107,209-224. JOHNSON, N. I., & KOTZ, S. 1970 Continuous Univariate Distributions, Vol. 1. New York: Wiley. LEVIN, S., & DURRETT, R. 1996 From individuals to epidemics. Phil. Trans. R. Soc. Lond. B 351, 1615-1621. LIGGETT, T. M. 1985 Interacting Particle Systems. New York: Springer-Verlag. METROPOLIS, M., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H., & TELLER, E.

1953 Equations of state calculation by fast computing machines. / Chem. Phys. 21,1087-1093. MURRAY, J. 1989 Mathematical Biology. Springer-Verlag. NEWTON, M. A., GUTTORP, P., & ABKOWITZ, J. L. 1992 Bayesian inference by simulation in a

stochastic model from hematology. In: Computing Science and Statistics, Vol. 24 (H. J. Newton, ed.), pp. 449-455. Fairfax Station, VA: Interface Foundation of North America. RENSHAW, E. 1991 Modelling Biological Populations in Space and Time. Cambridge University Press. RICHARDSON, S., & GREEN, P. J. 1997 On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Statist. Soc. B 59, 731-792. RlPLEY, B. D. 1987 Stochastic Simulation. New York: Wiley. SATO, K., MATSUDA, H., & SASAKI, A. 1994 Pathogen invasion and host extinction in lattice structured populations. J. Math. Biol. 32,251-268. SINCLAIR, A. 1993 Algorithms for Random Generation and Counting. Boston: Birkhauser. SMITH, A. F. M., & ROBERTS, G. 0.1993 Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J. R. Statist. Soc. B 55, 2-23. STARR, J. M., RODGERS, T. R., IMPALLOMENI, M. 1997 Hospital-acquired Clostridium difficile diarrhoea and herd immunity. Lancet 349,426-428. WHITTLE, P. 1957 On the use of the normal approximation in the treatment of stochastic processes. J. R. Statist. Soc. B 19, 266-281.

ESTIMATING PARAMETERS IN STOCHASTIC COMPARTMENTAL MODELS

39

WOLFRAM, S. 1991 Mathematica. Reading, MA: Addison-Wesley. YAKOWITZ, S., BLOUNT, M. & GANI, J. 1996 Computing marginal expectations for large com-

partment models with applications to AIDS evolution in a prison system. J. IMA Math. Appl. Med. Biol. 13, 223-244.

Appendix A: Proof of Proposition 3.1 We first prove the result for the particular case when s' (T) = s2(T) before proceeding to the general case. If sl(T) = s2(T) then it is easily verified that s1 and s2 have the same number of events of both types L and I occurring in [0, T], although the times of events may differ between s2 and s2. We make the following claim: sx can be 'mutated' to s2 by a sequence of steps of type (iii) (defined in Proposition 3.1) only. This is verified by induction on m, the common number of hidden events for sl and s2. When m = 1, (A.I) is obvious since s2 is obtained from J 1 directly by a single move. Now let m > 1. If the first event in both s' and J 2 is of type B or D, then we can redefine the observation interval so that it commences immediately after this event, but prior to the occurrence of any other event in s' or s2. This procedure can be repeated until at least one of the realizations sx and s2 commences with a hidden event. Hence, by relabelling sl and s2 and redefining the observation interval as appropriate, we may assume that the first event occurring after t = 0 for the realization s2 is a hidden event. Denote this event by e2. Let e\ be the first event of the same type in the realization of s'. Consider the new realization s' obtained by altering the time of e\ so that it occurs first in s'. Then s' e R+ so long as no event e'2, •••,e'k is rendered impossible by the prior occurrence of e\. So consider e'j (= ej_,) for any 2 ^ j ^ k, and let r denote its time of occurrence. There are two cases to consider. Case 1: e\ = L. If e'j = D, this is feasible since / ' ( r ) = /'(r). If e'j = B , then it is a feasible event since a birth occurs in the realization s2 and S'(r) ^ S2(j). If e'j = I, it is also possible since £'(r) ^ £ ' ( r ) . Case 2: e2 = I. If e'j = D, this is feasible since /'(r) ^ / ' ( r ) . If e'j = B, then it is also a feasible event since S'(T) = S'(r). If e' = L, it is feasible a fortiori since /'(r) ^ / ' ( r ) and5'(r) = 5 1 (r). By moving events in s' (without affecting their order) we obtain a realization s" such that e'{ = e2, t(e") = tie'1), and s"(T) = s2(T). Now consider the restriction of these realizations to the period (t(e2), T). Since m — 1 events occur in this interval, the claim (A.I) follows from the induction hypothesis. Suppose now that sl(T) ^ s2(T). Further suppose, without loss of generality, that S2(T) < N(T). If S](T) = N(T), then we can transform s] to s' satisfying S'(T) < N(T) by inserting a single event at an appropriate time in s]. To see this let r = sup[t e [0, T] \ Sl (f) ^ N(t)}. Now it follows that a death D must occur at time r in s\ and that S'(f) = N(t) must be monotonic increasing on (r, T). There are three possibilities to consider.

N(T+) = 1 < N(t). This implies S(T) = N(T) for any consistent realization s, contradicting the assumption that S2(7") < N(T). N(x+) = 1 = N(T). This implies that no events occur in s after the death at r. Since both S'(r) and / ' (r~) > 0, we can insert an event L, immediately prior to r, to produce the realization s' satisfying S'(T) < N(T). N(T+) > 1. Again insert an event L, immediately prior to the death at r to produce a new realization s'. Any births occurring after r remain feasible since S'(t) ^ 1 for all t e (r, r ) .

GAVIN J GIBSON AND ERIC RENSHAW

It now remains to show that s' can be transformed to s2 via a sequence of permissible realizations by mutations of the form (i)—(iii) defined in Proposition 3.1. Since S'(T), S2(T) < N(T), it follows that by appending appropriate events of type L and I in (T, T'], where T > T, we can 'extend' both s2 and s' to form realizations s2 and s'e which satisfy I2(T') = I'e(J') = N{T'). By (A.I), proved above, we can transform s'e into s2 via a sequence of permissible realizations using only mutations of type (iii). The proposition follows on noting that a mutation of type (iii) which moves an event from (7", T'] to [0, T] corresponds to a type (i) operation on a realization defined on [0, 7"], whilst moving an event from [0, T] to (T, T'] corresponds to a deletion (type (ii)). This completes the proof of the proposition.

40