4 4.1 4.1.1

Review of selected material, Lectures 1-5 Orthogonal representations Covariance kernels

Suppose X is a scalar, zero mean process (EX(r) = 0, for all r), which is (uniformly) square integrable (supr E|X(r)|2 < ∞), and has continuous paths. Assume further that X is continuous in quadratic mean, meaning that E|X(r0 ) − X(r)|2 → 0 as r0 → r. Then covariance kernel of X, defined as k(r, s) := EX(r)X(s), is itself continuous. Covariance kernels are trivially symmetric (k(r, s) = k(s, r)) and have the important property of positive semidefiniteness: for any g ∈ L2 [0, 1], ˆ

1

ˆ

1

g(r)k(r, s)g(s) dr ds ≥ 0. 0

(1)

0

(This quadratic form is always well defined, since k(r, s) is square integrable on [0, 1]2 , by virtue of its being continuous there.) The inequality follows straightfowardly by computing the variance of the random variable ´1 Z = 0 g(r)X(r) dr, which must be nonnegative, and which is equal to the LHS of (1). Covariance kernels can be treated more or less as though they were symmetric, positive semidefinite matrices. Thus, analgous to the eigenvalue-eigenvector decomposition for a matrix K (with columns {ki }), n X

ki φi = Kφ = λφ

i=1

we have the eigenvalue-eigenfunction decomposition of the covariance kernel k, ˆ 1 k(r, s)φ(s) ds = λφ(r).

(2)

0

(2) has a countable infinity of solutions, denoted by the sequences {λj } and {φj (r)}, where φj ∈ L2 [0, 1] denotes the eigenfunction corresponding to λj . The {φj } form an orthonormal sequence, in the sense that ˆ

1

φj (s)φk (s) ds = δjk 0

(where δjk = 1 if j = k, and 0 otherwise). It is convenient to order the eigenvalues λ1 > λ2 > · · · . As in the matrix case, it follows from positive semidefiniteness that λj ≥ 0 for all j; the inequality is strict if k is positive definite, which here means that (1) holds always as a strict inequality, unless g(r) = 0 for almost all r ∈ [0, 1]. In the matrix case, the eigenvalues and eigenvectors can be used to construct the spectral decomposition, K=

m X

λj φj φ0j .

j=1

An analogous relationship holds for a covariance kernel k, but proving this requires considerably more work. These difficulties arise because the decomposition k(r, s) =

∞ X

λj φj (r)φj (s)

(3)

j=1

now involves an infinite series. By Mercer’s theorem, (3) converges absolutely and uniformly (over (r, s) ∈ [0, 1]2 ), provided that k is a symmetric, positive semidefinite function.

1

4.1.2

Orthogonal decomposition for X

The main interest in this result lies in the orthogonal decomposition for the process X, associated to the covariance kernel k, that it gives rise to. The first step is to define the random variables ˆ 1 1 ξk := √ X(s)φk (s) ds, λk 0 (which are well defined, since X has continuous paths). The {ξk } have mean zero, unit variance and are orthogonal (but not necessarily independent), since by (2), r ˆ 1 ˆ 1ˆ 1 1 λ` φ` (r)k(r, s)φk (s) dr ds = φ` (r)φk (s) dr ds = δk` . Eξk ξ` = √ λ λk λ ` 0 0 k 0 Next, define Xn (r) :=

n p X λk φk (r)ξk . k=1

Then by the orthogonality, and direct calculation (see lecture notes), E|X(r) − Xn (r)|2 = k(r, r) −

n X

λk φk (r)2 → 0

k=1

uniformly in r, by Mercer’s theorem, since the RHS gives a uniformly convergent representation for k(r, r). This means that X(r) has the L2 representation, termed the Karhunen-Loève (KL) representation, X(r) =

∞ p X λk φk (r)ξk ,

(4)

k=1

which holds only in the sense that the RHS converges to X(r), in L2 , uniformly in r. Note the ‘double orthogonality’ in the here: the sample paths of X(r) are built up from a weighted collection of orthogonal functions, √ {φk (r)}, just as one can represent a determistic function by a Fourier series; moreover, the coefficients { λk ξk } are random, so as to reflect the stochastic nature of X(r), and are themselves mutually orthogonal. In general, the convergence of (4) cannot be strengthened beyond L2 convergence, unless additional restrictions are imposed on the process X(r). (This parallels the convergence of Fourier series representations for functions on [0, 1], which converge in L2 [0, 1], but need not converge pointwise, unless the function being approximated is continuous and has left and right derivatives.) In the case of Brownian motion (see below), (4) converges uniformly and absolutely, a fact that is extremely useful in establishing both the existence of Brownian motion, and some of its properties.

4.2 4.2.1

Brownian motion Definition and existence

A process W (r) on [0, 1] is called a standard Brownian motion if it has the following properties (i) initialised at zero: W (0) = 0; (ii) independent increments: W (r) − W (s) ⊥ ⊥ W (s), for r > s; (iii) Gaussianity: W (r) ∼ N [0, r]; (iv) continuous sample paths: W (·, ω) ∈ C[0, 1] for each ω ∈ Ω a.s.

2

It is easy to write down what a Brownian motion is, it is rather more difficult to show that the definition is meaningful, in the sense that there actually exist processes that satisfy all of these requirements. The approach taken in lectures was to prove existence by constructing the KL representation, as follows. Note first that a Brownian motion (if one exists) has the covariance kernel k(r, s) = EW (r)W (s) = r ∧ s. This function is positive semidefinite; we verified in lectures that its eigenvalues and eigenvectors are λk =

1 (k −

1 2 2, 2) π

φk (r) =



2 sin(k − 12 )πr.

By Mercer’s theorem, ∞ X

2

k=1

1 2 2 2) π

(k −

sin(k − 12 )πr



∞  X sin(k − 12 )πs = λk φk (r)φk (s) → k(r, s) = r ∧ s, k=1

absolutely, and uniformly in (r, s) ∈ [0, 1]2 . (Actually, this is easily verified without the aid of Mercer’s theorem, since |φk (r)| ≤ 1.) Now let {ξk } be a sequence of i.i.d. N [0, 1] random variables. Then the process X, defined by the L2 limit (uniform in r) ∞ √ X sin(k − 12 )πr ξk , X(r) := 2 (k − 12 )π k=1 satisfies properties (i)-(iii), but cannot be called a Brownian motion, until we have verified that it has continuous paths. (To construct X, we have run the KL representation ‘backwards’, so to speak, by first computing the eigenvalues and eigenfunctions of the covariance kernel, and then constructing X from (4) and an orthonormal Gaussian sequence {ξk }; the Gaussianity was introduced in view of requirement (iii).) To that end, we must show the series on the RHS is uniformly convergent a.s., which requires some additional work. Continuity (for almost all ω) then follows from the continuity of the uniform limit of a sequence of continuous functions. Hence (iv) is satisfied, and X is indeed a standard Brownian motion. 4.2.2

Brownian motion as a limit of partial sums

Let W (r) now denote a vector of k mutually independent standard Brownian motions. Then we can construct a vector Brownian motion with variance matrix Σ, by setting B(r) := Σ1/2 W (r), where Σ1/2 is any ‘square root’ of Σ (there are, of course, infinitely many of these). We denote this by B(r) ∼ BM (Σ). Consider a vector-valued, discrete-time process {Xt }, satisfying ∆Xt = ut , P∞ where {ut } is a vector of stationary, weakly-dependent processes, satisfying in particular that h=−∞ |Γu (h)| < ∞. Such a process is said to be integrated of order 1, denoted Xt ∼ I(1), with reference to the fact that it needs to be differenced once in order to obtain a (covariance) stationary process; processes that are already stationary are said to be integrated of order zero (∼ I(0)). Define the scaled partial sum process [nr]

X[nr] 1 X Xn (r) := √ = √ ut , n n t=1 where [x] denotes the integer part of x (round towards 0). √ Xn (r) is a process on [0, 1], but does not have continuous sample paths;Prather, it has jumps (of size ut / n) at each multiple of n1 . ∞ Let Σ := lrvar(ut ) = k=−∞ Γu (h) (assumed nonzero, but finite). By a central limit theorem for weakly dependent processes, Xn (r) N [0, rΣ], (5)

3

for all r ∈ [0, 1]. The key difference between the case where {ut } is serially dependent, as opposed to the i.i.d., is the appearance of the long-run variance matrix in the limiting distribution, in place of the variance Γu (0). This follows from the fact that, e.g. when r = 1, EXn (1)2 =

n 1 X Eus u0t = n s,t=1

n+1 X h=−n+1

  ∞ X |h| → Γu (h). Γu (h) 1 − n h=−∞

Owing to the serial dependence, the cross-products us u0t do not vanish, but instead constribute to the limiting variance of the partial sums. Observe that if the dependence between the ut ’s is strong in the sense that P∞ Γ h=−∞ u (h) diverges, then a CLT of the form (5) will not hold. To obtain a meaningful limit theory in such a case, we would at the very least have to re-standardise the partial sum process, so as to bring the limiting variance back under control. Note also that, for r < s,    [nr] [ns] n X X 1X 1 ut   u0t  < hΓu (h) → 0. (6) EXn (r)(Xn (r) − Xn (s))0 = E  n t=1 n h=1

t=[nr]+1

Thus the increments of Xn (r) are asymptotically orthogonal. (5) and (6) suggest that Xn (r) is going to behave like a scaled vector Brownian motion B(r) ∼ BM (Σ) in the limit. These are not by themselves sufficient to establish that Xn B, as processes on [0, 1]; for this, further conditions are needed, give us some control over the behaviour of the entire collection {Xn (r)}r∈[0,1] of random variables. (Exactly what ‘Xn B’ means will be explained in a subsequent class.) This is the reason why Brownian motion is so central to this course; it is the stochastic process analogue of the normal distribution, and will appear in limiting distributions of estimators whenever we have to work with integrated processes. 4.2.3

Singular Brownian motion and cointegration

Now consider the case where {Xt } can be partitioned as per X1t = AX2t + u1t ∆X2t = u2t , where A is some coefficient matrix, and the {ut } are stationary and weakly dependent. Note that u1t and u2t are permitted to be mutually correlated, and the X1t and X2t jointly determined, so that the first equation may have an endogenous ‘regressor’, so to speak. In this case, Xt is an integrated process, but the linear combination X1t − AX2t is stationary. This phenomenon is termed cointegration; the vectors α for which α0 Xt ∼ I(0) are termed cointegrating vectors. As per the discussion above, n

X[nr] 1 X X2,n (r) := √ = √ u2t n n t=1

B2 (r) ∼ BM (Ω).

Note ∆X1t = A∆X2t + ∆u1t = Au2t + ∆u1t , so that the when we consider the scaled partial sums, [nr]

[nr]

1 X 1 X 1 X1,n (r) := √ ∆X1t = A √ u2t + √ (u1,[nr] − u1,0 ) n t=1 n t=1 n | {z }

AB2 (r),

=op (1)

the u1t ’s telescope out. Hence Xn (r) converges weakly to a singular Brownian motion B(r) = [ A I ]B2 (r), with variance matrix     A Σ= Ω A0 I . I 4

Observe that any row, α0 , of the matrix [I, −A], has the property that α0 Σ = 0, and hence α0 X1,n (r) → 0 a.s. These α are, of course, the cointegrating vectors defined above. The rank deficiency of Σ is thus equal to the number of linearly independent cointegrating relations, termed the cointegrating rank of {Xt }. The signficance of this result is that it provides a nonparametric criterion for cointegration. That is, we could imagine determining whether or not any of the series an observable series {Xt } are cointegrated, simply by estimating the long-run variance matrix Σ = lrvar(∆Xt ) (using nonparametric methods to be described later in the course), and testing for a rank deficiency of Σ.

4.3

Stochastic integration for continuous martingales

Let M (r) be a martingale on [0, 1], with respect to the natural filtration F(r) := σ{M (s) | s < r}. Suppose further that M (0) = 0, and that M has continuous sample paths. (Clearly, Brownian motion is one such process.) Now let H(r) be a process of the form H(r) =

n X

h(ti )1{ti < r ≤ ti+1 },

i=1

where {ti }n+1 i=1 is a partition of [0, 1], and the h(ti )’s are square-integrable, F(ti )-adapted random variables. H is thus an F(r)-adapted process, with left-continuous sample paths. H is termed a simple process; let S denote the collection of all such processes. For H ∈ S, we define the stochastic integral with respect to M to be IM (H)[r] : =

n X

h(ti )(M (ti+1 ∧ r) − M (ti ∧ r))

i=1

=

X

h(ti )(M (ti+1 ) − M (ti )) + h(tr )(M (r) − M (tr ),

ti+1 ≤r

where tr denotes the smallest element of the partition strictly larger than r. In the conventional integral ´r notation, we write IM (H)[r] = 0 H(s) dM (s). It is easily verified that IM (H)[r] is a linear function (just as every good integral should be). Moreover, it maps each H ∈ S into a martingale with (clearly) continuous sample paths. To see this, note that for r > s,  E[IM (H)[r] | F(s)] = E 

 X

h(ti )(M (ti+1 ∧ r) − M (ti ∧ r)) | F(s)

ti ≤s

{z

|

}

(i)

" +E

# X

h(ti )(M (ti+1 ∧ r) − M (ti ∧ r)) | F(s) .

ti >s

|

{z

(ii)

In (i), each h(ti ) is F(s)-measurable, so X X (i) = h(ti )(E[M (ti+1 ∧ r) | F(s)] − E[M (ti ∧ r) | F(s)]) = h(ti )(M (ti+1 ∧ s) − M (ti ∧ s)), ti ≤s

ti ≤s

since M is a martingale. Moreover, by taking iterated expectations, " # X (ii) = E h(ti )E[(M (ti+1 ∧ r) − M (ti ∧ r)) | F(ti )] | F(s) = 0, | {z } t >s i

=0, by martingale property

and hence IM (H)[r], for r ∈ [0, 1], is a martingale. 5

}

We want to extend the domain IM (·) to a broader class of processes, which we do by establishing that IM (·) is an isometry (a norm-preserving map). Taking r = 1 (and suppressing it) for simplicity, write n 2 X E|IM (H)| = E h(ti )(M (ti+1 ) − M (ti )) 2

i=1

and observe that the Y (ti ) := h(ti )(M (ti+1 ) − M (ti )) are mutually orthogonal, since for ti < tj , EY (ti )Y (tj ) = E[Y (ti )h(tj )E[M (tj ) − M (tj ) | F(tj )]] = 0. Therefore E|IM (H)|2 =

n X i=1

E[|h(ti )|2 E[(M (ti+1 ) − M (ti ))2 | F(ti )]]. {z } | (c)

Now to make sense of (c), recall the definition of quadratic variation, X [M ]t := lim (M (ti+1 ) − M (ti ))2 , |∆n |→0

ti ∈∆n

where ∆n is a partition of [0, t], and |∆n | denotes the size of the partition (the greatest distance between adjacent points); it can be shown that the limit obtains in L1 when M is square integrable. Therefore, for ∆n a partition on [s, r], ˆ r X E(M (r) − M (s))2 = E (M (ti+1 ) − M (ti ))2 → E([M ]r − [M ]s ) = d[M ]t , s

ti ∈∆n

where the RHS is well-defined as an ordinary Lebesgue-Stieltjes integral, since [M ]t is an increasing (and therefore finite-variational) process. Hence E|IM (H)|2 = E

n X

ˆ

1

|H(s)|2 d[M ]s .

d[M ]s = ti

i=1

ˆ

ti+1

|h(ti )|2

0

The RHS defines a norm k·kM for processes on [0, 1], and so IM (·) can be regarded as an isometry between the spaces (S, k·kM ) → (M, k·k2 ), where M denotes continuous martingales on [0, 1], and k·k2 the L2 norm. Our final step is to use the isometry to extend the domain of the integral to the k·kM -closure of S. In ´1 particular, let X be a continuous process for which 0 |X(s)|2 d[M ]s < ∞. By taking a progressively finer partitions {tni } of [0, 1], we can construct a sequence of processes X n ∈ S X n (s) :=

n X

X(tni )1{tni < r ≤ tni+1 },

t=1 n

for which kX − XkM → 0. From the isometry, kIM (X n )[r]k2 = kX n (r)kM , so it follows that {IM (X n )[r]} is a Cauchy sequence, and so has a limit in L2 . For each r, we can therefore define IM (X)[r] := lim IM (X n )[r]. n→∞

(7)

It remains to verify: (i) The construction is well-defined, in the sense of not depending on the approximating sequence X n . This follows straightforwardly from the isometry. (ii) IM (X)[r], for r ∈ [0, 1], is a martingale. This follows because (7) holds in L2 , and so E[IM (X)[r] | F(s)] = lim E [IM (X n )[r] | F(s)] = lim IM (X n )[s] = IM (X)[s]. n→∞

n→∞

(iii) IM (X)[r] has continuous sample paths. (The proof of this is rather more involved.)

6

4 Review of selected material, Lectures 1-5

i=1 is a partition of [0, 1], and the h(ti)'s are square-integrable, F(ti)-adapted random variables. H is thus an F(r)-adapted process, with left-continuous sample paths. H is termed a simple process; let S denote the collection of all such processes. For H ∈ S, we define the stochastic integral with respect to M to be. IM (H)[r]:= n.

210KB Sizes 3 Downloads 219 Views

Recommend Documents

No documents