Economics 704 Steven N. Durlauf Fall 2015
Lecture Notes 5. Mathematics of Time Series
1. Basic ideas The Hilbert space framework provides a very powerful language for discussing the relationship between various random variables. Collections of random variables are called stochastic processes; in common usage stochastic processes are usually understood to refer to collections of random variables whose elements are indexed by time.
We focus on the case of a scalar
stochastic process xt where t is an integer as it is convenient to think of time stretching from to . second-order stationary.
We assume that the process is zero-mean and Second-order stationarity, also known as weak
stationarity, means that the autocovariances between xt j and xt do not depend on t. Formally, assume i) E xt 0 and ii) E xt xt j j . For random variables such as the elements of the stochastic process xt , the natural metric, i.e. notion of length, for a random variable its standard deviation,
xt = E xt2
(5.1)
with covariance as the associated notion of inner product, xt , xt j E xt xt j
1
(5.2)
One can generate a Hilbert space around the sequence. xt , xt 1, xt 2
.
What this means is that one forms a space by taking these elements, adding all linear combinations of the elements, all limits of the linear combinations, etc1. We denote this Hilbert space as Ht x . The entire history of the stochastic process from to generates H x . By construction, Ht 1 x Ht x . The general properties of Hilbert spaces described in Lecture Notes 3 allow one to characterize the linear structure of Ht x in ways that are very important in macroeconomics.
First, observe that by the Hilbert space
decomposition theorem, one can decompose Ht x so that
Ht x Ht 1 x Gt
(5.3)
where Gt is another Hilbert space. The dimension of this Hilbert space is either 0 or 1. This is so because the Hilbert space Gt must be spanned by the single
random variable xt proj xt Ht 1 x where proj xt Ht 1 x is the projection of
xt onto Ht 1 x . To say the space Gt has dimension 0 means that xt Ht 1 x ,
i.e. var xt proj xt Ht 1 x
0 .
If one again applies the Hilbert space
decomposition theorem, it is the case that
Ht x Ht 2 x Gt 1 Gt
Here Gt 1 is spanned by
(5.4)
xt 1 proj xt 1 Ht 2 x . One can repeat this
decomposition any number of times. The Gt spaces are by construction mutually orthogonal. This may be repeated an arbitrary number of times.
1Technically,
we are working with the smallest Hilbert space that contains the elements of the stochastic process. 2
Notice that it is not the necessarily the case that, if this decomposition is done an infinite number of times, the Gt ’s may be used to reconstruct Ht x . The reason for this is each space is constituted by elements that appear in the space Ht j x but not in the space Ht j 1 x ; if there are elements that appear in every member of the sequence Ht x , Ht 1 x …, they will not appear in any of the Gt ’s. Elements that are common to all of the Ht x ’s form a Hilbert space as well. Formally, this space is defined as
H x
t
Ht x
(5.5)
The Hilbert space generated by current and past xt ’s can therefore be decomposed as
Ht x Gt Gt 1 ... H x
(5.6)
2. Wold decomposition theorems
Equation (5.6) is the basis for two fundamental theorems in time series analysis, each due to Herman Wold; his 1948 article is still worth reading. Rozanov (1967) is a deep treatment. I find Ash and Gardner’s (1967) discussion to be especially insightful.
Theorem 5.1. Wold decomposition theorem I
Any zero-mean, finite variance, second-order stationary xt may be decomposed as
3
xt x1t x2t
(5.7)
x1t Gt Gt 1 Gt 2 ...
(5.8)
x2t H x
(5.9)
where
and
In this decomposition, x1t is called the indeterministic component and x2t the deterministic component of xt .
The terms refer to whether or not the
components may be perfectly (linearly) predicted from the past. When a time series contains a nontrivial indeterministic component, the time series itself is said to be indeterministic.
If the process does not contain a deterministic
component, it is purely indeterministic. The term x2t may be perfectly predicted from information in the arbitrarily distant past. One example of deterministic component is a seasonal. Consider the stochastic process cos t where is uniformly distributed on Since
cos t d 0 and
cos t j cos t d
, .
does not
depend on t ,2 cos t is a candidate for x2t . The second Wold Theorem characterizes the linear structure of the indeterministic part of a time series.
2This
follows from the identity cos t j cos t
1 cos j cos 2t j 2 . 2
4
Theorem 5.2. Wold decomposition theorem II
If xt is a purely indeterministic, zero-mean, finite variance, second-order stationary process, then there exists a representation of the process of the form
xt j t j , 0 1
(5.10)
j 0
where t Gt and 2t 2t j j .
j t j
j 0
is known as the fundamental moving
average (MA) representation of xt and is unique. Pf. Since Ht x Ht 1 x Gt , by construction Gt is a Hilbert space of maximum dimension 1; the space is spanned by t which is a scalar random variable. If the dimension is zero (which would mean that var t 0 ), then Ht x Ht 1 x which contradicts the assumption that the process is purely indeterministic. Since the process is indeterministic, one may find an element t in Gt such that the projection of xt onto Gt is t ; this is nothing more than a choice of axis for For the spaces Gt j ( j 0 ), one can find an
the one dimensional space.
element in each of them, denoted as t j , whose variance equals that of t ; each
t j
spans
its
respective
space.
xt
Since
Ht x Gt Gt 1 ... Letting proj xt Gt j
is
purely
indeterministic,
denote the projection of
xt onto Gt j ,
by the Hilbert space projection theorem
xt proj xt Gt j j t j j 0
5
j 0
(5.11)
where the second equality follows from the definition of the t ’s. This verifies the theorem, except for uniqueness.
To prove uniqueness, suppose that there
existed another MA representation xt j t j . For this to be the case, the j 0
variance of
j 0
j 0
j t j j t j must equal zero since by assumption the parts of
the expression are the same.
The variance of this expression equals
2 j j , which equals zero iff j j 0 j . 2
j 0
An immediate implication of the second Wold theorem is that the
fundamental moving average coefficients are square summable, i.e. 2j . j 0
This follows from the fact that
j 0
j 0
var xt var j t j 2 2j
and the
assumption that x 0 . What is meant by the term fundamental in the description of the moving average representation? It is a way of designating a particular orthogonal basis for Ht x . There exist an uncountable infinity of different orthogonal bases for
Ht x , just as there are for spaces such as R k . As far as I know the term is taken from Rozanov (1967); it was popularized by Christopher Sims. For a purely indeterministic process, there is an equivalence between the Hilbert space generated around the stochastic process xt and the Hilbert space generated around the fundamental innovations t
Corollary 5.1. Equivalence between the Hilbert space of a purely indeterministic time series and its associated fundamental innovations.
6
Assume xt is 0 mean, weakly stationary, and purely indeterministic. Let Ht denote the Hilbert space generated by t , t 1... , the fundamental moving average errors. Then Ht Ht x .
Pf. This is left as an exercise; the proof amounts to showing that each of the two Hilbert spaces is a subset of the other.
3. Prediction
We now consider the question of how to optimally predict a time series given its history.
Let xt t j denote the projection of xt onto Ht j ( x ) . This
projection is important in that it is also the solution to the linear prediction problem for xt relative to the information set Ht j x .
Theorem 5.3. Optimal linear predictor. The projection xt k t is the solution to min Ht ( x ) E xt k
Pf. Let
solve the minimization problem.
2
The prediction error equals
xt k t xt k xt k t . The variance of this term will equal x2t k xt k t x2 , t k t since xt k xt k t
x2
t k xt k t
is orthogonal to xt k t .
This variance must exceed
unless xt k t is zero. Uniqueness of the projection then verifies the
result. This theorem implies that t is the forecast error associated with the optimal (in a minimum variance sense) linear forecast of xt given the information
7
set Ht 1 x . The term linear means that the forecast has to be an element of the space, and so is either a linear combination of xt 1 , xt 2 , … or the limit of such a sequence.
Hence, one can think of a time series as a weighted average of
current and past forecast errors. This is intuitive since these forecast errors reveal aspects of the process that are realized each time period. The optimal linear predictor theorem provides insight into the nature of information sets and associated predictions. By construction, xt t j Ht j x and
xt xt t j Gt Gt 1 ... Gt j 1 .
The
fundamental
innovations
t ,..., t j 1
represent the part of xt that is revealed after the forecast. The fundamental innovations can be thought of as information increments between time periods. From the perspective of the Wold theorems, the moving average representation of a time series can thus be seen as a natural way of thinking about its underlying linear structure; much of time series analysis is based on this perspective, which is often referred to as the time domain approach. That said, nothing in these derivations rules out the presence of nonlinear structure in the stochastic process. What is the structure of a k step ahead forecast? Since
xt k j t k j , the equivalence of Ht x and Ht makes it immediate that j 0
xt k t may be written as
xt k t j k t j
(5.12)
j 0
The reason for this is simple: at time t, one can condition forecasts on t , t 1 ,… i.e. Ht . k 1
j 0
j t k j
The component of xt k that is determined by t k ,..., t 1 , i.e.
is orthogonal to Ht and so its linear projection onto this space is 0.
8
Put differently, at time t one has no basis for constructing predictions of the future shocks. To achieve a more parsimonious expression for (5.12), I introduce the annihilation operator.
Definition 5.1 Annihilation operator Let L be a polynomial lag operator of the form
2L2 1L1 0 1L 2L2
(5.13)
The annihilation operator, denoted as eliminates all lag terms with negative exponents, i.e.
L
0 1L 2L2
(5.14)
Using the annihilation operator, one can re-express the optimal predictor as
L xt k t k t L
(5.15)
When the MA polynomial is invertible, L xt t , so that one can explicitly 1
express the predictor as a weighted average of current and lagged xt ’s; (5.15) implies
L 1 xt k t k L xt L
9
(5.16)
This formula expresses forecasts as weighted averages of observables. Equations (5.15) and (5.16) are known as the Wiener-Kolmogorov prediction formulas, named after independent discoverers Norbert Wiener and Andrei Kolmogorov.
Example 5.1. MA(1) process. Let xt t t 1 . Then the AR Wiener-Kolmogorov formula for the optimal linear prediction of the process is 1 1 L xt k t k 1 L xt L
(5.17)
(2.18) which equals 0 for k 1, as one would expect since the process is unaffected by shocks more than one period in the past.
Example 5.2. AR(1) process. If xt xt 1 t , then the AR Wiener-Kolmogorov formula for the optimal linear projection is
xt k t i k Li 1 L xt k xt i 0
(5.19)
Forecasts of AR(1) process over a fixed horizon thus possess a very simple structure: xt k 1t xt k t .
10
References Ash, R. and M. Gardner, (1975), Topics in Stochastic Processes. New York: Academic Press. Rozanov, Y., (1967), Stationary Time Series, San Francisco: Holden Day. Wold, H., (1948), “On Prediction in Stationary Time Series,” Annals of Mathematical Statistics, 19, 4, 558-567.
11